Summary

summary statistics

94 views 7 purchases

Course
Statistiek 1

Institution
Rijksuniversiteit Groningen (RuG)

Book
How to Do Linguistics with R

Statistics I In English Professor Yfke Ongena year 2020 Bachelor Dutch Language and culture year 2

[Show more]

Last document update: 4 year ago

Preview 4 out of 54 pages

View example

Summarized whole book? No
Which chapters are summarized? Some parts
Uploaded on October 18, 2020
File latest updated on October 19, 2020
Number of pages 54
Written in 2020/2021
Type Summary

ntc ciw linguistics dutch statistics statistics

Book Title:How to Do Linguistics with R

Author(s):Natalia Levshina

Edition:november 2015
ISBN:9789027212252
Edition:Unknown

Summary
Summary How to do linguistics with R, Natalia Levshina (LCX046B05)
Summary
Summary How to do Linguistics with R - Natalia Levshina, Chap. 6, 7, 8, 12, 13
Summary
Statistics 2 Notes

Institution
Rijksuniversiteit Groningen (RuG)
Education
Nederlandse Taal en Cultuur
Course
Statistiek 1

jetsterkman

Member since 5 year 171 documents sold

$4.76

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Summary statistics
EXAM 2/11/2020 15:00-17:00

All weekly steps should be enough to prepare for the exam

You will not have to use Rstudio during the exam. You do have to be able to recognize
functions from the practica

It will be an online exam you make at home.

There will be both open-ended and MC questions
25 MC questions
6 open-ended questions, not very long answers like the lab sessions

The webinar exercises are most representative for the exam (and the practice exam of
course) All calculations you need to do will be discussed in webinar exercises. So if you can do
these exercises you are well-prepared for the calculations, but the exam also includes
theoretical questions

Week 1 - introduction
With statistics you can review and analyse the results of experiments

Statistical analyses are used to understand the data, for:

- descriptive statistics: summarizing/describing the characteristics of a sample
• Describe (sample) data without drawing conclusions
• Measures of central tendency: show which values are typical, f.e. Mean, Median, Mode
• Measures of variation/dispertion(spread): show how variable the data are, f.e. range, IQR,
variance, standard deviation
f.e.: the mean speech rate of the sample of a hundred Belgian Dutch speakers
- inferential statistics: relating variables to each other and evaluating the relationships between
variables (generalising the outcome of a sample to a population).
• Using the characteristics of a sample to draw conclusions about the entire population
f.e.: the speech rate of Dutch speaking people from the Netherlands is significantly higher than
the speech rate of Dutch speaking Belgians, based on a sample of 200 people.

statistical tests to relate a sample to a population:
• Comparing two groups to each other, or one group to a fixed value
• Associating 2 variables
• The internal consistency of questions in a questionnaire

For both kinds of statistics, the data has to be variable. This means that the cases we compare have
different values.

,For example: if you want to know if how much beer a country produces depends on how much beer
is consumed in that country, it does not make sense to investigate this if the amount of beer
consumed is exactly the same in every country.

Population= a group representing all objects of interest
For example: for an investigation focussing on the speech rate of Dutch speakers in the Netherlands
versus Belgium, the population is all Dutch-speaking people in the Netherlands and Belgium.
Parameters = the values obtained from a population
f.e.: the mean speech rate of all Dutch speakers from Belgium

Sample= a value that represents a population, without having to investigate the entire population
f.e.: the speech rate of one hundred Dutch speakers from Belgium
NL ‘steekproef’

statistics =
1 the method to analyse data
2 the measurements that are obtained from a sample
f.e.: the mean speech rate of the sample of a hundred Belgian Dutch speakers
Important: sample has to be representative for the population
sampling error= the difference between the sample statistic and the population parameter.
The smaller the sampling error, the more representative the sample.

Random sampling= the best way to draw a representative sample, because everyone in the
population has an equal chance of getting selected.
Representative sampling= using a sample that represents certain characteristics of a population,
such as ethnical group or sex.
Downside: you could overlook certain variables
Convenience sampling: the least reliable way of sampling, but the most frequently used. It means
you use data that is easily accessible, but therefore also less random.

We always need 2 types of hypothesis for statistical reasoning:
1. Research hypothesis/alternative hypothesis (Ha): ‘educated guess’
there is a relationship between two measured phenomena
Directional (expecting one value to be bigger than the other, or f.e. ‘if X increases, Y decreases’)
or non-directional (just expecting a difference between the two variables; X ≠ Y)
2. Null hypothesis (H0): there is no relationship between two measured phenomena/variables
If a significant difference is found→reject H0 and accept Ha
If no significant difference is found→retain H0

Distribution= NL ‘verdeling’. How values of a variable are distributed.
You can visualize the distribution in a graph, f.e. with the x-axis representing the value, and the y-axis
the frequency/number of occurrences.

normal distribution (normale verdeling):
• Bell-shaped
• Symmetric
• Space under the curve is 100%
• 68% of the observations is around the mean

,• we can use sd’s
• The mean, mode and median are exactly the same in a normal distribution.
• Standard normal distr.: mean=0, sd=1.

p-value (probability value): shows the probability of a certain value occurring in case H0 is true.
So: how big is the chance that you record this value ‘by chance’?
If the p-value is smaller than the alpha level (p<α), H0 can be rejected
The p-value says something about the chance of finding this particular result in random samples from
the population.
The p-value represents the chance of a type one error
One-sided to two-sided test: p*2
Two-sided to one-sided test: p/2
So: a one-sided test is more likely to give a significant result
significance level= boundary for the chance that you reject H0, even if it is true (type I error)
represented by the the α-value or significance level (default 0.05)
the smaller the significance level, the smaller the chance at a type one error and the bigger the
chance at a type two error

type one error= ‘false positive’, rejecting a true H0
type two error= ‘false negative’, rejecting a true Ha
type two errors are often because of a small power (n, the sample size, is very small a.k.a. not much
individual cases were tested)

Effect size: how big the effect is, in other words: how strong the relationship/assoiation between two
variables is
Used to quantify the difference between two variables
n increases→Effect size stays the same
n increases→p-value: lower (p-values depend on sample size and effect size)
n increases→t-value: higher (see week 4; because: the larger the t-value, the closer to the mean)
The larger the sample, the sooner you will get a significant result

one-tailed or two-tailed test:
One-tailed is used for a directional hypothesis.
f.e.: Ha = X<Y
you want to look at the left end/tail of the curve, because that is where X is smaller than Y
the end of the curve investigated is the shaded part
p-value of a one-sided test is always half the size of the p-value of a two-sided test

Variable = a characteristic of a testobject/ the individuals that you study that does not have a fixed
value. The value of a variable can be measured.
f.e.: variables of a set of words are word length and word class/gender
-univariate
-bivariate
-multivariate

, Units: persons or objects you are studying, who have certain characteristics
Variables: the characteristics, they can have different values
Values: the values the variables can have

Response= dependent variable: the value that changes as a result of some other parameter of
interest
Explanatory = independent variable: the variable that influences the outcome (and determines the
response value)
Example: you are investigating if word frequency influences the response time (time it takes people
to recognize the word). The explanatory variable is the word frequency, the response variable is the
response time.

Measurement levels:
Type of measurement level of a variable determines the possibilities for statistical analysis (low-
high):
• Nominal
unorderded categories
frequency table is possible
f.e. country of birth, favorite band
binary variable: variable with two levels, f.e. male/female
• Ordinal
ordered (ranked) scale, amount of difference between categories is unclear: intervals between
the scale points are not exactly the same
f.e. Likert-scale (strongly agree – agree – neither agree nor disagree – disagree – strongly
disagree / please rate on a scale from 1 to 5…) or year of birth in groups (After 1950, Between
1941 and 1950, Between 1931 and 1940, Between 1921 and 1930, In 1920 or earlier) or
education levels (MBO, HBO, WO etc.)
• Interval
Numerical with meaningful difference (so the intervals between the scale points are the
same/’known distances’) but no true 0
f.e. temperature in degrees Celsius or year of birth in numbers
do not allow for multiplication
• Ratio
numerical with meaningful difference and true 0 (value of 0 has a clear meaning)
f.e.: the amount of occurrences of a certain word in a text or travel time to work
Allow for multipliction: if in text A there are ten occurrences of a word and in text B 100, you can
say that the word occurs 10 times more often in text B

Quantitative/numerical = interval and ratio
Qualitative/categorical = nominal and ordinal

Parametric test= based on assumptions about a quantitative variable
Non-parametric test does not assume anything about the distribution of the data.

quantitative-scaled variables can be divided in:
-continuous
The value can be dicided, f.e. hight in cm (something can be 1, 2, but also 1,75 cm high)

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller jetsterkman. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.76. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

50990 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Seller

Summary

summary statistics

Document information

Subjects

Connected book

More summaries for

Written for