Summary How to do linguistics with R, Natalia Levshina (LCX046B05)
Summary How to do Linguistics with R - Natalia Levshina, Chap. 6, 7, 8, 12, 13
short overview statistics
All for this textbook (6)
Written for
Rijksuniversiteit Groningen (RuG)
Communicatie- En Informatiewetenschappen
Statistics 2 (LIX002X05)
All documents for this subject (3)
1
review
By: fennaschoemaker • 3 year ago
Seller
Follow
lamotte01
Reviews received
Content preview
p>0.05 = not significant – accept null – p is big
p<0.05 = significant – reject null – p is small
Week one: 1-Way ANOVA
Comparing means of groups: Do two groups have the same population
mean? We can use a t test and find out with its p-value
Example: is there difference in the effectiveness between two methods for
reading lessons for second-graders?
Do three or more groups have the same population mean? We can use
ANOVA
1-way ANOVA can be used for this type of question:
• Do three or more groups have the same population mean and are
the populations classified according to one factor?
o Eg: Is there a difference in the effectiveness between three
methods for reading lessons for second-graders?
o Using multiple t-tests, comparing 3 groups but that is not ideal
Issue with multiple t-tests: inflation of surprise
• When one performs multiple comparisons on the same data, the
probability of finding a surprising result increases --- change of type
I error increases
Consider 3 groups: A, B, C. We have 3 pairwise comparisons: (A, B), (A, C), (B,
C)
• p<0.05, so probability of no Type I error: 95% -- 5% chance of false
positive
• Each test is independent, so for 3 groups so you have to run 3 t-tests
o Probability of no Type I: 0.95*0.95*0.95 = 0.857
o Probability of Type I error is 1-0.857 = 14.3% -- much higher than
5%
Why we use ANOVA instead
___________________________________
ANOVA stands for ANalysis Of VAriance
• The name refers to variance, yet this technique is about comparing
the means (of 3 or more groups)
, • Predictor variable(s) are categorical factors (in 1 way ANOVA, 1
predictor)
• ANOVA is a family of statistical tests.
• 3 types:
1. 1-way
i. Observations are independent
ii. 1 experiment condition
2. Factorial
i. Observations are independent
ii. 2 or more experimental conditions. We can measure:
1. Individual effects
2. interactions
3. Repeated measures
i. Each subject is tested more than once, or
ii. Each stimulus is presented more than once
Variable types
• Between-group: different groups or subjects assigned to different
conditions
o Eg: patients taking 3 different treatments
• Within-subject: the same subjects tested in more than 1 condition
o Eg: subjects reacting to 3 different types of words (each
subject sees all the types)
What this implies?
• Only between-group variables: 1-way and factorial ANOVAs
• Only within-subject: repeated measures ANOVA
• Both types: mixed ANOVA (not covered in course)
~ ANOVA is a special case of linear regression (LR). In fact R implements
ANOVA as LR
• everything you can do with ANOVA you can do with regression
~ ANOVA is in disuse in favour of LR
• Still, you will find ANOVAs in papers so you should be able to
interpret them
Assumptions:
• The observations are independent from each other
• The response variable is at least interval-scaled
o Its numerical
,• The residuals are normally distributed (each sample is drawn from a
normally distributed population)
o
▪ We see if the residuals follow a normal distribution
▪ Residuals is the error – how far the model is from the
data
o
▪ 𝐻0 : each groups follows normal distribution
▪ Dependent, independent
▪ Groups 1 and 2 do not follow normal distribution (𝑝 <
0.05)
▪ P needs to be bigger than 0.05 to follow normal
distribution
• The variance is homoscedastic
o the variances in all groups are (roughly) equal
o so we want a not significant p value p>0.05
, o
▪ Variance assumptions is not met
o Fligner-Killeen test: for non-normal data
▪ Alternative for levene when data is not normal
distributions
▪
• Data is not normally distributed and variance is
not the same in all the groups
Alternative tests when data is not normally distributed, and variance are
not the same throughout the groups
1. Variance not homogeneous:
1. Welch one-way test
2. oneway.test()
2. Non-normality:
1. Kruskal-Wallis,
2. kruskal.test()
3. Both assumptions violated: (in the case rn)
1. non-parametric ANOVA,
2. oneway_test()
•
• Still get a significant result
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller lamotte01. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $9.10. You're not tied to anything after your purchase.