100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Combined Summary - Main Concepts Statistics €6,99
In winkelwagen

Samenvatting

Combined Summary - Main Concepts Statistics

1 beoordeling
 187 keer bekeken  11 keer verkocht

Passed the course with an 8.3 and am now certain that my Applied Multivariate Data Analysis summaries can get you all very well prepared for the re-sit! Condensed summary of main concepts, lectures, and theory from the practical exercises

Laatste update van het document: 2 jaar geleden

Voorbeeld 8 van de 31  pagina's

  • 31 januari 2022
  • 3 februari 2022
  • 31
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (27)

1  beoordeling

review-writer-avatar

Door: brahimarab • 10 maanden geleden

avatar-seller
galinajimberry
COMBINED SUMMARY:

MAIN CONCEPTS, LECTURE CONCEPTS, AND THEORY FROM PRACTICAL EXERCISES


Q&A: Last Tutorial Meeting
DESIGNS
Multivariate (MANOVA) => refers to the number of dependent variables; includes several
DVs
Mixed ANOVA => two types of factors; within- and between-subjects
Factorial designs => variables with distinct categories; different nominal levels
 3x2 factorial design => e.g., three different time points (within), and two different
categories (between)

FACTORIAL DESIGNS
Effects in a 2x2x2 mixed ANOVA:
 3x main effects
 3x two-way interactions
 1x three-way interaction

MAIN EFFECTS:
Ho: there is no difference between the groups
Ha: there is a difference between the groups


TWO-WAY INTERACTIONS
(i.e., Ho: use the words Diff erent and Same )
Ho: the mean differences between the groups are the same (= 0)
 E.g., the mean RT difference between the control and experimental group for
condition 1 is the same as the mean RT difference between the control and
experimental group on condition 2
OR

 (µgroup1, condition1 – µgroup1, condition2) = (µgroup2, condition1 – µgroup2, condition2)
The difference between condition 1 and condition 2 is the same for the experimental group
and the control group
- Regardless of the effect of Group (or Condition) => the effect on the outcome will
be the same (i.e., the slopes are the same)
(i.e., Ha: use the words Diff erent and Diff erent )
Ha: the differences between the groups will be different (≠ 0)

,  E.g., the difference between condition 1 and condition 2 is different for the
experimental group and the control group
 E.g., the mean RT difference between the control and experimental groups in
condition 1 is different than the mean RT difference between the control and
experimental groups on condition 2
There is a difference in effect between the groups


THREE-WAY INTERACTIONS
Ho: the mean RT difference for the experimental group on the Experimental task 1 using
Condition 1 is the same as the mean RT difference for the control group on the Experimental
task 1 using Condition 1
Ha: the mean difference for … is different than the mean difference for …


NO INTERACTION
There are two factors influencing an outcome – but their influence is independent on the
other (i.e., they do not affect each other)
 E.g., what you (1) eat has nothing to do with your genes – and (2) your genes have
nothing to do with what u eat => but they still influence your weight independently
If they both influenced each other => interaction effect
- And their influence on each other also influences outcome (e.g., weight)
E.g., a lamp going on and off
(1) Power => predictor 1
(2) Switch (on or off) => predictor 2
Whether lamp is on or off => depends on their combined mechanism
 You have power, and switch is on => their effects on lamp is dependent on them both
working together

LEVENE’S TEST VS SPHERICITY
Levene’s test => between subjects only
Sphericity => within-subjects only


DUMMY VARIABLES AND B-COEFFICIENTS
Dummy variables => use unstandardized coefficients to interpret
 With dummies => the b coefficient reflects the difference between specific group and
the reference group
Continuous variables => use standardized coefficients to interpret


T-VALUES AND DEGREES OF FREEDOM

,t-value of 2 => enough to reject Ho at .05
 If t-value > 2 or t-value < -2 then it is significant at .05
 The negative/positive sigh of the t-value => only tells us the direction of the
relationship; nothing about its significance
The larger the df => the larger the sample size => the smaller the p-value (i.e., significance)
The larger the t-value => the smaller the p-value
The larger the sample size => the smaller the necessary t-value for reaching significance
The larger the df => the more impressive the F-value (or t-value) is


MODERATION
Moderation (interaction effect) => the difference between different levels
 The main effect => the difference on average over both levels


Week 1

SAMPLES
Data is collected from a small subset of the population (i.e., a sample) and used to infer
something about the population as a whole
Samples are used as one is interested in populations – but cannot collect data from every
human being in a given population


MEAN
The mean is a simple statistical model of the center of a distribution of scores – it is a
hypothetical estimate of the typical score
Variance – or standard deviation – is used to infer how accurately the mean represents the
given data
The standard deviation – is a measure of how much error there is associated with the mean
 The smaller the SD – the more accurately the mean represents the data

The mean =
∑ of observed scores
total number of scores


STANDARD DEVIATION VS STANDARD ERROR
The standard deviation => how much the observations in a given sample differ from the
mean value within that sample; how well the sample mean represents the sample
The standard error => how well the sample mean represents the population mean

,  It is the SD of the sampling distribution of a statistic
For a given statistic (e.g., the mean) => the standard error represents how much variability
there is in this statistic across samples from the same population
 Large values => indicate that a statistic from a given sample may not be an accurate
reflection of the population form which the sample came


GOODNESS OF FIT
The SS, variance, and SD – all reflect how well the mean fits the observed sample data
 Large values (relative to the scale of measurement) => suggest the mean is a poor fit
of the observed scores
 Small values => suggest a good fit
They are all measures of dispersion – with large values indicating a spread-out distribution of
scores – and small values showing a more tightly packed distribution
These measures all represent the same thing – but SS = sum of (x i−x )2
differ in how they express it:
 The SS => is a total and therefore, it is 2
s=
∑ 2
of ( xi −x) SS
=
affected by the number of data points N−1 df
 The variance => is the average variability –



but units squared
 The SD => is the average variation – but
converted back into the original units of
s= √ s =
2 ∑ of (xi −x)2 =
N−1 √ SS
N −1
measurement
- The size of the SD can be compared to the s
SEmean=
mean – as they are in the same units of √N
measurement


CONFIDENCE INTERVAL
A 95% confidence interval => an interval constructed such that 95% of samples will contain
the true population value within the CI limits
Large samples => smaller SE => less variability => more reliable
CI =X ±(t n−1 × SE)
The relationship between CIs and null hypothesis testing:
 95% CIs that just about touch end-to-end represent a p-value for testing Ho: µ1 = µ2
of approximately .01
 If there is a gap between the upper limit of one 95% CI and the lower limit of another
95% CI then p < .01
 A p-value of .05 is represented by moderate overlap between the bars (approx. half
the value of the margin of error)
As the sample gets smaller => the SD gets larger => the margin of error of the sample mean
gets larger

, - The CIs would widen and could potentially overlap
- When two CIs overlap more than half the average margin of error (i.e., distance
from the mean to the upper or lower limit) – do not reject Ho


MARGIN OF ERROR
The margin of error => t(df) x SE
It is the distance from the mean to the upper or lower CI limit


TEST STATISTIC
A test statistic => is a statistic for which the frequency of particular values is known
The observed value of such a statistic – is used to (1) test hypotheses, or (2) establish whether
a model is a reasonable representation of what is happening in the population
A test statistic is the ratio of variance explained by the model (effect; df = k) and variance not
explained by the model (error; df = N – k – 1)


TYPE I AND TYPE II ERRORS
A Type I error occurs when one concludes there is a genuine effect in the population – when
there really isn’t one (Ho is true)
A Type II error occurs when one concludes that there is no effect in the population – when
there really is one (Ha is true)
There is a trade-off between both errors:
 Lowering the Type I error risk (=> alpha) – lowers the probability of detecting a
genuine effect => increasing Type II error risk


Ho is True Ho is False
Ho reject Type I Error Correct
(false positive) (true positive)
Probability = α =1-β
Ho accept Correct Type II Error
(true negative) (false negative)
=1-α Probability = β


In general – type I errors (false positives) are considered more undesirable than Type II errors
(false negatives) => because the real and ethical costs of implementing a new treatment or
changing policy based on false effects are higher than the costs of incorrectly accepting the
current treatment or policy


EFFECT SIZE

,An effect size => an objective and standardized measure of the magnitude of an observed
effect
Measures include Cohen’s d, Pearson’s correlations coefficient r and η²
An important advantage of effect sizes is that they are not directly affected by sample size
- In contrast, p-values get smaller (for a given ES) as the sample size increases
NB: Effect sizes are standardized based on the SD (e.g., Cohen’s d expresses the difference
between two group means in units SD) – whereas, test statistics divide the raw effect by the
SE
 Small effects can be statistically significant – as long as the sample size is large
 Statistically significant effects are not always practically relevant
 It is recommended to report p-values, CI’s and effect size => the three measures
provide complementary information


POWER
Power => the probability that a test will detect an effect of a particular size (a value of 0.8 is
a good level to aim for)


RESEARCHER DEGREES OF FREEDOM
Researcher dfs => the flexibility of researchers in various aspects of data-collection, data-
analysis and reporting results
The false-positive rates exceed the fixed level of 5% in case of flexibility in:
(1) Choosing among dependent variables
(2) Choosing sample size
(3) Using covariates
(4) Reporting subsets of experimental conditions
Multiple testing => results in an inflated Type I error risk
 E.g., carrying out 5 significance tests => results in overall false-positive risk of
1−(.95)5 = .23
 The overall risk becomes 23% instead of 5%


STANDARDIZED AND UNSTANDARDIZED REGRESSION COEFFICIENTS
Unstandardized regression coefficients (b-values) => refer to the unstandardized variables
Standardized regression coefficients (beta-values) => refer to the standardized variables (i.e.,
z-scores)
 The number of SDs by which the outcome will change as a result of one SD change in
the predictor

,  They are all measured on the same scale => they are comparable and can be used to
judge the relative contribution of each predictor in explaining the DV given the
predictors that are included in the regression equation
 When a new predictor is added to the regression model => all weights may change,
thus the relative contributions may change too
 Need to know the SDs of all variables in order to interpret beta-values literally


R-SQUARED
R-squared can be derived from the degree to which the points (in a scatter plot depicting
observed vs predicted values) lie on a straight line
SSmodel
R-squared =
SStotal




F-STATISTIC
MSmodel
F=
MSresidual


OUTLIER
An outlier => a score very different from the rest of the data; it can bias the model being
fitted to the data
Check for outliers by looking at (1) residual statistics and (2) influence statistics
- Need both types of statistics, as an outlier does not necessarily show both a large
residual and a deviant influential value


RESIDUAL STATISTICS
These statistics provide information about the impact each case has on the model’s ability to
predict all cases => i.e., how each case influences the model as a whole
Rules of thumb for identifying outliers that may be cause for concern:
 Cook’s distance => a general measure of influence of a point on the values of the
regression coefficients; Cook’s distances > 1 may be cause for concern
 Leverage => an observation with an outlying value on a predictor variable is called a
point with high leverage; points with high leverage can have a large effect on the
estimate of regression coefficients
 Mahalanobis distance => closely related to the leverage statistic, but has a different
scale; it indicates the distance of cases form the means of the predictor variables;
influential cases have values > 25 in large samples, values > 15 in smaller samples,
and values > 11 in small samples

, INFLUENCE STATISTICS
More specific measures of influence => for each case, assess how the regression coefficient is
changed by including that case
 DFB0 and DFB1 => indicate the difference in the regression coefficients bo and b1
between the model for the complete sample and the model when a particular case is
deleted
- They are dependent on the unit of measurement of all variables
 DFF => indicates the difference between the current predicted value for a particular
case and the predicted value for this case based on the model based on the rest of the
cases


MULTICOLLINEARITY
Refers to the correlations between predictors – there are three different ways to detect it:
1. Correlations between predictors that are higher than .80 or .90
2. VIF of a predictor > 10
3. Tolerance of a predictor < .10
Multicollinearity can cause several problems:
 It affects the value of b-slopes; b can become negative for predictors with positive
correlations to the outcome
 It limits the size of R-squared; adding new (but correlated) predictors does not
improve the explained proportion of variance
 It makes it difficult to determine the importance of predictors



Lecture 1

STATISTICAL MODELS
In statistics, we fit models to the data => a statistical model is used to represent what is
happening in the real world
Models consist of parameters and variables:
1. Variables => measured constructs (e.g., fatigue) and vary across people in the sample
2. Parameters => estimated from the data and represent constant relations between
variables in the model
Model parameters are computed in the sample => to estimate the value in the population


THE MEAN AS SIMPLE MODEL
The mean is a model of what happens in the real world – it is the typical score

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper galinajimberry. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 53340 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,99  11x  verkocht
  • (1)
In winkelwagen
Toegevoegd