Samenvatting

Summary ARMS Midterm Grasple (+ answers!)

14 keer bekeken 1 keer verkocht

Instelling
Universiteit Utrecht (UU)

These are all the Grasple lessons you have to know for the ARMS midterm. In this document you can find all the answers to every question in Grasple, highlighted information that will be important and everything that was mentioned in Grasple. Good luck with studying!

[Meer zien]

Voorbeeld 10 van de 111 pagina's

Bekijk voorbeeld

Geupload op 18 december 2022
Aantal pagina's 111
Geschreven in 2022/2023
Type Samenvatting

arms
grasple
advanced methods and statistics
midterm

Volgen

annewilbiesheuvel Lid sinds 3 jaar 223 documenten verkocht

GRASPLE MIDTERM
2022-2023

Anne-Wil Biesheuvel

,Inhoud:
Week 1:..................................................................................................................................................2
The Bayesian approach:......................................................................................................................2
Assumptions I:....................................................................................................................................5
Assumptions II:.................................................................................................................................10
Multiple linear regression, including hierarchical MLR:....................................................................14
Creating dummy variables:...............................................................................................................20
Multiple regression with dummy variables (interpretation):............................................................23
Week 2:................................................................................................................................................28
Factorial ANOVA: visually assessing main and interaction effects:...................................................28
Factorial ANOVA:..............................................................................................................................32
Follow-up testing (frequentist only):................................................................................................38
About multiple testing and error rates:............................................................................................43
Informative hypotheses (Bayes only):..............................................................................................45
Creating a JASP file:..........................................................................................................................49
Publishing your data and analyses:...................................................................................................56
Week 3:................................................................................................................................................59
Averages and corrected averages:....................................................................................................59
ANCOVA (Frequentist):.....................................................................................................................63
ANCOVA as regression:.....................................................................................................................69
ANCOVA (Bayesian):.........................................................................................................................72
Supporting the null hypothesis:........................................................................................................75
Week 4:................................................................................................................................................77
Within factors and between factors:................................................................................................77
The sphericity assumption:...............................................................................................................80
Repeated measures ANOVA with one factor:...................................................................................83
Two within factors: interpretation:..................................................................................................89
Mixed design RMA:...........................................................................................................................92
Week 5:................................................................................................................................................96
Moderation vs. mediation:...............................................................................................................96
Bootstrapping:................................................................................................................................100
Mediation analysis:.........................................................................................................................101

1

,Week 1:

The Bayesian approach:
 The Bayesian framework is based on the posterior distribution of one or more
parameters. Let us be interested in estimating a mean μ representing a grade (scale
0-10).
o The information in our dataset provides information about what reasonable
values for μ could be (through what is called the likelihood function).
o But also the prior distribution provides information, that is, the knowledge or
belief about μ before we examine our data.
 The posterior is a compromise (combination) of the prior and likelihood. Let's
examine this visually (formula's for this are beyond this course's goals)

 The resulting mean in the sample (observed data set) turns out to be 8 (M=8.0).
 Which prior could provide a posterior mean of 4?
Middle: Small values for μ in this prior distribution are much more likely than larger
values. The compromise between data (M=8) and prior will therefore result in a value
(substantially) lower than 8.

 So, we have seen in these questions that priors can affect the posterior estimates.
But the first prior distribution (on the left) also showed an example of an ignorant
(uninformative, flat) prior.
 Priors are sometimes seen as the bottleneck of the Bayesian approach because you
have to specify something, and it can affect the results.
 Others consider priors an advantage of the Bayesian approach because we do not
start our research from scratch. We often build on earlier research or on existing
knowledge. This can then be incorporated in the prior and allows science to be
accumulative.

 So:
Bayesian statistics assumes that we know more than just the frequency of an event in
a data set. We have some prior (= existing) knowledge (or beliefs) before we look at
our own data. In a Bayesian analysis, we add this prior knowledge or belief to the
analysis.

 When you are using a Bayesian approach for your own research question, you will be
confronted with one very important issue:

2

, What prior (previous knowledge or beliefs) do you want to add to your own data
analysis? (Spoiler: There are many views on this.)
o I would ask an expert in the field. Experts might already know more about the
thing that interest you. Ultimately, it is up to you as a researcher to decide
whether this is an option for your research - and how much weight you want
to give it!
o I would look at previous studies that are close to my own research question. It
is up to you as a researcher to decide whether you want to include subjective
knowledge or beliefs in your analysis.
o I would not want to include previous studies or beliefs in my own research. It
is up to you as a researcher to decide whether you want to include subjective
knowledge or beliefs in your analysis.

 Another important aspect of the Bayesian framework is the definition of probability.
 In classical / frequentist statistics there is one underlying simple definition: The
probability of an event is assumed to be the frequency with which it occurs.
 For example, if 150 out of 1000 people smoke, we could say that the probability that
some randomly picked person in that group of 1000 smokers is 0.15 (or 15 %). This is
the understanding of probabilities that is applied in the frequentist tests you know.
 In Bayesian statistics, we use a different way of looking at probabilities.

 The foundation of Bayesian statistics is Bayes theorem.

 Central in Bayes theorem are conditional probabilities.
 E.g. P(A given B) : What is the probability that A will happen or is true given that we
know B has happened or is true?
 If we fill in that A stands for a hypothesis of interest and B for data we collected,
then P(A given B) represents the probability of our hypothesis given the data we
observed in our study. Is that not exactly what we are interested in?
 Note the difference with the definition of the p-value: "the probability of observing
these (or more extreme) data assuming that the null hypothesis is true".
 To obtain P(A|B) (A|B is another way of writing A given B) an ingredient is needed
that is not part of the frequentist approach: P(A), the prior probability of the
hypothesis.

 The Bayesian use of conditional probabilities means that we approach an analysis in a
different way.
 We integrate previous knowledge and beliefs about the thing we are interested in
and then update our knowledge and beliefs based on the evidence we find in our
data.

3

, The different definition of probability used in the Bayesian framework also implies
that the interpretation of results is somewhat different. And according to Bayesians:
the Bayesian interpretation is more intuitive.
 Let us first look at estimation using 95% estimation intervals.

 A frequentist interval is called a confidence interval. A Bayesian interval is called a
credible interval. Read the following definition:
 "If we were to repeat this experiment many times and calculate an interval each
time, 95% of the intervals will include the true parameter value (and 5% does not)"
 Does this definition of an estimation interval belong to the confidence or the credible
interval?
Confidence: We cannot talk about the probability that the true value is in the interval
because it either is or is not. There are no probabilities connected to parameters
because they do not represent something that can frequently be repeated (see the
definition of probability in the frequentist framework).

 A frequentist interval is called a confidence interval. A Bayesian interval is called a
credible interval. Read the following definition:
 "There is 95% probability that the true value is in the interval."
 Does this definition of an estimation interval belong to the confidence or the credible
interval?
Credible

 As stated before: The different definition of probability used in the Bayesian
framework also implies that the interpretation of results is somewhat different. And
according to Bayesians: the Bayesian interpretation is more intuitive.
 And after discussing estimation intervals we will now turn to hypothesis testing.
 The definition of a (frequentist) p-value is perhaps not exactly what we are looking
for. It is the probability of observing the same or more extreme data given that the
null hypothesis is true. But this does not provide information on how likely it is that
the null is true given the data.
 A Bayesian probability can provide information about this: How likely is the null, or
any other hypothesis, given the data we observed?

 It is however important to know that Bayesians measure the relative support for
hypotheses. Two hypotheses are compared, or tested against one another, using the
Bayes factor (BF).
 A BF12 of 10 means that the support for H1 is 10 time stronger than the support for
H2.
 This does not imply that H1 is an excellent or perfect or true hypothesis; there can
exist a H3 that receives much more support than H1.
 Thinking about useful, reasonable and informative hypotheses is thus step 1.
Because only the formulated hypotheses are tested (against one another).
 Bayes factors and their interpretations will return later in the course.

 A BF is not a probability but BFs can be transformed into (relative) probabilities.

4

,  First we have to define prior model probabilities: i.e., how likely is each hypothesis
before seeing the data.
 The most common choice is that before seeing data each hypothesis is considered
equally likely. This provides:
o for interest in 2 hypotheses H1 and H2: P(H1) = P(H2) = 0.5
o for interest in 3 hypotheses H1, H2 and H3: P(H1) = P(H2) = P(H3) = 0.333
o for interest in 10 hypotheses H1, ..., H10: P(H1) = ... = P(H10) = 0.1
 The prior probabilities add up to one because they are relative probabilities divided
over the hypotheses of interest. (note this is also the case for unequal prior
probabilities that could be defined just as well)
 The posterior model probabilities (PMP) also add up to one (and they are also
relative probabilities).

 Consider a set of just 2 hypotheses, H1 and H2. The relative probability before
collecting data is chosen to be equal. That is P(H1) = P(H2) = 0.5. The data reports
that H1 receives 3 times more support than H2, that is BF=3. Given the equal prior
probabilities, the resulting PMP's for H1 and H2 will represent that same relative
support. Can you formulate the PMPs without needing a formal equation? Note that
they need to add up to 1.
 For equal prior probabilities and BF12=3 , what is the PMP of H1?
0,75 : PMP(H1) =0.75 and PMP(H2) = 0.25 also shows that H1 receives 3x stronger
support.

Assumptions I:
 For this lesson, all output is provided. How to obtain the output is explained in a
lesson, with a title that starts with JASP.
 Consider that we are interested in predicting how satisfied young people are with
their lives, with several predictor variables.
 For this research, data was collected from 98 randomly selected young people
through questionnaires.

 Within this datafile are the following variables:
 Satisfaction: measured with the Life Satisfaction Scale (1-100)
 Age: measured in years
 Gender: (0 = male, 1 = female)
 Sports: sport participation measured in number of hours per week
 Parents: support from parents (scale of 1-10)
 Teachers: support from teachers (scale of 1-10)
 SES: socio-economic status (1 = low, 2 = medium, 3 = high)
 Within this lesson, we will examine the assumptions we should check.

 Assumptions about the measurement level of variables in MLR:
 Assumption: the dependent variable is a continuous measure (Interval or Ratio).
 Assumption: the independent variables are continuous or dichotomous.

 Is Satisfaction a continuous variable?

5

, Yes: Satisfaction is measured on a scale of 1-100. Composite scales can be used as if
they were continuous. If the dependent variable is nominal or ordinal, it is not
possible to use linear regression. There are other regression methods which are
suitable for dependent variables with other measurement levels, however these are
beyond the scope of this course.

 The independent variable(s) must be continuous or dichotomous (nominal with two
categories).
 Are all independent variables in this study either continuous or dichotomous? (Age,
sports and gender)
Yes: age and sports participation are of interval measurement. Gender is a
dichotomous variable in this data set, that is, there are two categories present: male
and female.

 Another assumption in MLR is linearity of relations (the L in MLR).
 Assumption: there are linear relationships between the dependent variable and each
of the continuous independent variables.
 This can be checked using scatterplots. A scatterplot has the (continuous) predictor
on the x-axis and the outcome on the y-axis and uses dots to represent the
combination of x-y scores for each case in the data.
 A linear relation means that the scatterplot of scores has an oval shape that can be
described reasonably well by a linear line (i.e., not a curved or s-shaped
relationship).
 Examples of a curved relation and a s-shaped relation:

 Examine the 4 plots below. The first is a histogram showing the distribution of the
Satisfaction scores. This can be informative (e.g., for spotting outliers in Satisfaction),
but not for the investigation of linear relations. The other 3 are scatterplots although
only the 2nd and 4th are interesting for investigating linearity between x and y
(because only Age and Sports are continuous variables; Gender is not).

6

, Is the relationship between age and satisfaction linear?
Yes, as data points form an oval, the relationship could be described using a straight
line. Therefore it is possible to include this variable (age) as an independent variable
within the analysis.

 Is the relationship between sports participation and satisfaction linear?
Yes, as data points are oval, the relationship could be described using a straight line.
Therefore it is possible to include this variable (sports) as an independent variable
within the analysis.

 Non-linear relations
 When a relation between a continuous predictor (x) and the outcome (y) is not linear,
you can add additional terms to the regression model to accommodate the non-
linearity. We will only discuss one example.
 Assume the relation has one curve (see plot at bottom). Then a quadratic relation
may better present the observed relation between x and y than the linear relation.

 This is achieved by computing a new variable, the squared version of the original X
and running the regression with both variables
 X and X2 as predictors. You then get 2 parameter estimates B1 and B2, where:
o B1 informs you about the steepness of the overall slope (the linear trend in the
curved relation). The p-value when testing B1 informs you whether the linear
trend is zero (horizontal) or not (when p<.05).
o B2 informs you about how curved the relation is, or stated differently, it
measures the change in slope with increasing X. In the plot below, for instance,
we see that the line is steeper for larger values of X. The p-value when testing B2
informs you whether the change in slope is significantly non-zero. It basically tells
you if the quadratic relation is a better model for your data than the linear
relation.

7

, Assumption: there are no outliers.
 An outlier is a case that deviates strongly from other cases in the data set. This can be
on one variable (e.g. everybody in the data has values between 20-25 on this variable
but one person scored 35), on 2 variables (e.g., one dot in the scatterplot is far
outside the oval cloud that contains the other dots), or on a combination of even
more variables (then numerical instead of visual inspection is easier).
 For now, we focus on the x-y relation (for each x separately), so we will start by
looking at scatterplots.

 When looking at these figures, do you think that the condition is met that no outliers
are present?

No: The scatterplot for satisfaction and age shows that there is a respondent with a
significantly younger age (8 years) then the other respondents. Since this study was
conducted amongst young people, this age is unlikely. Based on this, it is logical to
decide not to include this respondent in the rest of the analyses, as this respondent
does not belong to the target group of the study. There are no outliers in the scatter
plot for satisfaction and sports participation.

 It is not always an easy decision how to deal with outliers. Sometimes it is clear that it
must be a typo (data entry error) and then you can either correct (if information is
available to do that) or delete the value (because you know it is wrong).
 Often it is not clear why the outlier exists. If it has large impact on results you can still
decide to remove it. Or sometimes it is changed to a less extreme value. E.g., if a case
scored much higher than all others, you can change the score to, for instance, the
mean+2*SD. This way this case still has a large score but not so extreme that it will
completely dominate the results of the analysis.
 Very important is transparency about any alterations to the data (and the motivation
for doing so).
 For our example, where one case had an unexpected low age, we will compare the
results after removing this case with the 'before removing the outlier' results.
 Plots are provided on the next slide.

 Is the relationship between age and satisfaction after removing the outlier stronger
or weaker?
Before removing the outlier:

8

, After removing the outlier:

Stronger: The line in the age-satisfaction plot tracks the data more closely than
before. This means the relationship is now a stronger linear relationship.

 The influence of a violated model assumption on the results can be severe. Therefore
it is important to visualize your data. This is also shown by the Anscombe Quartet
(Anscombe, 1973), describing four data sets that have several equal statistical
properties. The variables X and Y have the same average and the same variance
across all data sets, with the correlation and regression line also being exactly the
same.
 The figure below shows how the scatter plots for X and Y look for each data set.
Consider which of the four data sets meets the assumptions of a linear regression.

 Which of the four data sets meets the assumptions of a linear regression?

9

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper annewilbiesheuvel. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 53068 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Summary ARMS Midterm Grasple (+ answers!)

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?