100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Grasple lessons ARMS €7,99   In winkelwagen

Samenvatting

Summary Grasple lessons ARMS

3 beoordelingen
 234 keer bekeken  9 keer verkocht

This summary contains all the material from the Grasple lessons provided by the University Utrecht for the course ARMS. It includes material needed for the skills exam (e.g. how to do analyses), but also material for the theory exam. Important concepts are (bold) colored or instructions in italics...

[Meer zien]

Voorbeeld 3 van de 30  pagina's

  • 8 december 2022
  • 30
  • 2022/2023
  • Samenvatting
Alle documenten voor dit vak (25)

3  beoordelingen

review-writer-avatar

Door: liz_bos • 1 jaar geleden

review-writer-avatar

Door: jasmyna_94 • 1 jaar geleden

review-writer-avatar

Door: naomivdaalen • 1 jaar geleden

avatar-seller
Samenvattingenvoorieder
Summary Grasple lessons ARMS
Grasple week 1 Bayes and MLR
The Bayesian framework is based on the posterior distribution of one or more parameters. The
posterior is a combination; prior + likelihood
 the information in the data set gives information, for what logical values for μ (the mean) could
be: likelihood function
 We also have the knowledge and beliefs about μ, before examining the data: prior distribution

Here on the left are examples of prior
distributions, which gives the chances for
some values (e.g. on the left, all is equally
likely)
(and the posterior is thus a value between de μ of the likelihood and the prior)

By using the prior, you don’t start your research from scratch and so science can accumulate.
(Prior can also be seen as a bottleneck, because it can affect the results)

You as a researcher decide what kind of previous knowledge you want to add

Definition of probability:

 In classical/ frequentist statistics: the frequency with which it occurs
 In Bayesian statistics: is it based on Bayes theorem, where conditional probabilities are
central. These probabilities are about e.g. P(A given B); ‘what is the probability of A given
that B has happened or is true’. If we fill in that A
stands for a hypothesis of interest and B for data we
collected, then P(A given B) represents the probability
of our hypothesis given the data we observed in our
study.
(to get P(A|B), we need P(A); the prior probability of the hypothesis)

A frequentist interval is called a confidence interval. A Bayesian interval is called a credible interval
 with confidence interval: "If we were to repeat this experiment many times and calculate an
interval each time, 95% of the intervals will include the true parameter value (and 5% does not)"
 with credible interval: "There is 95% probability that the true value is in the interval."

P-value (frequentist): the probability of observing the same or more extreme data given that the null
hypothesis is true.
A Bayesian probability also gives information about how likely a hypothesis is, given the observed
data. They measure a relative support, using the Bayes Factor.
(e.g. A BF12 of 10 means that the support for H1 is 10 time stronger than the support for H2)

A BF is not a probability but BFs can be transformed into (relative) probabilities.
First we have to define prior model probabilities: i.e., how likely is each hypothesis before seeing the
data. You could consider all hypothesis chanced equally, so H1=H2=0,5 or H1=H2=H3=0,33 (the
probabilities add up to 1 (also when the chances are not equal)). Adding up to 1 applies to prior
probabilities and posterior model probabilities (PMP)
(when BF12=3, chances of H1=0,75 and H2=0,25)

,Assumptions about MLR (multiple linear regression):

About the measurement level of variables in MLR:
 The dependent variable is a continuous measure (Interval or Ratio)
 The independent variables are continuous or dichotomous (nominal with two categories)
About linearity of relations:
 There is linear relationships between the dependent variable and each of the continuous
independent variables. (This can be checked using scatterplots, with the (continuous)
predictor on the x-axis and the outcome on the Y. The shape has to be oval (not S or curved)
to meet this assumption, then the relationship can best be described with a straight line)
Other:
 No outliers (a case that deviates strongly from other cases in the data set, which can be on
variable, or multiple)

When you haven’t got a linear relationship, you can additional terms to the regression model, to
accommodate the non-linearity. If the shape is for example a curve, the quadratic relation may be
better than the linear one, so you can add a new variable (the squared version of the original X and
you run the regressing with both variables (X and X 2). You get two parameters (B1 and B2) out of
this, where:
- B1: informs you about the steepness of the overall
slope (the linear trend in the curved relation). The p-
value when testing B1 informs you whether the linear
trend is zero (horizontal) or not (when p<.05)
- B2: informs you about how curved the relation is, it measures the change in slope with increasing X.
The p-value when testing B2 informs you whether the change in slope is significantly non-zero. It
basically tells you if the quadratic relation is a better model for your data than the linear relation.

If you have an outlier, sometimes you can change it (e.g. you made a typo), or delete it, or make it
less extreme (e.g. by doing mean+2*SD). But all in all, transparency is import

It is important to visualize your data, because the influence of a violated model assumption on the
results can be severe.
Anscombe Quartet: describes four data sets that have several equal statistical properties

We want to check various assumptions:

 Absence of outliers (multivariate, so for combinations of all variables in the model)
 determine with histogram or boxplot, when one variable and with scatterplot when 2
variables.
 Multivariate (for all variables in the model), this can be assessed whilst performing the
analysis.
 On the basis of the values of the Casewise diagnostics, Standard residuals and Cook’s
Distance, it is possible to assess whether there are outliers in the Y-space and XY-space,
respectively
With standardized residuals we check whether there are outliers in the Y-space. As a rule of
thumb, it can be assumed that the values must be between -3.3 and +3.3. Those smaller than
-3.3, or greater than +3.3, indicate potential outliers.
With Cook’s Distance it is possible to check whether there are outliers within the XY-space.
An outlier in the XY-space is an extreme combination of X (all X-variables) and Y scores.

, Cook’s distance indicates the overall influence of a respondent on the model. As a rule of
thumb, we maintain that values for Cook’s distance must be lower than 1. Values higher than
1 indicate influential respondents (influential cases).

When you have to make a choice about whether or not to remove an outlier, a number of things can
be helpful:

 Does this participant belong to the group about which you want to make inferences?
o If not, do not include the participant in the analysis.
 Is the extreme value of the participant theoretically possible?
o If not, do not include the participant in the analysis.
o If so, you could run the analysis with and without the participant and compare
results.

When changing the data, you have to do it for the good reasons (not p-hacking) and be transparent
about it. Transparency is also reached by preregistration



 Absence of multicollinearity
Multicollinearity indicates whether the relationship between two or more independent
variables is too strong.
Association between predictors is not a problem for MLR, but very large association (r
above .8 /.9) is.
If you include overly related variables in your model, this has three consequences:
o The regression coefficients (B) are unreliable,
o It limits the magnitude of R (the correlation between Y and Ŷ),
o The importance of individual independent variables can hardly be determined, if at
all
Determining whether multicollinearity is an issue can be done on the basis of the statistics
Tolerance or VIF (Variance Inflation Factor). You can use the following rule of thumb:
o Values for the Tolerance smaller than .2 indicate a potential problem.
o Values for the Tolerance smaller than .1 indicate a problem.
o The variance inflation factor (VIF) is equal to 1/Tolerance. So for the VIF, values
greater than 10 indicate a problem.
When you run into multicollinearity, you have to remove the variables that cause the
problem (which are highly correlated) or combine variables in a scale (e.g. using factor
analysis)

 Homoscedasticity
Homoscedasticity is that the
spread of the residuals must be
approximately the same across all
values for the predicted y. We
check this by plotting the
(standardized) residuals against
the (standardized) predicted
values. If the spread is equally
distributed for every predicted value (X-axis) there is approximately the same amount of
spread around the Y-axis), you have homoscedasticity, otherwise heteroscedasticity.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Samenvattingenvoorieder. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 72042 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,99  9x  verkocht
  • (3)
  Kopen