In deze samenvatting vind je alle belangrijke stof uit de Grasple lessen voor het vak Advanced Research Methods & Statistics. Alle stof van week 1 t/m week 5 is behandeld, inclusief ondersteunende illustraties.
,Grasple lessons week 1:
The posterior is a compromise (or combination) of the prior and likelihood. The information
in the dataset provides information about what reasonable values for µ could be (through what
is called the likelihood function), but the prior distribution also provides information, that is,
the knowledge or belief about µ before we examine our data. Priors therefore can affect the
posterior estimates. The Bayesian approach makes it possible to build on existing knowledge
and allows science to be accumulative.
In classical / frequentist statistics the definition of probability is: the probability of an event is
assumed to be the frequency with which it occurs. The foundation of Bayesian statistics is
Bayes theorem. Central in Bayes theorem are conditional probabilities, e.g. P(A given B):
What is the probability that A will happen or is true given that we know B has happened or is
true? If we fill in that A stands for a hypothesis of interest and B for data we collected, then
P(A given B) represents the probability of our hypothesis given the data we observed in our
study. To obtain P(A given B) an ingredient is needed that is not part of the frequentist
approach: P(A), the prior probability of the hypothesis. We integrate previous knowledge and
beliefs about the thing we are interested in and then update our knowledge and beliefs based
on the evidence we find in our data.
It is important to know that Bayesians measure the relative support for hypotheses. Two
hypotheses are compared, or tested against one another, using the Bayes Factor (BF). A BF12
of 10 means that the support for H1 is 10 times stronger than the support for H2. A BF is not a
probability but BFs can be transformed into (relative) probabilities. First we have to define
prior model probabilities: how likely is each hypothesis before seeing the data. The most
common choice is that before seeing data each hypothesis is considered equally likely. The
prior probabilities add up to 1, because they are relative probabilities divided over the
hypotheses of interest. The posterior model probabilities (PMP) also add up to 1.
Assumptions about the measurement level of variables in MLR:
The dependent variable is a continuous measure (interval or ratio).
The independent variables are continuous or dichotomous.
There are linear relationships between the dependent variable and each of the
continuous independent variables. This can be checked using scatterplots. A
scatterplot has the predictor on the x-axis and the outcome on the y-axis and uses dots
to represent the combination of x-y scores for each case in the data. A linear relation
means that the scatterplot of scores has an oval shape that can be described reasonably
well by a linear line. When a relation between a continuous predictor and the outcome
is not linear, you can add additional terms to the regression model to accommodate the
non-linearity.
There are no outliers. An outlier is a case that deviates strongly from other cases in the
data set. It is not always an easy decision how to deal with outliers. Sometimes it is
clear that it must be a typo and then you can either correct or delete the value. Often it
is not clear why the outlier exists. If it has large impact on results you can still decide
to remove it or change it to a less extreme value. Very important is transparency about
any alterations to the data and the motivation for doing so.
There are also assumptions that can be evaluated during a regression analysis. This step is
taken before the results can be interpreted. We want to check various assumptions:
, Absence of outliers. With the standardized residuals we check whether there are
outliers in the Y-space. As a rule of thumb, it can be assumed that the values must be
between -3.3 and +3.3. Those smaller than -3.3, or greater than +3.3, indicate potential
outliers. With Cook’s Distance it is possible to check whether there are outliers within
the XY-space. An outlier in the XY-space is an extreme combination of X and Y
scores. Cook’s Distance indicates the overall influence of a respondent on the model.
As a rule of thumb, we maintain that values for Cook’s Distance must be lower than 1.
Absence of multicollinearity. Multicollinearity indicates whether the relationship
between two or more independent variables is too strong (r above .8 / .9). If you
include overly related variables in your model, this has three consequences: (1) the
regression coefficients (B) are unreliable, (2) it limits the magnitude of R (the
correlation between Y and Ŷ) and (3) the importance of individual independent
variables can hardly be determined, if at all. Determining whether multicollinearity is
an issue can be done on the basis of the statistics Tolerance or VIF. Values for the
Tolerance smaller dan .2 indicate a potential problem. Values for the Tolerance
smaller than .1 indicate a problem. For the VIF, values greater than 10 indicate a
problem. When you run into multicollinearity, you will have to figure out which
variables cause the problem and then either remove one or more variables or combine
variables in scales.
Homoscedasticity. This condition means that the spread of the residuals must be
approximately the same across all values for the predicted y. We assess this by
plotting the residuals against the predicted values. If for every predicted value there is
approximately the same amount of spread around the Y-axis, then the condition is met.
Normally distributed residuals. If the normality of residuals assumption is violated you
need to find the cause. This can be caused by one of the predictors or the outcome not
being normally distributed. But it can also be caused by non-linear relations between x
and y that were not spotted and dealt with before.
The multiple correlation coefficient R indicates the correlation between the observed
satisfaction scores (Y) and the predicted satisfaction scores (Ŷ). It is used to say something
about how good the model is at predicting satisfaction. Normally, the squared version of R, R
squared (R2), is used to assess how much variance of the dependent variable is explained by
the model. The adjusted R2 is an estimate of the proportion of explained variance in the
population. It adjusts the value of R2 on the basis of the sample size (n) and the number of
predictors in the model (k). The F-test tests whether the model as a whole is significant. The
R-square change is the change in R-square when adding predictors to a model. It measures the
increase in explained variance when adding predictors.
To convert a variable into a dummy variable, there are seven steps:
1. Count the number of groups that your variable has and subtract 1 from this.
2. Create as many new variables as you calculated in the first step. These are your
dummy variables. For example, if you have a variable that has 5 categories, you would
create 4 dummy variables.
3. Choose which group will become your reference group. This is the group with which
you can compare all your other groups. You make this choice based on the
comparisons which are most relevant and interesting to make.
4. Give your reference variable the value 0 for all dummy variables.
5. For your first dummy variable, give the value 1 to the first group which you want to
compare with your reference group. All other groups receive the value 0 for this
dummy variable.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lisanvdkamp. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €2,99. Je zit daarna nergens aan vast.