(ENG) This is the summary of all exam material. The summary is divided into 5 weeks where all parts of the lectures, seminars and grasple lessons are covered. It includes all models, concepts, and handy overviews for interpreting the results of statistical models.
ADVANCED RESEARCH METHODS & STATISTICS
Week 1
Frequentist approach vs Bayesian estimation
In frequentist statistics we don’t have prior information; all relevant information for
interference is contained in the likelihood function. In the distribution you can see the
likelihood of a data point. In frequentist statistics the underlying simple definition is: the
probability of an event assumed by the frequency with which it occurs.
In the Bayesian estimation you may have prior information and you update this
information. If you “choose” a prior, you start with expectations and you adjust your
expectations with updating the data. Your prior can either be very strong or weak, depending
on what kind of expectations you choose. The prior influences the posterior. Prior 1 + data =
posterior. When you have no expectations, the data will overrule the prior. In frequent
statistics we don’t use a prior, therefore the data will be the inferences. Different priors will
affect the outcome of the posterior.
Bayesian statistics assumes that we know more than just the frequency of an event in
a data set. We have some prior (= existing) knowledge (or beliefs) before we look at our own
data. In a Bayesian analysis, we add this prior knowledge or belief to the analysis.
The lady tasting tea
This is a famous anecdote about a lady who said that tea poured into milk and milk poured
into tea tastes different. Fisher did an experiment with the lady that consisted of 6 trials. She
got 5 out of 6 correct, but he wondered if she wasn’t just guessing. He conducted a null
hypothesis that said that the lady was guessing, but the p-value was p=.109. However,
another scientist noted that if you would use a different sampling plan (with the same
results, 5 out of 6) you could get a p-value of p=.031.
Therefore, it is important to remember that results (and the conclusion) depend on
things not observed and on the sampling plan. Thus, the same data can give different results!
Bayesian probability vs. frequentist probability
Posterior Model Probability (PMP) these are updates of prior probabilities (for
hypotheses) with the BF.
The prior probabilities add up to 1 because they are relative probabilities divided over the
hypotheses of interest. Note this is also the case for unequal prior probabilities that could be
defined just as well: If we are interested in two hypotheses H1 and H2, and we think that H1
is more likely a priori, we could assign P(H1) = 0.6 and P(H2) = 0.4 (this is just an example;
other values are possible as well -- but they should add up to 1).
The posterior model probabilities (PMP) also add up to one (and they are also relative
probabilities).
Within the Bayesian probability you can test a hypothesis against another one. You are
comparing the support for each hypothesis. However, with the frequentist probability you
are testing the null hypothesis without prior information. This results in a distinction
between the frequentist and Bayesian approach:
- Frequentist: probability is the relative frequency of events.
, - Bayesian: probability is the degree of belief.
So, the same word is used for slightly different things, and this leads to differences in the
correct interpretation of statistical results.
Bayesian probability of a hypothesis being true depends on two criteria:
1. How sensible it is, based on prior knowledge (the prior)
2. How well it fits the new evidence (data)
Frequentist 95% confidence interval (CI)
If we were to repeat this experiment many times and calculate a CI each time, 95% of the
intervals will include the true parameter value (and 5% won’t). if you do an experiment once,
you don’t know if your CI is one of the 95% or 5%. This makes things complicated. This won’t
answer the question about hot likely it is that the null is true given the data.
Bayesian 95% credible interval
There is 95% probability that the true value is in the credible interval. This interpretation is
much more straightforward and intuitive. If a zero (0) is included in the interval, it implies
that there is no significant effect for the coefficient.
Multiple Linear Regression (MLR)
In a MLR we add more predictable variables in comparison to a single linear regression (SLR).
Regression formula
Ŷ Estimation
b0 Intercept
b1 Slope of x1
b2 Slope of x2
ei Residual
Regression output
2 How much variance do our predictors explain? Adding
R
more predictors always increases R2, but adjusted R2
gets a penalty for more predictors (to account for
overfitting, more on overfitting later in this summary).
R Multiple correlation coefficient, correlation between
the observed y and y hat.
2 Estimate of how much variance can be explained
Adj. R
outside of this sample. This value is always smaller
than R squared.
Unstandardized B values. However, you cannot use these coefficients
coefficient to compare influence because they have different
scales.
Standardized These values are the same scale, and therefore you
coefficient can use them to compare the influence. The most
important predictor is the one with the biggest
number in absolute terms.
, Bayesian analysis output
Null model Where all B values are zero. You can look at this as a
sort of “control group” values.
BF10 = 10 Support for H1 is 10 times stronger than for H0
BF10 = 1 Support for H1 is as strong as support for H0
BF10 = <0 More support for H0
95% CI Range of values between lower and upper
BFinclusion This shows if the model improves with this predictor
BF=3 Some support
BF=10 Strong support
Hierarchical MLR
Comparing two nested models.
Exploration vs theory evaluation
With the frequentist approach the data analyst decides what goes in the model. The method
used is stepwise forward/backward: best prediction model is determined based on results in
the sample. In Bayesian model selection as implemented in JASP-base is somewhat
exploratory. BAIN can evaluate informative hypotheses.
B-value & beta-value
The b-values are always used for regression equations. And the beta-values are standardized
values, therefore you can compare the values. So, what does it say? A higher beta value
indicates a stronger relationship between the independent variable and the dependent
variable. A higher beta accounts for a more important predictor.
Note: it is disregarded the minus sign, so the absolute number. You can leave the
minus out of the equation when looking for the highest beta value.
Validity
Validity revision
Construct validity The extent to which a conceptual variable is accurately measured or
manipulated.
Internal validity The extent to which the research method can eliminate alternative
explanations for an effect/relationship.
External validity The extent to which the research results generalize to, or represent,
people or contexts besides those in the original study.
Statistical validity The extent to which the results of a statistical analysis are accurate
and well-found
Registered reports
As a scientist you should have no control over the results. However, the results are the most
important for advancing your career. This puts pressure on researcher to get “great results”.
This can lead to lack of replication, low statistical power, and selective reporting:
- P-hacking leaving out some of the results.
- Publication biases some papers will not get published by journals.
- HARK-ing changing or creating the hypothesis.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper julialunaharmsen. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €10,40. Je zit daarna nergens aan vast.