2.1 Predictive Regression
Explanation vs prediction
The goal of scientific psychology is to understand human behaviour. Historically, this has meant to
explain behaviour - that is, to accurately describe its causal underpinnings - and to predict
behaviour - that is, to accurately forecast behaviours that have not yet been observed.
In practice these two goals are rarely distinguished!
● It might seem that the best explanatory model is equal to the best predictive model
● But from a statistical point of view this is simply not true (see this lecture)
→ different things to make a best explanation as compared to the best prediction
Regression
The regression model Y=f(X1,X2)=a+b1X1+b2X2 can be used for explanation or prediction.
● Explanation: how are the X’s related to the Y.
So we test the beta values for significance, and which are significant to explain the variance
on another variable
● Prediction: if we have new X’s what will be the predicted value of Y and how accurate is the
prediction? → We try to be as accurate as possible in predicting, not too interested in
which variables are important
● In explanation you usually use everyone to create the explanatory model, while in
prediction you usually split up the data set and use one part to train the model, and use the
other half to see how well it does predict the values
Explanatory Regression
● Explanatory regression starts with a theory about the data. The regression model is a
translation of the theory into mathematical form.
○ For example: gender and neuroticism have an effect on depression.
● Depressioni = 2 + 0.5*genderi + 1.5*neuroticismi
● The hypotheses generated from the theory can be examined in terms of statistical tests on
the regression weights
● In explanatory regression it is important that the regression weights are estimated
accurately, i.e. they should be unbiased.
Given the data that you have you try to explain the outcome variable as good as possible.
● The regression model itself is the object of interest.
● Explanatory regression heavily depends on assumptions
E.g. normality, independence, etc. (for prediction they are usually not very important
Funny use of “prediction” in psychology
● In psychology we often see papers with titles like
1. Impulsivity predicts problem gambling ...
2.Trait rumination predicts onset of Post-traumatic stress disorder ...
3.Predicting reading and mathematics from neural activity …
● Often the words explanatory and prediction are being used interchangeably.
● In psychology (as compared to the weatherman) we try to predict certain variables as good
as possible, without particularly caring about which variables actually explain those
prediction (as compared to what you do in explanation which is where you look what
explains a certain score, aka. Which variables have a sig. Beta value in predicting the
outcome variable)
, 21
Predictive Regression
● Usually we split a data set into two datasets, from which we use one to train the model (aka
create a model by seeing which variables are good predictors) and the other to test the
model (does it predict the scores well enough):
● Suppose we have data and obtain estimates. This is the training phase.
y=2+0.5X1i+1.5X2i
● Further suppose we have a new observation with and X1 = 2 and X2 = 3
y = 2 + 0.5*2 + 1.5*3
● y = 7.5
(so we are focusing on how accurate the 7.5 is to the observed model)
● Prediction focusses on the accuracy of the prediction. Therefore, we compare the predicted
value (y^) against the observed value (y). This is the testing phase.
● It is important that training and testing is performed on two different data sets. This provides
out-of-sample prediction accuracy
● Usually when we only do one explanatory regression, and use this to “predict” values, the
R2 value usually overfits what it can actually explain. Because you base your prediction
from one sample on the same data as what you build your model on. SO you would need to
use an adjusted R2
● More general, we have a population where the means of Y are given by a function of the
predictor variable(s) (X): Y = f(X) + e
● Often we collect data for a sample of n persons. These data are given by used to train a
model(xi,yi),...,(xn,yn)yi=̂f(xi)+εi
● Suppose we have new observations from the population.
● Based on the model that we estimated on the training data , we can make predictions for
the newly observed data .
● We can compare the predictions against the observations using the mean squared
prediction error (PE): PE(̂f(x0))=E[(y0−̂f(x0))2]
Prediction error
● The prediction error decomposes into (important!)
○ bias: the difference between the estimated f^ and the true f
○ variance: the variability of the estimated f
(can’t measure this from one model. But when you have a more complex model and
you repeatedly sample data and each time you fit this model, the outcomes will
differ. So more complex models have larger variance)
○ irreducible term: variance of Y at a specific value of X (that you cannot reduce.)
○ So the prediction error can be decomposed into those three components:
(PE(̂f(x0))=[Bias(̂f(x0))]2+Var(̂f(x0))+σ2
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper fionabrosig. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €4,29. Je zit daarna nergens aan vast.