Samenvatting

Samenvatting statistiek blok 2 Pre-Master Zuyd (EN)

17 keer bekeken 1 keer verkocht

Vak
Statistiek / statistics

Instelling
Hogeschool Zuyd (HZ)

Samenvatting van het tweede blok statistiek van de pre master aan Zuyd Hogeschool, met uitgebreide bespreking van de begrippen en de procedures.

[Meer zien]

Voorbeeld 4 van de 49 pagina's

Bekijk voorbeeld

Heel boek samengevat? Nee
Wat is er van het boek samengevat? Hoofdstuk 6, 9, 11, 12, 14, 15, 18
Geupload op 11 augustus 2021
Aantal pagina's 49
Geschreven in 2020/2021
Type Samenvatting

Volgen

ninaaaaa Lid sinds 7 jaar 127 documenten verkocht

€7,29

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Chapter 9: the linear model (regression)

The linear model with one predictor

outcome = (b0+b1xi) +errori

This model uses an unstandardised measure of the relationship (b 1) and consequently we include
a parameter b0 that tells us the value of the outcome when the predictor is zero.

Any straight line can be defined by two things:

 the slope of the line (usually denoted by b1)
 the point at which the the line crosses the vertical axis of the graph (the intercept of the
line, b0)

These parameters are regression coefficients.

The linear model with several predictors

The linear model expands to include as many predictor variables as you like.
An additional predictor can be placed in the model given a b to estimate its relationship to the
outcome:

Yi = (b0 +b1X1i +b2X2i+ … bnXni) + Ɛi

bn is the coefficient is the nth predictor (Xni)

Regression analysis is a term for fitting a linear model to data and using it to predict values of an
outcome variable form one or more predictor variables.
Simple regression: with one predictor variable
Multiple regression: with several predictors

Estimating the model

No matter how many predictors there are, the model can be described entirely by a constant (b 0)
and by parameters associated with each predictor (bs).

To estimate these parameters we use the method of least squares.
We could assess the fit of a model by looking at the deviations between the model and the data
collected.

Residuals: the differences between what the model predicts and the observed values.

To calculate the total error in a model we square the differences between the observed values of
the outcome, and the predicted values that come from the model:
total error: Σni=1(observedi-modeli)2

Because we call these errors residuals, this is called the residual sum of squares (SS R).
It is a gauge of how well a linear model fits the data.

 if the SSR is large, the model is not representative

,  if the SSR is small, the model is representative for the data

The least SSR gives us the best model.

Assessing the goodness of fit, sums of squares R and R2

Goodness of fit: how well the model fits the observed data

Total sum of squares (SST): how good the mean is as a model of the observed outcome scores.

We can use the values of SST and SSR to calculate how much better the linear model is than the
baseline model of ‘no relationship’.
The improvement in prediction resulting from using the linear model rather than the mean is
calculated as the difference between SST and SSR.
This improvement is the model sum of squares SSM

 if SSM is large, the linear model is very different from using the mean to predict the
outcome variable. It is a big improvement.

R2 = SSM/ SST
R2 is the improvement due to the model

 To express this value as a percentage, multiply it by 100.
 R2 represents the amount of variance in the outcome explained by the model relative to
how much variation there was to explain in the first place.
 we can take the square root of this value to obtain Pearson’s correlation coefficient for the
relationship between values of the outcome predicted by the model and the observed
values of the outcome.

Another use of the sums of squares is in assessing the F-test.

 F is based upon the ratio of the improvement due to the model and the error in the model.

Mean squares (MS): the sum of squares divided by the associated degrees of freedom.

MSM = SSM/k

MSR = SSR/ (N – k – 1)

F = MSM/MSR

F has an associated probability distribution from which a p-value can be derived to tell us the
probability of getting an F at least as big as one we have if the null hypothesis were true.
The F statistic can also used to the significance R2
F = ((N – k – 1)R2) / (k(1-R2)

Assessing individual predictors

Any predictor in a linear model has a coefficient (bi). The value of b represents the change in the
outcome resulting from a unit change in a predictor.

,The t-statistic is based on the ratio of explained variance against unexplained variance or error
t = (bobserved – bexpected)/ SEb

The statistic t has a probability distribution that differs accordingly to the degrees of freedom for
the text.

Bias in linear models?
Outliers

An outlier: a case that differs substantially from the main trend in the data.
Outliers can affect the estimates of the regression coefficients.

Standardized residuals: the residuals converted to z-scores and so are expressed in standard
deviation units.
Regardless of the variables of the model, standardized residuals are distributed around a mean
of 0 with a standard deviation of 1.

 Standardized residuals with an absolute value greater than 3,29 are cause for concern
because in an average sample a value this high is unlikely to occur
 if more than 1% of our sample cases have standardized residuals with an absolute value
greater than 2,58 there is evidence that the level of error within our model may be
unacceptable
 if more than 5% of cases have standardized residuals with an absolute value greater than
1,96 then the model may be a poor representation of the data

Influential cases

There are several statistics used to assess the influence of a case.

 adjusted predicted value
the predicted value of the outcome for that case from a model in which the case is
excluded.
If the model was stable, then the predicted value of a case should be the same regardless
of whether that case was used to estimate the model
 Deleted residual
the difference between the adjusted predicted value and the original observed value.
 studentized deleted residual
the deleted residual divided by the standard error
 Cook’s distance
a measure of the overall influence of a case on the model
 the leverage
gauges the influence of the observed value of the outcome variable over the predicted
values
 Mahalanobis distances
measure the distance of cases from the mean(s) of the predictor variable(s)
 to look at how the estimates b in a model change as a result of excluding a case

DFBeta: the difference between a parameter estimated using all cases and estimated when one
case is excluded.
DFFit: the difference between the predicted values for a case when the model is estimated
including or excluding that case.

, Covariance ratio (CVR): quantifies the degree to which a case influences the variance of the
regression parameters.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper ninaaaaa. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,29. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 50990 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Samenvatting statistiek blok 2 Pre-Master Zuyd (EN)

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?