Samenvatting

Summary Statistical Methods for the Social Sciences

96 keer bekeken 6 keer verkocht

Vak
Statistiek 2

Instelling
Rijksuniversiteit Groningen (RuG)

This is a summary of the book Statistical Methods for the Social Sciences (fifth edition) by Alan Agresti. This book is used by Statistics 2, in the second year of studying Psychology at the University of Groningen.

[Meer zien]

Laatste update van het document: 4 jaar geleden

Voorbeeld 3 van de 18 pagina's

Bekijk voorbeeld

Heel boek samengevat? Nee
Wat is er van het boek samengevat? H9 tm/ h12
Geupload op 12 december 2020
Bestand laatst geupdate op 30 december 2020
Aantal pagina's 18
Geschreven in 2020/2021
Type Samenvatting

statistiek 2 rug
statistics 2 rug
bachelor psychologie
bachelor psychology
university of groningen
correlation
regression
anova
multiple regres
rijksuniversiteit groningen
statistics for social scienc

Volgen

isabelvdb Lid sinds 4 jaar 530 documenten verkocht

Statistical Methods for the Social Sciences
Chapter 9; Linear Regression and Correlation
Regression analysis Methods for analyzing association between quantitative response and
explanatory variables.
We present three different, but related, aspects of regression analysis;

1. We investigate whether an association exits between the variables by testing the hypothesis
of statistical independence
2. We study the strength of their association using the correlation measure of association
3. We estimate a regression equation that predicts the value of the response variable from the
value of the explanatory variable

9.1; Linear Relationships
Linear function The formula y= α + βx expresses observations on y as a linear function
of observations on x. The formula has a straight-line graph with slope
β (beta) and y-intercept α (alpha).
Y= response variable and x= explanatory variable

We analyze how values of y tend to change from one subset of the population to another, as defined
by values of x.

At x=0, the equation y= α + βx simplifies to y= α + βx = α + β (0) = α.

The slope β equals the change in y for a one-unit increase in x. The larger the absolute value of β, the
steeper the line.

- When β is positive, y increases as x increases- the straight line goes upwards. When a
relationship between two variables follows a straight line with β>0, the relationship is said to
be positive.
- When β is negative, y decreases as x increases. The straight line then goes downward, and
the relationship is said to be negative.

When β=0, the graph is a horizontal line. The value of y is constant and does not vary as x varies, the
two variables are independent.

An association does not imply causation.

9.2; Least Squares Prediction Equation
Scatterplot A plot of the n observations as n points. The scatterplot provides a
visual check of whether a relationship is approximately linear.
Regression outlier The point falls quite far from the trend that the rest of the data
follow. The line seems to be pulled up toward that point and away
from the center of the general trend of points. An observation is
called influential if removing it results in a large change in the
prediction equation. Unless the sample size is large, an observation
can have a strong influence on the slope if its x-value is low or high
compared to the rest of the data and if it is a regression outlier.
Residual For an observation, the difference between an observed value and the
predicted value of the response variable, y− y is called the residual.
The prediction errors are called residuals.
Least squares estimate The least squares estimates a and b are the values that provide the

, prediction equation ^y =a+bx for which the residual sum of squares,
SSE= ∑ ( Y − ^y )2, is a minimum.
To estimate the line y= α + βx we use ^y =a+bx . This formula is called the prediction equation.

Σ ( x−x ) ( y− y )
b= 2
Σ ( x−x )
a= y−b x
- A positive residual results when the observed value y is larger than the predicted value y , so
y− y> 0.
- A negative residual results when the observed value is smaller than the predicted value. The
smaller the absolute value of the residual, the better is the prediction, since the predicted
value is closer to the observed value.

In a scatterplot, the residual for an observation is the vertical distance between its point and the
prediction line.

We summarize the size of the residuals by the sum of their squared values. This quantity, denoted by
SSE (sum of squared errors), is SSE= Σ ( y−^y )2. The better the prediction equation, the smaller the
residuals tend to be and, hence, the smaller SSE tends to be.

The prediction line ^y =a+bx is called the least squares line, because it is the one with the smallest
sum of squared residuals.

The least squares line;

- Has some positive residuals and some negative residuals, but the sum (and mean) of the
residual equals 0
- Passes through the point ( x , y )

9.3; The Linear Regression Model
Deterministic For the linear model y=α + βx , each value of x corresponds to a
single value of y. Such a model is said to be deterministic. It is
unrealistic in social science research, because we do not expect all
subjects who have the same x-value to have the same y-value,
instead, the y-values vary.
Conditional distribution It is the conditional distribution of the y-values at x=12. A separate
conditional distribution applies for those with x=13.
Probabilistic model A probabilistic model for the relationship allows for variability in y at
each value of x.
Expected Value of y Let E(y) denote the mean of a conditional distribution of y. The symbol
E represents expected value.
Regression function A regression function is a mathematical function that describes how
the mean of the response variable changes according to the value of
an explanatory variable.
Conditional standard The linear regression model has an additional parameter σ describing
deviation the standard deviation of each conditional distribution. That is, σ
measures the variability of the y-values for all subjects having the
same x-value.
An equation of the form E ( Y )=α + βx that relates values of x to the mean of the conditional
distribution of y is called a regression function.

, The function E ( Y )=α + βx is called a linear regression function, because it uses a straight line to
relate the mean of y to the values of x.

The estimate of σ uses SSE= Σ ( y−^y )2, which measures sample variability about the least squares
2
line. The estimate is s= SSE = Σ y− y .
√ n−2
(
√
^)
n−2
The term (n – 2) in the denominator of s is the degrees of freedom (df) for the estimate. When a
regression equation has p unknown parameters, then df = n – p. The equation E ( y )=α +β x has two
parameters (α ∧β ), so df= n – 2.
2
Σ ( y− y ) .It differs from the
Estimate of the population standard deviation of a variable y is
√
standard deviation of conditional distribution of y, for a fixed value of x.
n−1

9.4; Measuring Linear Association; The Correlation
Correlation The correlation between variables x and y, denoted by r, is
Σ ( x−x )( y − y )
r=
❑
√ ( Σ ( x−x ) )( ∑ ( y− y )2 )
2

Correlation is a The correlation relates to the slope b of the prediction equation
standardized slope sx
^y =a+bx by r = ( )
sy
b.

The slope b of the prediction equation tells us the direction of the association. The slope does not
directly tell us the strength of the association. The slope is useful for comparing effects of two
predictors having the same units.
2
Σ ( x−x )
sx=
√ n−1
2
∑ ( y− y )
sy=
√ n−1
If the sample spreads are equal ( s x =s y ), then r=b. Because of the relationship between r and b, the
correlation is also called the standardized regression coefficient for the model E ( y )=α+ β x .

- Correlation is valid only when a straight-line model is sensible for the relationship between x
and y. Since r is proportional to the slope of a linear prediction equation, it measures the
strength of the linear association
- −1 ≤r ≤ 1. The correlation, unlike the slope b, must fall between -1 and +1
- R has the same sign as the slope b. This holds because their formulas have the same
numerator, relating to covariation of x and y, and positive denominators. Thus, r>0 when the
variables are positively related, and r<0 when the variables are negatively related.
- R=0 for these lines having b=0. When r=0, there is not a linear increasing or linear decreasing
trend in the relationship.
- r =±1 when all the sample points fall exactly on the prediction line these correspond to
perfect positive and negative linear associations.
- The larger the absolute value of r, the stronger the linear association.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper isabelvdb. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €2,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 47561 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Summary Statistical Methods for the Social Sciences

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?