Samenvatting

Summary BBS2007

Name: Summary BBS2007
SKU: doc_424481
Rating: 4.15 (13 reviews)
Author: sammyhermans

13 beoordelingen

53 keer verkocht

Vak
BBS2007 Statistics

Instelling
Maastricht University (UM)

This PDF has two parts: it starts with theory: the first - 16 pages are the topics from year 1 shortly repeated. Next, summaries of the lectures (2) follow, and next, the chapters indicated in the course book from Andy Field are summerized. In these summaries I added some extra info or images from...

[Meer zien]

Laatste update van het document: 6 jaar geleden

Voorbeeld 9 van de 157 pagina's

Bekijk voorbeeld

Geupload op 17 mei 2018
Bestand laatst geupdate op 4 juni 2018
Aantal pagina's 157
Geschreven in 2017/2018
Type Samenvatting

13 beoordelingen

Door: tyrahaverlag • 4 jaar geleden

Door: EvaBrkhuizen • 5 jaar geleden

Door: kelsywaaijenberg • 6 jaar geleden

Door: samgijbels14 • 6 jaar geleden

Door: sinisterdino • 6 jaar geleden

Door: andreeabiancanilas • 6 jaar geleden

Door: casswinkels • 6 jaar geleden

Bekijk meer beoordelingen

Volgen

sammyhermans Lid sinds 8 jaar 340 documenten verkocht

€10,99

In winkelwagen

Opslaan

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

BBS2007 STATISTICS
Summary topics year 1, Andy field chapters as indicated in course book,
lectures and additional info from movies

17 MEI 2018
SAMMY HERMANS
FHML

,Contents
Review basics from year 1 ............................................................................................................................... 4
Central tendency (average) ......................................................................................................................... 4
Measures of spread ..................................................................................................................................... 4
Types of variables ........................................................................................................................................ 4
Discrete and continuous variables............................................................................................................... 5
Histograms .................................................................................................................................................. 5
Interquartile range IQR, the middle 50% ..................................................................................................... 5
Normal distributions, skewness and kurtosis ............................................................................................ 5
Standard deviation ..................................................................................................................................... 6
Normal distribution / Gaussian distribution .............................................................................................. 6
Z scores........................................................................................................................................................ 6
Skewness ..................................................................................................................................................... 6
Kurtosis ....................................................................................................................................................... 6
Box plots ..................................................................................................................................................... 7
Correlation .................................................................................................................................................. 7
Simple linear regression ............................................................................................................................. 7
Least square regression line ....................................................................................................................... 8
Interpreting parameters ............................................................................................................................. 8
R2 ................................................................................................................................................................. 8
Central limit theorem ................................................................................................................................. 8
Sampling distribution of the sample mean................................................................................................ 8
Confidence intervals ................................................................................................................................... 9
Pearson’s chi square test ............................................................................................................................ 9
Standard error ........................................................................................................................................... 10
Hypothesis testing ..................................................................................................................................... 10
The Z test for one mean The Z test is used when the population standard ............................................. 11
Confidence interval ................................................................................................................................... 11
The T test for one mean ............................................................................................................................ 11
The P value ................................................................................................................................................ 11
Type I error: rejecting H0 while it is actually true ...................................................................................... 12
Type II error: failing to reject H0 while in reality it is false ........................................................................ 12
Power (1-β) ................................................................................................................................................ 12
Relative risk and odds ratio ....................................................................................................................... 12
Statistical significance VS practical significance ........................................................................................ 13
Confidence intervals and P values............................................................................................................. 13

1

, Interference for two means ...................................................................................................................... 13
Pooled variance t- test and confidence interval ....................................................................................... 13
The un-pooled t-test and confidence interval........................................................................................... 14
Paired difference t procedure ................................................................................................................... 14
Interference on the slope .......................................................................................................................... 14
Anscombe’s quartet .................................................................................................................................. 15
Simple linear regression checking assumptions with residual plots ......................................................... 15
Assumptions of the linear model .............................................................................................................. 15
Effect size and Bonferroni correction........................................................................................................ 15
Levene’s test.............................................................................................................................................. 16
Lecture 1........................................................................................................................................................ 16
Lecture 2 logistic regression analysis ............................................................................................................ 17
Summaries theory year 2 .............................................................................................................................. 22
Odds ratio .................................................................................................................................................. 22
Logistic regression ..................................................................................................................................... 22
Chapter 8, Andy field, regression .................................................................................................................. 23
Simple linear regression ............................................................................................................................ 23
Bias in regression models .......................................................................................................................... 24
Influential cases ......................................................................................................................................... 25
Generalizing the model ............................................................................................................................. 26
Cross validation of the model ................................................................................................................... 26
Sample size in regression .......................................................................................................................... 26
Multiple regression ................................................................................................................................... 26
Comparing models .................................................................................................................................... 27
Multi co-linearity ....................................................................................................................................... 27
Regression in SPSS ..................................................................................................................................... 29
Chapter 10 Andy field: moderation ............................................................................................................... 30
Recode function ........................................................................................................................................ 30
Comparing several means: ANOVA (general linear model 1) ....................................................................... 31
Post hoc procedures.................................................................................................................................. 32
ANOVA in spss (one way ANOVA) ............................................................................................................. 32
Interaction in SPSS..................................................................................................................................... 34
Chapter 12 Andy field ANCOVA analysis of co-variance (GLM2) .................................................................. 34
ANCOVA in SPSS ........................................................................................................................................ 36
Chapter 13 Andy field, Factorial ANOVA (GLM3) .......................................................................................... 38
F ratio’s : each effect in two way ANOVA has its own F ratio. .................................................................. 39

2

, Assumptions for factorial ANOVA ............................................................................................................. 39
Contrasts ................................................................................................................................................... 39
Post hoc tests, Tukey and Bonforetti & simple effect analysis ................................................................. 39
Factorial ANOVA in SPSS ........................................................................................................................... 40
Interaction graphs & bar charts ................................................................................................................ 40
Andy field Chapter 19 logistic regression ...................................................................................................... 42
Assessing the fit of the model ................................................................................................................... 42
Odds ratio= eB or exp (B) ........................................................................................................................... 43
Parsimony .................................................................................................................................................. 43
Assumptions .............................................................................................................................................. 43
Incomplete information from the predictors ............................................................................................ 43
Complete separation ................................................................................................................................. 43
Overdispersion→ SE too small .................................................................................................................. 43
Building a model ........................................................................................................................................ 44
SPSS ........................................................................................................................................................... 44
Output ....................................................................................................................................................... 45
Predicting several categories: multinomial logistic regression ................................................................. 46
Chapter 20, Andy Field: multilevel linear models ......................................................................................... 48
Random/ fixed intercept and random/fixed slope models ....................................................................... 48
Assessing the fit & comparing multivariate models .................................................................................. 49
Covariance structure types ....................................................................................................................... 49
Assumptions .............................................................................................................................................. 49
Sample and power..................................................................................................................................... 50
Centering predictors.................................................................................................................................. 50
Growth models .......................................................................................................................................... 50
Mixed models in spss .................................................................................................................................... 50
Syntax codes .............................................................................................................................................. 51
Shortly, formula’s+ most important topics ................................................................................................... 54

3

,Review basics from year 1
Central tendency (average)
Representation of one number out of a data set. The central tendency is the center of the distribution of
the data set.
• Arithmetic mean: sum of all numbers divided by the number of numbers there are
• Median: middle number from the data set when the data is arranged from small to large
numbers: 2,2,3,4,4,1,0→ 4 is the median. 2,3,4,5→ (3+4)/2=3,5 is median.
The median is more representative when there are outliers in your data, since it is less sensitive
to one or more numbers at the extremes. (the arrhythmic mean is sensitive for this).
• Mode: most frequent number in the data set. If there is no number ‘more frequent‘ it loses its
meaning.

Measures of spread
The more disperse, the further the data is away from the mean.
dispersion: How far away from the centre?
• Range: spread between the largest and the smallest number: largest-smallest number. The larger
the range, the more disperse the data set.
• Variance σ2: (data point-mean)2 + (data point-mean)2 + (data point-mean)2 etc.
number of data points
So, first you calculate the mean of the data set, and subtract this from each data point
• Standard deviation σ= √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 units are nicer than for variance.

Types of variables
The higher the level of measurement, the more statistical tests can be performed on the data. From low
to high:
• Nominal: categories are discrete, each category has a criteria that a variable either has or does
not have. The numbers have no meaning and are simply labels. Examples are hair color, type of
religion, type of car. It is categorical or qualitative data, not numerical data and the categories
have NO ORDER.
o Dichotomous: data falls into one of the two categories, e.g. male or female, yes or no.
o Categorical: more than two possible values, e.g. group membership.
Possible statistical tests are; mode, modal percentage, frequency distribution and range.
• Ordinal: ordered categories, used to establish a ranking/order, there is NO absolute zero, it is
unknown how much distance there is between each category, intervals between the categories
are not necessarily equal. E.g. ranking of favorite sport, socioeconomical status, size.
Possible statistical tests: all nominal level tests, median, percentile, semi quartile range, rank
order coefficients of correlation (mean is not meaningful since the distances between values do
not have a meaning).
• Interval: ordered, equal distances between the values (difference is known and has value)< the
zero is arbitrary (it does not mean that zero is nothing, zero is just an additional measurement
point, there are numbers below and above zero: zero degrees Celsius does not mean that there is
no temperature). E.g. school test grade, temperature, time. Possible statistics are: all ordinal
levels statistics, mean, standard deviation, addition and subtraction
• Ratio: ordered, there is an absolute zero and known intervals between the data. 0 means there is
absence of the characteristic being measured. E.g. weight, height, pulse, blood pressure, time,
degrees kelvin. Since there is an absolute 0, it is possible to compare values and divide (ratio’s)
values. Any statistical test can be performed.

4

, Zero means Yes→ ratio
Yes none?
No→interval
Equeally
Yes
spaced?
No, oridnal
Ordered?
No, nominal

nominal ordinal Interval Ratio
Mode Yes Yes Yes Yes
Median No Yes Yes Yes
Mean No No Yes Yes
Frequency distribution Yes Yes Yes Yes
Range no Yes Yes Yes
Add & substract No No Yes Yes
Multiply & divide No No No Yes
Standard deviation no No Yes yes
Money: interval or ratio: you can have a negative balance, but zero also means no money.

Discrete and continuous variables
Discrete variables; distinct or separate values, you could count the number of values. E.g. the year you
are born in, head=0 or tail=1, numbers of ants born tomorrow. As long as you can literally list the values/
as long as you can count it.
Continuous variables: any value in an interval, you cannot list the values. E.g. the exact mass of an
animal/ the exact winning time: this can have any value: e.g. 9,571 or 9,571547→ continuous because it
can have any value in between.

Histograms
Quantitative, numeric (at least interval level) data can be visualized in a histogram. There
are no gaps between the bars and the bar width corresponds to the bin size/class size,
the y-axis corresponds to the frequency. Put the data in order, and write down the
frequency of each number.

Interquartile range IQR, the middle 50%
Quartiles; the values that divide a list of numbers into quarters. The lower quartile is the
median of the lower half of the data set, the upper quartile is the median of the higher half of the data
set. Interquartile range: the range in the middle of the data, difference between the lower and upper
quartile.
Finding the interquartile range:
1) Order data set from small to large numbers.
2) Find the median of the ordered set Q2. 25% 25% 25% 25%
3) split the data in halves (at Q2) (when the data set has uneven
numbers→ do not include Q2 in the halves. Q1 Q2 Q3
4) find the median at the first half of the data set (lower quartile Q1)
5) find the upper quartile (Q3)
6) IQR= upper-lower quartile / Q3-Q1

Normal distributions, skewness and kurtosis
When data is not normal, be careful what statistical tests you use. When
data is normally distributed, measures of central tendency (mean, median
and mode) fall at the same midline point. (they are all equal). A normal
distribution is:
• Unimodal (one peak)
• Symmetrical and bell shaped curve Mean
median
mode
5

,Standard deviation: measure of variation in a distribution.
A low standard deviation means that all values are close to
the mean and a high standard deviation means that the
values are spread out over a larger range.

∑(𝑥−𝑀)2
SD=√ 𝑛−1

• 1 standard deviation above the mean: 34% of the
values are here.
• 68% of the scores fall between 1 standard
deviation (above and below) the mean.
• 95% of the scores falls between 2 standard deviations above and 2 standard deviations below the
mean.
• 99,7% of the scores fall between 3 standard deviations above and below the mean.

Normal distribution / Gaussian distribution
Continuous probability distribution
• µ= mean of the distribution which can take on any value (∞), around which the distribution is
symmetric.
• σ= standard deviation and is always bigger than 0, greater SD, lower peak (more tail area).
• σ 2= variance
• If X is a random variable that has a normal distribution, with mean µ and variance σ2 we write this
as 𝑋~𝑁(µ, σ2 )
• Below the area of the curve, there are probabilities.

Z scores: to measure how many standard deviations above or below the mean a particular score is.
𝑠𝑐𝑜𝑟𝑒−𝑀
𝑧= 𝑆𝐷
(score minus the mean divided by standard deviation) the shape of a Z-score distribution
does not change, only mean = 0 and SD= 1: this is the “standard normal distribution”. 𝑍~𝑁(0, 1 )
Z=0,34, means it is 0,34 SD above the mean. Z= -2,4 means it is 2.4 SD
below the mean. A Z-score < - 1,96 or a Z-score > 1,96 is significant.
Modality of a distribution: depends on the amount of peaks.
𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛
Skewness: 𝑆𝐷
the peak of the distribution if off center,
mode mode
and one tail is longer than the other tail. Skewness= 0, then the median
median
mean = median = mode and it is a perfect normal distribution. mean
mean
Positive skewness: mode<median<mean. There is

a pile up of data to the left, and a tail on the right side.
Negative skewness: pile up of data on the right, tail on the left,
mean <median<mode (the mean is to the left of the median). The
tail points into the direction of the skew.

Kurtosis: measure of the shape of the curve, normal, flat or
peaked.
-mesokurtic: normal distribution, coefficient is 0
-leptokurtic: sharp peak with flat tails, less variability (tall and

6

,skinny curve), coefficient is a large positive number.
-platykurtic: flattened and highly disperse, coefficient k<0 (negative) shorter
peak and fatter curve.

Box plots
• You can read the max value from the data set and the minimum
• These statements can’t be confirmed by a boxplot: there is only one
seven year old at the party, exactly half of the students are older than
13. You can’t know this since you do not know whether the data set
contained an even number or not.
A boxplot can also visualize skew

Correlation
A correlation coefficient is used to summarize
a scatterplot into on single number between -
1 and 1. It also shows whether the shape of
the line is pointing upwards or downwards. A
correlation coefficient of 0 means it is a
straight line.
The higher the correlation, the better the line
fits the data. The correlation coefficient is
determined by looking at the scatter around
the Y axis and to the scatter around the fitted line.
• The more scatter there is along the Y axis compared to the fitted line, the higher / stronger the
correlation coefficient.
• The larger the scatter around the fitted line, compared to the Y axis, the smaller the correlation
coefficient is.
• When the scatter around the line and the Y axis is equal, the correlation coefficient is 0.

Simple linear regression
Regression analysis explores the relationship between a quantitative response variable and one or more
explanatory variables. Simple linear regression uses only one explanatory variable. Otherwise it is called
multiple regression. Y is the dependent variable, the variable we are predicting and X is the independent
variable, the variable we use to predict Y.
The expectation for Y for a given value of X: 𝐸(𝑦/𝑥) = 𝛽0 + 𝛽1 ⋅ 𝑥 . But since values vary along the line,
we add an error component: y=𝛽0 + 𝛽1 ⋅ 𝑥 + 𝜀
Parameters: 𝛽0 𝑎𝑛𝑑 𝛽1 they need to be estimated: 𝑦̂ is the estimated/predicted value of y. 𝛽̂0 and 𝛽̂1 are
the estimated parameters from the sample data. This is done via the method of least squares.

7

, Least square regression line
residual: observed value-predicted value/ ei = Yi - 𝑦̂i
when the point is below the line, negative residual and
when the point is above the line, positive residual. The best
fitting line as a low number in absolute value of all the
residuals. (sum of total distances between each point and the
line must be minimized).

Interpreting parameters
• 𝛽0 = intercept, when X is zero, 𝛽0 =y
• 𝛽1 = slope, if x increases by 1, then y increases by 𝛽1
• σ2 = true variance of y at any given value of x. the variance of y is the same at every value of x. s2=
∑(𝑥̅ −𝜇)2
the estimated variance of y at any value of x. σ2 = 𝑛
• sample mean: 𝑥̅
• population mean: µ

R2
R2 tells us how well a regression line predicts of estimates actual values. It tells, how much variance in y is
accounted for the model (x). When R2=0,32 this meas 32 % of x can be accounted for y.
R2 =1 means it is a perfect fit, when there is a large distance between the actual and estimated value, R2
gets smaller (approaches 0), when the actual and estimated values are very close together, R2 approaches
0.
∑(𝑦̂−𝑦̅)2
R2 = 𝑦̂ = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑦, 𝑦 = 𝑡ℎ𝑒 𝑎𝑐𝑡𝑢𝑎𝑙 𝑦 𝑎𝑛𝑑 𝑦̅ 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑦
∑(𝑦−𝑦̅)2
Standard error of the estimate: the estimated and actual difference between the two of them

Central limit theorem
When you take samples form an original distribution, and you plot those sample means and frequencies,
you will approach a normal distribution.
Sample 1: [1,1,3,6] → mean is 2,25. Sample 2: [1,2,6,6] → mean is 3,5 sample 3: [3,4,3,1] → mean is 2,72.
So if you plot those mean values you will get a normal distribution.
• The bigger the sample size, the closer the means will be together, the smaller the SD and the
better it approaches a normal distribution.
• The central limit theorem states that the bigger the sample size, when plotting the mean and the
frequencies of each sample, you get a more normal distribution. When the sample size is infinite,
you have a perfect normal distribution. The larger n, the better it approaches a normal
distribution.
• If we are sampling from a distribution that is not normal, the sample mean will be approximately
normally distributed, provided the sample size is large enough. Since µ is mostly unknown, this
helps to estimate.

Sampling distribution of the sample mean
It has been derived from an original distribution. Your original distribution and the sampling distribution
will have the same mean. When the sample size is larger, the skew and kurtosis is lower and there is a
smaller SD (tighter fit around the mean).
• Mean of the sampling distribution is 𝑥̅ is always equal to the population mean µ, but the SD is
smaller. The larger the sample size, to more accurately we can estimate a parameter.
𝜎
• SD of the sampling distribution 𝜎𝑥̅ = 𝑛 (standard deviation of the population / sample size root)
√

8

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.