Samenvatting Applied Statistics- Rebecca M. Warner
H2
Name of statistic Sample statistic Corresponding population parameter
Mean M (or X-bar) µ (greek letter Mu)
Standard deviation s or SD σ (greek letter sigma)
Variance s2 σ2
Distance of individual X
score from the mean z = (X-X-bar)/SD z = X-µ/σ
Standard error of the
sample mean SEX-bar σX-bar
We use the sample mean, X-bar, for a small random sample to estimate the population mean, µ, for
a larger population.
Sampling error: variation in values of the sample mean, X-bar, across different batches of data that
are randomly sampled from the same population. Therefore, X-bar, for a single sample is not likely to
be exactly correct as an estimate of µ, the unknown population mean.
Confidence interval: include info about the magnitude of sampling error.
2.2 research example: Description of a Sample of HR Scores
SPSS has a procedure that allows the data analyst to select a random sample of cases from a data
file: the data analyst can specify either the percentage of cases to be included in the sample or the
number of cases (N) for the sample.
Variable View: the names of the variables are listed in the first column. Other cells provide info
about the nature of each variable.
When you compute a histogram in SPSS then move the variables with the arrow to the small window
on the right-hand side under the heading Variable.
Ruler Icon: indicates that scores on this variable are scale level of measurement.
To select a random sample of size N=9 from the entire population of 130 scores, you can click on
data select cases. Here you can click the radio button for Random sample of cases.
Then click the sample button: opens Select Cases, Random Sample Dialog and choose ‘Exactly’. Then
you have the command: ‘Randomly select exactly 9 cases from the first 130 cases’.
2.3 Sample Mean (M or X-bar)
Sample mean: provides information about the size of a ‘typical’ score in a sample.
Interpretation sample mean:
- Corresponds to the center of a distribution of scores in a sample
- Provides with one kind of info about the size of a typical X score
Scores in a sample can be represented as X1, X2, X3….Xn, where N is the number of observations or
participants and Xi is the score for participant number i. (the i subscript is used only when omitting
subscripts would create ambiguity about which scores are included in a computation). Sample mean,
M or X-bar, is obtained by: M = ƩX/N
Size of ƩX depends on 2 things:
1. Magnitudes (relative size of an object) of the individual X scores
2. And N, the number of scores
ƩX increases when the values of individual X scores are increased and positive and N is held constant,
and when all the X scores are positive and N gets larger.
,This formula tells the following:
1. What info is the sample statistic M based on? It is based on the sum of the Xs and the N of
cases in the sample.
2. Under what circumstances will the statistic M turn out to have a large or small value? M is
large when the individual X scores are large and positive, because we divide by N when
computing M to correct for sample size, the magnitude of M is independent of N.
SPSS Descriptive Statistics: Frequencies procedure was used to obtain the sample mean and other
simple descriptive statistics for the set of scores To see a distribution of frequencies and also
obtain simple descriptive statistics such as the sample mean, M.
The deviation of each score from the sample mean (X-M) is the prediction error that arises if M is
used to estimate that person’s score; the magnitude of error is given by the difference X-M.
How can we summarize info about the magnitude of prediction errors across persons in the sample?
1. Summing the X-M deviations across all the persons in the sample the sample mean, M, is
the value for which the sum of the deviations across all the scores in a sample equals 0. So,
using M to estimate X for each person in the sample results in the smallest possible sum of
prediction errors.
a. Avoid problem that the sum of the deviations always equals 0: Square the prediction
errors or deviations and then sum these squared deviations, the resulting term Ʃ(X-
M)2 is a number that gets larger as the magnitudes of the deviations of individual X
values from M increase.
2. M is the value for which the sum of squared deviations (SS), Ʃ(X-M)2 is minimized. The
sample mean is the best predictor of any randomly selected person’s score because it is the
estimate for which prediction errors sum to 0, and it is also the estimate that has the
smallest sum of squared prediction errors. Ordinary least squares: statistic meets the
criterion for best OLS estimator when it minimizes the sum of squared prediction errors. This
is the best estimate!
Mode: the score value that occurs most often
Median: rank ordering the scores in the sample from lowest to highest and then counting the scores.
The score that has half the scores above it and half the scores below it is the median.
The presence of a high extreme score has little effect on the size of the median, but makes a
substantial difference in the size of the sample mean, M. the mean is less robust to extreme scores
or outliers then the median.
Analysis of Variance (ANOVA): use group means and deviations from group means as the basic
building blocks for computations. ANOVA assumes that the scores on the quantitative outcome
variable are normally distributed.
2.4 Sum of Squared Deviations (SS) and Sample Variance (S2)
The sample variance provides summary info about the distance of individual X scores from the mean
of the sample. The magnitude of the deviation tells us whether a score is relatively close or far from
the sample mean.
1. Distance from each individual X score of the sample mean: Xi – M
a. Deviation is above M (positive) or below M (negative)
2. Obtain numerical index of variance, summarize info about distance from the mean across
subjects: Ʃ(Xi – M)
, a. Summing squared deviations is avoiding the problem of the uninformative sum
3. SS= Ʃ(Xi – M)2, SS has a minimum possible value of 0: in situations where all the X scores in a
sample are equal to each other and therefore also equal to M. This value has no upper limit.
Other factors being equal, SS tends to increase when:
1. The number of squared deviations included in the sum increases
2. The individual Xi – M deviations get larger in absolute value.
Other Equation is SS = Ʃ(X)2 – *(ƩX)2 / N]
Less rounding error
The minimum possible value of SS occurs when all the X scores are equal to each other and,
therefore, equal to M. SS values tend to be larger when they are based on large numbers of
deviations and when the individual X scores have large deviations from the mean, M. SS tends to be
larger when the number of squared deviations included in the sum is large.
2.5 Degrees of Freedom (df) for a Sample Variance
READ SUMMARIES!!!!!!!
H6 One-way between-subjects analysis of variance
6.1 Research situations where one-way between-subjects analysis of variance ANOVA is used
A one-way between-subjects (between-S) analysis of variance (ANOVA) is used in research situations
where the researcher wants to compare means on a quantitative y outcome variable across two or
more groups.
ANOVA is a generalization of the t-test. T-test provides info about the distance between the means
on a quantitative outcome variable for just two groups, whereas a one way ANOVA compares means
on a quantitative variable across any number of groups.
Design is non-experimental, when the means of naturally occurring groups are compared.
Design is experimental, when groups are formed by researcher and he administers a different type
or amount of treatment to each group while controlling extraneous variables.
Between-S: each participant is a member of one group and that the members of samples are not
matched of paired.
When a study consist of repeated measures or paired or matched samples, a repeated measures
ANOVA is required.
Factorial ANOVA: more than one categorical variable or factor is included in the study.
In ANOVA the categorical predictor variable is called a factor, the groups are called the levels of the
factor.
Doing a large number of significance tests leads to an inflated risk for Type 1 error. If a study includes
k groups, there are k(k-1)/2 pairs of means.
Omnibus test: examines all the comparisons in the study of a set. limits risk of Type 1 error.
F-test in a one-way ANOVA provides a single omnibus test of the hypothesis that the means of all k
populations are equal in place of many t tests for all possible pairs of groups.
F-test: used to assess differences for a set of more than two group means, and some of the more
popular procedures for follow-up comparisons among group means.
Null hypothesis for one-way ANOVA is that the means of the k populations that correspond to the
groups in the study are all equal: H0:µ1 = µ2 … = µk