Quantitative Data Analysis 2 (QDA2) Book Summary - GRADE 9,0
58 views 1 purchase
Course
Quantitative Data Analysis 2 (6012B0423Y)
Institution
Universiteit Van Amsterdam (UvA)
Book
Discovering Statistics Using IBM SPSS
Quantitative Data Analysis 2 (QDA2) summary of Andy Field's "Discovering Statistics Using IBM SPSS Statistics" (2017). The summary is 94 pages and includes all of the material covered in the course 6012B0423Y at the UvA.
,Quantitative Data Analysis 2: Discovering Statistics Using IBM SPSS Statistics (Andy Field)
Chapter 12: GLM 1: Comparing Several Independent Means
Using a Linear model to Compare Several Means
• If we include a predictor variable containing two categories into the linear model then the resulting b
for that predictor compares the difference between the mean score for the two categories.
• If we want to include a categorical predictor that contains more than two categories, this can be
achieved by recoding that variable into several categorical predictors each of which has only two
categories (dummy coding).
o When we do, the bs for predictors represent differences between means.
o Therefore, if we’re interested in comparing more than two means we can use the linear
model to do this.
• We test the overall fit of a linear model with an F-statistic, we can do the same here: we first use an F
to test whether we significantly predict the outcome variable by using group means (which tells us
whether, overall, the group means are significantly different) and then use the specific model
parameters (the bs) to tell us which means differ from which.
• The ‘ANOVA’ to which some people allude is simply the F-statistic that we encountered as a test of
the fit of a linear model, it’s just that the linear model consists of group means.
• Simple randomized controlled trial (as used in pharmacological, medical and psychological
intervention trials): people are randomized into a control group or groups containing the active
intervention.
• We’ve seen that with two groups we can use a linear model, by replacing the ‘model’
in equation (12.1) with one dummy variable that codes two groups (0 for one group and 1 for the
other) and an associated b-value that would represent the difference between the group means.
• We have three groups here, but we’ve also seen that this situation is easily incorporated into the
linear model by including two dummy variables (each assigned a b-value), and that any number of
groups can be included by extending the number of dummy variables to one less than the number of
groups.
• When we use dummy variables we assign one group as the baseline category and assign that group a
zero code on all dummy variables.
o The baseline category should be the condition against which you intend to compare the
other groups.
o In most well-designed experiments there will be a group of participants who act as a control
for the other groups and, other things being equal, this will be your baseline category –
although the group you choose will depend upon the particular hypotheses you want to test.
o In designs in which the group sizes are unequal it is important that the baseline category
contains a large number of cases to ensure that the estimates of the b-values are reliable.
• The dummy variables can be coded in several ways, but the simplest way is to use dummy coding.
o The baseline category is coded as 0 for all dummy variables.
o Using this coding scheme, each group is uniquely expressed by the combined values for the
two dummy variables.
• When we are predicting an outcome from group membership, predicted values from the model are
the group means.
o If the group means are meaningfully different, then using the group means should be an
effective way to predict scores.
• The F-test is an overall test that doesn’t identify differences between specific means.
o However, the model parameters (the b-values) do.
§ As we just discovered, the constant (b0) is equal to the mean of the base category
(the control group).
§ The b-value for the first dummy variable (b1) is equal to the difference between the
means of the first group and the control group.
§ Finally, the b-value for the second dummy variable (b2) is equal to the difference
between the means of the second group and the control group.
• We can extend this three-group scenario to four groups.
o As before, we specify one category as a base category (a control group) and assign this
category a code of 0 for all dummy variables.
,Quantitative Data Analysis 2: Discovering Statistics Using IBM SPSS Statistics (Andy Field)
o The remaining three conditions will have a code of 1 for the dummy variable that describes
that condition and a code of 0 for the other dummy variables.
• Logic of the F-Statistic
o The F-statistic (or F-ratio) tests the overall fit of a linear model to a set of observed data.
o F is the ratio of how good the model is compared to how bad it is (its error).
o When the model is based on group means, our predictions from the model are those means.
o If the group means are the same then our ability to predict the observed data will be poor (F
will be small), but if the means differ we will be able to better discriminate between cases
from different groups (F will be large).
§ So, in this context F basically tells us whether the group means are significantly
different.
o We can apply the same logic as for any linear model:
§ The model that represents ‘no effect’ or ‘no relationship between the predictor
variable and the outcome’ is one where the predicted value of the outcome is
always the grand mean (the mean of the outcome variable).
§ We can fit a different model to the data that represents our alternative hypotheses.
• We compare the fit of this model to the fit of the null model (i.e., using the
grand mean).
§ The intercept and one or more parameters (b) describe the model.
§ The parameters determine the shape of the model that we have fitted; therefore,
the bigger the coefficients, the greater the deviation between the model and the
null model (grand mean).
§ In experimental research the parameters (b) represent the differences between
group means.
• The bigger the differences between group means, the greater the
difference between the model and the null model (grand mean).
§ If the differences between group means are large enough, then the resulting model
will be a better fit to the data than the null model (grand mean).
• If this is the case we can infer that our model (i.e., predicting scores from
the group means) is better than not using a model (i.e., predicting scores
from the grand mean).
• Put another way, our group means are significantly different from the null
(that all means are the same).
o We use the F-statistic to compare the improvement in fit due to using the model (rather than
the null, or grand mean, model) to the error that still remains.
§ In other words, the F- statistic is the ratio of the explained to the unexplained
variation.
§ We calculate this variation using sums of squares.
• Total Sum of Squares
o To find the total amount of variation within our data we calculate the difference between
each observed data point and the grand mean.
§ We square these differences and add them to give us the total sum of
squares (SST):
o The variance and the sums of squares are related such that variance, s2 = SS/(N
− 1), where N is the number of observations.
§ Therefore, we can calculate the total sum of squares from the variance of all
observations (the grand variance) by rearranging the relationship (SS = s2(N − 1)).
§ The grand variance is the variation between all scores, regardless of the group from
which the scores come.
o The total sum of squares: it is the sum of the squared distances between each point and the
solid horizontal line (which represents the mean of all scores).
o When we estimate population values, the degrees of freedom are typically one less than the
number of scores used to calculate the estimate.
§ This is because to get the estimates we hold something constant in the population
(e.g., to get the variance we hold the mean constant), which leaves all but one of
the scores free to vary.
, Quantitative Data Analysis 2: Discovering Statistics Using IBM SPSS Statistics (Andy Field)
§ For SST, we used the entire sample to calculate the sums of squares and so the total
degrees of freedom (dfT) are one less than the total sample size (N − 1).
• Model Sum of Squares
o Because our model predicts the outcome from the means of our treatment groups, the
model sums of squares tell us how much of the total variation in the outcome can be
explained by the fact that different scores come from entities in different treatment
conditions.
o The model sum of squares is calculated by taking the difference between the values
predicted by the model and the grand mean.
§ When making predictions from group membership, the values predicted by the
model are the group means.
o The model sum of squared error: it is the sum of the squared distances between what the
model predicts for each data point and the overall mean of the outcome.
o The model sum of squares requires us to calculate the differences between each
participant’s predicted value and the grand mean.
§ These differences are squared and added together.
§ Given that the predicted value for participants in a group is the same
value (the group mean), the easiest way to calculate SSM is by using:
• Calculate the difference between the mean of each group
and the grand mean.
• Square each of these differences.
• Multiply each result by the number of participants within that group (ng).
• Add the values for each group together.
o For SSM, the degrees of freedom (dfM) are one less than the number of ‘things’ used to
calculate the SS.
§ We used the three group means, so dfM is the number of groups minus one (which
you’ll see denoted as k − 1).
• Residual Sum of Squares
o The residual sum of squares (SSR) tells us how much of the variation cannot be explained by
the model.
§ This value is the amount of variation created by things that we haven’t measured
such as measurement error and individual differences in things that might affect
happiness.
o The simplest way to calculate SSR is to subtract SSM from SST (SSR = SST − SSM), but this
provides little insight into what SSR represents and, of course, if you’ve messed up the
calculations of either SSM or SST (or both!) then SSR will be incorrect also.
o The residual sum of squares is the difference between what the model predicts and what
was observed.
§ When using group membership to predict an outcome the values predicted by the
model are the group means.
o The residual sum of squared error: it is the sum of the squared distances between each point
and the horizontal line for the group to which the score belongs.
• We already know that for a given participant, the model predicts the mean of the group to which that
person belongs.
o Therefore, SSR is calculated by looking at the difference between the score obtained by a
person and the mean of the group to which the person belongs.
o These distances between each data point and the group mean are squared
and added together to give the residual sum of squares, SSR:
§ The sum of squares for each group is the squared difference
between each participant’s score in a group and the group mean, and the two sigma
signs mean that we repeat this calculation for the first participant (i = 1) through to
the last (n), in the first group (g = 1) through to the last (k).
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller zarafranceschi. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $8.02. You're not tied to anything after your purchase.