Summary Using Multivariate Statistics + Lectures Repeated Measures
141 views 26 purchases
Course
Repeated Measures (PSMM2)
Institution
Rijksuniversiteit Groningen (RuG)
Book
Using Multivariate Statistics
Comprehensive summary of the relevant chapters of the book 'Using Multivariate Statistics' and notes + summary of the lectures given during the Repeated Measures course.
Summary Repeated Measures
Using Multivariate Statistics
By Barbara G. Tabachnick, Linda S. Fidell, Sixth Edition, Pearson New International Edition
Chapter 1: Introduction
1. Multivariate Statistics: Why?
Provide analysis when there are many independent variables (IVs) and/or many dependent variables
(DVs), all correlated with one another to varying degrees.
1.1 The Domain of Multivariate Statistics: Numbers of IVs and DVs
Multivariate statistical methods: extension of univariate and bivariate statistics. Multivariate
statistics: complete or general case, univariate/bivariate statistics: special cases of the multivariate
model. Two major types of variables:
1. Independent variables (IVs): differing conditions (e.g. treatment vs. placebo) to which you
expose your subjects, or the characteristics (e.g. tall or short) that the subjects themselves
bring into the research situation. Predictor variables: predict the DVs.
2. Dependent variables (DVs): response or outcome variables.
Univariate statistics= analyses in which there is a single DV, may be more than one IV (e.g. the
amount of social behavior of graduate students (the DV) is studied as a function of course load (one
IV) and type of training in social skills to which students are exposed (another IV)). Analysis of
variance.
Bivariate statistics= analysis of two variables, where neither is an experimental IV and the desire is to
study the relationship between the variables (e.g. the relationship between income and amount of
education). Usually not applied in experimental studies.
Multivariate statistics= simultaneously analyze multiple dependent and multiple independent
variables. Nonexperimental (correlational or survey) and experimental research.
1.2 Experimental and Nonexperimental Research
- Experimental research: researcher has control over the levels (or conditions) of at least one
IV to which a subject is exposed by determining what the levels are, how they are
implemented, and how and when cases are assigned and exposed to them. Experimenter
randomly assigns subjects to levels of the IV and controls all other influential factors by
holding them constant, counterbalancing, or randomizing their influence. Scores on the DV
are expected to be the same, within random variation, except for the influence of the IV.
Systematic differences in the DV associated with levels of the IV: differences attributed to
the IV (e.g. if groups of undergraduates are randomly assigned to the same material but
different types of teaching techniques, and afterward some groups of undergraduates
perform better than others, the difference in performance is said, with some degree of
confidence, to be caused by the difference in teaching technique). The value of the DV
depends on the manipulated level of the IV. The IV is manipulated by the experimenter and
the score on the DV depends on the level of the IV.
▪ Multiple IVs: research usually designed so that the IVs are independent of each other
and a straightforward correction for numerous statistical tests is available.
▪ Multiple DVs: problem of inflated error rate if each DV is tested separately. At least
some of the DVs are likely to be correlated with each other: separate tests of each
DV reanalyze some of the same variance.
- Nonexperimental (correlational or survey) research: levels of the IV(s) are not manipulated
by the researcher. Researcher can define the IV, but has no control over the assignment of
subjects to levels of it (e.g. groups of people may be categorized into geographic area of
residence (Northeast, Midwest, etc.), but only the definition of the variable is under
researcher control. Except for the military or prison, place of residence is rarely subject to
, manipulation by a researcher). A naturally occurring difference like this is often considered
an IV and is used to predict some other nonexperimental (dependent) variable (e.g. income).
Distinction between IVs and DVs is usually arbitrary. IVs= predictors, DVs= criterion variables.
Very difficult to attribute causality to an IV: if there is a systematic difference in a DV
associated with levels of an IV, the two variables are said (with some degree of confidence)
to be related, but the cause of the relationship is unclear (e.g. income as a DV might be
related to geographic area, but no causal association is implied). Survey: many people are
surveyed, each respondent provides answers to many questions → producing a large
number of variables → variables usually interrelated in highly complex ways.
▪ Distinguish among subgroups in a sample (e.g. between Catholics and Protestants)
on the basis of a variety of attitudinal variables: several univariate t tests (or analyses
of variance) to examine group differences on each variable separately.
▪ Variables are related: multivariate statistical techniques to reveal and assess complex
interrelationships among variables. Possible to keep the overall Type I error rate at,
say, 5%, no matter how many variables are tested.
Multivariate statistics help the experimenter design more efficient and more realistic experiments by
allowing measurement of multiple DVs without violation of acceptable levels of Type I error. Not
relevant to choice of statistical technique: whether the data are experimental or correlational.
1.3 Computers and Multivariate Statistics
IBM SPSS or SAS.
1.4 Garbage In, Roses Out?
Benefits:
- Increased flexibility in research design
- Provide insights into relationships among variables that may more closely resemble the
complexity of the “real” world
- Sometimes you get at least partial answers to questions that could not be asked at all in the
univariate framework
Downsides:
- Increased ambiguity in interpretation of results
- Results can be quite sensitive to which analytic strategy is chosen
- Results do not always provide better protection against statistical errors than their univariate
counterparts
- Occasionally you still cannot get a firm statistical answer to your research questions
2. Some Useful Definitions
2.1 Continuous, Discrete, and Dichotomous Data
- Continuous (interval, quantitative): measured on a scale that changes values smoothly
rather than in steps. Take on any values within the range of the scale. Size of the number
reflects the amount of the variable. Precision is limited by the measuring instrument, not by
the nature of the scale itself (e.g. time as measured on an old-fashioned analog clock face,
annual income, age, temperature, distance, and grade point average (GPA)).
- Discrete (nominal, categorical): take on a finite and usually small number of values. No
smooth transition from one value or category to the next (e.g. time as displayed by a digital
clock, continents, categories of religious affiliation, and type of community (rural or urban)).
Sometimes discrete variables are used in multivariate analyses as if continuous: there are
numerous categories and the categories represent a quantitative attribute (e.g. a variable
that represents age categories (1 stands for 0 to 4 years, 2 stands for 5 to 9 years, 3 stands
for 10 to 14 years, and so on up through the normal age span) can be used because there are
a lot of categories and the numbers designate a quantitative attribute (increasing age)).
Discrete variables composed of qualitatively different categories are sometimes analyzed
after being changed into a number of dichotomous or two-level variables (e.g. Catholic vs.
nonCatholic, Protestant vs. non-Protestant, Jewish vs. non-Jewish, and so on until the
degrees of freedom are used)) =dummy variable coding. Goal: limit the relationship between
, the dichotomous variables and others to linear relationships. A discrete variable with more
than two categories can have a relationship of any shape with another variable, and the
relationship is changed arbitrarily if the assignment of numbers to categories is changed.
- Dichotomous (nominal, categorical): only two points. Can have only linear relationships with
other variables. Methods using correlation in which only linear relationships are analyzed.
Any continuous measurement may be rendered discrete (or dichotomous) with some loss of
information, by specifying cutoffs on the continuous scale.
Non-normally distributed continuous variables and dichotomous variables with very uneven splits
between the categories present problems to several of the multivariate analyses.
Rank order (ordinal) scale= assigns a number to each subject to indicate the subject’s position vis-à-
vis other subjects along some dimension (e.g. ranks are assigned to contestants (first place, second
place, third place, etc.) to provide an indication of who is the best—but not by how much). Problem
with ordinal measures: their distributions are rectangular (one frequency per number) instead of
normal, unless tied ranks are permitted and they pile up in the middle of the distribution. Variable
with ambiguous measurement (e.g. the number of correct items on an objective test is technically
not continuous because fractional values are not possible, but it is thought to measure some
underlying continuous variable such as course mastery).
Likerttype scale= consumers rate their attitudes toward a product as “strongly like,” “moderately
like,” “mildly like,” “neither like nor dislike,” “mildly dislike,” “moderately dislike,” or “strongly
dislike.” Dichotomous variables may be treated as if continuous under some conditions.
2.2 Samples and Populations
Samples= measured to make generalizations about populations. Selected by some random process
so that they represent the population of interest. Population= group from which you were able to
randomly sample. Sampling
- Nonexperimental research: investigate relationships among variables in some predefined
population. Elaborate precautions to ensure that you have achieved a representative sample
of that population → define population → do best to randomly sample from it.
- Experimental research: attempt to create different populations by treating subgroups from
an originally homogeneous group differently. Sampling objective: ensure that all subjects
come from the same population before you treat them differently. Random sampling:
randomly assigning subjects to treatment groups (levels of the IV) to ensure that, before
differential treatment, all subsamples come from the same population. Statistical tests
provide evidence as to whether, after treatment, all samples still come from the same
population. Generalizations about treatment effectiveness are made to the type of subjects
who participated in the experiment.
2.3 Descriptive and Inferential Statistics
- Descriptive statistics: describe samples of subjects in terms of variables or combinations of
variables.
- Inferential statistical techniques: test hypotheses about differences in populations on the
basis of measurements made on samples of subjects. Reliable differences found: descriptive
statistics used to provide estimations of central tendency, and the like, in the population
=parameter estimates.
2.4 Orthogonality: Standard and Sequential Analyses
Orthogonality= a perfect nonassociation between variables. The value of one variable gives no clue
as to the value of the other: the correlation between them is zero. Often desirable in statistical
applications. Tests of hypotheses about main effects and interactions are independent of each other:
the outcome of each test gives no hint as to the outcome of the others. In orthogonal experimental
designs with random assignment of subjects, manipulation of the levels of the IV, and good controls,
changes in value of the DV can be unambiguously attributed to various main effects and interactions.
Advantage if sets of IVs or DVs are orthogonal:
- If all pairs of IVs in a set are orthogonal, each IV adds to prediction of the DV (e.g. income as
a DV with education and occupational prestige as IVs. If education and occupational prestige
, are orthogonal, and if 35% of the variability in income may be predicted from education and
a different 45% is predicted from occupational prestige, then 80% of the variance in income
is predicted from education and occupational prestige together).
Venn diagrams= represent shared variance (or correlation) as
overlapping areas between two (or more) circles (e.g. the total
variance for income is one circle. The section with horizontal stripes
represents the part of income predictable from education, and the
section with vertical stripes represents the part predictable from
occupational prestige; the circle for education overlaps the circle for
income 35% and the circle for occupational prestige overlaps 45%.
Together, they account for 80% of the variability in income because
education and occupational prestige are orthogonal and do not themselves overlap.
- Set of DVs is orthogonal: overall effect of an IV can be partitioned into effects on each DV in
an additive fashion.
Variables are correlated with each other =nonorthogonal. IVs in nonexperimental designs: often
correlated naturally, experimental designs: IVs become
correlated when unequal numbers of subjects are measured in
different cells of the design. DVs are usually correlated:
individual differences among subjects tend to be consistent over
many attributes. When variables are correlated, they have
shared or overlapping variance (e.g. education and occupational
prestige correlate with each other. Although the independent
contribution made by education is still 35% and that by
occupational prestige is 45%, their joint contribution to
prediction of income is not 80%, but rather something smaller
due to the overlapping area shown by the arrow in Figure 2(a)).
Standard analysis= the overlapping variance contributes to the size of summary statistics of the
overall relationship but is not assigned to either variable. Overlapping variance is disregarded in
assessing the contribution of each variable to the solution (e.g. figure 2(a) is a Venn diagram of a
standard analysis in which overlapping variance is shown as overlapping areas in circles; the unique
contributions of X1 and X2 to prediction of Y are shown as horizontal and vertical areas, respectively,
and the total relationship between Y and the combination of X1 and X2 is those two areas plus the
area with the arrow. If X1 is education and X2 is occupational prestige, then in standard analysis, X1 is
“credited with” the area marked by the horizontal lines and X2 by the area marked by vertical lines.
Neither of the IVs is assigned the area designated with the arrow. When X1 and X2 substantially
overlap each other, very little horizontal or vertical area may be left for either of them, despite the
fact that they are both related to Y. They have essentially knocked each other out of the solution).
Sequential analyses= researcher assigns priority for entry of variables into equations. First one to
enter is assigned both unique variance and any overlapping variance it has with other variables.
Lower-priority variables are then assigned on entry their unique and any remaining overlapping
variance (e.g. figure 2(ab) where X1 (education) is given priority over X2 (occupational prestige). The
total variance explained is the same as in Figure 2(a), but the relative contributions of X1 and X2 have
changed. Education now shows a stronger relationship with income than in the standard analysis,
whereas the relation between occupational prestige and income remains the same).
If variables are correlated, the overall relationship remains the same, but the apparent importance of
variables to the solution changes depending on whether a standard or a sequential strategy is used.
3. Linear Combinations of Variables
Multivariate analyses combine variables to do useful work: combination is mostly linear: each
variable is assigned a weight (e.g. W1) → products of weights and the variable scores are summed to
predict a score on a combined variable. Y’ (the predicted DV) is predicted by a linear combination of
X1 and X2 (the IVs).
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through EFT, credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying this summary from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller romyvandijk. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy this summary for $4.84. You're not tied to anything after your purchase.