Summary Multivariate Data Analysis (7th Edition) – Chapters 1-4 + 7 (2010 Edition)
Chapter 1 – Overview of Multivariate Methods.....................................................................2
List of Key Terms................................................................................................................2
Measurement scales............................................................................................................4
Measurement error and multivariate measurements..........................................................5
Statistical significance versus statistical power..................................................................5
Classification of multivariate techniques............................................................................6
Chapter 2 – Examining Your Data.......................................................................................12
List of Key Terms..............................................................................................................12
Graphical examination of the data...................................................................................14
Missing data......................................................................................................................15
Outliers..............................................................................................................................21
Testing the assumptions of Multivariate Analysis.............................................................22
Data transformations........................................................................................................24
Incorporating nonmetric data with dummy variables.......................................................25
Chapter 3 – Exploratory Factor Analysis.............................................................................25
List of Key Terms..............................................................................................................25
Stage 1: Objectives of Factor Analysis.............................................................................28
Stage 2: Designing a Factor Analysis...............................................................................29
Stage 3: Assumptions in Factor Analysis..........................................................................30
Stage 4: Deriving Factors and assessing overall fit.........................................................31
Stage 5: Interpreting the Factors......................................................................................32
Stage 6: Validation of Factor Analysis.............................................................................34
Stage 7: Additional uses of Factor Analysis results.........................................................35
Schematic overview of each stage in Exploratory Factor Analysis..................................38
Chapter 4 – Regression Analysis..........................................................................................40
List of Key Terms..............................................................................................................40
Stage 1: Objectives of Multiple Regression......................................................................46
Stage 2: Research design of a Multiple Regression Analysis...........................................48
Stage 3: Assumptions in Multiple Regression Analysis....................................................49
Stage 4: Estimating the Regression Model and assessing overall model fit.....................50
Stage 5: Interpreting the Regression Variate...................................................................52
Stage 6: Validation of the results......................................................................................54
Schematic overview of each stage in Multiple Regression Analysis.................................56
Chapter 7 – Logistic Regression (page 314)........................................................................58
List of Key Terms..............................................................................................................58
Stage 1: Objectives of Logistic Regression.......................................................................59
Stage 2: Research design of Logistic Regression.............................................................60
Stage 3: Assumptions of Logistic Regression...................................................................60
Stage 4: Estimation of the Logistic Regression model and assessing overall fit..............60
Stage 5: Interpretation of the results................................................................................62
Stage 6: Validation of the results......................................................................................63
,Chapter 1 – Overview of Multivariate Methods
List of Key Terms
Alpha (α) See Type 1 Error
Beta (β) See Type 2 Error
Bivariate partial Simple (two-variable) correlation between two sets of residuals
correlation (unexplained variances) that remain after the association of other
independent variables is removed.
Bootstrapping An approach to validating a multivariate model by drawing a large
number of sub-samples and estimating models for each subsample.
Estimates from all the subsamples are then combined, providing not
only the “best” estimated coefficients, but their expected variability
and thus their likelihood of differing from zero; that is, are the
estimated coefficients statistically different from zero or not? This
approach does not rely on statistical assumptions about the
population to assess statistical significance, but instead makes its
assessment based solely on the sample data.
Composite measure See summated scales
Dependence Classification of statistical techniques distinguished by having a
technique variable or set of variables identified as the dependent variable(s)
and the remaining variable as independent. The objective is
prediction of the dependent variable(s) by the independent
variable(s). (e.g. regression analysis)
Dependent variable Presumed effect of, or response to, a change in the independent
variable(s).
Dummy variable Nonmetrically measured variable transformed into a metric variable
by assigning a 1 or 0 to a subject, depending on whether it possesses
a particular characteristic.
Effect size Estimate of the degree to which the phenomenon being studied
exists in the population.
Independent Presumed cause of any change in the dependent variable.
variable
Indicator Single variable used in conjunction with one or more other variables
to form a composite measure.
Interdependence Classification of statistical techniques in which the variables are not
technique divided into dependent and independent sets; rather all variables are
analyzed as a single set. (e.g. factor analysis)
Measurement error Inaccuracies of measuring the “true” variable values due to the
fallibility of the measurement instrument, data entry errors, or
respondent errors.
Metric data Also called quantitative data, interval data, or ratio data, these
measurements identify or describe subjects (or objects) not only on
the possession of an attribute but also by the amount or degree to
which the subject may be characterized by the attribute.
Multicollinearity Extent to which a variable can be explained by the other variables in
the analysis. As multicollinearity increases, it complicates the
interpretation of the variate because it is more difficult to ascertain
the effect of any single variable, owing to their interrelationships.
Multivariate Analysis of multiple variables in a single relationship or set of
,analysis relationships
Multivariate Use of two or more variables as indicators of a single composite
measurement measure.
Nonmetric data Also called qualitative data, these are attributes, characteristics, or
categorical properties that identify or describe a subject or object.
They differ from metric data by indicating the presence of an
attribute, but not the amount. Also called nominal data or ordinal
data.
Power Probability of correctly rejecting the null hypothesis when it exists.
Determined as a function of (1) the statistical significance level set
by the researcher for a Type 1 error (α), (2) the sample size used in
the analysis, and (3) the effect size being examined.
Practical Means of assessing multivariate analysis results based on their
significance substantive findings rather than their statistical significance.
Whereas statistical significance determines whether the result is
attributable to chance, practical significance assesses whether the
result is useful in achieving the research objectives.
Reliability Extent to which a variable or set of variables is consistent in what it
is intended to measure. If multiple measurements are taken, the
reliable measures will all be consistent in their values. It differs
from validity in that it relates not to what should be measured, but
instead of how it is measured.
Specification error Omitting a key variable from the analysis, thus affecting the
estimated effects of included variables.
Summated scales Method of combining several variables that measure the same
concept into a single variable in an attempt to increase the
reliability of the measurement through multivariate measurement.
In most instances, the separate variables are summed and then their
total or average score is used in the analysis.
Treatment Independent variable the researcher manipulates to see the effect on
the dependent variable(s), such as in an experiment.
Type I error Probability of incorrectly rejecting the null hypothesis – in most
cases, it means saying a difference or correlation exists when it
actually does not. Typical levels are .05 or .01.
Type II error Probability of incorrectly failing to reject the null hypothesis – in
simple terms, the chance of not finding a correlation or mean
difference when it does exist. The value of (1-β) is defined as
power.
Univariate analysis Statistical technique used to determine, on the basis of one
of variance dependent measure, whether samples are from populations with
(ANOVA) equal means.
Validity Extent to which a measure or set of measures correctly represents
the concept of study – the degree to which it is free from any
systematic or nonrandom error. Validity is concerned with how well
the concept is defined by the measures, whereas reliability relates to
the consistency of the measure(s).
Variate Linear combination of variables formed in the multivariate
technique by deriving empirical weights applied to a set of variables
specified by the researcher.
, Multivariate analysis relates to all statistical techniques that simultaneously analyze multiple
measurements on individuals or objects under investigation, and is used to transform data into
usable knowledge.
The building block of multivariate analysis is the variate, a linear combination of variables
with empirically determined weight. The variables are determined by the researcher, whereas
the weights are determined by the multivariate technique to meet a specific objective. The
mathematical equation is as follows:
Variate value=w1 X 1+ w2 X 2 +w 3 X 3+ … wn X n
X n is the observed variable and w n is the weight determined by the multivariate technique.
The result is a single value representing a combination of the entire set of variables that best
achieves the objective of the specific multivariate analysis. The variate captures the
multivariate character of the analysis: it is the focal point of the analysis. We must understand
not only its collective impact in meeting the technique’s objective, but also each separate
variable’s contribution to the overall variate effect.
Measurement scales
Data can be classified into two categories: nonmetric (qualitative) and metric (quantitative).
Nonmetric data describe differences in type or kind by indicating the presence or absence of
an attribute. Nonmetric measurements can be made with two scales. The first are nominal
scales, which assign numbers to objects to indicate the presence of absence of an attribute.
The second are ordinal scales, with which variables can be ordered in relation to the amount
of the attribute possessed. Ordinal scales provide no measure of the actual amount or
magnitude in absolute terms, only the order of the values. The researcher knows the order, but
not the amount of difference between the values. This means that nonmetric data are quite
limited in their use in estimating model coefficients. In contrast to nonmetric data, metric
data are used when subjects differ in amount or degree on a particular attribute. The two
types are interval scales and ratio scales, which have constant units of measurements, so
differences between any two adjacent points on any part of the scale are equal, allowing
nearly any mathematical operation to be performed. The only real difference is that interval
scales use an arbitrary zero point, whereas ratio scales include an absolute zero point. It’s
important to understand and identify the different types of measurement scales to prevent that
nonmetric data are incorrectly used as metric data, and to determine which multivariate
techniques are the most applicable to the data.