Multivariate Data Analysis: Pearson International Edition
Summary of the book Multivariate Data Analysis (Hair et al), chapters 1, 3, 4, 6 11 and 12. These chapters are mandatory for the course 'Advanced Research Methods, part B' at Radboud University, Master Business Administration. The summary is quite large, but it entails a list of all definitions per...
MMSR Trial exam + Answers 2024/2025 - Methodology for marketing and strategic management Radboud
Summary Introduction to Research in Marketing (Fall)
TIU Intoduction to Research in Marketing ALL lectures summary
All for this textbook (18)
Written for
Radboud Universiteit Nijmegen (RU)
Master Business Administration
Advanced Research Methods
All documents for this subject (5)
2
reviews
By: ruben98broek • 4 year ago
By: goldjeongbin • 4 year ago
Seller
Follow
Romygerritsen
Reviews received
Content preview
Summary book Multivariate Data Analysis
Chapters 1, 3, 4, 6 11, 12.
Chapter 1: Overview of Multivariate Methods
This chapter is an overview of multivariate analysis, starting with the key terms.
Alpha (α ) See Type I error.
Beta (β) See Type II error.
Bivariate partial correlation Simple (two-variable) correlation between two sets of residuals
(unexplained variances) that remain after the association of other
independent variables is removed.
Bootstrapping An approach to validating a multivariate model by drawing a large
number of subsamples and estimating models for each subsample.
Estimates from all the subsamples are then combined, providing not
only the “best” estimated coefficients (e.g., means of each estimated
coefficient across all the subsample models), but their expected
variability and thus their likelihood of differing from zero; that is, are
the estimated coefficients statistically different from zero or not?
This approach does not rely on statistical assumptions about the
population to assess statistical significance, but instead makes its
assessment based solely on the sample data.
Composite measure See summated scales.
Dependence technique Classification of statistical techniques distinguished by having a
variable or set of variables identified as the dependent variable(s) and
the remaining variables as independent. The objective is prediction of
the dependent variable(s) by the independent variable(s). An
example is regression analysis.
Dependent variable Presumed effect of, or response to, a change in the independent
variable(s).
Dummy variable Nonmetrically measured variable transformed into a metric variable
by assigning a 1 or a 0 to a subject, depending on whether it
possesses a particular characteristic.
Effect size Estimate of the degree to which the phenomenon being studied (e.g.,
correlation or difference in means) exists in the population.
Independent variable Presumed cause of any change in the dependent variable.
Indicator Single variable used in conjunction with one or more other variables
to form a composite measure.
Interdependence technique Classification of statistical techniques in which the variables are
not divided into dependent and independent sets; rather, all variables
are analyzed as a single set (e.g., factor analysis).
Measurement error Inaccuracies of measuring the “true” variable values due to the
fallibility of the measurement instrument (i.e., inappropriate
response scales), data entry errors, or respondent errors.
Metric data Also called quantitative data, interval data, or ratio data, these
measurements identify or describe subjects (or objects) not only on
the possession of an attribute but also by the amount or degree to
which the subject may be characterized by the attribute. For
example, a person’s age and weight are metric data.
1
,Multicollinearity Extent to which a variable can be explained by the other variables in
the analysis. As multicollinearity increases, it complicates the
interpretation of the variate because it is more difficult to ascertain
the effect of any single variable, owing to their interrelationships.
Multivariate analysis Analysis of multiple variables in a single relationship or set of
relationships.
Multivariate measurement Use of two or more variables as indicators of a single composite
measure. For example, a personality test may provide the answers to
a series of individual questions (indicators), which are then combined
to form a single score (summated scale) representing the personality
trait.
Nonmetric data Also called qualitative data, these are attributes, characteristics, or
categorical properties that identify or describe a subject or object.
They differ from metric data by indicating the presence of an
attribute, but not the amount. Examples are occupation (physician,
attorney, professor) or buyer status (buyer, nonbuyer). Also called
nominal data or ordinal data.
Power Probability of correctly rejecting the null hypothesis when it is false;
that is, correctly finding a hypothesized relationship when it exists.
Determined as a function of (1) the statistical significance level set by
the researcher for a Type I error ( ), (2) the sample size used in the
analysis, and (3) the effect size being examined.
Practical significance Means of assessing multivariate analysis results based on their
substantive findings rather than their statistical significance. Whereas
statistical significance determines whether the result is attributable
to chance, practical significance assesses whether the result is useful
(i.e., substantial enough to warrant action) in achieving the research
objectives.
Reliability Extent to which a variable or set of variables is consistent in what it is
intended to measure. If multiple measurements are taken, the
reliable measures will all be consistent in their values. It differs from
validity in that it relates not to what should be measured, but instead
to how it is measured.
Specification error Omitting a key variable from the analysis, thus affecting the
estimated effects of included variables.
Summated scales Method of combining several variables that measure the same
concept into a single variable in an attempt to increase the reliability
of the measurement through multivariate measurement. In most
instances, the separate variables are summed and then their total or
average score is used in the analysis.
Treatment Independent variable the researcher manipulates to see the effect (if
any) on the dependent variable(s), such as in an experiment (e.g.,
testing the appeal of color versus black-and white advertisements).
Type I error Probability of incorrectly rejecting the null hypothesis—in most cases,
it means saying a difference or correlation exists when it actually
does not. Also termed alpha ( ). Typical levels are 5 or 1 percent,
termed the .05 or .01 level, respectively.
Type II error Probability of incorrectly failing to reject the null hypothesis—in
simple terms, the chance of not finding a correlation or mean
difference when it does exist. Also termed beta (β), it is inversely
related to Type I error. The value of 1 minus the Type II error (1 - β) is
defined as power.
2
,Univariate analysis of variance Statistical technique used to determine, on the basis
of one dependent measure, whether samples are from populations
with equal means. Also called ANOVA
Validity Extent to which a measure or set of measures correctly represents
the concept of study— the degree to which it is free from any
systematic or nonrandom error. Validity is concerned with how well
the concept is defined by the measure(s), whereas reliability relates
to the consistency of the measure(s).
Variate Linear combination of variables formed in the multivariate technique
by deriving empirical weights applied to a set of variables specified by
the researcher.
What is multivariate analysis?
Nowadays we are drowning in information and starved for knowledge. The information available for
decision making exploded in recent years and will continue to do so in the future. Some of that
information can be analyzed and understood with simple statistics, but much of it requires more
complex, multivariate statistical techniques to convert these data into knowledge.
Multivariate analysis = all statistical techniques that simultaneously analyze multiple measurements
on individuals or objects under investigation: any simultaneous analysis of more than two variables.
Many multivariate techniques are extensions of univariate or bivariate analyses, like correlation,
simple regression and analysis of variance. Simple regression can for example be extended in the
multivariate case to include several predictor variables. The analysis of variance can be extended to
include multiple dependent variables. Thus, some multivariate techniques (e.g., multiple regression
and multivariate analysis of variance) provide a means of performing in a single analysis what once
took multiple univariate analyses to accomplish. Other multivariate techniques are uniquely designed
to deal with multivariate issues, such as factor analysis, which identifies the structure underlying a
set of variables, or discriminant analysis, which differentiates among groups based on a set of
variables. The definition in this book for multivariate analysis will include both multivariable
techniques (number of variables) and truly multivariate techniques.
The building block of multivariate analysis is the variate, a linear combination of variables with
empirically determined weights. The variables are specified by the researcher and the weights are
determined by the multivariate technique to meet a specific objective. Thus, the variate is the focal
point of the analysis.
Two categories of data:
1. Nonmetric (qualitative): describes differences in type or kind by indicating the presence or
absence of a characteristic or property. They have a particular feature and by that, all other
features are excluded. For example: gender. If you are a male, you can’t be a female.
a. Nominal scales: assigns numbers to label subjects. These numbers have no
quantitative meaning beyond indicating the presence or absence of the attribute.
Also known as categorical scales. Nominal data only represent categories or classes
and doesn’t imply amounts of an attribute or characteristic. Examples: demographic
attributes (gender, religion, political party), forms of behavior (voting or purchase
behavior), or any other action that is discrete (it happens or it doesn’t happen)
b. Ordinal scales: variables can be ordered or ranked in relation to the amount of the
attribute possessed. Every subject can be compared to another in terms of ‘greater
than’ or ‘less than’. They are nonquantitative because they indicate only relative
positions, no measure of the actual amount or magnitude in absolute terms. You
only know the order, but not the amount of difference between the values. This type
3
, of measurement looks attractive, but there is not much analysis to perform: no
sums, no average, no multiplication or division.
2. Metric (quantitative): used when subjects differ in amount, magnitude or degree on a
particular attribute, like the level of job satisfaction or commitment
a. Interval scales: has constant units of measurement, so differences between points
on the scale are equal. Doesn’t have an absolute zero point, so uses an arbitrary one.
For example Fahrenheit and Celsius temperature scales, where zero doesn’t mean
zero amount or lack of temperature, you can even go below zero. Therefore, it is not
possible to say that any value on an interval scale is multiple of some other point on
the scale. You can’t say that 40 degrees is twice as hot as 20 degrees.
b. Ratio scales: also has constant units of measurement, but does have an absolute
zero point. This is the highest form of measurement precision. All mathematical
operations are permissible with ratio-scale measurements. For example weighing
machines: 100 pounds is twice as heavy as 50 pounds.
Understanding these different types of measurement scales is important for two reasons:
1. So nonmetric data are not incorrectly used as metric data and vice versa. For example, you
can’t have an average gender.
2. To determine which multivariate technique is most applicable to the data.
Measurement error = the degree to which the observed values are not representative of the true
values. Sources can be data entry errors, imprecision of the measurement, inability of respondents.
Thus, all variables used in multivariate techniques must be assumed to have some degree of error.
This error adds noise to the observed variables. Reducing measurement error, although it takes
effort, time, and additional resources, may improve weak or marginal results and strengthen proven
results as well.
Reducing this measurement error can be done in different ways. To assess the degree of the error,
the researcher must address two important characteristics of a measure:
1. Validity: the degree to which a measure accurately represents what it is supposed to. If you
want to measure discretionary income, you should not ask for household income. You should
understand what is to be measured and make sure that the correct question is being asked.
2. Reliability: the degree to which the observed variable measures the true value and is error
free, so the opposite of measurement error. Researchers should use more reliable measures
and choose variables with the highest reliability.
In addition to reducing measurement error by improving individual variables, the researcher may
also choose to develop multivariate measurements, also known as summated scales, in which
several variables are joined in a composite measure to represent a concept. The objective of this is
to avoid the use of only a single variable to represent a concept and instead to use several variables
as indicators, all representing different facets of the concept. Multiple responses reflect the true
response more accurately than does a single response. The researcher should assess reliability and
incorporate scales into the analysis.
All the multivariate techniques, except for cluster analysis and perceptual mapping, are based on the
statistical inference of a population’s values or relationships among variables from a randomly drawn
sample of that population. Interpreting statistical inferences requires the researcher to specify the
acceptable levels of statistical error that result from using a sample (known as sampling error). The
most common approach is to specify the level of Type I error, also known as alpha (α). Type I error is
the probability of rejecting the null hypothesis when it is actually true—generally referred to as a
false positive. By specifying an alpha level, the researcher sets the acceptable limits for error and
indicates the probability of concluding that significance exists when it really does not. When
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Romygerritsen. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.83. You're not tied to anything after your purchase.