Exam Guide for Applied Multivariate Data Analysis – Get yourself a Wonderful Grade!
Summary of Statistics (IBC), Radboud University
Answers assignment 3 business research methods
All for this textbook (117)
Written for
Radboud Universiteit Nijmegen (RU)
Bedrijfskunde
Statistics (MANMOR004)
All documents for this subject (5)
Seller
Follow
lauraroetgerink
Reviews received
Content preview
Summary Textbook Statistics
CH1 Why is my evil lecturer forcing me to learn statistics?
The research process
A theory consists of a set of principles that explains a general broad phenomenon. There is also a
critical mass of evidence to support the idea.
A hypothesis is NOT a prediction. Predictions emerge from a hypothesis, and transform it from
something unobservable into something that is.
A good theory should allow us to make statements about the state of the world. Scientific
statements are ones that can be verified with reference to empirical evidence, whereas non-scientific
statements are ones that cannot be empirically tested. Non-scientific statements can sometimes be
altered to become scientific statements.
Falsification is the act of disproving a hypothesis or theory.
Most hypotheses can be expressed in terms of two variables: a proposed cause and a proposed
outcome. A variable that we think is a cause is known as an independent variable. A variable that we
think is an effect is called a dependent variable. Field prefers to use the terms predictor variable and
outcome variable in place of dependent and independent variable. In experimental work the cause is
a predictor, and the effect is an outcome, and in correlational work we can talk of one or more
variables predicting one or more outcome variables.
,1.6.2. Levels of measurement
Variables can be categorical or continuous, and can have different levels of measurement. A
categorical variable is made up of categories. A categorical variable is one that names distinct
entities. This is known as a binary variable. When to things that are equivalent in some sense are
given the same name, but there are more than two possibilities the variable is said to be a nominal
variable. The only way that nominal data can be used is to consider frequencies.
A continuous variable is one that gives us a score for each person and can take on any value on the
measurement scale that we are using. Continuous variables can be discrete. A truly continuous
variable can be measured to any level of precision, whereas a discrete variable can take on only
certain values (usually whole numbers) on the scale.
1.6.3. Measurement error
There will often be a discrepancy between the numbers we use to represent the thing we're
measuring and the actual value of the thing we're measuring. This discrepancy is known as
measurement error. Self-report measures will produce larger measurement error because factors
other than the one you're trying to measure will influence how people respond to our measures.
1.6.4 Validity and reliability
Criterion validity is whether you can establish that an instrument measures what it claims to measure
through comparison to objective criteria. When data are recorded simultaneously using the new
instrument and existing criteria, this is said to assess concurrent validity; when data from the new
instrument are used to predict observations at a later point in time, this is said to assess predictive
validity.
Assessing criterion validity is often impractical because objective criteria that can be measured easily
may not exist. Content validity refers to the degree the measure covers the full range of the
construct.
To be valid the instrument must first be reliable. The easiest way to assess reliability is to test the
same group of people twice: a reliable instrument will produce similar scores at both points in time
(test-retest reliability).
1.7 Collecting data: research design
There are two ways to test a hypothesis: either by observing what naturally happens, or by
manipulating some aspect of the environment and observing the effect is has on the variable that
interests us. In correlational or cross-sectional research we observe what naturally goes on in the
world without directly interfering with it, whereas in experimental research we manipulate one
variable to see its effect on another.
1.7.1 Correlational research methods
Correlational research provides a very natural view of the question we're researcher because we're
not influencing what happens and the measures of the variables should not be biased by the
researcher being there (this is an important aspect of ecological validity). Correlational research tells
us nothing about the casual influence of variables.
1.7.2 Experimental research methods
Even when the cause-effect relationship is not explicitly stated, most research questions can be
broken down into a proposed cause and a proposed outcome.
David Hume defined a cause as 'An object precedent and contiguous to another, where all the
objects resembling the former are placed in like relations of precedency and contiguity to those
,objects that resemble the latter'. This definition implies that the cause need to precede the effect,
and causality is equated to high degrees of correlation between contiguous events.
In correlational research variables are often measured simultaneously. The first problem with doing
this is that it provides no information about the contiguity between different variables. Longitudinal
research addresses this issue to some extent, but there is still a problem with Hume's idea that
causality can be inferred from corroborating evidence, which is that it doesn't distinguish between
what you might call an 'accidental' conjunction and a causal one. When there is a third person or
thing of determinate character is called the tertium quid. It influences the variables so they co-occur.
These extraneous factors are sometimes called confounding variables, or confounds.
The shortcomings of Hume's definition led Joh Stuart Mill to suggest that, in addition to a correlation
between events, all other explanations of the cause-effect relationship must be ruled out. To rule out
confounding variables, Mill proposed that an effect should be present when the cause is present and
that when the cause is absent, the effect should be absent also. This is what experimental methods
strive to do.
1.7.3 Two methods of data collection
When we use an experiment to collect data, there are two ways to manipulate the independent
variable. The first is to test different entities (between-groups, between-subjects, or independent
design). An alternative is to manipulate the independent variable using the same entities (within-
subject or repeated-measures design. The way in which the data are collected determines the type of
test that is used to analyse the data.
1.7.4 Two types of variation
Performance won't be identical, there will be small differences in performance created by unknown
factors. This variation in performance is known as unsystematic variation.
Differences in performance created by a specific experimental manipulation are known as systematic
variation.
When we use different participants, it is an independent design. Unsystematic variation will be
bigger in an independent design as in a repeated-measures design.
In both the repeated-measures design and the independent design there are always two sources of
variation:
• Systematic variation: This variation is due to the experimenter doing something in one
condition but not in the other condition.
• Unsystematic variation: This variation results from random factors that exist between the
experimental conditions.
Statistical tests are often based on the idea of estimating how much variation there is in
performance, and comparing how much of this is systematic to how much is unsystematic.
In a repeated-measures design, differences between two conditions can be caused by only two
things:
• The manipulation that was carried out on the participants, or
• Any other factor that might affect the way in which an entity performs from one time to the
next.
In an independent design differences between the two conditions can also be caused by one or two
things:
, • The manipulation that was carried out on the participants, or
• Differences between the characteristics of the entities allocated to each of the group.
The latter factor is likely to create considerable random variation both within each condition and
between them.
In a repeated-measures design there is less noise than in a independent design and so the effect of
the experiment is more likely to show up.
1.7.5 Randomization
By keeping a unsystematic variation as small as possible we get a more sensitive measure of the
experimental manipulation. Generally, scientists use the randomization of entities to treatment
conditions to achieve this goal. Many statistical test work by identifying the systematic and
unsystematic sources of variation and then comparing them. This comparison allows us to see
whether the experiment has generated considerably more variation than we would have got had we
just tested participants without the experimental manipulation. Randomization is important because
it eliminates most other sources of systematic variation, which allows us to be sure that any
systematic variation between experimental conditions is due to the manipulation of the independent
variable.
The two most important sources of systematic variation in the repeated-measures design are:
• Practice effects: Participants may perform differently in the second condition because of
familiarity with the experimental situation and/or the measures being used.
• Boredom effects: Participants may perform differently in the second condition because they
are tired or bored from having completed the first condition.
We can ensure that these effects produce no systematic variation between our conditions by
counterbalancing the order in which a person participates in a condition.
In independent design the best way to reduce the systematic variation is to randomly allocate
participants to conditions.
1.8 Analysing data
1.8.1. Frequency distributions
Once you've collected some data a very useful thing to do is to plot a graph of how many times each
score occurs. This is known as a frequency distribution, or histogram.
There are two main ways in which a distribution can deviate from normal:
• Lack of symmetry (called skew)
• Pontyness (called kurtosis)
Kurtosis refers to the degree to which scores cluster at the ends of the distributions (known as the
tails) and this tends to express itself in how pointy a distribution is. Leptokurtic is very divided
(positive), platykurtic (negative) is when the frequencies are really close to each other.
In a normal distribution the values of skew and kurtosis are 0.
1.8.2 The mode
We can calculate where the center of a frequency distribution lies (known as the central tendency)
using three measures commonly used: the mean, the mode and the median.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller lauraroetgerink. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.80. You're not tied to anything after your purchase.