Statistics I: Description and Inference Notes on Readings - GRADE 8,5
All for this textbook (6)
Written for
Radboud Universiteit Nijmegen (RU)
Bachelor of Business Administration
Statistics (MANMOR004)
All documents for this subject (8)
1
review
By: Mathias • 1 year ago
Seller
Follow
jaccoverbij
Reviews received
Content preview
Chapter 1 Why is my evil lecturer forcing me to learn statistics?
Chapter 1.3 The research process
The research process can be broadly summarized as in the figure below.
Figure: Research process
Chapter 1.5 Generating and testing theories and hypotheses
Theory: an explanation or set of principles that is well substantiated by repeated testing
and explains a broad phenomenon.
Hypothesis: a proposed explanation for a fairly narrow phenomenon or set of
observations (informed, theory-driven attempt).
Falsification: the act of disproving a hypothesis or theory.
Chapter 1.6 Collecting data: measurement
Variables
- Independent variable: a variable thought to be the cause of some effect (the
variable that has been manipulated).
- Dependent variable: a variable thought to be affected by changes in an
independent variable.
- Predictor variable: a variable thought to predict an outcome variable.
- Outcome variable: a variable thought to change as a function of changes in a
predictor variable.
Most hypothesis can be expressed in terms of two variables: a proposed cause and a
proposed outcome.
Levels of measurement
- Categorial variable: a variable made up of categories.
o Binary variable: a categorial variable that names just two distinct types of
things (two categories).
o Nominal variable: same as binary variable, but with more categories.
o Ordinal variable: when categories are ordered.
- Continuous variable: a variable that gives us a score for each person and can take
on any value on the measurement scale that we are using.
o Interval variable: equal intervals on the scale represent equal differences
in the property being used.
, o Ratio variable: has the same requirements as the interval variable,
however as an additional requirement, the ratios of values along the scale
should be meaningful (scale must have a true and meaningless zero point).
A continuous variable can be measured to any level of precision, whereas a discrete
variable can take on only certain values (usually whole numbers) on the scale.
Measurement error: the discrepancy between the numbers we use to represent the thing
we’re measuring and the actual value of the thing we’re measuring.
Validity: whether an instrument measures what it sets out to measure.
- Criterion validity: whether you can establish that an instrument measures what it
claims to measure though comparison to objective criteria (e.g. by relating scores
on your measure to real-world observations).
- Concurrent validity: when data are recorded simultaneously using the new
instrument and existing criteria.
- Predictive validity: when data from the new instrument are used to predict
observations at a later point in time.
- Content validity: the degree to which individual items represent the construct
being measures, and cover the full range of the construct.
Validity is a necessary but not sufficient condition of a measure. To be valid, the
instrument must first be reliable.
Reliability: whether an instrument can be interpreted consistently across different
situations (the ability of the measure to produce the same results under the same
circumstances).
- Test-retest reliability: a reliable instrument will produce similar scores at both
points in time.
Chapter 1.7 Collecting data: research design
In correlational or cross-sectional research, we observe what naturally goes on in the
world without directly interfering with it, whereas in experimental research we
manipulate one variable to see its effect on another.
- This type of research provides a very natural view of the question we’re
researching because we’re not influencing what happens and the measures of the
variables should not be biased by the researcher being there (this is an important
aspect of ecological validity).
- Correlational research tells us nothing about the causal influence of variables.
- In correlational research variables are often measured simultaneously (it provides
no information about the contiguity between different variables).
- Limitation of correlational research is the tertium quid.
Tertium quid: ‘a third person or thing of indeterminate character’.
Confounding variables (confounds): extraneous factors which can influence a correlation.
Longitudinal research: measuring variables repeatedly at different time points.
- Cause: ‘an object precedent and contiguous to another, and where all the objects
resembling the former are placed in like relations of precedency and contiguity to
those objects that resemble the latter’ (Hume).
- The only way to infer causality is through comparing two controlled situations:
one in which the cause is present and one in which the cause is absent (this is the
function of experimental methods; to provide a comparison of situations).
- In experiments there are two ways to manipulate the independent variable:
o By testing different entities (a between-groups, between-subjects, or
independent design).
o By using the same entities (a within-subject or repeated-measures design).
Systematic variation: due to the experimenter doing something in one condition but not
in the other condition.
,Unsystematic variation: due to random factors that exist between the experimental
conditions (e.g. the time of day).
- By keeping the unsystematic variation as small as possible we get a more
sensitive measure of the experimental situation (in this case, randomization is
used).
Randomization is important because it eliminates most other sources of systematic
variation, which allows us to be sure that any systematic variation between experimental
conditions is due to the manipulation of the independent variable. Randomization can be
used in two ways:
- In the repeated measures design
o Practice effects: participants may perform differently in the second
condition because of familiarity with the experimental situation and/or the
measures being used.
o Boredom effects: participants may perform differently in the second
condition because they are tired or bored from having completed the first
condition.
By counterbalancing the order in which a person participates in a condition, we can
ensure that they produce no systematic variation between our conditions.
- The independent design
o To randomly allocate participants to conditions: by doing so you minimize
the risk that groups differ on variables other than the one you want to
manipulate.
Chapter 1.8 Analysing data
Frequency distribution (or: histogram): plotting a graph of how many times each score in
a set of data occurs.
Normal distribution: is characterized by the bell-shaped curve. This shape implies that
the majority of scores lie around the centre of the distribution (the largest bars on the
histogram are around the central value). Also, as we get further away from the centre,
the bars get smaller, implying that as scores start to deviate from the centre their
frequency is decreasing.
A distribution can deviate from normal in two ways:
- Lack of symmetry (skew). These are the most frequent scores; the tall bars on the
graph are clustered at one end of the scale.
o Positively skewed: the frequent scores are clustered at the lower end.
o Negatively skewed: the frequent scores are clustered at the higher end.
- Pointiness (kurtosis). Refers to the degree to which scores cluster at the ends of
the distribution (or: tails) and this tends to express itself in how pointy a
distribution is.
o Positive kurtosis: many scores in the tails (heavy-tailed distribution;
leptokurtic).
o Negative kurtosis: relatively thin in the tails and tends to be flatter than
normal (platykurtic).
In a normal distribution the values of skew and kurtosis are 0 (i.e. the tails of the
distribution are as they should be).
We can calculate where the centre of a frequency distribution lies (or: central tendency).
There are three measures:
- Mode: the score that occurs most frequently in the data set.
o Bimodal: data sets with two modes.
o Multimodal: data sets with more than two modes.
- Median: the middle score, when scores are ranked in order of magnitude.
- Mean: the average score (sum of all scores divided by the number of scores).
Range (or: spread/dispersion): take the largest score and subtract from it the smallest
score.
, - Since it only uses the highest and lowest score, it is affected by extreme scores.
- A way around this is to use the interquartile range (IQR).
The quartiles are the parts that split the data into four equal parts of each 25%. The
second quartile (or: median) splits the data into two equal parts (50% of measurements).
The lower quartile is the median of the lower half; the upper quartile is the median of the
upper half.
- Rule of thumb: the median is not included when the two halves are split, which is
convenient when there is an odd number of values. It is, however, possible to
include it.
- Like the median, if each half of the data had an even number of values in it, then
the upper and lower quartiles would be the average of two values in the data set.
Therefore, the upper and lower quartile need not to be the values that actually
appear in the data.
- The interquartile range is the difference between the upper and the lower
quartile.
Quantiles: values that split a data set into equal portions; quartiles that split the data into
four equal parts.
- Percentiles: points that split the data into 100 equal parts.
- Noniles: points that split the data into nine equal parts.
Figure: Interquartile range (normal distribution)
If we want to use all the data rather than half of it, we can calculate the spread of scores
by looking at how different each score is from the centre of the distribution. If we use the
mean as measure of the centre of distribution, we can calculate the difference between
each score and the mean, known as the deviance.
- Deviance1: the distance of each score from the mean.
- Total deviance2: when added up (all deviances), this equals zero (there are both
the same negative and positive deviances = 0). To overcome this, people tend to
square the deviances (minus * minus equals positive).
Measures of dispersion or spread of data around the mean:
- Sum of squared errors (SS)3: adding up the squared deviances. Often called the
sum of squares. This can be used as an indicator of the total dispersion, or total
deviance of scores from the mean.
1
Equation: 1
2
Equation: 2
3
Equation: 3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller jaccoverbij. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.21. You're not tied to anything after your purchase.