Statistics
The research process
Find something that needs explaining:
- Observe the real world
- Read other research to see: whether someone else has made the same observation you
have, whether there is a theory that might explain why you observe what you observe.
Generating theories and hypotheses
Theories: A hypothesized general principle or set of principles that explain known findings
about a topic and from which new hypotheses can be generated.
Hypothesis: a proposed explanation for a fairly narrow phenomenon or set of observations. It
seeks to explain a narrower and untested phenomenon.
Prediction: the transformed, visible, and thus measurable aspect of the hypothesis.
Testing hypotheses through falsification
Falsification: you can only examine whether a theory/hypothesis is credible, if there is a
possibility to disprove it. It is the act of disproving a theory or hypothesis.
The principle of hypothesis testing
Women are more intelligent than men
Point of departure = assumption that there is no difference
- This gives a point of comparison
- If no difference, than IQ (women) - IQ (men) = 0
- We can predetermine: if I measure IQ in 1000 people, and the mean difference between
men and women is larger than 5 IQ-points, then it is very unlikely that this difference is
coincidence.
Types of hypotheses
Null hypothesis, H0 < this is the one we try to reject.
- There is no effect (most of the time)
- Example: women are equally likely as men to wear a skirt or dress; there is no
relationship between age and the number of wrinkles you have.
The alternative hypotheses, H1 < If we can reject H0, this one is SUPPORTED by the data,
but not proven
- Women are more likely to wear a skirt or dress than a man
- There is a positive relationship between age and the number of wrinkles you have: the
older people are, the more wrinkles they have.
Why do we need statistics?
- Statistics offer us a means to determine exactly how (un)likely it is that we would
observe a set of data if the null hypothesis were true.
- If it is unlikely (chances are smaller than 5%) we may conclude that there is support for
our alternative hypothesis.
- In other words: we examine the chance the null hypothesis is true.
,Variable: independent vs. dependent variable
What to measure: variables
- A variable ‘varies’: it has different values.
Independent variable
- If experiment: the proposed cause, which is manipulated
- If survey: a predictor variable
Dependent variable
- If experiment: the proposed effect
- If survey: an outcome variable
- Measured, not manipulated
Two important questions you need to be able to answer:
1. What is the dependent variable, what is the independent variable?
2. What is the measurement level of my variables?
Research Designs
Two frequently used research designs:
- Experimental designs
- Correlational designs
Correlational designs:
- You measure/observe (perceived) reality
1. Examine associations
2. Predictor > outcome variable
Variables: measurement levels
Two main categories:
Categorical variables:
- Entities are divided into distinct categories
Continuous variables
- Entities get a distinct score
Important distinction because they determine which analysis you can do.
Categorical (entities divided into distinct categories):
1. Binary or dichotomous variable: there are only two categories
2. Nominal variable: there are more than two categories
Binary and nominal variables only allow you to say whether something equals something or not
(equality).
3. Ordinal variable: the same as a nominal variable but the categories have a logical order.
Allow you to say something about the order of things (order)
,Continuous (entities get a distinct score):
1. Interval variable: equal intervals on the variable represent equal differences in the
property being measured.
They allow you to say something about the distance between units (distance), as well as order
and equality.
2. Ratio variable: the same as an interval variable, but the ratios of scores on the scales
must also make sense.
They allow you to say something about the ratio between measurements (ratio).
There are five key concepts to wrap your head around:
Standard error
Parameters
Interval estimates (confidence intervals)
Null hypothesis significance testing
Error
Scientists tend to use linear models, which are based on a straight line.
Parameters are not measured and are (usually) constants believed to represent the
fundamental truth about the relations between variables in the model.
Graphs
They should do the following:
- Show the data.
- Induce the reader to think about the data being presented (rather than some other
aspect of the graph, like how pink it is).
- Avoid distorting the data.
- Present many numbers with minimum ink.
- Make large data sets (assuming you have one) coherent.
- Encourage the reader to compare different pieces of data.
- Reveal the underlying message of the data.
Properties of Frequency distributions
Skew
- The symmetry of the distribution.
- Positive skew (scores bunched at low values with the tail pointing to high values: or tail
to right)
- Negative skew (scores bunched at high values with the tail pointing to low values: or tail
to left)
Kurtosis
- The heaviness of the tails.
Central tendency and dispersion
Important features of a distribution:
, Mode: the most frequent score.
It is better used with non numerical data. It can be problematic because we can run into bimodal
or multimodal problems.
- Bimodal: having two modes
- Multimodal: having several modes.
Mean: the average score.
Median: the middle score when they are organized by order.
Standard deviation: an estimate of the average variability of a set of data.
A deviation is the difference between the mean and an actual data point.
Deviations can be calculated by taking each score and
Variance
The sum of squares is a good measure of overall variability, but it is dependent on the number
of scores.
- We calculate the average variability by dividing by the number of scores (n-1).
The variance has one problem: it is measured in units squared.
This isn't a very meaningful metric so we take the square root value. This is the standard
deviation.
Things to remember:
- The mean is a simple statistical model
- The standard deviation/variance is part of how representative the mean is compared to
the whole lot of data.
- The sum of Squares, Variance, and Standard Deviation essentially represent the same
thing: the “Fit” of the mean to the data, the variability in the data, how well the mean
represents the observed data or also the error.
Then why use both the variance and the standard deviation?
- The SD is more adequate when it comes to describing the spread of the data.
- The variance has importance in the more “advanced” statistics, such as Analysis of
Variance.
Population Sample
Mean x̅ μ or “mu”
Standard Deviation s σ or “sigma”
Statistic Parameter
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller ximenalopez. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.75. You're not tied to anything after your purchase.