Statistics
The research process
Find something that needs explaining:
- Observe the real world
- Read other research to see: whether someone else has made the same observation you
have, whether there is a theory that might explain why you observe what you observe.
Generating theories and hypotheses
Theories: A hypothesized general principle or set of principles that explain known findings
about a topic and from which new hypotheses can be generated.
Hypothesis: a proposed explanation for a fairly narrow phenomenon or set of observations. It
seeks to explain a narrower and untested phenomenon.
Prediction: the transformed, visible, and thus measurable aspect of the hypothesis.
Testing hypotheses through falsification
Falsification: you can only examine whether a theory/hypothesis is credible, if there is a
possibility to disprove it. It is the act of disproving a theory or hypothesis.
The principle of hypothesis testing
Women are more intelligent than men
Point of departure = assumption that there is no difference
- This gives a point of comparison
- If no difference, than IQ (women) - IQ (men) = 0
- We can predetermine: if I measure IQ in 1000 people, and the mean difference between
men and women is larger than 5 IQ-points, then it is very unlikely that this difference is
coincidence.
Types of hypotheses
Null hypothesis, H0 < this is the one we try to reject.
- There is no effect (most of the time)
- Example: women are equally likely as men to wear a skirt or dress; there is no
relationship between age and the number of wrinkles you have.
The alternative hypotheses, H1 < If we can reject H0, this one is SUPPORTED by the data,
but not proven
- Women are more likely to wear a skirt or dress than a man
- There is a positive relationship between age and the number of wrinkles you have: the
older people are, the more wrinkles they have.
Why do we need statistics?
- Statistics offer us a means to determine exactly how (un)likely it is that we would
observe a set of data if the null hypothesis were true.
- If it is unlikely (chances are smaller than 5%) we may conclude that there is support for
our alternative hypothesis.
- In other words: we examine the chance the null hypothesis is true.
,Variable: independent vs. dependent variable
What to measure: variables
- A variable ‘varies’: it has different values.
Independent variable
- If experiment: the proposed cause, which is manipulated
- If survey: a predictor variable
Dependent variable
- If experiment: the proposed effect
- If survey: an outcome variable
- Measured, not manipulated
Two important questions you need to be able to answer:
1. What is the dependent variable, what is the independent variable?
2. What is the measurement level of my variables?
Research Designs
Two frequently used research designs:
- Experimental designs
- Correlational designs
Correlational designs:
- You measure/observe (perceived) reality
1. Examine associations
2. Predictor > outcome variable
Variables: measurement levels
Two main categories:
Categorical variables:
- Entities are divided into distinct categories
Continuous variables
- Entities get a distinct score
Important distinction because they determine which analysis you can do.
Categorical (entities divided into distinct categories):
1. Binary or dichotomous variable: there are only two categories
2. Nominal variable: there are more than two categories
Binary and nominal variables only allow you to say whether something equals something or not
(equality).
3. Ordinal variable: the same as a nominal variable but the categories have a logical order.
Allow you to say something about the order of things (order)
,Continuous (entities get a distinct score):
1. Interval variable: equal intervals on the variable represent equal differences in the
property being measured.
They allow you to say something about the distance between units (distance), as well as order
and equality.
2. Ratio variable: the same as an interval variable, but the ratios of scores on the scales
must also make sense.
They allow you to say something about the ratio between measurements (ratio).
There are five key concepts to wrap your head around:
Standard error
Parameters
Interval estimates (confidence intervals)
Null hypothesis significance testing
Error
Scientists tend to use linear models, which are based on a straight line.
Parameters are not measured and are (usually) constants believed to represent the
fundamental truth about the relations between variables in the model.
Graphs
They should do the following:
- Show the data.
- Induce the reader to think about the data being presented (rather than some other
aspect of the graph, like how pink it is).
- Avoid distorting the data.
- Present many numbers with minimum ink.
- Make large data sets (assuming you have one) coherent.
- Encourage the reader to compare different pieces of data.
- Reveal the underlying message of the data.
Properties of Frequency distributions
Skew
- The symmetry of the distribution.
- Positive skew (scores bunched at low values with the tail pointing to high values: or tail
to right)
- Negative skew (scores bunched at high values with the tail pointing to low values: or tail
to left)
Kurtosis
- The heaviness of the tails.
Central tendency and dispersion
Important features of a distribution:
, Mode: the most frequent score.
It is better used with non numerical data. It can be problematic because we can run into bimodal
or multimodal problems.
- Bimodal: having two modes
- Multimodal: having several modes.
Mean: the average score.
Median: the middle score when they are organized by order.
Standard deviation: an estimate of the average variability of a set of data.
A deviation is the difference between the mean and an actual data point.
Deviations can be calculated by taking each score and
Variance
The sum of squares is a good measure of overall variability, but it is dependent on the number
of scores.
- We calculate the average variability by dividing by the number of scores (n-1).
The variance has one problem: it is measured in units squared.
This isn't a very meaningful metric so we take the square root value. This is the standard
deviation.
Things to remember:
- The mean is a simple statistical model
- The standard deviation/variance is part of how representative the mean is compared to
the whole lot of data.
- The sum of Squares, Variance, and Standard Deviation essentially represent the same
thing: the “Fit” of the mean to the data, the variability in the data, how well the mean
represents the observed data or also the error.
Then why use both the variance and the standard deviation?
- The SD is more adequate when it comes to describing the spread of the data.
- The variance has importance in the more “advanced” statistics, such as Analysis of
Variance.
Population Sample
Mean x̅ μ or “mu”
Standard Deviation s σ or “sigma”
Statistic Parameter
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper ximenalopez. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €5,29. Je zit daarna nergens aan vast.