All subjects discussed in the Statistics part of the Onderzoeksmethoden voor Informatica (INFOB3OMI) course, clearly summarized in a structured way. Based on the lectures and the slides.
Definition 2
Data basics 2
Sampling strategies 2
Experiments 4
Summarizing data 5
Probability 8
General inference 10
Point estimates and sampling variability 10
Confidence Interval 11
Hypothesis Testing 11
Inference for categorical data 13
Inference for a single proportion 13
Inference for comparing proportions 14
Inference for numerical data 15
t-distribution 15
Di erence in two means 16
Multiple comparisons 17
Correlation 17
,Definition
Statistic tries to answer questions related to uncertainty and uses numerical evidence to draw
valid conclusions.
- Event has not occurred yet
- Event has occurred, but we do not know the answer for sure
Statistics is the science concerned with developing and studying methods for collecting,
analyzing, interpreting and presenting empirical data.
Data basics
Data matrix Matrix showing the data for various variables
- Rows are observations, columns are variables
Types of variables
- Numerical: numeric value
- Continuous: for each value, there are other numbers that are arbitrarily close (e.g.
temperature, infinitely many values)
- Discrete: there is some distance between each two elements; there is a jump from
one variable to the next with nothing in between (e.g. number of people, integers)
- Categorical: everything that is not numerical (though numbers can be used to indicate a
category, but then they do not have any numerical meaning/are used as labels)
- Ordinal: has some intuitive ordering on the variables (e.g. education level)
- Regular: everything else (e.g. names)
Positive relationship If one variable increases, the other also tends to increase
Associated vs independent
- Associated/dependent variables: two variables that show a connection with one another
- Two variables that are associated with each other do not have to have a causal
relation
- Independent variables: two variables are not associated, no evident connection
Association strength The close the data is plotted together, the stronger the association
Sampling strategies
Population of interest Group of people that are relevant to a study
Sample Subset of the population of interest
Census Sample of the entire population
- Problems with this:
- Hard-to-reach individuals are .. well, hard to reach
- Population is changing constantly and measuring takes time
- Taking a census is more complex than more simple sampling
, Descriptive statistics Describe what a sample means, without generalizing
- Taste a spoonful of soup and decide that the spoonful is not salty enough
Inferential statistics Generalize the result of a sample to infer an overall conclusion
- For an inference to be valid, the sample needs to be representative of the entire
population
- Stir the soup very well before tasting the spoonful
Sampling bias
- Non-response: if only a small subset of the sampled people choose to respond, the
sample may no longer be representative of the population
- Voluntary response: when the sample consists of people who volunteer to respond, it
may not be representative as these people might have very strong opinions on the issue
- Convenience sample: individuals who are most easily accessible may not be
representative of the population
- For example, sampling random people in a city that is not easily accessible to
disabled people will not be representative as you’re only sampling non-disabled
people
- Size of the sample does not have to say anything about whether or not it is representative
(a huge sample size can still be biased)
Observational studies Data is collected in a way that does not directly interfere with how the
data arises
- Results of these studies can generally be used to establish association between
explanatory and response variables, but no causal statements
Sampling methods
- Simple random sampling: randomly selects cases from the population, without any
implied connection between the selected cases
- Stratified sampling: taking a simple random sample from each stratum separately
- Strata: group made up of similar observations (e.g. distinguishing between
male/female or grouping based on income level)
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Suniht. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.