All subjects discussed in the Statistics part of the Onderzoeksmethoden voor Informatica (INFOB3OMI) course, clearly summarized in a structured way. Based on the lectures and the slides.
Definition 2
Data basics 2
Sampling strategies 2
Experiments 4
Summarizing data 5
Probability 8
General inference 10
Point estimates and sampling variability 10
Confidence Interval 11
Hypothesis Testing 11
Inference for categorical data 13
Inference for a single proportion 13
Inference for comparing proportions 14
Inference for numerical data 15
t-distribution 15
Di erence in two means 16
Multiple comparisons 17
Correlation 17
,Definition
Statistic tries to answer questions related to uncertainty and uses numerical evidence to draw
valid conclusions.
- Event has not occurred yet
- Event has occurred, but we do not know the answer for sure
Statistics is the science concerned with developing and studying methods for collecting,
analyzing, interpreting and presenting empirical data.
Data basics
Data matrix Matrix showing the data for various variables
- Rows are observations, columns are variables
Types of variables
- Numerical: numeric value
- Continuous: for each value, there are other numbers that are arbitrarily close (e.g.
temperature, infinitely many values)
- Discrete: there is some distance between each two elements; there is a jump from
one variable to the next with nothing in between (e.g. number of people, integers)
- Categorical: everything that is not numerical (though numbers can be used to indicate a
category, but then they do not have any numerical meaning/are used as labels)
- Ordinal: has some intuitive ordering on the variables (e.g. education level)
- Regular: everything else (e.g. names)
Positive relationship If one variable increases, the other also tends to increase
Associated vs independent
- Associated/dependent variables: two variables that show a connection with one another
- Two variables that are associated with each other do not have to have a causal
relation
- Independent variables: two variables are not associated, no evident connection
Association strength The close the data is plotted together, the stronger the association
Sampling strategies
Population of interest Group of people that are relevant to a study
Sample Subset of the population of interest
Census Sample of the entire population
- Problems with this:
- Hard-to-reach individuals are .. well, hard to reach
- Population is changing constantly and measuring takes time
- Taking a census is more complex than more simple sampling
, Descriptive statistics Describe what a sample means, without generalizing
- Taste a spoonful of soup and decide that the spoonful is not salty enough
Inferential statistics Generalize the result of a sample to infer an overall conclusion
- For an inference to be valid, the sample needs to be representative of the entire
population
- Stir the soup very well before tasting the spoonful
Sampling bias
- Non-response: if only a small subset of the sampled people choose to respond, the
sample may no longer be representative of the population
- Voluntary response: when the sample consists of people who volunteer to respond, it
may not be representative as these people might have very strong opinions on the issue
- Convenience sample: individuals who are most easily accessible may not be
representative of the population
- For example, sampling random people in a city that is not easily accessible to
disabled people will not be representative as you’re only sampling non-disabled
people
- Size of the sample does not have to say anything about whether or not it is representative
(a huge sample size can still be biased)
Observational studies Data is collected in a way that does not directly interfere with how the
data arises
- Results of these studies can generally be used to establish association between
explanatory and response variables, but no causal statements
Sampling methods
- Simple random sampling: randomly selects cases from the population, without any
implied connection between the selected cases
- Stratified sampling: taking a simple random sample from each stratum separately
- Strata: group made up of similar observations (e.g. distinguishing between
male/female or grouping based on income level)
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Suniht. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.60. You're not tied to anything after your purchase.