Samenvatting

Statistics 1 summary

2 keer verkocht

Vak
Statistics 1 (GESTAT1)

Instelling
Rijksuniversiteit Groningen (RuG)

Summary of áll the lectures, computer practicals and seminars of Statistics 1 in the year 2022/2023. This includes personal notes and examples + figures from the PowerPoint presentations.

[Meer zien]

Voorbeeld 4 van de 51 pagina's

Bekijk voorbeeld

Geupload op 24 januari 2023
Aantal pagina's 51
Geschreven in 2022/2023
Type Samenvatting

Volgen

Enya96 Lid sinds 2 jaar 18 documenten verkocht

€7,49

In winkelwagen

Opslaan

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

STATISTICS 1
Week 1 - Lecture 1 & 2

Statistics is a guessing game.
You never know the parameter/ the truth about the population, you only hope that you are close.

Population = The group that you wish to describe (The entire set of elements)
Sample = The group for which you have data (A subset of elements from the population,
taken with the intention of making inferences about the population)

Why take a Sample?
› Describing the whole population is:
• Too expensive
• Impossible
• Sampling might be destructive
• Impractical
• Unnecessary

Parameter = Numerical property of the population (based on the entire population/ the truth)
Statistic = Numerical property of a sample (based on a statistic)

Sampling error
› A difference between the value of a parameter and the statistic computed to estimate that
parameter
› Result of:
• Variability
• Sampling Bias
• Nonsampling Error

Reducing Sampling Error
› Variability (this Lecture)
- Increase n
› Sampling Bias (this Lecture)
- Design of sampling procedure
› Nonsampling Error
- Validity, Accuracy, Precision of variables
- Prevent coding errors
- Prevent interpretation errors
- Also: good labelling, metadata

➔ You do have control over variability, sampling bias and nonsampling error, you want to
minimalize them.

Variability = The phenomenon whereby repeated sampling from the same population results in
different values for the statistic.

Example; ask 5 students age in course group. Ask again with different 5 students. The difference in
average age. How different?
= variability (size and diversity important). Statistically you want it to be as low as possible, increase
confidence in result. Solution is increase sample size.

1

,Sampling distribution = Describes how the statistic varies when sampling is repeated.
- In other words: describes (extent of) variability
- This is the basis for inference

Central Limit Theorem
Even if a variable X is not normally distributed in the population …
› … we may assume that …
Under certain conditions, such as a large number of cases and a fixed standard deviation σ
› ... the Sampling Distribution of the mean is approximately normal with standard error:

Sampling Bias = Result of procedures which favour the inclusion, in your sample, of elements from
the population with certain characteristics. (make sure you have the right people in your sample)

› Sources of Sampling Bias: (a combination of) the
- population
- researcher
- research design
- research topic
- respondent
› May result in:
- incomplete coverage: relevant elements not in sampling frame
- nonresponse: refusal or missing data

➔ Increasing the sample size increases the problem.

Population, reductant to participate, don’t trust science.
Researcher, are we capable to see population?

Difference between probability and non-probability sample: who is taking the decisions.

2

,Probability samples: driven by chance + reduced sampling bias.
Non-probability samples: researcher is in charge + risk of bias.
Judgemental: handpicked who you research, suitability.
Volunteer: hey I wanna be in your research.
Convenience: laziness, only ask people who are there/queuing> easy and nowhere else to go.
Cluster (random): assumption that you have groups in your population that are similar. Then it
doesn’t really matter who you pick.
Stratified: opposite of cluster, different groups. Maybe different approaches per group.
Systematic (random): population already ordered, example; student numbers. Every 5th person etc.
Simple random: ideal case, perfect list same probability. Clear population + list + randomly selected.
Independent: small population, trick. Independent, keep probability the same to being selected. Take
them out, ask questions, put them back in the group.
Quota: Targets, find me 100 people of this kind, without intend of representative. Just about getting
the numbers. Not representative.

Simple random and convenience difference; most convenient way disregarding the population you
would like to cover. Simple random different approach, work hard to cover population and choose
from that. If lucky; convenience can be representative.

Example Public Transport Bureau = stratification; different groups of commuters. Clustered design in
stratified group possible. Not systematic, cause you leave out all the people without passes.
➔ Exam: which groups do you want to research/ define population and sample, are they
different? Work your way up which strategy you would choose, cover each group.
+ Definitions from the book. Don’t remember formulas. Pick right formula and apply.

Geographic sampling:
- Traverse samples; lines
- Quadrat samples; squares
- Point samples; dots
You want it to be random.

Processing of data
› How to deal with nonresponse
Distinguish:
• Choice of respondent
- Can still be regarded as a value
- “no opinion” still informs about the respondents opinion
- “don’t know” still informs about the reason of nonresponse
• Other causes
- “no answer” does not inform about the position of the respondent

Types of data
Qualitative (Non-numerical values)
› Categories
Quantitative (Numerical values (counts, measurements)
› Discrete; Range of possible values is limited (how many cars do you have, no commas)
› Continuous; Intermittent values are also possible (height, can be specific. Also averages, inhabitants
have an average of .5 cars; variable is number of cars per household, not specifically about cars or
inhabitants anymore.)

3

, Measurement levels
› Nominal
- Categorical, no ranking
› Ordinal
- Categorical, ranked (low-high, bad-good etc.)
- Degrees of a certain phenomenon
- Width of intervals unknown
› Ratio (& Interval) = scale in SPSS
- Width of intervals known (= equidistance)
- We can compute differences
Interval and ratio difference; ratio has a natural/absolute/true zero point.
Example; Celsius = interval (below zero no absence of temperature) and Kelvin = ratio.

Example grey colours: ordinal.
Example countries: nominal.
Example German political parties: nominal. Variable more specific; number of seats/ degree of
conservativeness makes it different.
Example satisfaction: ordinal. Opinion, width unknown.

Binary variables (a.k.a.: Dummy, or Boolean) (rules out the measurement levels = nominal)
› Two possible values: True or not true, yes or no, 1 or 0, agree or disagree.
› Special case of a nominal variable: Mean = proportion of “1”. > Possibility to calculate useful
average!

Choose suitable variables and measurement levels.

Exploratory Data Analysis
› Study data in order to describe key properties
- What do you see?
› For each variable
- Diagrams and / or tables
- Numerical summaries of distributions
› No single best way of doing EDA
- BUT: the starting point of any decent quantitative analysis!

Distributions (> quality control, does the variable do what it is supposed to do)
› Shape
› Center
› Spread

4

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.