INTRODUCTION
What is statistics?
● A tool that enables you to test some beliefs, expectations,
predictions about the state of affairs in the world or relationships
between them. The aim is to update your beliefs, models or
theories
● The discipline that concerns the collection, organization, analysis,
interpretation, and presentation of data.
(https://en.wikipedia.org/wiki/Statistics)
○ Data = collection of recording about things that occurred and
can consist of observable units and manipulated variables
Inferential vs. Descriptive Statistics
Descriptive Statistics Inferential Statistics
Summarize your data in a clear and correct way 1. Make inferences about the population based on a sample
2. Allows you to generalize from your sample
Statistics is about estimates AND the uncertainty of these estimates
Example 1
● Experiment: You do heads of tales with a coin three times
● Result: The coin is three times heads
● Estimate: The coin is biased and will always lead to heads
● Coincidence plays a role!
● Uncertainty: You are not so certain because you only threw the coin 3 times
Example 2
● Estimate: Based on this class, the average IQ of the university is about 120
● Coincidence plays a role! Another class may lead to another result. If this class has an average IQ of 120, what is the probability of
having an average IQ of 80 in the whole university? Or of 110? What is the size of the class?
● Uncertainty:
○ We are 95% sure the average IQ of the university is between 100 and 130.
○ We are 95% sure the average IQ of the university is between 80 and 150.
● Uncertainty is as important as the estimate
Statistics is Everywhere!
● How many persons do have COVID-19 right now in Belgium?
● Do vaccines help?
● Who did win the presidential election?
● Who did win the presidential election among black voters?
● Is CO2 output related to climate change?
● How many students do yearly pass this course?
● If you failed last year, what is the probability that you will succeed this year?
DISTRIBUTION OF DATA (ONE VARIABLE)
1. Data
● Put data in Table, link data together using subject number
● Make sure all data uses same measurement scale
● Missing data can be represented (in SPSS) by either a dot (.) or a number (e.g.
-999) that you choose. Make sure the number can not exist in your data.
2. Types of Data
1. Quantitative vs Qualitative
2. Measurement level
a. Nominal
b. Ordinal
c. Interval
d. Ratio
,3. Discrete vs Continuous
4. Dependent vs Independent
2.1. Quantitative vs. Qualitative
● Difference (between quant and qual) is very important to see how a variable can be used in statistics
Quantitative Qualitative (or categorical) Example
● Always a number ● No math possible, one value
● You can do math with it is not bigger or better than
● E.g.: Age, exam score, virus the other
load, length ● You can only say whether
● You CAN ask a question like two observational units
‘what is the average age in (subjects) are the same or
this class’ not (called equivalence)
● E.g.: Gender, race, country of
origin, favorite movie
● You cannot ask a question
like ‘what is the average sex
in this class’
2.2. Measurement Level
Nominal Ordinal Interval Ratio
● Same as qualitative ● Is the same as a nominal ● The variable has a ● Fixed unit: an increase
variable variable apart that the standard unit, of which an with 1 (point) has the same
categories have a logical increase with 1 (point) has meaning across the whole
Equivalence order the same meaning across range of values
● = the condition of being ● Numbers denote an order the whole range of values ● Absolute zero: a point
equivalent or essentially relation ○ The difference between where none of the quality
equal ● Smaller / Larger values can 14 and 15 degrees or being measured exists
● Example of equivalence reflect Better, bigger / between 30 and 31 ● Ratio of the scores on the
for a nominal variable: Worse degrees is the same scale make theoretical
● “Stephanie and Marc were ● Typical examples: ● Equal intervals on the sense.
both born in Country 1 ○ Letters to denote grades variable represent equal ○ “The screen of this TV is
(Belgium). So, Stephanie (grading system in the differences in the property 43 inch. The screen of
has the same country of USA: A, B, C, D) being measured the other TV is 86 inch.
origin as Marc.” ○ Rating scales: never, The diagonal of the
rarely, sometimes, often ● Typical examples: second screen is twice
○ Ranking in a competition ○ Degrees Celsius as big as the one of the
(first, second, third) ○ Calendar year first screen.”
● Somewhere between
qualitative and ● There is no natural point ● Typical examples:
quantitative zero 0 does not mean ○ Length
○ Some researchers use anything special ○ Response times
statistics specifically for ● Multiplications or divisions ○ Age
ordinal scale (best (ratios) do not make any
option) theoretical sense.
○ Some researchers use ○ For example, the
them as categorical difference between 68
variables without order degrees F and 58
(lose information) degrees F is the exact
○ Some researchers use same as 101 degrees F
them as quantitative and 91 degrees F. In this
variables example, you can not say
● Numbers represent a that 98 degrees F is
logical order double the temperature
● You can say whether two in terms of “heat” or
observational units “cold” of 49 degrees F.
(subjects) are the same or This is because there is
not (called equivalence) no absolute zero on the
● You can say whether one Fahrenheit scale – that
of two observational units is at zero temperature
, (subjects) is better or not doesn’t exist.
(called order)
● The direction of the logic
order can change. Smaller
numbers denotes better
categories, or vice versa
larger numbers denote
better categories
Ordering
● = the different levels of a
variable can be structured
in a logical order
● Example of ordering for an
ordinal variable:
○ “Alicia got an A and
Britney got a B on last
week’s test. Alicia
performed better as
Britney”
2.3. Discrete vs. Continuous Variables
Discrete Continuous
● variable can only take certain values ● Variable can take any value on the scale (e.g., length)
● Limit amount of potential values ● Can always take a value between two values (in principle it can
● Examples: number of children in a family, religion take on any value)
● Examples: Reaction times, Length
In practice, consider a variable to be continuous if:
● Large number of possible values (e.g., test score)
● The underlying variable can be considered to be continuous
(e.g., math ability)
2.4. Dependent vs. Independent Variable
, ● Mainly how it is used in the model, does not insinuate causal relation for regression analysis
○ Gender → exam results: You can predict exam results with gender
○ Exam results → gender: You can predict gender with exam results (even though exam results will have no impact on
gender)