Chapter 14:
o Statistical methods can be used to summarize, simplify or condense data, but they can also be
used to help us draw conclusions based on the data that you have collected.
o 2 ways to use statistics and statistical techniques:
Descriptive statistics: Numbers or graphs that summarize or condense a set of sample
data. But, there is a degree of uncertainty, because of information is maybe not true
reflection of population.
Inferential statistics: Set of methods used to reach conclusions about populations based
on samples and probability. What is likelihood that it represents the whole population?
o Levels of measurement
Nominal: Categories with no numeric scales. Ex. Females/Males
Ordinal: Rank ordering, numeric values limited, about quantity. Ex. 2-star hotel
Interval: Numeric properties are literal, assume equal interval between values, how
much does the values differ. Ex. Temperature, intelligence
Ratio: Zero indicates absence of variable measured, how much of the quantity exists. Ex.
Reaction time, age, weight.
o Descriptive statistics
Central tendency: It tells us about the score that the data tend to centre around. How an
entire group scores as a whole, or on average. Summarizes middle of the data. Measures:
Mean: The arithmetic average, the sum of the scores divided by the
number of scores. Suitable for interval or ratio data.
Median: The point that divided a set of scores into equal halves, half the
scores are above the median and half are below it. Simply find the
middle score of the ordered scores from lowest to highest. Suitable for
ordinal, interval or ratio data.
Mode: The score that occurs most frequently. In data where every score
is different, there are two modes (=bimodal). Suitable for every data.
Which measure of central tendency should I use?
Most of the time most used is the mean, because it is the most sensitive
measure, because its value is affected by the magnitude of each score. But
problems by extreme scores (=score that lies towards the extremes of a
distribution, sometimes defined as lying more than 3x interquartile range
from the other scores). With extreme scores, you can better use median.
Spread: The degree of dispersion or variability of scores in a data-set. To give a good
summary of a data-set, it is not enough to just give a measure of central tendency.
Range: The highest score minus the lowest score. Is least informative
measure of spread, because the range is only based on 2 scores and very
sensitive for extreme scores/outliers.
Interquartile range: A range of scores that captures the 50 % of a
distribution that is between the 25th and the 75th %. How to calculate?
First organizing data from low to high, then split data into 4 equal parts.
Qlower= Number of scores +. Qupper= 3(N+1) / 4. The median is also a
section.
Variance: The average squared deviation from the mean. Uses all
information from the scores. So look at each score and how far it is from
the mean. Because there are negative and positive values, you have to
square the deviations. Add all these values, then dividing the total
squared values by N-1.
Standard deviation: A measure of spread of scores about the mean, the
average deviation from the mean.
Summarizing data in tables and graphs
Frequency distributions: Arrangement of scores or categories with the
frequency of each score or category shown.
Histograms: Is a graph of frequencies of a continuous variable
constructed with contiguous vertical bars. Only for quantitative
variables.
Frequency polygon: A frequency distribution graph of a continuous
variable with frequency points connected by lines.