Introduction to Statistics – Summary
Block 3, Week 1
Chapter One: Introduction to Statistics
- Statistics:
➢ Science of collecting, analysing, presenting, and interpreting data
➢ Branch of mathematics dealing with collection, analysis, interpretation, and presentation
of mass of numerical data
- Statistics (statistical procedures): methods for organising, summarising, and interpreting data
- Purposes of statistics
➔ Organise and summarise information
➔ Answer questions that initiated research y determining general conclusions
- Population: entire set of individuals one wishes to study
- Sample: set of individuals selected from population, intended to represent population in
research study
- Variable: characteristic or condition that changes or has different values for different
individuals
- Data (plural): measurements or observations
- Data set: collection of measurements or observations
- Datum (singular): single measurement or observation, commonly called score / raw score
- Statistic: characteristic describing a sample, usually numerical value, derived from
measurements of individuals in sample
- Parameter: characteristic describing a population, usually a numerical value, derived from
measurements of individuals in population
- Sampling error: naturally occurring difference between statistic and parameter (sample
statistics usually representative of parameters, but small discrepancy)
➔ fundamental error of inferential statistics
➔ “Margin of error”
- Two broad categories of statistical methods: descriptive and inferential
- Descriptive statistics: statistical procedures used to organise, summarise, and simplify data
- Inferential statistics: use sample data to draw inferences about populations
- Descriptive research: research studies conducted to describe individual variables as the exist
naturally
,- Correlational method: examining relationships between variables by measuring different
variables for each individual
➔ Measure and describe relationships, but no cause-and-effect explanation
➔ Numerical values: scatter plot
➔ Non-numerical values (for both scores): table, chi-square test
- Experimental method / experimental research strategy: examining relationships between
variables by manipulating an independent variable to create different treatment conditions
and then measuring a dependent variable to obtain a group of scores in each condition,
groups of scores then being compared
➔ Systematic difference between groups: evidence that changing IV causes change in DV
➔ All other variables controlled (avoid confounds):
- Participant variables: characteristics, e.g., age, gender that vary from one individual
to another
- Environmental variables: characteristics of the environment
➔ Control of variables through random assignment, matching, holding variables constant
➔ Intent: demonstrate cause-and-effect relationship
➔ Numerical scores: comparing averages
➔ Non-numerical categories: computing proportions for each group and comparing them
- IV: two or more treatment conditions, antecedent conditions manipulated prior to observing
DV
- DV: measured
- Control condition: no / placebo treatment, serving as baseline for comparison
- Experimental condition: experimental treatment
- Nonexperimental studies: examining relationships between variables by comparing groups of
scorers, but without rigor of true experiments
➔ No cause-and-effect explanations
➔ Use of pre-existing participant characteristics or passage of time to create groups instead
of manipulation of IV
➔ Non-equivalent groups or pre-post studies
➔ Quasi-independent variable
- Constructs: internal attributes or characteristics that cannot be directly observed but are
useful for describing and explaining behaviour
- Operational definition: identifies a measurement procedure for measuring an external
behaviour, using measurements as definition and measurement of hypothetical construct
➔ Two components: describing set of operations for measuring construct, and defining
construct in terms of resulting measurements
- Measurement scale: consists of set of categories used to classify individuals
- Nominal scale: categories that differ only in name and are not differentiated in terms of
magnitude or direction; categorisation and labelling of observations but no quantitative
distinction, no zero
, ➔ Use of numerical values possible, but only as codes / names
- Ordinal scale: categories are differentiated in terms of direction, forming an ordered series,
ranking observations in terms of size or magnitude, but no equal difference between ranks
and no zero
- Interval scale: ordered series of categories that are all equal-sized intervals
➔ Possible to differentiate direction and magnitude or distance between categories
➔ Equal differences between numbers on scale = equal differences in magnitude, i.e., each
unit has same size
➔ Zero point = arbitrary
- Ratio scale: intervals scale for which the zero point indicates none of the variable being
measured, equally sized intervals
➔ Ratios of measurements = ratios of magnitude
➔ Possible to compare absolute amount of variable, measurements in terms of ratios
- Statistical technique used dependent on measurement scale
- Discrete variable: consisting of indivisible categories and only limited number of categories,
often whole numbers that vary in countable steps; no values can exist between neighbouring
categories
- Continuous variable: consisting of categories that are infinitely divisible with variables being
able to take any valuable between two points, each score corresponds to interval on scale;
fall between any two observed values
➔ Almost impossible for two people to have exact same score
➔ Real limits = Boundaries separating intervals represented on continuous number line,
located exactly halfway between adjacent scores (.5); upper real limit at top of interval
and lower real limit at bottom
- X = used to represent scores for a variable
- Y = representing scores if second variable used
- N = used as symbol for number of scores in population
- n = symbol for number of scores in a sample
- Σ = used to stand for summation
- ΣX = read as “the sum of the scores”
- Summation sign always followed by symbol or mathematical expression
➔ If ΣX: find sum of all values of X
➔ If Σ(X-1)2 : first calculate all values for (X-1)2, then add results
- Summation: mathematical operation (like addition or multiplication), must be performed in
its proper place and in order of operations
➔ Occurs after 1) parentheses, 2) squaring / exponents, and 3) multiplying/dividing have
been completed (from left to right); then summation using Σ notation; finally any other
addition / subtraction
, - Demonstrations:
1) ΣX = 3 + 1 + 7 + 4 = 15
2) ΣX 2 = 9 + 1 + 49 + 16 = 75
→ first step: square each score (use of computational table for calculations with several steps,
adding column for squared scores)
→ second step: find sum of squared values
3) (ΣX)2 = (15)2 = 225
→ first step: find ΣX (calculation in parentheses)
→ second step: square the sum
4) Σ(X – 1) = 2 + 0 + 6 + 3 + = 11
→ first calculate score for each X in parentheses, then add up values
5) Σ(X – 1)2 = 4 + 0 + 36 + 9 = 49
→ same procedure as above, but adding another column, i.e., squaring scores BEFORE adding them
up
6) ΣX – 1 = 15 – 1 = 14
→ extra addition / subtraction after summation
7) ΣX = 3 + 1 + 7 + 4 = 15
ΣY = 5 + 3 + 4 + 2 = 14
ΣXY = 15 + 3 + 28 + 8 = 54
➔ If two scores for individual present, ΣX and ΣY calculated as always
➔ Calculation of ΣXY by adding third column: multiplying X times Y for each individual; then
adding products to find sum
Chapter Two: Frequency Distributions
- Descriptive statistics: simplify organisation and presentation of data
- Frequency distribution: descriptive technique where data is placed in a frequency
distribution table or graph showing exactly how many individuals (or scores) are located in
each category on the scale of measurement
= is an organized tabulation of the number of individuals
located in each category on the scale of measurement.
- Frequency distribution table: lists categories that make up the scale of measurement (X
values) in one column, and frequency / number of individuals in category in second column
- Listing of measurement categories (X values) from highest to lowest
- Beside each x value: frequency (f)/frequency of values