Statistics, central tendency, variability, z-
scores, probability, hypothesis testing, t-
statistic, correlation
Data
Analysis in
Psychology
Psychology 251
Suné-Mari Koekemoer
, CHAPTER 1: Introduction to Statistics 1
Contents
CHAPTER 1: INTRODUCTION TO STATISTICS 5
STATISTICS, SCIENCE AND OBSERVATION 5
POPULATION AND SAMPLES 5
VARIABLES AND DATA 5
PARAMETERS AND STATISTICS 6
DESCRIPTIVE AND INFERENTIAL STATISTICS 6
STATISTICS IN THE CONTEXT OF RESEARCH 7
VARIABLES AND MEASUREMENT 7
CONSTRUCTS AND OPERATIONAL DEFINITIONS 7
DISCRETE AND CONTINUOUS VARIABLES 7
SCALE OF MEASUREMENT 8
DATA STRUCTURES, RESEARCH METHODS, STATISTICS 10
RELATIONSHIPS BETWEEN VARIABLES 10
1. DESCRIPTIVE RESEARCH: ONE GROUP W/ ONE OR MORE SEPARATE VARIABLES MEASURED
FOR EACH INDIVIDUAL 10
2. CORRELATION METHOD: ONE GROUP W/ TWO VARIABLES MEASURED FOR EACH INDIVIDUAL 10
3. EXPERIMENTAL AND NONEXPERIMENTAL METHODS: COMPARING TWO/MORE GROUPS OF
SCORES 11
EXPERIMENTAL AND NON-EXPERIMENTAL METHODS 11
THE EXPERIMENTAL METHOD 12
NON-EXPERIMENTAL METHODS 13
STATISTICAL NOTATION 13
SCORES 13
SUMMATION NOTATION 14
CHAPTER 2: FREQUENCY DISTRIBUTIONS 15
FREQUENCY DISTRIBUTIONS AND FREQUENCY DISTRIBUTION TABLES 15
FREQUENCY DISTRIBUTION TABLES 15
PROPORTIONS AND PERCENTAGES 16
GROUPED FREQUENCY DISTRIBUTION TABLES 16
REAL LIMITS AND FREQUENCY DISTRIBUTIONS 17
FREQUENCY DISTRIBUTION GRAPHS 17
GRAPHS FOR INTERVAL OR RATIO DATA 18
GRAPHS FOR NOMINAL OR ORDINAL DATA 19
BAR GRAPHS 19
GRAPHS FOR POPULATION DISTRIBUTIONS 20
RELATIVE FREQUENCIES 20
SMOOTH CURVES 20
THE SHAPE OF FREQUENCY DISTRIBUTION 20
CHAPTER 3: CENTRAL TENDENCY 22
, CHAPTER 1: Introduction to Statistics 2
THE MEAN 22
ALTERNATIVE DEFINITIONS FOR THE MEAN 22
THE WEIGHTED MEAN 23
COMPUTING THE MEAN FROM A FREQUENCY DISTRIBUTION TABLE 23
CHARACTERISTICS OF THE MEAN 24
CHANGING A SCORE 24
INTRODUCING A NEW SCORE OR REMOVING A SCORE 24
ADDING OR SUBTRACTING A CONSTANT FROM EACH SCORE 25
MULTIPLYING OR DIVIDING EACH SCORE BY A CONSTANT 25
THE MEDIAN 26
FINDING THE MEDIAN FOR MOST DISTRIBUTIONS 26
FINDING THE PRECISE MEDIAN FOR A CONTINUOUS VARIABLE 26
THE MEDIAN, THE MEAN, AND THE MIDDLE 27
THE MODE 28
CENTRAL TENDENCY AND THE SHAPE OF DISTRIBUTION 28
SYMMETRICAL DISTRIBUTIONS 28
SKEWED DISTRIBUTIONS 29
SELECTING A MEASURE OF CENTRAL TENDENCY 29
WHEN TO USE THE MEAN 30
WHEN TO USE THE MODE 31
PRESENTING MEANS AND MEDIANS IN GRAPHS 31
CHAPTER 4: VARIABILITY 33
THE RANGE 33
SCORES AS MEASUREMENTS OF A CONTINUOUS VARIABLE 33
SCORES AS WHOLE NUMBERS 33
DEFINING VARIANCE AND STANDARD DEVIATION 34
MEASURING VARIANCE AND STANDARD DEVIATION FOR A POPULATION 35
THE SUM OF SQUARED DEVIATIONS (SS) 35
2. THE COMPUTATIONAL METHOD 36
FINAL FORMULAS AND NOTATION 37
MEASURING VARIANCE AND STANDARD DEVIATION FOR A SAMPLE 38
PROBLEM WITH SAMPLE VARIABILITY 38
FORMULAS FOR SAMPLE VARIANCE AND STANDARD DEVIATION 38
SAMPLE VARIABILITY AND DEGREES OF FREEDOM 39
CHAPTER 5: Z-SCORES – LOCATION OF SCORES AND STANDARDIZED
DISTRIBTIONS 41
Z-SCORES AND LOCATIONS IN A DISTRIBUTION 41
THE Z-SCORE FORMULA FOR A POPULATION 41
DETERMING A RAW SCORE (X) FROM A Z-SCORE 42
COMPUTING Z-SCORES FOR SAMPLES 42
USING Z-SCORES TO STANDARDIZE A DISTRIBUTION 43
POPULATION DISTRIBUTIONS 43
USING Z-SCORE FOR MAKING COMPARISONS 44
, CHAPTER 1: Introduction to Statistics 3
CHAPTER 6: PROBABILITY 45
THE UNIT NORMAL TABLE 45
PROBABILITY/PROPORTION AND Z-SCORES 47
CALCULATING THE X VALUE CORRESPONDING TO PROPORTIONS OR PROBABILITY 47
CHAPTER 7: PROBABILITY AND SAMPLES – THE DISTRIBUTION OF SAMPLE
MEANS 48
CENTRAL LIMIT THEOREM 48
THE SHAPE AND DISTRIBUTION OF SAMPLE MEANS 48
THE MEAN OF DISTRIBUTION OF SAMPLE MEANS: THE EXPECTED VALUE OF M 49
STANDARD ERROR OF M 49
1. THE SAMPLE SIZE 49
2. THE POPULATION STANDARD DEVIATION 50
DEFINING STANDARD ERROR IN TERMS OF VARIANCE 50
A Z-SCORE FOR SAMPLE MEANS 51
CHAPTER 8: INTRODUCTION TO HYPOTHESIS TESTING 53
THE LOGIC OF HYPOTHESIS TESTING 53
THE FOUR STEPS OF A HYPOTHESIS TEST 53
UNCERTAINTY AND ERRORS IN HYPOTHESIS TESTING 56
TYPE I ERRORS 56
TYPE II ERRORS 56
SELECTING AN ALPHA LEVEL 57
DIRECTIONAL (ONE-TAILED) HYPOTHESIS TESTS 58
HYPOTHESES FOR A DIRECTIONAL TEST 58
CRITICAL REGION FOR DIRECTIONAL TESTS 59
COMPARISON OF ONE-TAILED V TWO-TAILED 59
CHAPTER 9: INTRODUCTION TO THE T-STATISTIC 61
THE T-STATISTIC: AN ALTERNATIVE TO Z 61
THE PROBLEM WITH Z-SCORES 61
INTRODUCING THE T STATISTIC 61
DEGREES OF FREEDOM AND THE T-STATISTIC 62
THE T-DISTRIBUTION 62
HYPOTHESIS TESTS W/ THE T STATISTIC 63
HYPOTHESIS TESTING 64
ASSUMPTIONS OF THE T TEST 65
INFLUENCE OF SAMPLE SIZE AND SAMPLE VARIANCE 66
CHAPTER 10: INDEPENDENT-MEASURES DESIGN 67
, CHAPTER 1: Introduction to Statistics 4
THE HYPOTHESES AND THE INDEPENDENT-MEASURE T-STATISTIC 67
THE T TEST FOR TWO INDEPENDENT SAMPLES 67
ESTIMATED STANDARD ERROR 68
POOLED VARIANCE 68
ESTIMATED STANDARD ERROR 70
RESULTING FORMULA AND DEGREES OF FREEDOM 70
CHAPTER 11: THE T-TEST FOR TWO RELATED SAMPLES 72
INTRODUCTION TO REPEATED MEASURES DESIGN 72
THE T-STATISTIC FOR A REPEATED-MEASURES DESIGN 72
HYPOTHESIS FOR A REPEATED MEASURE DESIGN 72
T-STATISTIC FOR RELATED SAMPLES 73
THE T-TEST FOR TWO RELATED SAMPLES 73
DIRECTIONAL HYPOTHESES AND ONE-TAILED TESTS 75
RELATED-SAMPLES T TEST ASSUMPTIONS 76
CHAPTER 14: CORRELATION 77
INTRODUCTION 77
SCATTERPLOT FOR CORRECTIONAL DATA 77
POSITIVE AND NEGATIVE RELATIONSHIPS 78
DIFFERENT LINEAR RELATIONSHIP VALUES 78
THE PEARSON CORRELATION ® 79
SUM OF PRODUCTS (SP) 79
PEARSON CORRELATION CALCULATION 80
PEARSON CORRELATION AND Z-SCORES 81
CORRELATION AND OUTLIERS 81
CORRELATION AND CAUSATION 82
HYPOTHESIS TESTS W/ THE PEARSON CORRELATION 82
CORRELATION HYPOTHESIS TESTS 82
ALTERNATIVES TO THE T STATISTIC 83
, CHAPTER 1: Introduction to Statistics 5
CHAPTER 1: INTRODUCTION TO STATISTICS
• The term statistics refers to a set of mathematical procedures for organizing, summarizing,
and interpreting information.
• Statistical procedures help ensure that the information or observations are presented
• and interpreted in an accurate and informative way.
PURPOSE OF STATISTICS
1. Statistics are used to organize and summarize the information so that the researcher can see
what happened in the research study and can communicate the results to others.
2. Statistics help the researcher to answer the questions that initiated the research by
determining exactly what general conclusions are justied based on the specific results that
were obtained
STATISTICS, SCIENCE AND OBSERVATION
POPULATION AND SAMPLES
• A population is the set of all the individuals of interest in a particular study.
o The population being studied should always be identified by the researcher.
• A sample is a set of individuals selected from a population, usually intended to represent the
population in a research study.
o representative of its population
o be identified in terms of the population from which it was selected
VARIABLES AND DATA
• A variable is a characteristic
or condition that changes or
has different values for
different individuals.
o Typically,
researchers are
interested in specific
characteristics of the
individuals in the
population (or in the
sample), or they are
interested in the factors that may influence individuals or their behaviours.
, CHAPTER 1: Introduction to Statistics 6
o Variables can be…
▪ Characteristics
▪ Environmental conditions
• Changes in variables = make measurements of the variables being examined
• Data (plural) are measurements or observations.
• A data set is a collection of measurements or observations.
• A datum (singular) is a single measurement or observation and is commonly called a score or
raw score.
• Because research typically involves measuring each individual to obtain a score, every
sample (or population) of individuals produces a corresponding sample (or population) of
scores.
PARAMETERS AND STATISTICS
• Typically, the research process begins with a question about a population parameter
• However, the actual data come from a sample and are used to compute sample statistics.
• A parameter is a value—usually a numerical value—that describes a population.
o A parameter is usually derived from measurements of the individuals in the
population.
• A statistic is a value—usually a numerical value—that describes a sample.
o A statistic is usually derived from measurements of the individuals in the sample
• Every population parameter has a corresponding sample statistic
DESCRIPTIVE AND INFERENTIAL STATISTICS
• Descriptive statistics are statistical procedures used to summarize, organize, and simplify
data.
o Descriptive statistics are techniques that take raw scores and organize or summarize
them in a form that is more manageable.
o Organised via…
▪ Graph
▪ Table
▪ Computing an average
• Inferential statistics consist of techniques that allow us to study samples and then make
generalizations about the populations from which they were selected.
o Make general statements about the population
o Basis for drawing conclusions about population parameters
o Sample = only limited information about the population
• Sampling error is the naturally occurring discrepancy, or error, that exists between a sample
statistic and the corresponding population parameter.
, CHAPTER 1: Introduction to Statistics 7
o Because the characteristics of each sample depend on the specific people in the
sample, statistics will vary from one sample to another
o It is also very unlikely that the statistics obtained for a sample will be identical to the
parameters for the entire population
o Sample statistics vary from one sample to another and typically are different from the
corresponding population parameters.
o Common sampling error: error associated w/ a sample proportion
o NB: margin of error = sampling error
STATISTICS IN THE CONTEXT OF RESEARCH
1. Role of inferential statistics
a. Descriptive statistics are used to simplify the pages of the data
b. Interpret the outcome
VARIABLES AND MEASUREMENT
CONSTRUCTS AND OPERATIONAL DEFINITIONS
2. Behavioural sciences study internal characteristics
3. Constructs are internal attributes or characteristics that cannot be directly observed but are
useful for describing and explaining behaviour
a. Possible to observe and measure behaviours that represent the construct
4. An operational definition identifies a measurement procedure (a set of operations) for
measuring an external behaviour and uses the resulting measurements as a definition and a
measurement of a hypothetical construct. Note that an operational definition has two
components:
a. First, it describes a set of operations for measuring a construct.
b. Second, it denes the construct in terms of the resulting measurements
DISCRETE AND CONTINUOUS VARIABLES
• Variables may be characterized by the type of values that can be assigned to them
• A discrete variable consists of separate, indivisible categories. No values can exist between
two neighbouring categories
o Restricted to whole, countable numbers
o Consist of observations that differ qualitatively
o Separate, indivisible categories
• For a continuous variable, there are an infinite number of possible values that fall between
any two observed values. A continuous variable is divisible into an infinite number of fractional
parts.
o When measuring a continuous variable, it should be very rare to obtain identical
measurements for two different individuals
▪ If the data show a substantial number of tied scores, then you should suspect
that the measurement procedure is very crude or that the variable is not
continuous
o Researchers must first identify a series of measurement categories on the scale of
measurement
▪ However, each measurement category is actually an interval that must be
defined by boundaries.
, CHAPTER 1: Introduction to Statistics 8
▪ These boundaries are called real limits and are positioned exactly halfway
between adjacent scores
• Real limits are the boundaries of intervals for scores that are represented on a continuous
number line. The real limit separating two adjacent scores is located exactly halfway between
the scores. Each score has two real limits. The upper real limit is at the top of the interval, and
the lower real limit is at the bottom.
o he concept of real limits applies to any measurement of a continuous variable, even
when the score categories are not whole numbers. For example, if you were
measuring time to the nearest tenth of a second, the measurement categories would
be 31.0, 31.1, 31.2, and so on. Each of these categories represents an interval on the
scale that is bounded by real limits. For example, a score of 31.1 seconds indicates
that the actual measurement is in an interval bounded by a lower real limit of 31.05
and an upper real limit of 31.15. Remember that the real limits are always halfway
between adjacent categories.
• The terms continuous and discrete apply to the variables that are being measured and not to
the scores that are obtained from the measurement.
• Although the scores may appear to be discrete numbers, the underlying variable is
continuous.
• One key to determining whether a variable is continuous or discrete is that a continuous
variable can be divided into any number of fractional parts.
SCALE OF MEASUREMENT
• Measurement involves assigning individuals or events to categories.
• The categories used to measure a variable make up a scale of measurement, and the
relationships between the categories determine different types of scales.
• The distinctions among the scales are important because they identify the limitations of
certain types of measurements and because certain statistical procedures are appropriate for
scores that have been measured on some scales but not on others.
THE NOMINAL SCALE
• A nominal scale consists of a set of categories that have different names. Measurements on a
nominal scale label and categorize observations, but do not make any quantitative distinctions
between observations
• Measurements from a nominal scale allow us to determine whether two individuals are
different, but they do not identify either the direction or the size of the difference
• Occasionally represented by numbers
ORDINAL SCALE
• An ordinal scale consists of a set of categories that are organized in an ordered sequence.
Measurements on an ordinal scale rank observations in terms of size or magnitude.
• Series of ranks identified by verbal labels
• With measurements from an ordinal scale, you can determine whether two individuals are
different, and you can determine the direction of difference.
• However, ordinal measurements do not allow you to determine the size of the difference
between two individuals.
• Ordinal scales are often used to measure variables for which it is difficult to assign numerical
scores
INTERVAL AND RATIO SCALE
, CHAPTER 1: Introduction to Statistics 9
• Both an interval scale and a ratio scale consist of a series of ordered categories (like an
ordinal scale) with the additional requirement that the categories form a series of intervals that
are all the same size.
• Thus, the scale of measurement consists of a series of equal intervals
• The fact that the intervals are all the same size makes it possible to determine both the
direction and the size of the difference between two measurements
• The factor that differentiates an interval scale from a ratio scale is the nature of the zero point.
o An interval scale has an arbitrary zero point. That is, the value 0 is assigned to a
particular location on the scale simply as a matter of convenience or reference. A
value of zero does not indicate a total absence of the variable being measured.
o A ratio scale is anchored by a zero point that is not arbitrary but rather a meaningful
value representing none (a complete absence) of the variable being measured. A
non-arbitrary zero point means that we can measure the absolute amount of the
variable; that is, we can measure the distance from 0. This makes it possible to
compare measurements in terms of ratios.
• An interval scale consists of ordered categories that are all intervals of the same size. Equal
differences between numbers on scale reflect equal differences in magnitude. However, the
zero point on an interval scale is arbitrary and does not indicate a zero amount of the variable
being measured.
• A ratio scale is an interval scale with the additional feature that a score of zero indicates none
of the variable being measured. With a ratio scale, ratios of numbers do reflect ratios of
magnitude.
STATISTICS AND SCALE MEASUREMENTS
• they help determine the statistics that are used to evaluate the data.
• The distinction is based on the fact that numerical scores are compatible with basic arithmetic
operations but non-numerical scores are not
• For most statistical applications, the distinction between an interval scale and a ratio scale is
not important because both scales produce numerical values that permit us to compute
differences between scores, to add scores, and to calculate mean scores.
• On the other hand, measurements from nominal or ordinal scales are typically not numerical
values, do not measure distance, and are not compatible with many basic arithmetic
operations. Therefore, alternative statistical techniques are necessary for data from nominal
or ordinal scales of measurement