Psychology 253 Exam Notes
[The exam structure will be as follows: 100 marks, 2 hours. Short questions (Multiple choice, short calculations such as z-scores) and long
questions (long calculations covering mixed work). We will be given a formula sheet and the necessary tables.]
CHAPTER 1:
Introduction to Statistics and Basic Terms
Statistics = the study of data. This includes all aspects related to data (i.e.: collection,
tabulation/organising, analysis/interpretation, presentation, and drawing conclusions). It is
ultimately concerned with trying to draw meaning from data.
Data =Values collected and recorded from a specific source on a topic of interest
Data Set = A collection of data
Population = Entire group/environment you’re interested in studying. (Total number of units
in a population – population size - is denoted by N)
Sample = This is drawn from the population and is thus a smaller representative of the
population. (The sample size is denoted by n)
E.g.: You wish to study the anxiety levels in Psychology 253 students, so you use 100 (sample) of the
total 800 (population) students in the study.
Population Parameter = A summarising measure of a specific aspect of an entire population
Sample Statistic = A summarising measure of a specific aspect of a sample
The method used to gather the data can vary from study to study. What is important is to determine
how much of the data can be used to generalise findings from the sample to the population.
Sampling Unit = basic unit drawn from a population and from which a measurement is
made. It can be a single item from the population or a group of items that produce a single
measurement.
Sampling Frame = The list of all sampling units from which a sample is drawn.
Sampling Frame Bias = Practical problems result in certain sections of the population being
under-represented in the study
E.g.: You are conducting a study to analyse how many students in Stellenbosch prefer going out on a
Wednesday night compared to a Friday night. The method used is a telephonic survey via cellphone.
The sampling unit is each individual student who responded, the sampling frame is the list of all
students that were emailed, and the sampling frame bias could be that not every student has a
cellphone or that not every student is going to answer an unknown number and therefore won’t be
able to answer the survey.
Measurement Instrument = device used to obtain measurement values from a sampling unit.
(E.g.: scales, questionnaires, online surveys, etc.)
Census = Studies that collect data from the entire population. These can be problematic due
to two main reasons: (a) It’s not easy to access every single item in the population because
1
, they can be too numerous or hard to get hold of and (b) The costs involved in obtaining and
recording each item in a population becomes far too high, especially if the study involves
complicated measurement instruments. (See “statistical inferences” below to find out how
we solve these problems.)
Who Makes Use of Statistics?
-Every field that collects data makes use of stats.
-Data is collected with the expectation that it will be analysed to produce information that we can
use to draw conclusions. These conclusions in turn can then be shared with the wider population.
Statistics is thus the tool we use to get the necessary information to make these conclusions.
-Examples of fields that use stats: psychology, sociology, anthropology, social work, etc.
Statistical Inference
-The generalisation of results from a sample to the population from which the sample was drawn.
-Because the sample is unable to contain all the information that the population has, the results are
generalised and therefore have a degree of uncertainty about them.
-The uncertainty of the result is expressed in the form of a probability theory.
Sample Info Statistical Inference (probability theory) Population Information
-Because we are now able to make inferences about the population, we have solved the problems
that arise from using a census.
-The goal of statistical inference is to say something meaningful about a population based on the
small amount of info provided by the sample
-The Target Population is the population that you are interested in studying
-Whilst the Actual Sampled Population is the population from which the data is actually obtained.
This population is often more restricted than the target population and might not represent the
entire target population. Therefore it’s important to make sure that any inferences made in the
study must consider the actual population from which the sample was obtained, and not from the
target population.
(E.g.: If your target population is men aged between 18 and 21 in the Western Cape but your study
only looks at men aged between 18 and 21 in Cape Town, this means that your actual sampled
population was restricted to men in Cape Town only and did not look at the rest of the Western
Cape. We must thus be very careful when making generalisations.)
Sampling Errors
-A calculated statistical imprecision
-It occurs due to the inability of a sample to convey the same information as the entire population.
-This is inherent in all samples in every study because sampling does not contain perfect information
about the population.
-There are other types of sampling errors:
2
, Coverage Error = associated with the inability to contact portions of the population (This is
common with telephonic surveys due to some people not having phones or not being home
when the phone was ringing and so on.)
Measurement Error = occurs when surveys do not focus on what they are intended to
measure. This error is thus caused by the instrument itself. (E.g.: the wording of a question,
interviewer mistakes, timing, etc.)
Non-Response Error = this results from being unable to interview people who would’ve been
eligible to participate in the study. Either they are not contactable or they do not want to
answer the questions. (E.g.: When you are sent an email to participate in a Masters
student’s study and are sent a questionnaire but do not answer it.)
Non-response bias is the difference in responses between those who participated and those
who did not.
The Research Cycle
-Research is conducted in a circular process that results in a continuous cycle
Planning
Data
Conclusion
Collection
Data
Analysis
-There are 4 main phases:
1) Planning Phase = The most important aspect of this phase is developing the research
question. In answering the research question we are able to attempt to determine the target
population, the method of data collection, the sample size, the appropriate measuring
instrument, the type of data that will be obtained, and the method of analysis.
2) Data Collection Phase = This involves gathering the data from the sample using the method
and measuring instrument outlined in stage 1. This data is measured and then captured and
edited for use in stage 3.
3) Data Analysis Phase = The data collected and captured in stage 2 is now analysed to draw
out relevant information. Analysing the data involves (a) Descriptive statistics which
summarise sample data with numerical statistics to gain intuitive perspective on the data
using tables and graphs, and (b) statistical inferences, which was discussed previously.
4) Conclusion Phase = Attempt to answer the research question. If all 3 previous stages were
correctly followed then the conclusion should be meaningful in answering the research
question.
-The result of the research question often leads to new questions and thus a new research cycle
begins.
3
, CHAPTER 2:
Types of Variables
-A variable is a measurable aspect/characteristic of a sampling unit typically denoted using symbols
(E.g.: height, age, weight, etc.)
-The variable used produces a particular type of data with then is represented by particular tables
and graphs
-Independent Variables = Manipulated by the researcher
-Dependent Variables = change according to changes in the independent variable. These give us the
data that we are observing and gathering.
-Variables can be broadly classified as either Qualitative/Categorical (measure descriptive
characteristics of a subject) or Quantitative (measure characteristics numerically)
-We can then further subdivide these two classes into the following measurement scales:
Nominal –Scaled Variable = yields labelled/categorical information that can group
people/items. Measures qualitative variables. (E.g.: kitchen utensils, different rugby teams,
stationery items, etc.)
Ordinal-Scaled Variable = values can be ordered/ranked, with the higher number
representing higher values – however the distance between these values is not important.
Can measure quantitative or qualitative variables. (E.g.: ranking your pain on a scale of 1 – 5
with 1 being “not in pain” and 5 being “immensely in pain”.)
Interval-Scaled Variable = the distance between any two adjacent units of measurements
(intervals) is the same. Does not have an absolute zero therefore it is not possible to make
statements about how many times higher one score is than another. Measures quantitative
variables. (E.g.: temperature, IQ tests, etc.)
Ratio-Scaled Variable = has the properties of all other scales, but has the extra property of
an absolute zero. This allows us to be able to say that x is bigger than z by however much.
Measures quantitative variables. (E.g.: Weight: someone who is 60kgs is twice as heavy as
someone who is 30kgs, Time: The exam will take twice as long to complete as the semester
test)
Types of Data
-Data can exist in two forms:
1) Discrete Data = occurs as integers/whole numbers (E.g.: 1, 2, 3, 4, 5...). Tends to be
generated by qualitative variables.
2) Continuous Data = occurs on a continuum and therefore can be numbers containing
decimals (E.g.: 1, 2.3, 5.8, 7, 9.3...). Tends to be generated by quantitative variables.
-We can now judge data based on three qualities: (a) whether it is quantitative or qualitative, (b)
which measurement scale is used, and (c) whether it is discrete or continuous.
-E.g.: Do men or women prefer the taste of Colgate toothpaste more than Aquafresh?
Here, men and women fall into the nominal scale as they can be categorised according to their
gender, and the results would also be discrete data because you can only receive whole numbers as
results (you cannot have a fraction of a man).
Frequency Tables
-Summarises values in a data set by displaying the number of times unique values occur
4