Chapter 1: Introduction (Page 13-22)
(1.1)
Data: The observations gathered on the characteristics of interest. Existing archived
collections of data are called databases. Statistics consist of a body of methods for
obtaining and analyzing data.
Statistical science provides methods for:
1. Design: Planning how to gather data for a research study to investigate
questions of interest to us.
2. Description: Summarizing the data obtained in the study.
3. Inference: Making predictions based on the data, to help us deal with
uncertainty in an objective matter.
Graphs, tables, and numerical summaries such as averages and percentages are called
descriptive statistics. They are used to reduce the data to a simpler and more
understandable form. Predictions made using data are called statistical inferences.
Description and inferences are two types of ways of analyzing data.
(1.2)
A statistical analysis is classified as descriptive or inferential, according to whether its
main purpose is to describe the data or make predictions.
The entities on which a study makes observations are called the sample subjects.
The population is the total set of subjects of interest in a study. A sample is the subset of
the population on which the study collects data. Descriptive statistics summarize the
information in a collection of data.
Inferential statistics provide predictions about a population based on data from a sample
of that population.
A descriptive statistic is a numerical summary of the sample data, the corresponding
numerical summary for the population is called a parameter.
Usually, the population to which inferences apply is an actual set of subjects, sometimes
though the generalizations refer to a conceptual population- one that does not actually
exist but is hypothetical.
(1.3/1.4)
Statistical software analyzes data organized in the spreadsheet form of a data file. A data
file has a separate row of data for each subject and a separate column for each
characteristic. Software applies statistical methods to data files.
The field of statistical science includes methods for
● Designing research studies
● Describing the data (= descriptive statistics)
, ● Making predictions using the data (= inferential statistics)
Statistical methods apply to observations in a sample taken from a population.
Statistics summarize sample data, while parameters summarize entire populations.
● Descriptive statistics summarize sample or population data with numbers,
tables, and graphs.
● Inferential statistics use sample data to make predictions about population
parameters.
Chapter 2: Sampling and measurement (Page 23-40)
(2.1)
A variable is a characteristic that can vary in value among subjects in a sample or
population. The values the variable can take form the measurement scale. For
gender, the scale consists of two labels (female/male), for the number of siblings it’s
(0,1,2,3, and so on).
The valid statistical methods for a variable depend on its measurement scale.
A variable is called quantitative when the measurement scale has numerical values
that represent different magnitudes of the variable. (Eg. annual income, number of
siblings, age, etc.).
A variable is called categorical when the measurement scale is a set of categories.
(Eg. marital status with categories such as single/married/divorced). Distinct
categories differ in quality, not in numerical magnitude. Categorical variables are
often called qualitative.
For a quantitative variable, the possible numerical values are said to form an interval
scale because they have a numerical distance between each pair of levels.
Categorical variables have two types of scales: nominal- and ordinal scales.
Each possible value of a quantitative variable is greater than or less than any other
possible value.
Nominal scale (qualitative) Ordinal scale
The scale doesn’t have a ‘’high’’ or Natural ordering of values. the categories
‘’low’’ end. They are unordered, no are ordered. They are not interval
category is greater or smaller than any because there is no defined distance
other category. between levels.
Eg: Mode of transportation Eg: social class, political philosophy,
(bus/car/train) frequency of religious activity.
Levels of interval scales are quantitative, varying in magnitude.
,A variable is discrete if its possible values form a set of separate numbers, (such as
0,1,2,3,...) (Eg: siblings). Discrete values have a basic unit of measurement that can
not be subdivided. Any value with a finite number of values is discrete.
It is continuous if it can take an infinite continuum of possible real number values.
(Eg: height/weight). For a continuous variable between any two values, there is
always another possible value.
Categorical Nominal variables Ordinal variables Quantitative
variables variables
Discrete Discrete Discrete Discrete or
continuous
● Variables are either quantitative (numerical-valued) or categorical. Quantitative
variables are measured on an interval scale. Categorical variables with
unordered categories have a nominal scale, and categorical variables with
ordered categories have an ordinal scale.
● Categorical variables (nominal or ordinal) are discrete. Quantitative variables
can be either discrete or continuous. In practice, quantitative variables that can
take lots of values are treated as continuous.
(2.2)
Randomization is the mechanism for achieving good sample representation. n is the
number of subjects in the sample, called sample size.
Simple random sampling is a method of sampling for which every possible sample of
size n has equal chances of selection. A simple random sample is often just called a
random sample.
Sampling frame: a list of all subjects in the population. Sample survey: selecting a
sample of people from a population and interview them. Many telephone interviews
obtain the sample with random digit dialing.
In some studies, data result from a planned experiment. With the purpose to compare
responses of subjects on some outcome measure, under different conditions. Those
conditions are levels of a variable that can influence the outcome. The conditions in an
experiment are called treatments.
, A study is called an observational study when they merely observe the outcomes for
available subjects on the variables without any experimental manipulation of the
subjects. An observational study always has the possibility that some unmeasured
variable could be responsible for patterns observed in the data.
(2.3)
The sampling error of a statistic is the error that occurs when we use a statistic based
on a sample to predict the value of a population parameter.
Sampling bias Inferences using nonprobability
sampling have unknown reliability and
results in sampling bias. It isn’t possible
to determine the probabilities of the
possible samples.
Volunteer sampling: Selection bias Respondents are unlikely to be a
representative cross-section.
Random sampling: undercoverage The sample lacks representation of some
groups. and overrepresentation of other
groups.
Response bias Poorly worded or confusing questions
result in response bias.
Nonresponse bias Some subjects who are selected for the
sample may refuse to participate or they
may be impossible to reach. A problem
in many studies is missing data.
Summary of types of bias
● Sampling bias: occurs from using nonprobability samples, such as selection
bias inherent in volunteer samples.
● Response bias: occurs when the subject gives an incorrect response, or the
question wording or the way the interviewer asks the questions is confusing/
misleading.
● Nonresponse bias: occurs when some sampled subjects can’t be reached or
refuse to participate or fail to answer some questions.
(2.5)