Introduction to Statistics
Semester 1, block 3
2021 – 2022
Summary in English
1
,Statistical Methods for the Social Sciences – Agresti .......................................................... 3
Chapter 1 – Introduction ..................................................................................................................... 3
Chapter 2 – Sampling and measurement ............................................................................................ 5
Chapter 3 – Descriptive statistics ........................................................................................................ 9
Chapter 4 - Probability distributions ................................................................................................. 13
Chapter 5 – Statistical Inference Estimation ..................................................................................... 18
Chapter 6 – Statistical inference: significance tests .......................................................................... 26
Chapter 7 – comparison of two groups ............................................................................................ 36
Chapter 8 – Analyzing the relationship between categorical variables ............................................ 41
Chapter 9 – Linearregression and correlation ................................................................................... 46
Lectures ....................................................................................................................... 55
Lecture 0 – Introduction to Statistics [Thijs Bol] ............................................................................... 55
Lecture 1 – GOD ; On probability, z-scores and distributions ........................................................... 58
Lecture 2: ........................................................................................................................................... 59
Lecture 3 ............................................................................................................................................ 62
Lecture 4 ............................................................................................................................................ 65
2
,Statistical Methods for the Social Sciences
– Agresti
Chapter 1 – Introduction
Introduction to statistical methodology
This chapter introduces "statistics" as a science that deals with describing data and making
predictions that have a much broader scope than simply summarizing the data collected.
More and more jobs for social scientists require knowledge of statistical methods as a basic
working tool. As the joke goes, "What did the sociologist who passed statistics say to the sociologist
who failed? I want a Big Mac, fries and coke."
Data
Information gathering is at the heart of all sciences and provides the observations used in statistical
analysis. The collected observations about the characteristics of interest are collectively called
data.
To collect data, the social sciences use a wide variety of methods, including questionnaire
surveys, experiments, and direct observation of behavior in a natural environment. Existing
archived data collections are called databases. Many databases are now available on the internet.
What is statistics?
In this book, statistics are used in a much broader sense — as a science that gives us ways to
obtain and analyze data.
In particular, statistical science provides methods for:
Design: Planning how to collect data for a research study to explore questions that are of
interest to us.
Description: Summarizing the data obtained in the study to help understand what
information the data provided. For example, an analysis of the number of close friends based on
the GSS data could start with a list of the number reported for each person surveyed. The raw data
is then a complete list of observations, person by person. For the presentation of the results,
instead of listing all the observations, we could summarize the data with a graph or table that shows
the percentages of 1 close friend, 2 close friends, 3 close friends, and so on. Graphs, tables, and
numerical summaries such as averages and percentages are called descriptive statistics.
Inference: Making predictions based on the data, to help us deal with uncertainty in an
objective way. Data-based predictions are called statistical inferences.
Description and inference are two types of ways to analyze the data. Social scientists use
descriptive and inferential statistics to answer questions about social phenomena.
Descriptive and inferential statistics
A statistical analysis shall be classified into descriptive or inferential statistics, depending on its
main purpose to describe the data or make predictions. To explain this distinction in more detail,
we define the population and the sample below.
Populations and samples
The entities over which a study makes observations are called the subjects of the study. Although
we obtain data on the subjects, our interest is ultimately in the population that the sample
represents.
Population and sample: The population is the total collection of topics of interest in a study.
A sample is the subset of the population for which data is collected in the study.
Descriptive statistics: summarize the information in a data set
3
, While data is usually only available for a sample, descriptive statistics are also useful when
data is available for the entire population, such as in a census.
Inferential statistics: Provide predictions about a population, based on data from a sample of that
population.
Parameters and statistics
A descriptive statistic is a numerical summary of the sample data. The corresponding numerical
summary for the population is called a parameter.
In practice, we are particularly interested in the values of the parameters, and not only in
the values of the statistics for the specific sample that has been taken. For example, when looking
at the results of an opinion poll for an election, we are more interested in the percentages of the
population that are for the different candidates than in the percentages of the sample for the people
surveyed.
Defining populations: factual and conceptual
Usually, the population to which conclusions apply is an actual group of individuals, like all adult
residents of the United States. Sometimes, however, the generalizations relate to a conceptual
population — a population that doesn't really exist but is hypothetical.
The role of computers in statistics
Over time, powerful and easy-to-use software has been developed for the application of statistical
methods. This software is a huge incentive for the use of statistics.
Statistical software
Statistical software packages include R, SPSS, SAS, and stata. Annex A explains how they are to
be used for each chapter.
One of the objectives of this textbook is to teach you what to look for in output and how to
interpret it. Knowledge of computer programming is not necessary for the use of statistical software.
Databases
Statistical software analyzes data organized in the spreadsheet form of a data file.
Each row contains the observations for a particular topic in the sample
A column contains the observations for a particular attribute
Use and misuse of statistical software
A note of caution: The easy access to statistical methods using software has both dangers and
advantages. It is easy to apply unsuitable methods. A computer performs the requested analysis
regardless of whether the hypotheses required for its proper use are met.
It is vital to understand the method before using it.
4