Statistical Methods for the Social Sciences, Global Edition
Dit is een samenvatting voor het vak Statistiek II van het tweede jaar psychologie. De samenvatting is gebaseerd op het boek Statistical Methods for the Social Sciences van Alan Agresti en de hoorcolleges. Door te studeren aan de hand van deze samenvatting heb ik een 8,7 gehaald voor het tentamen.
Summary statistics 1
Chapter 1: introduction
Data are the collectively gathered observations on the characteristics of interest. Databases
are existing archived collections of data. A data file has a separate row of data for each
subject and a separate column for each characteristic, software can apply statistical methods
to data files. Statistics consists of a body of methods for obtaining and analysing data.
Statistical science provides methods for:
1. Design. Planning how to gather data for a research study to investigate questions of
interest to us.
2. Description. Summarising the data obtained in the study to help understand the
information that the data provide.
3. Inference. Making predictions based on the data, to help us deal with uncertainty in
an objective manner.
Descriptive statistics are graphs, tables, and numerical summaries like averages and
percentages that are used to simply describe data and make them understandable. Statistical
inferences are predictions made about a population using data from a sample of that
population.
The entities on which a study makes observations are called the subjects, these are usually
people. A population is the total set of subjects of interest in a study. A sample is the subset
of the population on which the study collects data.
A descriptive statistic is a numerical summary of the sample data. A parameter is the
corresponding numerical summary of the population. Parameters always include a margin of
error.
Chapter 2: sampling and measurement
A measure should have validity and reliability. Validity means the measure should measure
what it is intended to measure. Reliability means that the measure should be consistent in the
sense that a subject will give the same response when asked again.
A variable is a characteristic that can vary among subjects in a sample or population. A
measurement scale describes the values a variable can take.
A variable is quantitative if the measurement scale has numerical values that represent
different magnitudes of that variable (annual income, number of siblings, age). The possible
numerical values are said to form an interval scale because they have a numerical distance or
interval between each pair of levels.
A variable is qualitative/categorical if the measurement scale is a set of categories (marital
status; single, married, divorced). The possible values can form a nominal scale, not one
value differs in magnitude. The values of a qualitative variable can also form an ordinal
scale; categorical values are ordered or ranked.
A variable is discrete if its possible values form a set of separate numbers with gaps in
between. A variable is continuous if it can take an infinite continuum of possible real number
values.
,Randomisation is the mechanism for achieving good sample representation so that
inferences can be made, and a parameter can be determined. Simple random sampling or
probability sampling of n subjects from a population is one in which each possible sample of
that size has the same probability of being selected. Simple random sampling reduces the
chance that the sample is biased and unrepresentative of the population.
A sampling frame is a list of all subjects in the population. The most common method for
selecting a random sample is:
1. Number the subjects in the sampling frame.
2. Generate a set of these numbers randomly (with a computer for example).
3. Sample the subjects whose numbers were generated.
Data often result from planned experiments. The different conditions measured in an
experiment are called treatments. Randomised clinical trials are experiments in medicine,
using randomisation. Observational studies are studies in which the researcher measures
subjects’ responses to the variables of interest but has no experimental control over the
subjects.
A sampling error of a statistic is the error that can occur when we use a statistic based on a
sample to predict the value of a population parameter.
There are three types of bias that can cause varying results from sample to sample:
• Sampling bias. In nonprobability sampling, it is not possible to determine the
possibilities of the possible samples. Nonprobability sampling leads to sampling bias.
There are three types of nonprobability sampling:
o Volunteer sampling. Only volunteers as subjects.
o Selection bias. Only one type of subject.
o Undercoverage. The sample lacks representation from some groups within
the population.
• Response bias. Poorly worded or confusing questions (or other external influences)
cause people to answer incorrectly.
• Nonresponse bias. This occurs when some of the sampled subjects cannot be reached
or refuse to participate resulting in missing data.
Systematic random sampling is a type of probability sampling, the method takes three
steps:
1. Skip number (k) = population (N) / sample (n)
2. Select a subject at random from the first k names in the sampling frame.
3. Select every kth subject listed after that one.
Stratified random sampling is another type of probability sampling that divides the
population into separate groups, called strata, and then selects a simple random sample from
each stratum. Stratified random sampling is proportional if the sampled strata proportions
are the same as those in the entire population. Stratified random sampling is disproportional
if the sampled strata proportions differ from the population proportions; this is used if the
group is such a small part of the population that it may not have enough representation in a
simple random sample to allow for precise inferences.
, Cluster sampling is also a type of probability sampling; it divides the population into a
large number of clusters, such as city blocks, and selects a random sample of clusters in
which all people are used as subjects in the sample.
The last type of probability sampling is multistage sampling; clusters are randomly
selected and people within those clusters are randomly selected to be subjects.
Chapter 3: descriptive statistics
Relative frequencies are used to report proportions and percentages within different
categories to compare them to each other. The proportion equals the number of observations
in a category divided by the total number of observations (the outcome is a number between 0
and 1). The percentage is the proportion multiplied by 100.
A frequency distribution is a listing of possible values for a variable, together with the
number of observations at each value (this can be used for categorical and quantitative data).
In a relative frequency distribution, proportions or percentages are shown instead of the
number of observations. In frequency distributions for quantitative data, the intervals of
values are usually of equal width, they should include all possible values of the variable, and
they should be mutually exclusive (any possible value must fit into only one interval).
A bar graph has a rectangular bar drawn over each category that shows the frequency or
relative frequency in that category. The bars are separated to emphasise that the variable is
categorical rather than quantitative.
A histogram is a bar graph used for quantitative data; each interval has a bar over it, with
the height representing the (relative) number of observations in that interval. In histograms,
the bars are not separated. If there are just a few possible values of the variable, the values do
not need to be divided into intervals.
Stem-and-leaf plots represent each observation by its leading
digit(s) (the stem) and by its final digit (the leaf). Each stem is a
number to the left of the vertical bar and a leaf is a number to the
right of it. Stem-and-leaf plots are useful for quick portrayals of
small data sets. When turned on its side, a stem-and-leaf plot has
the same shape as a histogram.
A sample data distribution is a bar graph. A population
distribution is a smooth line. This occurs because the larger a sample, the smoother the line
will flow.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller mandyrose. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.97. You're not tied to anything after your purchase.