In-depth, complete overview of the entire SMCR course from UvA Communication Science Bachelor. Grade obtained in the exam was a 7.5, these notes include all the content needed to get a great grade.
Chapter 1, Sampling Distribution: How Different Could My Sample Have Been?
Sampling distribution
Is the characteristics of the samples that we could have drawn
Inferential statistics
Offers techniques for making statements about a larger set of observations from data collected for a
smaller set of observations
Population - the large set of observations about which we want to make a statement
Sample - the smaller set of observations
A discrete random variable: how many yellow candies are in my bag?
What do you expect is the number of yellow candies in a random sample of 10 candies from this
population? The colours are equally distributed in the population, so one out of five candies in the
population is yellow ( = 2)
Sampling space
The collection of all possible outcome scores
Sample statistic
A number describing a characteristic of the sample = random variable (such as the mean)
Sampling distribution
The distribution of the outcome scores of very many samples
An explanation of this is, we take some yellow candies out of bag 1: our sample is bag 1, our
sampling distribution is the distribution of all the samples we collect (more than one) and our
camping space is the number of yellow candies.
Probability & probability distribution
If we change the (absolute) frequencies in the sampling distribution into proportions (relative
frequencies) we obtain the probability distribution of the sample statistic: A sample space with a
probability (between 0 and 1) for each outcome of the sample statistic.
Discrete probability distribution occurs when the probability distribution only has a limited number
of outcomes as possible
Expected value on expectation
• The mean of the sampling distribution is equal to the expected value of the sample statistic
• The mean of the sampling distribution of the sample proportion is equal to the population
proportion
• The expected value (mean of sampling distribution) only equals the population value if the
sample statistic is an unbiased estimate of the population value (parameters)
1
, Mónica Smienk Arnedo
• The expected value is the average of the sampling distribution of a random sample
Unbiased estimation
A sample statistic is an unbiased estimation of the population statistic if the expected value (mean of
the sampling distribution) is equal to the population statistic (parameters)
Representative sample
A sample is representative of a population if variables in the sample are distributed in the same way
as in the population
Chapter 2, Probability Models: How Do I Get a Sampling Distribution?
How do we create a sampling distribution, if we only collect data for a single sample? This chapter
presents three different ways: bootstrapping, exact approaches, and theoretical approximations.
The bootstrap approximation of the sampling distribution (Method 1)
This is still based on the idea of drawing a large number of samples. However, we only draw one
sample from the population for which we collect the data. As a next step, we draw a large number
of sample from our initial sample. The samples drawn in the second step are call bootstrap samples.
For each bootstrap sample, we calculate the sample statistic of interest and we collect these as our
sampling distribution. We usually want about 5000 bootstrap samples for our sampling distribution.
Sampling with and without replacement
• The size of a sample is very important to the shape of the sampling distribution.
• To construct a sampling distribution from bootstrap samples, the bootstrap samples must be
exactly as large as the original sample.
What are the differences between sampling with and without replacement?
I. If we draw a sample without replacement from our initial sample of the same size as the initial
sample, the new sample must contain all observations from the initial sample. As a result, the
new sample is identical to the initial sample. All samples that we draw are identical. This does
not provide an interesting sampling distribution.
II. Drawing with replacement, an observation can be drawn more than once. As a result, the same
candy number may appear more than once in the new sample. Otherwise, we could never have
more candies of a particular colour in the bootstrapped sample than in the original sample (five
candies of each colour). Each new sample drawn with replacement from the original sample can
be different, so the proportion of yellow candies varies across these bootstrap samples. We can
create a meaningful sampling distribution from these varying proportions of yellow candies.
Calculating probabilities with replacement
• It is okay to sample with replacement when we are bootstrapping. We usually calculate the
probabilities as if we samples with replacement.
• We assume that the probability of drawing a yellow candy remains the same while we are
sampling. This assumption is very convenient because it implies the calculation of probabilities.
We act as if the proportion of yellow candies in the population remains the same after we have
sampled the first yellow candy.
2
, Mónica Smienk Arnedo
• This can only be true if the yellow candy that we have sampled is immediately replaced by a new
yellow candy from the population. Otherwise, the promotion of yellow candies in the population
would decrease when we sample the first yellow candy.
Calculating probabilities without replacement
• In practice, we never want the same respondent to participate twice in our research because this
would not yield new information. In actual research, then, we sample without replacement. A
respondent does not risk being sampled more than once.
• How do we calculate probabilities if we sample without replacement? Since we do not put the
yellow candy back in the population, the number of yellow candies is reduced by one after we
have drawn the first sample. Therefore, the probability of drawing a second yellow candy should
be less than 20% (the original probability for the first sample)
• If the population is large, the decrease in the probability is too small to be in any way relevant.
• Calculating probabilities becomes complicated if we sample without replacement because each
new draw as a new probability for drawing a yellow candy.
• In an empirical research project, we always sample respondents without replacement but our
statistical software calculates probabilities as if we sampled with replacement. This is perfectly
fine as long as the population is much larger than the sample. If the sample contains a large share
of the population then we should not trust the probabilities our software reports.
Limitations to bootstrapping: Does the bootstrapped sampling distribution always reflect the true
sampling distribution?
• We can create a sampling distribution by sampling from our original sample with replacement.
• The original sample that we have drawn from the population must be more or less representative
of the population.
• A sample is more likely to be representative of the population if the sample is drawn randomly
and largely. But we can never be sure.
Any sample statistic can be bootstrapped
Every statistic that we can calculate for our original sample can also be calculated for each
bootstrap sample. The sampling distribution is just the collection of the sample statistic calculated
for all bootstrap samples.
Bootstrapping is more or less the only way to get a sampling distribution for the sample median,
e.g. the median weight of candies in a sample bag.
Bootstrapping in SPSS
Micro lecture 1 - Bootstrapping How To
1. Number of samples = 5000 (always)
2. Set seed (so we always get the same results within the data set since bootstrapping is random)
3. C.I. 95% > choose bias corrector
4. Paste syntax > run
3
, Mónica Smienk Arnedo
Micro lecture 2 - Bootstrapping Interpretation of Results
1. Specification of bootstrap: the choices that made
2. T-test: tells us the statistics of our sample, e.g. the weight of yellow and red candies
3. Output of the test: interpretation of the chosen test according to APA 6th edition.
4. Bootstrap for the test: check C.I. according to bootstrap. We can have positive or negative
differences, what we need to check here is whether there are significant differences between our
groups concerning the independent variable.
Exact Approaches to the Sampling Distribution (Method 2)
The calculated probabilities of all possible sample statistic outcomes finds us an exact approach to
the sampling distribution. We use approach instead of approximation because this new distribution
is the true sampling distribution itself.
Exact approaches for categorical data
• An exact approach lists and counts all the possible combinations. This can only be done with
discrete or categorical variables. For an unlimited number of categories, we cannot list all
possible combinations.
• A proportion is based on frequencies (which are discrete), so we can use an exact approach to
create a sampling distribution for one proportion, e.g. the proportion of yellow candies. The exact
approach uses binomial probability formula to calculate probabilities.
• Exact approaches are also available for the association between two categorical (nominal or
ordinal) variables in a contingency table, e.g. are yellow candies often more sticky than red
candies? We can create an exact probability distribution for the combination of colour and
stickiness.
Computer-intensive
The exact approach is used in discrete variables which usually are nominal or ordinal. If the number
of categories becomes large, a lot of computing time can be needed to calculate the probabilities of
all possible sample statistic outcomes. These exact approaches are said to be computer-intensive.
It is usually wise to set a limit of time you allow your computer to work on an exact sampling
distribution.
Exact Approaches in SPSS
Micro lecture 1 - Exact Test
1. Only for categorical variables, e.g. colour & stickiness of candies
2. For categorical variables we use non-parametric tests > legacy dialogs > Chi-Square (not
necessarily this test)
1. Exact > exact option > time limit - 5’
3. Another options is descriptives > cross tabs
1. Exact > exact option > time limit - 5’
2. Statistics > Chi-Square (or any other test but make sure to click one!)
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller msmienk. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $16.42. You're not tied to anything after your purchase.