Week 1: Chapter 1-3
Chapter 1: Asking and Answering Research questions
A Thumbnail Sketch of Research
1. Ask a research question
2. Design a study to collect data that can answer the question
3. Carry out the study and collect the data
4. Apply statistical analysis to picture and describe the data, and provide a basis for
drawing conclusions
5. Draw conclusions about what the data tell us in answer to our original question
6. Interpret the results, give critical discussion of the whole study, and prepare a report.
Think about the next study.
For example, you can have a question about voting. The question is about the whole
population, but you can’t ask everybody. Researchers can take a sample from the
population. If the sample was chosen in a fair and unbiased way, it’s probably
representative, so the sample results are a reasonable estimate of the population.
Besides that researchers calculate the margin of error (MoE) as the likely greatest error in
the point estimate.
If the point estimate = 53%, with the margin of error 2%, it’s most likely that the value of the
population lies in the range [51, 55].
This is also the 95% confidence interval; it’s a range of values calculated from the data that,
most likely, includes the true value of what we’re estimating about the population.
The 95% means we are not guaranteed that the CI includes the true value. The longer the
CI, the greater the chance to have low precision (and it may be further from the true value).
Research studies rarely, if ever, give definitive answers to our questions, so we must be
willing to think about uncertainty.
The CI is an interval estimate because it’s an interval containing the most plausible values
from the population value.
Using estimation, we should always express research questions in quantitative (“To what
extent…?” “How much…?” rather than yes-or-no terms.
Suppose researchers ran the voters poll again, with a much larger sample. What happens to
the margin of error? And with a much smaller sample? Which result is more useful?
A much larger sample is likely to give a result that’s closer to the true value in the population,
meaning its CI will be shorter, its estimate more precise.
Beyond sampling variability, there’s always additional uncertainty, which is much harder to
pin down. It can have different causes in different situations, and usually there’s no statistical
formula to quantify it.
The replicability crisis and open science
μσ√≠αβδρ
,Open Science is a central idea that we’ll meet often in this book. Open science has two
requirements:
1. Avoid misleading selection of what’s reported
2. Report in full detail
It has largely been prompted by the replicability crisis - an ‘alarming discovery’ that a number
of widely known and accepted research findings cannot be replicated; when researchers
repeat the earlier studies, they get different results. It seems that some well-accepted
research findings are simply wrong.
Open Science addresses the crisis by aiming to reduce the chance that incorrect research
results are obtained and reported.
‘Open’ science refers to the idea that, as much as possible, full information about every
stage of research should be openly available, so other researchers can repeat the original
data as a check, analyse the data in a different way, or conduct a replication study.
Meta-analysis
If we have results from two or more similar studies, we can use meta-analysis to combine
the results. It gives an overall CI that’s shorter than the CI for any of the single studies. It’s
shorter because adding further information from additional studies should reduce our
uncertainty about where the population average lies.
All these results can be shown in a forest plot. A forest plot shows point and interval
estimates for individual studies, and displays the meta-analysis result as a diamond.
Well-conducted replications make a vital contribution to building a research literature that
includes fewer wrong findings, and therefore deserve our trust.
Replication and meta-analysis are important because we should adopt meta-analytic
thinking. This is the consideration of any study in the context of similar studies already
conducted, or to be conducted in the future.
A step-by-step plan for estimation
1. State the research question. Express it as a ‘how much’ or ‘to what extent’
2. Identify the measure that’s most appropriate for answering that question
3. Design a study that uses that measure and gives us good point and interval
estimates to answer our question
4. After running the study, examine the data, calculate point and interval estimates, and
make a figure
5. Interpret these, using judgement in the research context
6. Report the study, making sure to state there was no selective reporting of just some
of the results, and giving full details of every aspect of the study
7. Adopt meta-analytic thinking throughout. Seek other similar studies and, if
appropriate, conduct a meta-analysis. Consider conducting a replication.
Chapter 2: Research Fundamentals: Don’t Fool Yourself
Several reasons why it is necessary to understand research:
1. Research has become important in many, many fields. You could easily be involved
in planning, conducting, and interpreting research in your future career.
μσ√≠αβδρ
, 2. As a citizen and customer, you will often encounter issues and choices for which
understanding research will be invaluable, even essential.
Inference from sample to population
In chapter 1, the book used the results from a sample to make an inference (conclusion)
about the population of all intending voters. The main goal of this book is to explain
techniques of statistical inference that justify doing that - using a sample to reach a
conclusion.
Estimation is the main statistical inference technique that will be discussed.
Definition list before this chapter starts:
- Population: usually a very large set of people which are interested in drawing
conclusions
- Sample: a set of people selected from the population
- Descriptive statistic: a summary number, such as the sample mean, that tells us
about a set of data
- Inferential statistic: such as a CI, is calculated from sample data and tell us about the
underlying population
Random Sampling
A sample is useful to the extent that it tells us about the population, which means that it
needs to be representative. How should we choose the sample so it’s representative?
The best strategy is usually to seek a random sample. It requires that every member of the
population has an equal probability of being chosen and all members of the sample are
chosen independently, meaning separately the choice of one sample member has no
influence on whether another is chosen.
μσ√≠αβδρ
, It’s rarely possible to meet the two requirements of random sampling, and you are likely to
have to use a convenience sample; a practically achievable sample from the population.
In order to calculate the CI, you have to make a judgement as to whether it’s reasonable to
draw conclusions about a particular population, if you use a convenience sample.
Making comparisons
To make a comparison, a study needs at least two variables. First the independent variable
(IV). It defines the two conditions we wish to compare. It can take two values or values
(conditions or treatments)
Sometimes one of the conditions is chosen to provide a baseline for the comparison, in
which case we can refer to it as the control condition.
The second variable is the dependent variable (DV). It is the variable that’s measured in the
study and provides the data to be analysed.
Experimental and Non-Experimental Research
There are several ways to set up a study. One approach is a non-experimental one. For
example, choosing students to either use a pen to write with or a laptop when they are
testing their memory.
These can often be easy and convenient, but the trouble is that the group almost certainly
differs in ways. Researchers can’t always conclude that differences that are observed are
caused by the choice of pen or laptop; it could have been caused by any of the possible
differences between the two groups. Those differences are called confounds; an unwanted
difference between groups, which is likely to limit the conclusions we can draw from a study.
Another approach is to manipulate the method to create two, separate, independent groups
or conditions. This is an experimental approach.
This approach uses random assignment to form the group or conditions to be compared.
Random assignment tends to even out all the differences, meaning that it helps avoid
confounds. This is good because the aim of research is to investigate cause and effect.
On the other hand, this approach is often not possible. Random assignments need to be
done strictly.
Both approaches can be valuable, and often their data are analysed in the same way. One
big difference is taking care when stating conclusions. An experimental study can justify
concluding something. For a non-experimental study, we can only say it ‘tends to’ or ‘is
associated with’ that something.
Measurement
Measurement is an important issue across science. However, measurement is not so easy.
For example, if you want to measure someone’s anxiety, or the amount someone has
learned from a lecture. In this case, anxiety is the construct of interest: the underlying
psychological characteristic we wish to study. To measure it, we need to operationalize it.
In many cases they are well-established tests or instruments to choose from. A good
measure features two basic principles: reliability and validity
Reliability refers to the repeatability or consistency: if you measure again, are you likely to
get the same result?
μσ√≠αβδρ