Samenvatting TOE
TOE correlationeel
Onderzoekscyclus en manieren om data te genereren/errors
Onderzoeksvraag correlationeel op basis van PAC acroniem:
P = population.
A = association: verband/wat voor soort relatie wordt verwacht?
C = constructs: kenmerken waartussen verband verwacht wordt.
Research cycle from KOM: concept/theory research question research design
hypotheses data collection data analysis supporting data strengthens the theory or
non-supporting data leads to revisions of the theory and research design.
KOM review: to measure a (theoretical) construct, researchers must follow steps (construct
conceptual definition operational definition variable).
Different types of correlational data: customer satisfaction, political polls, governmental
statistics.
There is a lot of growth of digital data over the last 20 years first almost nobody had a
digital device, now there are 50 billion smart objects/devices. This is while the world
population stayed rather consistent over these years.
(quantitative) data are generated in different ways:
Incidentally, aka organic (ready made):
o Aspirational: facebook, twitter, instragram.
o Transactional: bonuskaart van ah, pinpas, mastercard.
Purposively, aka designed (custom made):
o Experiment.
o Survey.
o Administrative: IRS, belastingdienst.
Correlational data (designed):
We design a study and collect data to:
o Describe the social reality.
o Study (causal) relationships.
o Generalize to the target population.
Inferential goals are:
o Description,
o Causation,
o And prediction.
Measuring:
What are you interested in?
, o Facts secondary data, ask specialist.
o Behaviours observe/ask the person.
o Opinions ask the person.
We often ask people questions through surveys.
How to ask?
Survey modes:
o Face-to-face (CAPI/computer-assisted personal interviewing).
o Mail.
o Telephone (CATI/Computer Assisted Telephone Interviewing).
o Internet.
o Mixed-mode.
Differences between the modes:
o Degree of interviewer involvement.
o Degree of interaction with the respondent.
o Degree of privacy.
o Channels of communication (visual or auditive).
o Technology use.
Survey modes in NL:
Telephone surveys (random digit dialing) not widely used.
Mixed modes:
o Postal mail with invitation to internet surveys.
o Telephone component if numbers are known
Market researchers use both probability (aselecte) and nonprobability (selecte)
online panels.
Mixed-device surveys:
o Computer/smartphone.
Cross-sectional and panel surveys:
Panel surveys interview respondents over time (content usually the same, but may
differ).
Advantages:
o We can assess within-person change and causality.
o We can disentangle age, period, and cohort effects.
Potentional errors:
o Attrition (= drop-out of consecutive ware-nonresponse).
o Panel conditioning (AKA learning effects)
Types of mixed-mode design:
, One mode for some respondents, another mode for others:
o E.g. online survey with mail component for those without internet.
One mode for recruitment, another for survey administration:
o E.g. mail invitation for an online survey.
One mode for data collection, another for reminders, follow-up:
o E.g. telephone reminders for an online survey.
One mode for main part of the interview, another for some subset of answers (e.g.
sensitive items):
o E.g. telephone and Audio Computer Self-Administered (ACASI/Audio
computer-assisted self-interviewing = allows respondents to listen to
prerecorded survey questions through headphones and record responses
using a touch screen or keypad.).
One mode for one wave of the panel survey, another for others:
o E.g. first wave face-to-face, following waves online to save costs.
Survey modes in comparison: Overview
Survey lifecycle + errors that can occur:
For measurement: construct measurement (measurement error) response
(processing error) edited response survey statistics.
For representation: target population (coverage error/dekkingsfout) sampling
frame (sampling error/steekproeffout) sample (nonresponse error/non-response
fout) respondents (adjustment error) postsurvey adjustments survey
statistics.
Sampling: with probability samling (aselect) you can generalize to the population. With non-
probability sampling (select) you cannot generalize to the population.
Literature lecture 1
Bivariate correlation/association = association that involves 2 variables. 3 types of
association are positive, negative and zero. With analysis of bivariate correlation is only
looked at two variables at a time. If there are more than 2 variables, then these are
, presented between different pairs. Steps: collecting data testing association (using
scatterplots and correlation coefficient r).
r has 2 qualities, direction and strength. It is always between 1 and -1. Guidelines for
evaluating strength or association based on r:
r Effect size
.10 or -.10 Small or weak
.30 or -.30 Medium or moderate
.50 or -.50 Large or strong
A scatterplot is used when the results of two quantitative variables are researched. When
one of the variables is categorical, it is better to use a bar graph. With a bar graph you
usually examine the difference between group averages (and look for a mean) to look for an
association. r is used in these situations, but more likely to use the t-test (group averages).
With an association claim the most important validities to interrogate are construct validity
(= how well was each variable measured?) and statistical validity (= how well do the data
support the conclusion?). Statistical validity questions:
1. what is the effect size?
2. Is the correlation statistically significant?:
a. It is a matter of statistical inference (= the likelihood we will see a correlation
in a sample as we would see if the whole population was researched), and
probability estimate = p-value.
3. Could there be outliers affecting the association?: a single outlier can have effect on
r. Outliers matter most when sample is small.
4. Is there restriction of range?: definition of situation = if there is not a full range of
scores on one variable, it can make the correlation appear weaker than it really is.
How to fix: obtain more data with more range, or use correction for restriction of
range. Usually this question is asked when correlation is weak.
5. Is the association curvilinear?: curvilinear association = the relationship between 2
variables is not a straight line, but might be positive/negative up to a point. This is
rare. r is designed to describe the slope of the best-fitting straight line through the
scatterplot. But when scatterplot goes up and then down (or vice versa), it does not
describe pattern well.
It is tempting to make causal claim from correlational result. First need to apply the 3 causal
claims:
1. Covariance of cause and effect: results must show correlation/association.
2. Temporal precedence: cause variable must precede the effect variable. Criterion is
often called the directionality problem (we don’t know which variable came first).
3. Internal validity: no other plausible explanations. Criterion is often called the third-
variable problem (if there is plausible/probable third variable, causation cannot be
inferred). The third variable must correlate logically with both of the measured