An introduction to Statistical Methods & Data Analysis
Chapter 1
The four-step process
1. Defining the problem
2. Collecting the data
3. Summarizing the data
4. Analyzing the data,
Interpreting the analyses and
Communicating the results
Population: the set of all measurements of interest to the sample collector (de totale groep).
Sample: any subset of measurements selected from the population (een subset/deel van de
populatie).
Chapter 2
2.1 Introduction and abstract of research study
Designing the data collection process: Research the public’s perception of the bus system to increase
the uses of buses.
1. Specifying the objective of the study, survey or experiment
What aspects of the bus system determine whether or not a person will ride the bus?
Objective – identify factors that the transportation department can alter to increase the
number of people using the bus system.
2. Identifying the variable(s) of interest
Examine the objective of the study.
Review studies conducted in other cities, brainstorm with the employees.
Safety, cost, cleanliness of the bus etc.
Measurements obtaining the study: importance rating (very important, important etc.),
demographic information, how frequently a person rides the bus.
3. Choosing an appropriate design for the survey or experimental study
Surveys, experiments, examination of existing data, censuses (volksstellingen)
government records and previous studies.
Goal – to gather data on existing conditions, attitudes or behaviors.
4. Collecting the data
Construct a questionnaire and then sample current riders of the buses and persons who
use other forms of transportation within the city.
,Experimental studies
More active – varying (veranderen) the experimental conditions to study the effect of the
conditions on the outcome of the experiment.
As many as possible of the factors that affect the measurements are under the control of
the experimenter.
2.2 Observational studies
Observational study – the researcher records information concerning the subjects under study
without any interference with the process that is generating the information – passive observer.
Experimental study – the researcher actively manipulates certain variables associated with the study
(explanatory variables) and then records their effects on the response variables associated with the
experimental subjects. (de onderzoeker manipuleert actief bepaalde variabelen die aan het
onderzoek zijn gekoppeld (verklarende variabelen) en registreert vervolgens hun effecten op de
responsvariabelen die aan de onderwerpen zijn gekoppeld).
Explanatory variables: certain variables associated with the study (onafhankelijke variabele -
oorzaak).
Response variables: associated with the experimental subjects (afhankelijke variabele -
gevolg).
A severe limitation of observational studies is that recorded values of the response variables may be
affected by variables other than the explanatory variables. These are..
Confounding variables: variables not under the control of the researcher (manipulate the
explanatory and response variables). The effect of the confounding variables and the
explanatory variables on the response variable cannot be separated due to the lack of control
the researcher has over the physical setting in which the observations are made.
Maintain control over all variables that may have an effect on the response variables.
Observational studies are dichotomizes (divided) into..
Comparative study (exploratief onderzoek) - two or more methods of achieving a result are
compared for effectiveness.
Descriptive study (beschrijvend onderzoek) – to characterize a population or process based on
certain attributes in that population or process.
In an observational study, the factors of interest are not manipulated while making measurements or
observations. Surveys are often used, but there are biases and sampling problems:
Cause-and-effect relationships – assigning these relationships to spurious associations
between factors (een valse link leggen. Vb. Je kan mensen geen hoogvet-dieet geven om te
kijken of ze meer risico hebben op hevz). Even if you could assign a cause-and-effect
relationship, the study results could be reported as an association and not a casual
relationship.
Observational studies are of three basic types:
, A sample survey (cross-sectional or prevalence study): a study that provides information
about a population at a particular point in time (alle gegevens tegelijkertijd verzamelen).
A prospective study: a study that observes a population in the present using a sample survey
and proceeds to follow the subjects in the sample forward in time in order to record the
occurrence of specific outcomes (eerst gegevens, later uitkomst).
Subjects can keep careful records of their daily activities.
Subjects can be instructed to avoid certain activities that may bias the study
Confounding variables may not be completely controlled – restrict the study to matched
subgroups of subjects.
A retrospective study: a study that observes a population in the present using a sample
survey and also collects information about the subjects in the sample regarding the
occurrence of specific outcomes that have already taken place (aan de uitkomst wordt
onderzocht of er verschillen zijn in eerder gedrag, een groep met en zonder de ziekte en kijken
naar verschilen in hun verleden).
Retrospective studies is cheaper and faster than prospective studies.
Retrospective studies have problems due to inaccuracies (onnauwkeurigheden) in data
due to recall errors (iets uit het verleden vertellen).
Retrospective studies have no control over variables that may affect disease occurrence.
Cohort study (prospective): a group of subjects is followed forward in time to observe the differences
in characteristics between subjects who develop a disease an those who do not.
Case-control study (restrospective): two groups of subjects are identifies, one with the disease and
one without the disease. Next, information is gathered about the subjects from their past concerning
risk factors that are associated with the disease. Distinctions are then drawn between the groups
based on these characteristics.
2.3 Sampling designs for surveys
Crucial element – how is the sample selected from the population – foreknowledge, coerding
(gedwongen) into participating etc.
The components that are necessary for a sample to be effective:
Target population: the complete collection of objects whose description is the major goal to
study;
All persons paying taxes, all registered voters etc.
Sample: a subset of the target population.
Sampled population: the complete collection of objects that have the potential of being
selected in the sample; the population from which the sample is actually selected;
The persons from the target population you were able to reach, who reacted.
Nonresponse: a subset of the target population that refuses to fill out the survey. It is
important to characterize the nonresponders.
Observation unit: the objects about which data are collected;
A specific individual or a water stream.
Sample unit: The object that is actually sampled;
households = sampled units, individuals in the sampled households = observation unit.
Sampling frame: the list of sampling units;
A list of addresses of the households.