Recap Introduction to Biomedical Sciences
Methodology
The research process starts with the formulation of the research question. Next, the hypothesis will be
formulated. This is a prediction about the expected effects. Then, a study is designed to investigate the
research question and data is collected. This step is called methodology. The obtained data can be
summarized using descriptive statistics. Based on the descriptive statistics, there can be described what is
going on in the sample. However, the research question does not just involve a sample. There wants to be
known if this effect is general, for the whole population. To do that, inferential statistics are needed. This are
statistics build on probability models to asses to which extend observing the sample is applicable to the
interested population.
A variable is an observable or hypothetical event that can change and whose changes can be measured in
some way. Examples are age, height, pain, temperature, cardiac output, etc. Different types of variables can
be distinguished, such as independent, dependent, extraneous and confounding variables. In science, the
interest is mostly at causal effects. These are directional effects, in which one variable causes or leads to
another variable. The outcome variable, that causes the observed effect, is the dependent variable. The
variable that is controlled or manipulated in the experiment is the independent variable. In essence what is
measured in a research design is the dependent variable at different levels of the independent variable. For
example, the effect of a drug on pain.
An extraneous variable is a variable that is not of interest to the researcher, but it might influence the
variables of interest if it is not controlled. Therefore, it might provide an alternative explanation. If this
variable is controlled there is no problem, but when it is not it will be a confounding variable, since it will
confound the relationship between the independent and dependent variable.
There are different levels to measure variables:
Nominal: variable represents a category without a logical order. If two categories: dichotomous
o Sex: female vs. male
o Colour: blue/red/green/…
o Type of respiratory disease: COPD/asthma/emphysema/…
Ordinal: ranked variable represents a category with a specific order or rank position
o Position in the race: 1st/2nd/3rd/…
o Position in height ranking: tallest/shortest/…
o Goodness of lung function: horrible/bad/reasonable/good/excellent
Discrete: counts – finite values
o Amount of shopping per week
o Coughs per hour
Continuous: scale variable with infinite values
o Weight of a package
o Distance between home and store
o Lung volume in litres
The nominal and ordinal variables are categorical. There is no meaningful interpretation of the differences
between the categories. The discrete and continuous variables are quantitative and there is a meaningful
interpretation of differences.
The first question to ask, when defining the research question is whether to look for a causal effect or an
association. Next, it is needed to decided how the dependent variables will be measured and what the level
of measurement will be. The same goes for the independent variable. However, for the manipulation of the
independent variable, there needs to be decided which groups or conditions will be compared, how many
,and whether the measurements/manipulations will be dependent or independent. For example, for a
dependent research design the participants will be exposed to the hot room and the normal room, whereas
for the independent research design some will be exposed to the hot room and some to the normal one.
In biomedical research, there are different types of study designs:
Observational study designs are used for merely observing and not manipulating
o Cross-sectional: all measurements happen at the same time
o Case-control: measure the outcome at a current timepoint and look back in time to find
possible predictors
o Cohort: measure variables at one point, follow the sample and asses them at a second
timepoint
Experimental study designs are used for not only observing, but also manipulating
o Randomized control design: participants are randomly assigned to groups (undergo only
one condition)
o Cross-over design: participants are randomly assigned to an order (undergo all conditions)
An essential element of a research design is deciding on a representative sample. The primary goal is to study
a specific population. This is the group to which is aimed to generalise the conclusions. It is too much to
observe the whole population, so there is made use of a sample. This is a subset of the population, the
limited group in which the data is observed.
Descriptive statistics
The goal of descriptive statistics is to present, organize and summarize the data that is observed in the
sample. For this, different measures can be used:
Measures of frequency
o Frequency and proportion
Measures of central tendency the most central or typical value of a data set
o Mean
o Median
o Mode
Measures of dispersion/variability the extent to which all the values in a data set vary around the
central or typical value
o Range and interquartile range
o Variance and standard deviation
The frequency shows how often each
value in the data set occurs and the
proportion shows how often each
, value in the data set occurs in proportion to other values. The mean is the average value. In the data set of
the example, it can be calculated: (16+18+14+17+18+18+19+16+20+17+19+15+15+17 +18+19+21+17+15+18)
/ 20 = = 17.35. The median is the value at the median location, which means the value at the middle
of the sample, if scores are ranked from lowest to highest. If there is a odd number of data, the value of the
median is found at the median location. If there is an even number of data, the median is the mean of the
adjacent values of the median location. For the example applies that the median is (17 + 18) / 2 = 17.5. The
mode is the most frequently occurring value in a data set. In some situations, there are 2 modes. This is
called a bi-modal situation. In the example, 18 is the mode. The mean, median and mode are 3 types of
central tendency measures. The question is when to use which one. The mean is used with quantitative data.
It takes account of the exact distances between values in the data set. It is a powerful statistic used in
estimating population parameters and in inferential statistics. However, the mean is sensitive to outliers
(extreme values in the data set). The median is used with ordinal data. It takes account only of the position of
ranked values in the data set and is unaffected by outliers (extreme values in the data set). The mode is
typically used with nominal data. It does not take account of the exact distances between values in the data
set, nor the rank order. It is unaffected by outliers and uninformative in small data sets.
The first measure of dispersion/variability is the range. This is the difference between the highest and the
lowest scores of the sample. In the example, it is 21 – 14 = 7. A more specific measure for dispersion is the
interquartile range. It is the distance between the two values that cut of the bottom 25% of values (Q1) and
the top 25% of values (Q3). Q1 is the 25th percentile: median of the values below the median.
Q3 is the 75th percentile: median of the values above the median. The IQR is Q3 – Q1. In the example, Q1 is
16 and Q3 is 18.5, which results in an IQR of 18.5 – 16 = 2.5. A third measure of dispersion is the variance.
This is an estimate of the average amount by which the scores in the
sample deviate from the mean score. This is
shown in the graph, but can also be
calculated manually. The variance is:
The standard deviation is simply calculated
with the square root of the variance.
Reversely, the variance is found by taking the square of the standard deviation.
What are the advantages and disadvantages of the 4 measures of dispersion? The
range is the simplest, rudest measure. It is sensitive to outliers and
unrepresentative of any features of the distribution of values between the
extremes. The IQR is unaffected by outliers and the most useful measure for
ordinal-level data. The standard deviation and variance take account of all values
in the data set. It is the most sensitive measures of dispersion, but it is also
sensitive to outliers. Specifically, they are measures of dispersion around the mean (for SD: at the scale of
variable) and are most useful measure for quantitative data. They are powerful statistics used in estimating
population parameters and in inferential statistics.
Probability theory
The goal of research is to draw a conclusion about a population based on data observed in a sample, by using
statistical tests based on probability distributions and measures of uncertainty. The probability is how likely
an event is to occur, the proportion of times an outcome would occur. This is stated as Pr(event) = … 0 would
mean impossible and 1 would mean a certain event. Example: Pr(man develops prostate cancer in his
lifetime) = 0.11.
The first and most basic rule of the probability theory is the complement rule. It states that the probability of
an event occurring and the probability of an event not occurring sums to 1. This means that the probability of