Chapter 13: Ensuring the quality of data
Quantitative data aims to give an explanation, identify a
relationship between variables, expressing a social law, and
generalising findings. Qualitative data aims to understand
phenomena in its natural context and extrapolate (or transfer)
those findings to other similar situations. As these different research
styles have different aims, they also have different criteria that
determines whether the research is of a high quality or not.
1. Quantitative data is evaluated based on the reliability and
validity of the measurements. There are various types of reliability
and validity.
Reliability: This is the extent to which the observable (or
empirical) measures that represent a theoretical concept are
accurate when the experiment is repeated (or observed over and
over). Or, reliability is the degree to which a measuring instrument
produces equivalent results for repeated trials.
In other words, something is reliable if it gives the same results
each time you test it.
Variance is the variability between scores, and will occur even in
highly reliable studies. Results will never be identical, but can be
very similar (and thus, highly reliable). However, if a study is truly
reliable, this variance will be due to systematic factors, rather than
chance factors. Thus, researchers must establish if there are
regularities of the factors or variables, e.g. If measuring people's
opinions about their working conditions, can this vary from day to
day? Various techniques are used to assess reliability.
Test-retest reliability: Here, the same measurement procedure
is applied to the same group of people on two or more
occasions. If a study has high reliability it will produce very
similar results each time. However, there are different
problems which can affect this. The first example is history.
This is when something in the environment causes the
participants' answers to change between tests. Maturation is
when the participants themselves change, and as a result, will
give different answers when retested. Reactivity is when the
participants' change their answers because they have been
exposed to the test before – e.g. boredom, practising their
answers and trying to “do better” may all be ways in which the
participants second set of answers are different to their first.
Equivalent-form reliability: (also called parallel-form reliability)
This is when participants are re-tested, just as in the previous
case, but the researchers ask the same questions in a different
, way so that the effects of reactivity can be cancelled. Eg. “Do
you have a good sense of humour?” and “Can you usually find
something to laugh about?” Although this makes reactivity less
of a problem, history and maturation remain a problem.
Another problem with this technique is that it is difficult to
establish whether the two “identical” questions are, in fact,
asking the same thing. Furthermore, this technique is time
consuming as you have to design two sets of questions instead
of one.
Inter-rater reliability: This is when more than one “rater” (the
person who assesses the participants) is used to asses each
participant. Then, all the corresponding scores, or the average
scores for each participant are used. This ensures that the
readings will not be determined by a specific rater's judgement
or biases.
Internal consistency: This is a measure of the homogeneity of
the sections or answers within a measuring item (such as a
questionnaire). This means that if a depressed person takes a
questionnaire, all their answers should be fairly consistent. i.e.
They will not give “depressed” answers for everything, and
then answer “I love life and am excited to wake up every
morning”. That would not be internally consistent, as their
answers are not homogeneous (similar).
Split-halves reliability: This is when researchers split the test
into two halves, and then compare the results to ensure that
they obtained equal readings. This is more concerned with
testing the internal consistency of the instruments. Thus,
history, maturation, reactivity and equivalent forms are not a
problem here.
Item analysis: This is when researchers measure each item (ie.
The results of a specific question) compare to all the other
items, and to the test as a whole. In this way, researchers can
quickly pick up which items are distorting the data, and
exclude them from the data to improve internal consistency.
Often, researchers will also test the overall internal
consistency using a statistic called the coefficient of
reliability. An example of this would be Cronbach's alpha, and
a score of 0 would represent no reliability, while a score of 1
would represent absolute reliability. Most social scientists aim
for 0.7.
Validity: This is how accurately the measures actually represent
the concept, or whether it represents something else. For example,
if you are measuring someone's wealth from their income, that has
a high validity. If you measure someone's wealth by how often they