Introduction to statistics
Statistics plays a vitally important role in the research. Much of the scientific information is very
often explained in statistical terms, with many decisions in the Health Sciences being created
through statistical studies
Statistics enables you:
o to read and eval...
Introduction to statistics
Statistics plays a vitally important role in the research. Much of the scientific information is very
often explained in statistical terms, with many decisions in the Health Sciences being created
through statistical studies
Statistics enables you:
o to read and evaluate reports and other literature
o to take independent research investigations
o to describe the data in meaningful terms
Definitions
Statistics: is the study of how to collect, organizes, analyze, and interpret data.
Data: the values recorded in an experiment or observation.
Population: refers to any collection of individual items or units that are the subject of
investigation.
Sample: A small representative sample of a population is called sample.
Observation: each unit in the sample provides a record, as a measurement which is called
observation.
Sampling: getting sample from a population
Variable: the value of an item or individual is called variable
Raw Data: Data collected in original form.
Frequency: The number of times a certain value or class of values occurs.
Tabulation: can be defined as the logical and systematic arrangement `of statistical data in rows
and columns.
Frequency Distribution: The organization of raw data in table form with classes and frequencies.
Class Limits: Separate one class in a grouped frequency distribution from another. The limits
could actually appear in the data and have gaps between the upper limit of one class and the
lower limit of the next.
Class Boundaries: Separate one class in a grouped frequency distribution from another.
Cumulative Frequency: The number of values less than the upper class boundary for the
current class. This is a running total of the frequencies.
Histogram: A graph which displays the data by using vertical bars of various heights to represent
frequencies.
Variables
• The value of an item or individual is called variable.
• Variables are of two types:
o Quantitative: a variable with a numeric value. E.g. age, weight.
o Qualitative: a variable with a category or group value. E.g. Gender (M/F),
Religion (H/M/C), Qualification (degree/PG)
• Quantitative variable are two types:
o Discrete /categorical variables
o Continuous variables
• Variables can be
o Independent
Are not influenced by other variables.
Are not influenced by the event, but could influence the event.
o Dependent
The variable which is influenced by the others is often referred as
dependent variable.
1
,SBL 321: Biostatistics J. C.Korir
E.g. In an experimental study on relaxation intervention for reducing hypertension, blood
pressure is the dependent variable and relaxation training, age and gender are independent
variable.
Sampling
• Sampling is the process of getting a representative fraction of a population.
• Analysis of the sample gives an idea of the population.
Methods of sampling
1. Random Sampling or Probability sampling
Simple random sampling
Stratified random Sampling
Systematic sampling
Cluster sampling
Propotionate sampling
Multistage sampling
2. Non-random sampling
Haphazard Sampling
Convenient Sampling
Purposive Sampling
Quota Sampling
Simple Random sampling
Each individual of the population has an equal chance of being included in the sample. Two
methods are used in simple random sampling:
• Random Numbers method
• Lottery method
Stratified random sampling
Stratified random sampling is used when we have subgroups in our population that are likely to
differ substantially in their responses or behavior. This sampling technique treats the population
as though it were two or more separate populations and then randomly samples within each.
For example, you are interested in visual-spatial reasoning and previous research suggests that
men and women will perform differently on these types of task. So, you divide your sample into
male and female members and randomly select equal numbers within each subgroup (or
"stratum"). With this technique, you are guaranteed to have enough of each subgroup for
meaningful analysis.
Systematic sampling
Systematic sampling yields a probability sample but it is not a random sampling strategy.
Systematic sampling strategies take every nth person from the sampling frame. For example,
you choose a random start page and take every 45th name in the directory until you have the
desired sample size. Its major advantage is that it is much less cumbersome to use than the
procedures outlined for simple random sampling.
Cluster sampling
Cluster sampling is useful when it would be impossible or impractical to identify every person in
the sample. Suppose a college does not print a student directory. It would be most practical in
this instance to sample students from classes. Rather than randomly sample 10% of students
from each class, which would be a difficult task, randomly sampling every student in 10% of the
classes would be easier.
Sampling every student in a class is not a random procedure. However, by randomly selecting
the classes, you have a greater probability of capturing a representative sample of the
population. Many students believe that it is not possible to gather a representative sample for a
2
,SBL 321: Biostatistics J. C.Korir
class project or a thesis. However, this type of cluster sampling is easily done, especially since all
colleges publish lists of classes for registration.
Propotionate sampling
Proportionate sampling is a variation of stratified random sampling. We use this technique when
our subgroups vary dramatically in size in our population. For example, we are interested in risk
taking among college students and suspect that risk taking might differ between smokers and
nonsmokers. Given increasing societal pressures against smoking, there are many fewer
smokers on campus than nonsmokers. Rather than take equal numbers of smokers and
nonsmokers, we want each group represented in their proportions in the population.
Proportionate sampling strategies begin by stratifying the population into relevant subgroups
and then random sampling within each subgroup. The number of participants that we recruit
from each subgroup is equal to their proportion in the population.
Multistage sampling
This is the most sophisticated sampling strategy and it is often used in large epidemiological
studies. To obtain a representative national sample, researchers may select zip codes at random
from each state. Within these zip codes, streets are randomly selected. Within each street,
addresses are randomly selected. While each zip code constitutes a cluster, which may not be as
accurate as other probability sampling strategies, it still can be very accurate.
Non-random sampling
Non-probability sampling strategies are used when it is practically impossible to use probability
sampling strategies. This typically occurs because of time and expense constraints and the lack
of an adequate sampling frame. Nonprobability sampling is also used when the frequency of the
behavior or characteristic of interest is so low in the population that a more targeted strategy is
needed to find sufficient numbers of participants for the research.
Haphazard Sampling
Haphazard sampling is a strategy that is almost guaranteed to introduce bias into your study. It
should be avoided at all costs. A typical haphazard strategy uses a "man-on-the-street"
technique to recruit those who wander by or selects a sampling frame that does not accurately
reflect the population.
Convenience sampling
This is a type of non-probability sampling which involves the sample being drawn from that part
of the population which is selected because it is readily available and convenient.
Purposive sampling
Purposive sampling targets a particular group of people. When the desired population for the
study is rare or very difficult to locate and recruit for a study, purposive sampling may be the
only option. For example, you are interested in studying cognitive processing speed of young
adults who have suffered closed head brain injuries in automobile accidents. This would be a
difficult population to find.
Quota sampling
In quota sampling, the population is first segmented into mutually exclusive sub-groups, just as
in stratified sampling. Then judgment is used to select the subjects or units from each segment
based on a specified proportion. For example, an interviewer may be told to sample 200 females
and 300 males between the age of 45 and 60. This means that individuals can put a demand on
who they want to sample (targeting)
It is this second step which makes the technique one of non-probability sampling. In quota
sampling, the selection of the sample is non-random unlike random sampling and can often be
found unreliable. For example interviewers might be tempted to interview those people in the
street who look most helpful, or may choose to use accidental sampling to question those which
3
, SBL 321: Biostatistics J. C.Korir
are closest to them, for time-keeping sake. The problem is that these samples may be biased
because not everyone gets a chance of selection. This non-random element is its greatest
weakness and quota versus probability has been a matter of controversy for many years.
Quota sampling is useful when time is limited, a sampling frame is not available, the research
budget is very tight or when detailed accuracy is not important. You can also choose how many
of each category is selected.
Scales of measurement
There are five measurement scales are used:
• Nominal Data
• Ordinal Data
• Rank Data
• Discrete Data
• Continuous Data
Nominal data
Nominal variables include categories of people, events, and other phenomena are named. Often
we do not need the full power of numbers for every application. To make this point clear we
classify our use of numbers into different class. For example, one kind of data is what we call a
nominal data; when we label males as 0, females as 1, then that’s nominal data. Another
example of nominal data is if we use 0 to denote who's alive and 1 for denoting people who are
dead. In both these examples, they are nominally numbers, just 0 or 1.
The only property we're making use of the number system here is that 0 is different from 1.
We're not saying 1 is bigger than 0. We're not saying that 1 is one unit away from 0. Simply that
0 and 1 are different. This is the simplest example we have of nominal data. This is sometimes
called binary data or dichotomous data, depending upon whether you prefer the Greek or the
Latin root for two.
But it doesn't just have to have two values. For example, if we're looking at blood groups, here
we would need four values: one each for blood groups A, B, AB and O.
They are exhaustive in nature, and are mutually exclusive. These categories are discrete and
non-continuous. The Statistical operations permissible are: counting of frequency, Percentage,
Proportion, mode, and coefficient of contingency.
Ordinal data
It is second in terms of its refinement as a means of classifying information. It incorporates the
functions of nominal scale. The ordinal scale is used to arrange (or rank) individuals into a
sequence ranging from the highest to lowest. For example, we might classify some disease as
mild, moderate, or severe, where we might label mild as a 1, moderate as a 2, and severe a 3.
We use the order of the data because 2 is a little bit more severe than 1, and 3 is a little bit more
severe than 2. So the order is important.
Rank data
Rank data is sort of like when we just had the Olympics, the person who finishes first gets the
gold medal. The person who finishes second gets the silver. It doesn't matter how far behind the
second is from the first. It's just that the second one finished second. So it could be a fraction of
a second, to finish second, later than the first. Or it could be a few minutes. It doesn't matter.
It's just the rank, the rank in which the data are ordered.
Interval data
Interval scale refers to the third level of measurement in relation to complexity of statistical
techniques used to analyze data. It is quantitative in nature. The individual units are equidistant
from one point to the other. The interval data does not have an absolute zero. E.g. temperature
is measured in Celsius or Fahrenheit.
4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
√ Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper BESTGRADED. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €16,20. Je zit daarna nergens aan vast.