,
, Revision PACK
Questions. Answers
Study unit 1 Introduction and Data Collection
Introduction to statistics
Definition: The branch of mathematics that deals with the collection, organization, analysis, and interpretation
of numerical data in order to make decisions.
Statistics involves planning an experiment, obtaining the relevant data, analysing the data obtained, and
interpreting and drawing conclusions (inferences) from the data.
All decisions we make involve uncertainty about the outcome, things could turn out in many different ways,
but we don’t know how probable each outcome is. To help us cope with this uncertainty or risk or the ‘chance’
that something might happen, we study and use probability theories where we look at all the different
outcomes and try determine the probability of each one occurring.
Often statements such as half of the students taking a test score less than the average mark; in a large
population of animals, about half of the adult animals are heavier than the average adult weight; All these
arguments, however, are only unsupported theories until they can be tested. Data has to be collected in
order to test the validity or such theories. A variety of statistical techniques (methods used during analysis)
may be used to determine whether or not the data supports these arguments.
Sample vs population
A sample is a portion or a sub-set of a population. Sometimes a populations is large and it is difficult to gather
information from each element in the populations. In such cases we gather information about only a part of
the population and use this information to draw conclusions about the entire population by sampling from it.
A statistic is a summary measure for a sample (the result from testing the data in the sample) and a
parameter (deduction / inference that can be made) for a population.
A population does not necessarily mean a collection of people, it can also be a collection of any kind of items
such as books, cars or cell phones. If data are collected on all the elements of the population, it is referred to
as a census. Thus a population refers to ALL elements within a specific category, e.g. all students in
registered for STA1510; all females in South Africa.
Measures (results/conclusions) for a sample are called statistics
Measures (results/conclusions) for a population are called parameters.
, Types of variables
Types of data
Quantitative data Qualitative data
Discrete data Continuous data
Quantitative data are numerical values (you can count it or attached a numerical measurement to it like
km/kg).
Non-numerical data is known as qualitative data. Data such as age, distance and money are all quantitative in
nature while type of transport, the make of your shirt, gender, political affiliation and the different subjects you
are enrolled for are all examples of qualitative data.
Quantitative data can be further classified as being discrete or continuous.
Discrete data can take on only integer values (whole numbers), you can count e.g. how many people are
there in your class; how many cars do you own?
Whereas continuous data can take on any value (have decimals) e.g. how far do travel to work; what is your
height?
Levels of measurement
Nominal-scaled data
A categorical variable, also called a nominal variable, is for mutual exclusive, but not ordered, categories.
Data that is divided into various categories of equal importance e.g. colour of your eyes, make/model of cars,
people’s names. To calculate the average would be meaningless.
Ordinal-scaled data
Is a scale on which data is shown simply in order of magnitude since there is no standard of measurement of
differences: for instance, a squash ladder is an ordinal scale since one can say only that one person is
better than another, but not by how much.
Preferences are ordered (ranked), but do represent specific measurements, i.e. all categories are not
equally important). A ordinal variable, is one where the order matters but not the difference between values.
E.g. you might ask patients to express the amount of pain they are feeling on a scale of 1 to 10. A score of 7
means more pain that a score of 5, and that is more than a score of 3. But the difference between the 7 and
the 5 may not be the same as that between 5 and 3. The values simply express an order. Another example
would be movie ratings.
Interval-scaled data
Similar to ordinal-scaled data that can be ranked, but the ‘difference/distance’ between measurements is
constant. An interval variable is a measurement where the difference between two values is meaningful. The
difference between a temperature of 100 degrees and 90 degrees is the same difference as between
90 degrees and 80 degrees.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Jennifer2024. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $2.65. You're not tied to anything after your purchase.