Statistics
Population the whole set of items that of interest
•
is are .
A observes of
member
population
•
census or measures
every a .
A subset of intended
sample is some the
population to represent the
population
•
.
CENSUS SAMPLE
Advantages Advantages
: :
↳ result
gives a
completely accurate .
↳ Less time
consuming
&
expensive than census
↳ Fewer
people have to
respond
Disadvantages :
↳ Less data to
process
than in a census
↳ Time &
consuming expensive
Disadvantages
cannot be used if the
testing
↳
process :
destroys the item ↳ data not as accurate
↳ of
hard to
process large quantity data ↳
sample may not be
large enough to
give
information about small sub of the
groups
-
Sampling units are individual units of a
population .
population .
Sampling frame is a numbered list of
sampling
units of a
population .
Random number
generator using calculator '
-
g.
min *
,
Menu 8 OPTION 2 Value Ran Int Cl too )
SAMPLING
→
OF
→ → →
TYPES
:
,
t how
many
d
to be
•
RANDOM SAMPLING Range :
A1 :
A 20 chosen
1) SIMPLE RANDOM SAMPLING 2) SYSTEMATIC SAMPLING 3) STRATIFIED SAMPLING
where item frame divided into strata s and
sampling population
-
in elements chosen at
.
required
a
every
a
are
of
has an
equal chance
being chosen .
regular intervals from an ordered list .
simple random
sampling is carried out
stratum
stamp 's
no in
each item has
identifying number ( ) starting overall
•
on
Take Kth item K
'
in each
-
.
group
-
every
.
×
sample
.
size
' '
no ,
in population
lottery sampling
Use random number
generator at random item between 1 and K
proportion sampled from each
• .
same
group
. .
.
Adv bias free Adv and Adv reflects
population structure
quick
:
simple
- : :
&
cheap suitable for
large populations guarantees proportional representation
:
easy
: :
.
not suitable for Dis frame needed of
Dis
large population sampling
within
population
:
groups
: .
frame needed Dis
sampling introduce bias if population must be
clearly
:
classified
sampling
: .
:
can into
each stratum
( dis
+ within
)
,
frame not random distinct strata . adv .
of
simple random
,•
NON RANDOM SAMPLING -
1) QUOTA SAMPLING 2) OPPORTUNITY SAMPLING
Population divided into
according sample taken from
people who
present at
•
are
•
groups
to characteristic meet criteria
.
time of
study ,
who
Adv to out
easy carry
:
A
quota of items in each set
•
is
group
to reflect the
:
inexpensive
try and
group
's
proportion
Dis to representative sample
unlikely provide
:
a
in the whole
population .
highly dependent individual researcher
:
* on .
Interviewer selects the actual units
sampling
•
.
Adv .
:
allows small sample to still be representative
of
population .
sampling frame
required
:
no
quick
:
,
easy , inexpensive
:
between
easy comparison groups
Dis random bias
non
sampling can introduce
: -
.
divided into be
costly / inaccurate
groups can
:
;
of number of
increasing study increases
i.
scope
time d
groups , adding expense .
Northern hemisphere :
Jacksonville
Beijing
Southern hemisphere : Perth
Typical
:
ranges
* + r / trace < " 05mm
Mean rainfall 1mm ) Maximum lkn )
treat as 0 gust
Mean 1°C ) Mean cloud ( oktas )
temperature cover
Total sunshine ( hrs )
*
to nearest to ( )
Daily mean
visibility Dm
hecto pascals
( knot / Beaufort scale )
Daily mean wind
speed Daily mean
pressure
1hPa )
↳ 0 =
calm
I kn 1.15
↳ I -
3 =
light
=
mph ( %)
↳
↳
4
5
=
=
moderate
fresh
Daily maximum relative
humidity
, MEASURES OF LOCATION AND SPREAD
'"
( )
CUMULATIVE FREQUENCY
•
Mean : I =
n
discrete
[ set
n
I =
sf ( grouped )
①
80
§
3
/
on
if whole number between this
IT
take
halfway q
)
•
Lower quartile Q i
.
,
above
, ,
and the one .
• 60 -
if not whole number, round
up
and take this E
•
Upper quartile Q, :
Zan
data point 40
,
.
g
E
E
" median
Using interpolation
:
o >
h ( )
↳ to estimate median g, Height mm
quartiles percentiles
,
mean
, , ,
*
distributed
Assumption : data values are
evenly within each class .
↳ median falls in the class
3kg Ex <
7kg C O D l N Cy
e.g .
" "
3kg 7kg If the data coded the formula =
I
is
using y g
7
' t 21 ⇐ cumulative I -
a
the of the coded data
g-
•
frequency mean is b
=
original O
[Y
←
× (
7kg 3kg ) -
) t
3kg
=
5.86kg standard deviation of coded data
O, not affected
Oy
"
, by the
'
=
the a
.
is b
-
*
remember for continuous data been
:
,
values
might have
rounded
up
1 down .
OUTLIERS
( 3ms " E 5M include values from 2. son to 5. Sm ) ↳
an extreme value that lies outside the overall
pattern
may
of the data
Variance
'
( O ) mean
d
,
IT
'
{( Ssi I S×× the data
'
x
Cleaning is the
process
of
removing
-
= -
=
n n n n
anomalies from the data .
'
Sxx =
{(x -
x-p =
q , ,z -
Knx )
Standard Deviation BOX PLOTS
Q median
-
" "
)
Ibn
{
In Sn Snxx
' ,
-
I " "
"
O = = -
= ← Qs
outlier
←
X
T r
lowest value that value that
highest
For
grouped frequency calculate estimates of is not an outlier is not an outlier
,
l l l l l l l
' O 80
G and of each class interval
using midpoints
O .
Using the calculator :
Menu → 6 → 1 -
Variable →
input values → OPTION → 1 -
Variable calc
Frequency :
setup → down → statistics → on