INFERENTIAL STATISTICS -
➔ Used to test hypothesis -
➔ Estimate disease burden (parameters)
➔ Each time the study is done, answer/hypothesis might be varied depending on the
sample you choose from the whole population THUS the estimate from a SINGLE study
may not be the the answer
➔ How to ensure this is the true result ?
➔ Even if you do the study 100 times and make it. Graph - it will be a normal bell shaped
curve
➔ And the results will lie in the 95% of the area
➔ Confidence interval : plausible range within which we are confident that the true value
lies
➔ If the CI does includes 1, then we are not confident about the groups being effective
against the other
➔ How do we prevent this uncertainty - INCREASE THE SAMPLE SIZE - smaller
confidence interval
➔
➔ Bigger the sample, bigger the representation, can be generalised
➔
,Sampling
➔ Selection bias - the sample won’t represent the entire population
➔ Larger and more random the sample, greater the inference /9applying the estimates
back to the population)
➔ Sampling frame - whole population whom you want to study
➔ Quantitative sampling techniques mein the calculations are done beforehand to estimate
the population ile you may need to get achieve significance - you do like a pilot
calculation to estimate the significance of your results
➔ Two types of sampling: Probability and Non probability sampling
Probability Sampling Types
➔ Simple random sampling - using a software to pick out people - like a lottery style bal
machines after the population is numbered
➔ Stratified sampling - first divide people based on age/gender and then conduct simple
random sampling in those 2 groups
➔ Cluster sampling - based on geographical location- people are divided into
clusters/groups and then random sampling from a few groups - Helps dealing with
subjects distributed geographically.
➔ Multi stage sampling - combining sampling methods - you may do cluster sampling -
followed by stratified - again Strat based on the groups you want to study, age, ethnic
group, gender - then simple sampling
◆ Non probability sampling - does not use random sampling methods - NO
GENERALISATION
◆ Convenience sampling - study population by asking for volunteers, people on the
road - used for small scale pilot study
◆ Quota sampling - they have a quota of 100 and just recruit people who may say
yes until 100 is reached - also might use proportional quota sampling
Types of errors
➔ Sample size calculation enough to give statistical power
➔ Type 1 error: researcher says result is significant but there is none; false positive.
➔ Probability of a type 1 error occurring - p
➔ P<0.05 - less than 5% probability of a type 1 Error occurring
➔ Null hypothesis means the opposite of your hypothesis
➔ Type 2 error: there is a genuine effect but researcher claim there isn’t: False negative
➔ Probability of a type II error : beta (power of a study)
➔ Occurs when the sample is too small
➔ Power of a study is 1-beta or 100-beta (if percent)
, ➔ Power of a study needs to be at least 80% meaning accepting a 20% probability of a
type II error
In hypothesis testing we always strive to test and reject the null hypothesis and therefore prove
the alternative hypothesis.
➔ StatsDirect, SPSS or STATA
P value
● P values indicates if the result you have gotten is due to a chance and not real
● P<0.05 - 5% likely that it is a chance finding
● P<0.01 - 1% “. “
● Recalls itself to the confidence interval - 95% we are sure, 5% we think it may be from
chance (why 5%? - arbitrary value you can choose any value for ex: p<0.02 - can be
your basis for a significnt study
● Greater the sample size, lesser the range of your CI - better the study
, ●
➔ Used to test hypothesis -
➔ Estimate disease burden (parameters)
➔ Each time the study is done, answer/hypothesis might be varied depending on the
sample you choose from the whole population THUS the estimate from a SINGLE study
may not be the the answer
➔ How to ensure this is the true result ?
➔ Even if you do the study 100 times and make it. Graph - it will be a normal bell shaped
curve
➔ And the results will lie in the 95% of the area
➔ Confidence interval : plausible range within which we are confident that the true value
lies
➔ If the CI does includes 1, then we are not confident about the groups being effective
against the other
➔ How do we prevent this uncertainty - INCREASE THE SAMPLE SIZE - smaller
confidence interval
➔
➔ Bigger the sample, bigger the representation, can be generalised
➔
,Sampling
➔ Selection bias - the sample won’t represent the entire population
➔ Larger and more random the sample, greater the inference /9applying the estimates
back to the population)
➔ Sampling frame - whole population whom you want to study
➔ Quantitative sampling techniques mein the calculations are done beforehand to estimate
the population ile you may need to get achieve significance - you do like a pilot
calculation to estimate the significance of your results
➔ Two types of sampling: Probability and Non probability sampling
Probability Sampling Types
➔ Simple random sampling - using a software to pick out people - like a lottery style bal
machines after the population is numbered
➔ Stratified sampling - first divide people based on age/gender and then conduct simple
random sampling in those 2 groups
➔ Cluster sampling - based on geographical location- people are divided into
clusters/groups and then random sampling from a few groups - Helps dealing with
subjects distributed geographically.
➔ Multi stage sampling - combining sampling methods - you may do cluster sampling -
followed by stratified - again Strat based on the groups you want to study, age, ethnic
group, gender - then simple sampling
◆ Non probability sampling - does not use random sampling methods - NO
GENERALISATION
◆ Convenience sampling - study population by asking for volunteers, people on the
road - used for small scale pilot study
◆ Quota sampling - they have a quota of 100 and just recruit people who may say
yes until 100 is reached - also might use proportional quota sampling
Types of errors
➔ Sample size calculation enough to give statistical power
➔ Type 1 error: researcher says result is significant but there is none; false positive.
➔ Probability of a type 1 error occurring - p
➔ P<0.05 - less than 5% probability of a type 1 Error occurring
➔ Null hypothesis means the opposite of your hypothesis
➔ Type 2 error: there is a genuine effect but researcher claim there isn’t: False negative
➔ Probability of a type II error : beta (power of a study)
➔ Occurs when the sample is too small
➔ Power of a study is 1-beta or 100-beta (if percent)
, ➔ Power of a study needs to be at least 80% meaning accepting a 20% probability of a
type II error
In hypothesis testing we always strive to test and reject the null hypothesis and therefore prove
the alternative hypothesis.
➔ StatsDirect, SPSS or STATA
P value
● P values indicates if the result you have gotten is due to a chance and not real
● P<0.05 - 5% likely that it is a chance finding
● P<0.01 - 1% “. “
● Recalls itself to the confidence interval - 95% we are sure, 5% we think it may be from
chance (why 5%? - arbitrary value you can choose any value for ex: p<0.02 - can be
your basis for a significnt study
● Greater the sample size, lesser the range of your CI - better the study
, ●