stat 301 final exam review purdue universitystat 301 final exam review purdue universitystat 301 final exam review purdue universitystat 301 final exam review purdue universitystat 301 final e
Stat 301 Final Exam Review
Population – the entire group we want information about (ALL)
Sample – data is collected from a subset of elements in the population
Parameter – a value calculated from the population
Statistic – a value calculated from the sample
Variability – the spread or range of the data
oCan be reduced by using a larger sample
Inference – the use of sample information to make conclusions about a population
Types of Samples:
oVoluntary Sample – participants choose themselves for the sample
Ex: survey respondents
oRandom Sample – participants are randomly selected for the sample
Ex: deer are captured and weighed
Types of Random Samples
Simple Random Sample (SRS) – consists of n individuals/objects from the population chosen in such a way that each individual has an equal chance of being selected
Stratified Random Sample
1.Divide population into similar strata
Strata – groups with similar entities
2.Choose separate SRS in each strata
Multistage Random Sample
1.Divide population into sub-population
2.Randomly select sub-population
3.Within each selected sub-population, randomly select individuals
Capture-Recapture Sample
Types of Bias:
oUnder coverage – relevant groups are not included or represented and have no chance at selection
oNonresponse – not everyone who is chosen participates
oResponse Bias – behavior of respondent, interviewer, or how the question is asked does not illicit the correct response
Types of Studies:
oAnecdotal Evidence – drawing conclusions from our own experience – not scientific
oObservational Study – gathering data without interfering with the sample
oExperimental Study – the conditions of the study are manipulated by the researcher
Experimental Unit – individuals participating in the experiment (called subjects if human) Expressed in the singular “A/AN”
Factors – the explanatory (dependent) variables
Factor Levels – the specific values of the explanatory variables
Treatments – all the combos of the different factor levels for all factors (the specific experimental
conditions applied to the units)
Response Variable – what is being measured for each unit
Types of Experimental Design:
oCompletely Randomized Design – randomly assign subjects to different treatments oRandomized Block Design – divide subjects into blocks then administer treatments
oMatched Pairs Design – subjects grouped into pairs of similar individuals that each receive one of two treatments
oPrinciples of Experimental Design (Things to watch out for):
Lurking Variables – variables that are not taken into consideration that can influence the final results
Placebo Effect – the response of a subject to a treatment that they believe/hope will assist them
Bias – systematically favoring certain outcomes
Lack of Realism – the subjects or treatments do not realistically duplicate the conditions to be studied
Reduce theses by:
»Control Group (Compare) – doesn’t receive treatment
»Randomization – pick subjects randomly
»Replication – have plenty of subjects
Causation – this exists when changes in the explanatory variables are the only cause of changes in the response variable
Ethical Principles:
1.Review Board – reviews all planned studies in advance in order to protect the subjects from possible harm
2.Informed Consent – ALL subjects must be informed of the project and provide their consent
3.Confidentiality – individual subjects cannot be identified when the study is published – only statistical summaries can be used
Ethical Principles for Animals (4 “R’s”):
1.Review Board
2.Replacement – using non-animal models such as microorganisms or cell culture techniques, computer simulations, or species lower on the phylogenetic scale
3.Reduction – reduce the number of animals needed by implementing careful experimental design
4.Refinement – eliminate or reduce unnecessary pain and distress
Types of Variables:
oCategorical Variables:
Records a thought, observation, opinion, or words
Data expressed in words
Graphical Representations:
Bar Graph:
»Horizontal Axis: labelled with the value of the variable
»Vertical Axis: labelled with frequency (# or %)
»Bars do not need to touch each other
»Values of variable do not need to appear in any given order
»Each unit can give one answer, multiple answers, or no answer
Pie Chart:
»Full circle
»Respondents may only give one answer
»Slices correspond to the frequency of each possible response (%) »Sizes of the slices are proportional to the frequency
oQuantitative Variables:
Information presents in the form of numbers
Answers the questions: “How often”, “How many”
Graphical Representations:
Stem Plots:
»Used for smaller data sets
Histograms:
»Horizontal Axis: continuous range of values for variables
»Vertical Axis: frequency (number or %) corresponding to different bins
»Vertical bars for each bin touch each other
»Values on x-axis have a continuous order
»Used for larger data sets
Distribution – the distribution of a variable gives the values it takes and how often it takes these values
oShape: Number of peaks
Unimodal = 1 peak
Bimodal = 2 peaks
Multimodal = 3+ peaks
oSkewness: Level of symmetry
Symmetric – approximately equal tails / no longer tail
Right Skew – positively skewed = tail to the right
Left Skew – negatively skewed = tail to the left
oCenter (Middle):
Mean (average) – the value of the data (xx̅)
Not a resistant measure (its value is influenced by the presence of extreme observations)
Mode – the element that appears most frequently in the dataset
Median – the middle value of a distribution
(n+1)/2
oSpread:
Range = max value – min value
Variance = S2
Standard Deviation = S = the positive square root of the variance
Represents the average distance of a data point from the mean
Not a resistant measure
Quartiles and Percentiles:
Pth percentile – value of the variable such that p% of the observations fall at or below it
»25th percentile = First Quartile = Q 1
»50th percentile = Median = M
»75th percentile = Third Quartile = Q 3
Interquartile Range (IQR) = Q 3 – Q1
5-Number Summary: min, Q 1, M, Q3, max
»Good at describing non-symmetric distributions
Outliers – these are points which are either much larger or smaller than the other points in the dataset
High Outliers > Q3 + 1.5 * IQR
Low Outliers < Q1 – 1.6 * IQR
oWhat to use when describing a distribution:
Use 5-Number Summary if you have:
Skewed Distribution
Outliers
The mean and quartiles are resistant measures
Use Mean and Standard Deviation if you have:
Symmetric Distribution
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller VEVA2K. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $15.98. You're not tied to anything after your purchase.