Data Preparation
➔ think about what kind of analyses will be conducted and what type of data is needed
➔ identify/create and label the variables
◆ use a codebook, a log of how the data was prepared and how the analyses was conducted
➔ ensure correct values are inputted
➔ screen the data set for errors, missing values, etc
◆ using graphs [i.e. histograms, scatterplots, etc.] would be beneficial
Distributions
*the kinds of distributions are hyperlinked to google images of what they should look like*
➔ Guassian
◆ normal distribution
➔ Lognormal
◆ log normal distribution
➔ Skewed
◆ positively/negatively skewed
➔ Lepto
◆ leptokurtic distribution
● very skinny bell curve
➔ Platy
◆ platykurtic distribution
● very flat bell curve
Types of Statistics
➔ descriptive stats: organize and describe the data
◆ can’t make conclusions/generalizations based on these stats
◆ look for trends, but isn’t conclusive
◆ numerical summaries of data
➔ inferential stats: make predictions about the population through observations and analysis of
a sample, using statistical tests
◆ use descriptive stats to make explore the inferential stats, afterwards
◆ must have a good representative sample to make the predictions
, ● assess if the sample represents the general pop. by testing assumptions
◆ consider sampling error before inference
● relatively small SE to the end results
Histograms
➔ a visual summary of univariate data, w/ minimal loss of info
➔ usually used w/ dependent variable, but can also be used with independent variable
➔ identify the anomalies (factors that could skew the data) that violate assumptions
◆ i.e. outliers, non-normality
➔ histograms vs. bar graphs
Histograms Bar Graphs
● bars are touching, depicting the variable is ● bars don’t touch, depicting that the
continuous variable is discrete
● univariate graph: shows distribution of ● compares variables
one continuous variable [y-axis/IV can be
a frequency while the x-axis/DV is a
continuous variable]
● bars can’t be reordered; ascending order ● bars can be reordered; any order fine
only
➔ constructing bins (must be careful in order to avoid creating misleading information)
◆ bins are equal-sized; the range per bin needs to be equal
◆ the size/number of bins can change the shape of the graph
◆ formula for # of bins: 2k = n
● k: the number of bins
● n: the number of data points
● essentially, you’re taking the square root of n (isolating for k), k is the number of
bins, to determine if those number of bins is enough solve 2k and see if its
greater/equal to n
◆ formula for bin width: (Max - Min)/k
● divide the range by the number of bins
Descriptive Stats (click the title to return to summary of descriptive stats)
➔ two kinds
◆ measures of central tendency
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller sobikaaravi. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.64. You're not tied to anything after your purchase.