100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Full course notes for Introduction to Applied Statistics (STAT2080) CA$11.06
Add to cart

Class notes

Full course notes for Introduction to Applied Statistics (STAT2080)

 0 purchase

Full course notes from lectures and textbook readings for all units.

Preview 3 out of 21  pages

  • December 30, 2021
  • 21
  • 2021/2022
  • Class notes
  • Faisal khamis
  • All classes
All documents for this subject (1)
avatar-seller
emilygiles-duhamel
STAT 2080 - EXAM REVIEW

Chapter 1 - Looking at Data Distributions
1.1 : Data
- Cases: are the objects described by a set of data. Cases may be customers, companies, subjects
in a study, units in an experiment or other objects
- Label: is a special variable used in some data sets to distinguish the different cases
- Variable: is a special character of a cases
- Values: Different cases can have different values of a variable
- Qualitative variable : (categorical) places each case into one of several groups (categories)…
qualitative. Marital status, Political party, Eye colour, Telephone number
- A quantitative variable: takes numerical values for which arithmetic operations, such as adding
and averaging, make sense.
- Discrete (whole numbers): Number of children, Defects per hour, Counted Items
- Continuous: Weight, Voltage, Temperature, Measured characteristics
- Who? : What cases do the data describe? How many cases does a data set have
- What? : How many variables does the data set have? What are the exact definitions of these
variables? What are the units of measurement for each quantitative variable?
- Why?: What purpose do the data have? Do the data contain the information needed to answer
the questions of interest?

1.2 : Graphs
- Exploratory data analysis: Begin by examining each variable by itself. Then move on to study the
relationship among the variables. Begin with a graph, then add numerical summaries of specific
aspects of the data
- Variables: We construct a set of data by first deciding which cases or units we want to study. For
each case, we record information about characteristics that we call variables
o Individual: An object described by data
o Variable: Characteristic of the individual
- categorial variables: Places individual into one of several groups or categories. The distribution
of a categorical variable lists the categories and gives the count or percent of individuals who fall
into each category
o Bar graphs: represent categories as bars whose heights show the category counts or
precents
o Pie charts: Show the distribution of a categorical variable as a "pie" whose slices are
sized by the counts or precents for the categories
- quantitative variables: Takes numerical values for which arithmetic operations make sense. The
distribution of a quantitative variable tells us what values the variable takes on and how often it
takes those values.
o Histograms: show the distribution of a quantitative variable by using bars. The height of
a bar represents the number of individuals whose values fall within the corresponding
class. Divide the possible values into classes or intervals of equal widths. Count how
many observations fall into each interval. Instead of counts, one may also use percent's.
o Stem plots: separate each observation into a stem and a leaf that are then plotted to
display the distribution while maintaining the original values of the variable. Separate
each observation into a stem(all but the rightmost digit) and a leaf(the remaining digit)
Write the stems in a vertical column; draw a vertical line to the right of the stems. Write
each leaf in the row to the right of its stem; order leaves if desired.
- Distribution of a variable: To examine a single variable, we graphically display its distribution.
The distribution of a variable tells us what values it takes and how often it takes these values.
Distributions can be displayed using a variety of graphical tools. The proper choice of graph
depends on the nature of the variable.

, - Outlier: are observations that lie outside the overall pattern of a distribution. Always look for
outliers and try to explain them. The overall pattern is fairly symmetrical except for two states
that clearly do not belong to the main pattern. Alaska and Florida have unusually small and large
precents, respectively, of elderly residents in their populations. A large gap in the distribution is
typically a sign of an outlier.
- distribution is symmetric: if the right and left sides of the graph are approximately mirror images
of each other.
- distribution is skewed to the right : (right-skewed) if the right side of the graph (containing the
half of the observations with larger values) is much longer than the left side.
- It is skewed to the left: (left-skewed) if the left side of the graph is much longer than the right
side.
- A time plot : (line plot) shows behavior over time. Time: is always on the horizontal axis, and the
variable being measured is on the vertical axis. Look for an overall pattern (trend) and deviations
from this trend. Connecting the data points by lines may emphasize this trend. Look for patterns
that repeat at known regular intervals (seasonal variations).

1.3 : Describing Distributions with Numbers
- The mean: The most common measure of center is the arithmetic average or mean. Mean is
affected by the outliers, not always accurate. To find, add their values and divide by the number
of observations.


- The Median: Not affected by outliers, Another common measure of center is the median. The
median is the midpoint of a distribution, the number such that half of the observations are smaller
and the other half are larger.
1. Arrange all observations from smallest to largest.
2. If the number of observations n is odd, the median M is the center observation in the
ordered list.
3. If the number of observations nis even, the median M is the average of the two center
observations in the ordered list.
- The Mode: the data value that occurs most often in the list of data points. It is possible to have no
mode, one mode, or more than one mode
- Mean Vs Median: The mean and median measure center in different ways, and both are useful.
If the distribution is exactly symmetric, the mean and median are exactly the same. The mean
and median of a roughly symmetric distribution are close together. In a skewed distribution, the
mean is usually farther out in the long tail than is the median
- The Quartiles: A useful numerical description of a distribution requires both a measure of center
and a measure of spread.
1. Arrange the observations in increasing order and locate the median M.
2. The first quartile Q1 is the median of the observations located to the left of the median
in the ordered list.
3. The third quartile Q3 is the median of the observations located to the right of the
median in the ordered list.
- The five-number summary: of a distribution consists of the smallest observation, the first quartile,
the median, the third quartile, and the largest observation, written in order from smallest to largest
- Boxplots: The median and quartiles divide the distribution roughly into quarters. This leads to a
new way to display quantitative data, the boxplot.
1. Draw and label a number line that includes the range of the distribution.
2. Draw a central box from Q1to Q3.
3. Note the median M inside the box.
4. Extend lines (whiskers) from the box out to the minimum and maximum values that are
not outliers.
- Suspected Outliers: 1.5 x IQR Rule… defined as IQR = Q3–Q1.

, o The 1.5 x IQR Rule for Outliers: Call an observation an outlier if it falls more than 1.5
´IQR above the third quartile or below the first quartile.
- The standard deviation: The most common measure of spread looks at how far each observation
is from the mean. This measure is called the standard deviation. measures the average distance
of the observations from their mean. It is calculated by finding an average of the squared
distances and then taking the square root. This average squared distance is called the variance.



o s measures spread about the mean and should be used only when the mean is an
appropriate measure of center.
o s= 0 only when all observations have the same value and there is no spread.
Otherwise, s> 0.
o S is not resistant to outliers.
o S has the same units of measurement as the original observations.
- Choosing measures of center and spread: Mean and standard deviation & Median and
interquartile range. The median and IQR are usually better than the mean and standard
deviation for describing a skewed distribution or a distribution with outliers. Use mean and
standard deviation only for reasonably symmetric distributions that do not have outliers.
- Changing the unit of measurement: Variables can be recorded in different units of
measurement. Most often, one measurement unit is a linear transformation of another
measurement unit. Linear transformations do not change the basic shape of a distribution
(skew, symmetry). But they do change the measures of center and spread.
o Multiplying each observation by a positive number b multipliesboth measures of center
(mean, median) and spread (IQR, S) by b.
o Adding the same numbera (positive or negative) to each observation adds a to
measures of center and to quartiles, but does not change measures of spread (IQR, S)

1.4: Density Curves and Normal Distributions
- A density curve : is a curve that is always on or above the horizontal axis, has an area of exactly 1
underneath it. Describes the overall pattern of a distribution. The area under the curve and
above any range of values on the horizontal axis is the proportion of all observations that fall in
that range.
- Distinguishing the Median and Mean of a Density Curve: The median of a density curve is the
“equal-areas” point―the point that divides the area under the curve in half. The mean of a
density curve is the balance point, that is, the point at which the curve would balance if made of
solid material. The median and the mean are the same for a symmetric density curve. They both
lie at the center of the curve. The mean of a skewed curve is pulled away from the median in the
direction of the long tail. The mean and standard deviation computed from actual observations
(data) are denoted by 𝑥 and s, respectively. The mean and standard deviation of the actual
distribution represented by the density curve are denoted by µ (“mu”) and σ (“sigma”),
respectively.
- A Normal distribution: is described by a Normal density curve. Any particular Normal
distribution is completely specified/described by two numbers: its mean µand standard
deviation σ.
o The mean of a Normal distribution is the center of the symmetric Normal curve.
o The standard deviation is the distance from the center to the change-of-curvature
points on either side.
o We abbreviate the Normal distribution with mean µand standard deviation σ as N(µ,σ).
o All Normal curves are symmetric, single-peaked, and bell-shaped.
- The 68-95-99.7 Rule: In the Normal distribution with mean µ and standard deviation σ
o approximately 68% of the observations fall within σof µ.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller emilygiles-duhamel. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for CA$11.06. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

62774 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
CA$11.06
  • (0)
Add to cart
Added