Beschrijvende en inferentiële statistiek (S_PMBIS)
All documents for this subject (19)
Seller
Follow
maraoltmans1
Content preview
Chapter 1
1.1 Using data to answer statistical questions
Statistics: the art and scence of designig studies and analyzing the data that those studies
produce. Its ultimate goal is translating data into kwowledge and understanding of the world
around us. In short: he art and science of learning from data.
Statistical methods helps us investigate questions in an objective manner. Statistical problem
solving is an investigative process that involves four componens:
1. Formulate a statistical question
2. Collect data
3. Analyze data
4. Interpret results
Reasons for using statistical methods
There are three main components of statistics for answering a statistical question:
1. Design: stating the goal and or statistical question of interest and planning how to
obtain data that will address them
2. Description: summarizing and analyzing the data that are obtained
a. Exploring and summarizing patterns in the data
3. Inference: Making decisions and predictions based on the data for answering the
statistical question
a. Usually, the decision or prediction refers to a larger group of people
b. Description: stating the percentages for the sample of voters
c. Inference: predicting the outcome for all voters
Probability: a framework for quantifying how likely various possible outcomes are.
Variable: the charateristic being measured, such as number of hours per day that you watch
TV
1.2 Sample versus population
Subjects: the entities that we measure in a study, such as peopl or countries
Population: the set of all subjects in which we are interested
Sample: the data we have from the subjects who belong to the population (we don't always
have data from every subject)
Descriptive statistics and inferential statistics
Descriptive statistics: methods for summarizing the collected data (where data constitutes
either a sample or a population). The summaries usually consist of graphs and numbers such
as averages and percentages.
Inferential statistics: methods of making decisions or predictions about a population, based
on data obtained from a sample of that population
In most surveys, we have data for a sample, not for the entire population. We use descriptive
statistics to summarize the sample data and inferential statistics to make predictions about
the population.
An important aspect of statistical inference involves reporting the likely precision of a
prediction. How clos is the sample value likely to be to the true percentage of the
population?
Sample statistics and population parameters
,Sample statistic: the percentage of the sample
Parameter: numerical summary of the population.
Statistic: numerical summary of a sample taken from the population.
Randomness and variability
Random sampling: designed to make the sample representative of the population.
Estimation from surveys with random sampling
Margin of error: a measure of the expected variability from one random sample to the next
random sample.
In statistics, we let n denote the number of subjects in the sample.
Testing and statistical significance
In a randomized experiment, the variation that could be expected to occur just by chance
alone is rougly like the margin of error with simple random sampling. The difference
expected due to ordinary variation is smaller with larger samples. When the difference
between the results for the two treatments is so large that it would be rare to see such a
difference by ordinary random variation, we say that the results are statistically significant. --
> the larger the sample size, the better
1.3 Using calculators and computers
Chapter 2: Exploring Data with Graphs and Numerical
Summaries
2.1 different types of data
Variables: any characteristic observed in a study
Observations: the data values that we observe for a variable
Number > a variable is quantitative if observations on it take numerical values that
represent different magnitudes of the variable
Category > a variable is called categorical if each observation belongs to one of a set
of distinct categories
Quantitavie variables
Key feature: the center and the variability (spread) of data
Categorical variables
A key feuture is the relative number of observ ations
Quantitative variables are discrete or continuous
Discrete variable: if its possible values form a set of separate numbers (number of
pets in a household
Continuous: If its possible values form an interval (height, weight)
Distribution of a variable
The first step in analyzing data collected on variable is to look a the observed values by using
graphs and numerical summaries.
Distribution: describes how the observations fall (are distributed) across the range of
possible values.
,Features to look for in the distribution of a categorical vairbale:
The category with the largest frequency (modal category)
How frequently each category was observed
Features to look for in the distribution of a quantitative variable:
Shape (do observations cluster in certain intervals
Center (where does a typical observation fall?)
Variability (how tightly are the observations clustering around a center?
Frequency table
Frequency table: a listing for possible values for a variable, together with the number of
observations for each value.
Proportion: the number of observations in that category divided by the total number of
observations.
Percentage: the proportion multiplied by 100.
These are also called relative frequencies.
To show the distribution for a discrete quantitative variable, we would list thedistinct values
and gthe frequency of each one occurring.
For a continuous quantitative variable, we divide the numerica scale in intervals and count
the number of observations falling in each interval.
2.2 Graphical summaries of data
Graphs for categorical variables
Pie chart
Bar graph
o A bar graph with categories ordered by their frequency: pareto chart
Pareto principle: a small subset of categories often contains most of
the observations
Graphs for quantitative variables
Dot plot: shows a dot for each observation
Stem-and-leaf plot: each observation is represented by a stem and a leaf. Stem
consists of all the digits except for the final one, which is the leaf. (zie vb op p. 64)
Histograms: a graph that uses bars to portray the frequencies or the relative
frequencies of the possible outcomes for a quantitave variable (let op: bij
kwantitative variabelen heet het een histogram, een staafdiagram bij categorische
variabelen)
The shape of a distribution
Unimodel: single mound or peak
o Most often the peak is the mode
Bimodal: two distinct mounds
Shape: symmetric or skweded (to the right or left)
Gap: is there a gap that one or more observations notiveably deviate from the rest?
Tails: the parts of the curve for the lowest and for the highest values.
, Time plots: displaying data over time
Time series: a data set collected over time
Time plot: a way to display time-series graphically. This charts each observation on the
vertical scale against the time it was measured.
Trend: tendency of the data to rise or fall
2.3 Measuring the center of quantitative data
Describing the center: the mean and median
Mean: the sum of the observations divided by the number of observations (gemiddelde)
Median: the middle value of the observations when the observations are ordered from the
smallest to the largers (or from largest to the smallest)
Sample size: n
Basic properties of the mean:
Balance point of data
The mean is not equal to any value that was observed in the sample (usually)
For a skewed distribution, the mean is pulled in the direction of the longer tail
The mean can be highly influenced by an outlier
o Outlier: an observation that falls well above or well below the overall bulk of
the data
Comparing the mean and median
If the shape is:
Symmetric, the mean equals the median
Skewed to the left, the mean is smaller than the median
Skewed to the right, the mean is larger than the median.
The median is resistant to the effect of extreme observations: a numerical summary of the
observations is call resistant if extreme observations have little, if any, influence on its value.
The mode
The mode: the value that occurs most frequently. For continuous observations, it is usually
not meaningful to look for a mode because there can be multiple modes or no mode at all.
2.4 Measuring the variability of quantitative data
Measuring variability: the range
Range: the difference between the largest and the smallest observations
The range is not a resistant statistic. It shares the worst property of the mean, not being
resistant, and the worst property of the median, ignoring the numerical values of nearly all
the data.
Measuring variability: the standard deviation
A much nbeter numerical summary of variability uses all the data, and it describes a typical
distance of how far the data falls from the mean. It does this by summarizing deviations from
the mean.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller maraoltmans1. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.97. You're not tied to anything after your purchase.