Summery exam QM 21
Numerical (quantitative) data:
Real valued or whole numbers: 85.3 graden Celcius, 2.12mm (dimension), 14 (counts).
- Interval data: order, addition and subtraction will have well-defined meaning. Example:
temperature in Celcius.
- ratio data: order, addition and subtraction, but also multiplication and division have meaning. There
is a meaningful zero point. Example: temperature in Kelvin.
Categorical (qualitative) data:
Subtraction and addition have no meaning.
- Ordinal data (order): there is a well-defined order among categories. Example: good, acceptable,
critical, reject.
- Nominal data: unordered categories; merely labels. Example: type XO, MM, SW, LS.
- Binary data: Example: pass/fail, positive/negative, ok/nok.
Numerical data contain more information than categorical data.
Operational definitions
14% of the trains is delayed. This percentage has to be reduced. How did NS measure this
percentage?
Before we can study the phenomenon “trains are delayed”, we need give this term a precise
definition. We call this: the operational definition of the problem. An operational definition specifies
which actions (measurements) should be done to determine whether a train is delayed or not.
Operational definitions are always arbitrary to some extent: the correct definition does not exist.
Describing data
Numerical (quantitative) data:
descriptive statistics: mean, standard deviation, skewness
Histogram, boxplot, scatterplot
Percentiles such as median and quartiles
distributions
Skew = measure of asymmetry. Skewness of 0 implies a symmetrical distribution.
Positive skew (scores bunched at low values with the tail pointing to high values (left).
Negative skew (scores bunched at high values with the tail pointing to low values (right).
1
,Boxplot (Box-Whisker Diagram)
A boxplot is composed of:
The median of the data
a box: indicates the ‘bulk’ of the data. Ranges from Q1 to Q3.
‘wiskers’: indicate the tails of the distribution. Extend from the
box to the last value within 1.5 times the length of the box
outliers
Categorical (qualitative) data:
Frequency tables, mode (most frequent outcome), median (only for ordinal data; middle
outcome)
Bar charts or pie charts
Bimodal: Having two modes & Multimodal: having several modes
Sample vs. Population
With data from a sample, we can estimate the underlying population
parameters location and spread. The blue histogram gives the
distribution of the values in the sample, with mean and standard
deviation. The red curve is an estimation (“fitted”) population
distribution, with two parameters population mean and population
standard deviation.
2
,Confidence interval
Results from sample is an estimate, which is subject to some
inaccuracy. We can calculate two boundaries in between
which the sought characteristics lies with 95% confidence.
These boundaries mark the confidence interval. The larger
the sample, the more accurate our estimate, the smaller our
confidence interval.
Formulas for confidence interval on the mean (assuming a
normal distribution):
Formulas for confidence interval on the variance (assuming a normal distribution):
Measurement validity
Strategies to assess measurement validity.
Beforehand:
Devil’s advocate brainstorming
Check definitions and calculations
Do a test measurement
During data collection: Monitor the measurement process
After data collection:
Face validity of the data: Judge the dataset based on common sense.
Autopsies: One could investigate strange or unexpected data patterns or observations.
Sampling error
Difference between the sample statistic and the population parameter that exists only because we
consider a part of the population (the sample) and not the whole population.
The larger the sample size, the smaller the sampling error.
Measurement error (nonsampling error)
The discrepancy between the actual value we’re trying to measure, and the number we use to
represent that value. Does not go away with sample size.
Systematic measurement error
Difference between the average measurement result and the true value.
Examples:
NMI calibrates pumps at gas station at a yearly basis
Clocks on mobile phones are regularly synchronized with online
time servers.
Random measurement error
3
, Unsystematic deviations due to imprecision of the measurement system. Example: for ice skating at
the winter Olympics, multiple time measurements systems are used.
Collecting data
Why sampling? “Better precise information of a part of the population, then vague information of
the whole population”.
The sample has to be representative. To this extent: the sample should be based on a mechanism
that is independent of the question. Always use a computer to draw random numbers.
Simple random sampling
Use a chance mechanism to select items. Each item in the population has equal chance of being
selected. Number all items in the population 1, 2, …, N. Use Minitab to draw a random sample of n
items out of N. Without replacement: each item can be selected only once.
Systematic sampling
Use a rule to select a sample. For example:
Order all items and select each 100th item.
Select all persons whose name starts with an S.
Collect an item from a production process every 30 minutes.
Be careful that the rule is not correlated with the question of the study!
Stratified sampling
The population can be divided into subpopulations (strata)
that differ from each other in mean and/or variance.
Inhomogeneous population large sample size?
Solution: stratified sample
1. Divide the population into homogeneous parts
(strata)
2. Take a sample from every stratum
3. Combine the resulta of the strata to a result for the whole population
Normal distribution
If a random variable X has a normal distribution with mean ц and
standard deviation Ó, then we write . A normal
distribution with mean 0 and standard deviation 1 is called the
standard normal distribution: .
Normal (Gaussian) distributions are symmetrical.
Weibull distribution: is skewed.
Shape parameter: ß and scale parameter: n
Properties of the normal distribution:
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller aqua03. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.96. You're not tied to anything after your purchase.