Management Research Methods 1 – Rob Goedhart
Week 1 video 1
Learning objectives
- The basic terminology, concepts, principles, techniques of statistical analysis
- Using software spss
- Preparation for MRM2
Slides and theory are leading
Week 1 video 2: Data
Data has a fixed structure
It consists of a number of properties (variables)
Each column represents one variable
Measures from a set of things/people/ etc (units
Each row represents one unit
Levels of measurements
- Categorical (entities are divided into distinct categories)
o Binary variable (two outcomes), eg pass or fail
o Nominal variable (extra options without ordering), eg whether someone is an
omnivore, vegetarian or vegan
o Ordinal variable (extra options with ordering), bad, intermediate, good
- Numerical:
o Discrete data (counts), eg number of defects, number of people that pass a
course, cannot be 10,5
o Continuous (entities get a distinct score) eg temperature, body length
Amount of information from the top to the bottom increases.
Variables can be converted to a lower level of measurement. This implied loss of
information, it cannot be reversible. You cannot go from less detailed data to detailed data.
Why is this relevant?
- Over the next year, you’ll study a broad range of statistical techniques
- For different types of data, there are different techniques
- The lower the amount of information in your data, the larger your sample needs to
be
,Data collection
In quantitative research, you need to motivate and document the way you collect data
- Is the sample representative? -> Point of statistics: Generalize findings in a sample to
an entire population
- Is the data valid? -> Do the data reflect what they should reflect?
- Is there measurement error? -> The discrepancy between the actual value we are
trying to measure, and the number we use to represent that value.
o Systematic error
o Random error
Systematic measurement error
The difference between the average measurement result and the true value
For systematic measurement error it’s possible to calibrate this error
Random measurement error
The unsystematic deviations due to imprecision of the measurement system
Week 1 video 3: Data analysis
Describing data
You usually don’t want to recite an entire dataset when someone asks you what is in it.
You summarize it in a few numbers.
Location:
- Median: the middle score when data are ordered.
- Mean: the average (sum data/ amount of data)
Dispersion
- Range: the smallest value – the largest value (largest – smallest) -> sensitive to
outliers
- Interquartile range: the range of the middle 50% of the data (25% - 75%)
- Variance: the average squared distance between each point and the mean of the
data
- Standard deviation: the square root of the variance
Makes much more sense because it’s in the same
language of the data
Confidence interval:
- When we estimate something (mean, standard deviation, correlation, etc.), we make
sampling error (a different sample will contain different estimates)
-
Sample average: X
Population average u (Greek letter for average)
- Sample average is not the same as population average, but will be close or around
each other.
- Specifically, in 95% of cases, we will find X such that
, - More variance -> less certainty, bigger sample -> more certainty
Skew (skewness)
- The asymmetry of the distribution
- Positive skew (score bunched at low values
with the tail pointing to high values) (Left)
- Negative skew (scores bunched at high values
with the tail pointing to low values) (Right)
Mode: most frequent score in the data
Bimodal: Having two modes
Multimodal: having several modes
Plotting data
Just like descriptive statistics, a popular way to concisely display an entire dataset.
Following plots are best practices: best way to display data
Depends on whether you have
- 1 or 2 variables to display
- Categorical or numerical data
Categorical: Bar chart, Pie chart,
Numerical: Histogram, Boxplot
The boxplot indicates the quartiles of the data. More specifically, maximum, third quartile
(75%), median (50%), First quartile (25%), minimum.
Displaying two categorical variables: Multiple bars plot
Displaying two numerical variables: Scatterplot (see correlation)