This course reviews and explains the basic statistical concepts and techniques that are used in the area of Management and Business Administration and MSc theses, and emphasizes the practical application of the various techniques using SPSS software.
What is “data”?
- Various properties (variables) measured from a set of things, people (units)
- Data has a fixed structure
o Each column = properties of the unit
o Each row = unit, the thing we’re studying
What is a “case” or “unit”?
- Experimental or observational entity being measured (students, cats etc.)
- Each case or unit has variables
Types of measurement
- Categorical measurements shown with distinct categories
o Binary variable
Two categories (dead/alive, black/white).
Offers the least amount of information
Need minimum 300 units for good sample:
The less information shown, the larger the sample needs to be
o Nominal variable
Has several categories (omnivore, vegetarian or vegan)
o Ordinal variable
Assesses value on a scale (bad, ok, good, great)
- Numerical measurements: shown with numbers
o Discrete data:
Round number counts (nr of defects, delayed flights)
Cannot be negative, must be count-able
o Continuous data:
A numerical value (body temp, height, weight)
Offers the most amount of information
Need minimum 30 units for good sample
Information amounts:
- We can always downshift information amounts (from discrete to nominal, subtracting
amount of information)
- We cannot shift upwards: we cannot add information
- Downshifting information is non-reversible.
o Body length less than 160 cm we convert to category “small”
o Body length between 160 cm – 180 we convert to category “med”
o Body length greater than 180 cm we convert to category “tall”
The lower the amount of information, the larger the sample needs to be
Complementary research methods:
- Research involving numbers = quantitative methods
- Research analyzing language = qualitative methods
Research:
- Start with question you want to answer, based on observation, anecdotal, etc.
- Generate theory, generate hypothesis
- Identify variables
- Gather data, measure variables
- Analyze data, fit model to graph
- Theory is supported or not
,Falsification: proving a theory wrong
Variables: things that change based on circumstances
- Independent variable: the “causing” variable (predictor), shown as “x”
- Dependent variable: the “effect” variable (outcome)
- Categorical variable: things that belong to various categories
- Binary variables: things that fall into two categories
- Nominal variable: a variable with the same name but with more than two categories
(cat/brown, cat/black, cat/white, cat/orange); can use numbers
- Ordinal variable: categorical variables on a value scale (awful, bad, ok, good, great)
- Continuous variable: gives a score for each thing being measured, can take on any
value (negative, decimal, etc.)
- Interval variable: when points on a scale are equidistant to each other
- Ration variable: ratio of values along the scale have meaning: true “zero”, and that
“4” is twice as meaningful as “2”
- Discrete variable: round numbers only, no negative numbers
Parameters:
- Different from variables
- Constants believed to be fundamental truths
- The mean and median represent the center of the distribution
Level of measurement:
- That which is being measured and the numbers that represent what is being
measured
Validity: whether an instrument is measuring what it is supposed to measure
- Criterion validity: how well one measure predicts an outcome for another measure
(a high GRE score predicts how well someone does in school)
- Concurrent validity: data recorded simultaneously with new instrument against
existing criteria
- Predictive validity: data from a new instrument are used to predict observations at a
later point in time
- Content validity: the degree to which individual items represent the construct being
measured
- Test-retest validity: test the same group twice – a reliable instrument produces
similar results at various points in time
Reliability:
- Whether an instrument measures consistently across different situations
Correlational research:
- We observe what naturally goes on without directly interfering with it
- Cross-sectional research
Experimental research: we manipulate one variable to see its effect on another
Assessing data
Is the sample representative?
- Does the sample represent the total population?
- Can the sample findings be generalized to an entire population?
- Eg: only sampling students in Amsterdam for a country-wide study
Is the data valid?
- Does data reflect what it should reflect?
- Can it be used to answer the research question?
- “Face validity check”: checking data for obvious errors and mistakes
, - Were there other problems / irregularities during measurement?
Is there a measurement error?
- Discrepancy between the actual (real life) value we are trying to measure and the
number we use to represent that value
- Example: you (in reality) weigh 80 kg. According to your bathroom scale, you weigh
83 kg. The measurement error is 3 kg.
- Two types of measurement error:
o Systematic: problem with the system; the results are accurate but “off”:
measuring tool isn’t calibrated
o Random: problems unrelated to the system, results are all over the place;
measuring tool is non-functional, imprecise
Median:
- The “middle score” when data is ordered
- In a lineup of 11 datapoints, nr 6 is the median
- For an even number of results, use the mean of the two central numbers
Mean:
- The sum of the data (sigma) divided by the number of datapoints (n)
- Symbolized by “x bar”
If the median is lower than the mean…
- You have major outliers in the high end of the distribution
- Eg. one Bill Gates among normal to measure personal net worth
- Graph is positively skewed
If the median is higher than mean …
- You have major outliers in the low end of the distribution
- Eg. one guy with gambling debt among normal to measure personal net worth
- Graph is negatively skewed.
Describing data: distribution
Range:
- The smallest value subtracted from the largest
- Highest value = 100, smallest = 20? Range = 80
- Very sensitive to outliers
Interquartile range:
- The middle 50% of the data
- Range between upper and lower quartiles
- Upper quartile: the median of the upper half of the data
- Lower quartile: the median of the lower half of the data
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller fiona54. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.97. You're not tied to anything after your purchase.