Pagina |1
Introduction to Research in Marketing
Lecture 1
Introduction
Important factors that brand managers need are:
- Segmentation – who are out their?
- Targeting – who do we want to reach?
- Positioning – how do we appeal to that segment?
Managers make decisions often relying on intuition alone. Marketing research is a set of
techniques for systematically collecting, recording analysing and interpreting data. This is
especially important in the age of big data! It helps managers to make better decisions
avoid pitfalls of decision errors! Researchers do not want to be influenced by e.g. the way you
ask a question. Independent of the industry or the functional area you plan to work in, you
should learn the techniques available to understand your customers and to profitably satisfy
their needs.
There are three goals of a statistical model: summarise, predict and what if?
Summarising: e.g. customer segmentation
The example right shows a form of summarising:
“How close are we (Heineken) to our
competitors, in the eyes of our consumers?”
Prediction: forecast what will happen, e.g. how customers are going to react to a certain
promotion
e.g. Who will respond to our direct marketing?, Who are our most valuable
customers? (Customer analysis), Who will default on his credit card bill? Who
will experience an insurable loss?
What if: what will happen when you adjust one of the variables causality
e.g. What if we change the organization and signage of our aisles, what will
happen to the amount of time customers spend shopping and what products
they buy?
Use ANOVA when analysing the results of an experiment
Measurements scales
Non-metric scales:
- nominal (categorical) – brand a vs brand b, categories of respond
Number serves only as label or tag for identifying or classifying objects in mutually
exclusive and collectively exhaustive categories
- ordinal – ranking
Numbers are assigned to objects to indicate the relative positions of some
characteristic of objects, but not the magnitude of difference between them.
, Pagina |2
These outcomes can be directional or categorical (labels) – can measure only the direction of
the response (e.g., yes/no).
Metric scales:
- interval – scales (1-5 or 1-7), words or numbers; artificial scale
Numbers are assigned to objects to indicate the relative positions of some
characteristic of objects with differences between objects being comparable; zero
point is arbitrary
- ration – actual zero point (e.g. age, income); no artificial scale
This is the most precise scale. Absolute zero point. It has all the advantages of other
scales
In contrast, when scales are continuous they not only measure direction or classification, but
intensity as well (e.g., strongly agree or somewhat agree). Going from nominal to ratio means
an increase in information you get from it
The right statistical technique depends on what scale is used, e.g., metric vs. non-metric.
Statistical programs (like SPSS!) make a big deal of asking you whether the variable is.
Entering the wrong form of variables causes you to run a bad / incorrect analysis.
Measurement error
To what extent is what you observe ≠ true value?
Validity: Does it measure what it’s supposed to?
Asking something different than what you are researching
Reliability: Is it stable? Error-free?
Did the answers stay the same after some time? Are they consistent?
e.g. At home you weigh yourself: 80, 81, 80.
Yesterday I weight myself at the doctor’s office: 75
How affects this validity and reliability? Validity lower, reliability higher.
In practice:
Validity: do these coefficients make sense? (i.e., do the effect sizes and signs give
plausible model results?)
Reliability: how much do these results change if:
We add additional variables to the model
We take away some observations (e.g., outliers, influential)
We estimate the same model on a new dataset
wild changes when adjusting you variables it says that your research is not
reliable
Sampling
Keep in mind that a sample should represent your population. Sampling is needed because
asking the complete population is time-consuming and expensive.
Statistics (characteristics of the sample) (estimate) parameters (characteristics of the
population)
Sampling error
The process of selecting a sufficient number of elements from the population, so that results
from analysing the sample are generalizable to the population. No sample statistic is exactly
reflecting the population characteristic. Sampling is the selection of a fraction of the
, Pagina |3
population for the ultimate purpose of being able to draw general conclusions about the
population.
Lecture 2
Sampling error
If you are sampling randomly, your bias should approx. = 0. This is called the “wisdom of the
crowds” phenomenon; bias goes down when there are more people in the sample.
Experiment candy
What went wrong?
- Three types of candy which were not equal in amount
When picking the candy out of the bag with your eyes closed, you pick the bigger ones
easier (oversampled) size bias
How would you fix it?
- Stratified sampling: know the proportion of candy in the bag; pick randomly from
each category
- Number the candies and pick five numbers randomly create sample framework
Random sampling is not as easy as you think
- Write frame
- Number the objects
- Pick a certain amount of objects
When the amount of variation is one set of the sample is smaller than in the other, the
standard error is lower.
Confidence interval
We know that sample estimate is not the same as the population. One constructs an interval
around the sample estimate to increase confidence. The wider the interval, the more confident
we are that it covers the true population
Our best guess, given our sample is that the population lies in the following interval:
𝑋̅ ± 𝓏 ∙ 𝑠𝑋̅
𝑋̅ = estimate from sample
𝓏 = number of standard deviations a point is away from estimate
o if confidence level is 68%: 𝓏 = 1.00
o if confidence level is 90%: 𝓏 = 1.65
o if confidence level is 95%: 𝓏 = 1.96
o if confidence level is 99%: 𝓏 = 3.00
𝑠𝑋̅ = standard error of the estimate
= “accuracy with which sample represents population”
, Pagina |4
= 𝜎⁄
√𝑛
The standard deviation of the sample is divided by the square root of the number of
respondents. This means there is no bias.
Statistical tests
Test whether confidence interval includes a certain point (usually zero)
- (We don’t usually believe that any effect is actually 0, but it’s sometimes useful.)
If the CI doesn’t include the point, then you can reject the null hypothesis.
Often we are interested in comparing groups. With hypothesis, testing whether two groups are
equal is equivalent to testing whether the difference between them equals zero.
When an interval goes from negative to positive H0 cannot be rejected since confidence
interval includes the zero point.
The possible outcomes of a hypothesis test are “reject” or “do not reject”. It is never possible
to “accept” a statistical hypothesis; there is only not enough data to reject it.
Hypothesis testing
Type I error: You’ve rejected a hypothesis but
the truth is different
Type II error: You didn’t rejected a hypothesis,
but it had to be rejected related to the power
of the sturdy
Power
Power is the ability to detect an effect if there is one. Holding α constant, power is positively
related to effect and sample size. A large effect (e.g., price changes 100%) requires a smaller
sample in order to have sufficient power. A smaller effect (e.g., subliminal ad) requires a
larger sample, in order to detect an effect
Problems with statistical significance
A common statistical error is to summarize comparisons by statistical significance
- p-value < .05 “SIGNIFICANT!!”
- p-value > .05 “not significant”
Should you always believe these results?
There can be some problems concerning statistical significance. Every time we say there is a
difference, there is a possibility we have it wrong (5% with a 95% confidence). Every time we
do a test there is an error
File drawer effect: hiding the studies with non-significant tests / results causes problems
in reproducibility
There are some other problems with statistical significance, namely statistical significance ≠
practical significance.
Confidence intervals give more information than p-values, because the latter does not include
the size of the effect. Keep non-significant variables in your model!