Lecture 1: introduction
Data analytics for marketing
Data Tools Strategy
Know:
- when to use what tool?
- how to use a tool?
- what information is needed?
- what decision to be made?
- limitations / strengths?
Lecture Tool Decision
2 Linear regression Market responses (eg
pricing)
3 Conjoint analysis New product design
4 Bass model New product diffusion
5 Cluster analysis Segmentation
6 Multi-dimensional scaling Positioning
These tools are specifically designed for marketing strategies.
Principles of data-driven marketing
Generic and applicable to almost all data-driven marketing situations.
P1: Any statistical analysis is to reduce/minimize information loss.
P2: Causation cannot be learnt directly from data.
P3: Prediction does not care about statistical significance
P4: Practical usefulness triumphs statistical criteria.
,Lecture 2: Market Response Models
Assumptions of a model guarantee the validity of a model.
Data prediction model prediction.
Goal: to find a functional relationship between input (IV) & output (DV), can have
many forms.
Linear regression: y = a + bx.
a = intercept
b = slope (if x moves 1 unit, y moves 1 unit).
b: the expected changes in Y, given a 1 unit increase
in X. This is not completely correct because it is not
causation.
Example: using price to predict sales. You can create
a scatterplot to check for the correlations.
Objective: to fit the relationship into a line.
Price = IV = input
Sales = DV = output.
What is a good prediction?
Principle: any statistical analysis is to reduce information loss.
choose a line to minimize the differences.
Residuals = differences = e = ∆ 𝑦 = 𝑦 – 𝑦^
Y^ = predicted y.
Choose a & b.
1. Square ∆y
2. Sum up over all points
5-step framework for linear regression
1. Examine the data
Check for multicollinearity.
Multicollinearity = highly correlated IVs, multiple variables containing the same
information.
This leads to: biased & misleading coefficients. Also: information redundancy.
We want IVs that are not highly correlated. If VIF < 10, the IVs are not highly
correlated.
If VIF > 10, we have a collinearity issue (e.g. age and income).
use either one variable in regression;
transform the correlated variables into a mutually independent set of
predictors (e.g. factor analysis);
collect more data.
2. Formulating the model
Decide which variables to use as input and translate this into a formula.
y = b0 + b1*X1 + … + bk*Xk + e.
Sales = β0 + β1 Advertising + β2 Promotion + β3 Price + β4 BrandEquity + e
In R: DV ~ IVs. Regression of the DV on IVs.
E.g.: regression of Sales on Advertising, Promotion, Price, and Brand Equity.
,3. Estimating the model
Translate the equation into a R formula:
model_with_brand <- lm(Sales ~ Advertising + Promotion + Price +
Brand_Equity, data = train)
summary(model_with_brand)
Any statistical analysis is to reduce/minimize information loss.
Choose coefficients in such a way that the difference (= residuals) between
actual Y and predicted Y is minimized.
OLS Least square criterion: minimize the residual sum of squares (RSS).
4. Validating the model
Naïve prediction: no model involved, only based on distribution (only intercepts,
no other IVs). In a normal distribution, the median = mean.
E.g., What is the height of a Dutch female?
The question is: is the model better than a naïve prediction? test the overall
model significance.
Overall model significance
H0: b1 = b2 = (…) = bk = 0.
We compare the model we run to a null model with no IVs.
We test the null hypothesis that the coefficients (βs) of all IVs are zero, the model
than has no predictive value. Check F-stat and P-value.
If P>0.05 H0 is true the IVs do not impact the prediction (DV). The
coefficients are equal to 0.
If P<0.05 we can reject H0 the model is of predictive value.
For this model, the F-stat is 322.1, with a p-value < 2.2e-16 < 0.05.
Model fit / Strength of association / How good is the model?
R2
We tested that the model is significant, next question: how good is the model for
prediction? test model fit (R2) to validate the current model.
How well does the model fit the data?
R2 = the % of variation in the DV explained by the IV (by the model).
The higher the R2, the better the prediction. Value between 0 and 1.
R2 = the explained variation (SSreg) / total variation (SSy). So, the percentage of
variation that is explained.
, Here: R2 = 0.9485 94.85% of the variation in the sales is captured by the
model.
There is no clear cut-off value for R2, you must consider the setting. In sales
prediction, a big R2 is expected (e.g., 90%), because retailers are often faced
with a relatively stable environment where consumers show persistent habits of
buying products.
Adjusted R2
Adjusted R2 penalizes the number of IVs. Useful to compare models with different
number of IVs. Adjusted R2 is used for model comparison.
Significance of coefficients
To test for the significance of individual coefficients, we test the following
hypothesis for a particular IV:
H0: βk = 0.
Check the p-value < 0.05.
Parameter significance test using t-test.
E.g.; for IV Brand Equity: β=1.46, SE: 0.06, P <2.2e-16 < 0.05.
we reject H0 and we conclude that Brand Equity is of predictive value.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller margot0408. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.49. You're not tied to anything after your purchase.