Lecture 1: introduction
Data analytics for marketing
Data Tools Strategy
Know:
- when to use what tool?
- how to use a tool?
- what information is needed?
- what decision to be made?
- limitations / strengths?
Lecture Tool Decision
2 Linear regression Market responses (eg
pricing)
3 Conjoint analysis New product design
4 Bass model New product diffusion
5 Cluster analysis Segmentation
6 Multi-dimensional scaling Positioning
These tools are specifically designed for marketing strategies.
Principles of data-driven marketing
Generic and applicable to almost all data-driven marketing situations.
P1: Any statistical analysis is to reduce/minimize information loss.
P2: Causation cannot be learnt directly from data.
P3: Prediction does not care about statistical significance
P4: Practical usefulness triumphs statistical criteria.
,Lecture 2: Market Response Models
Assumptions of a model guarantee the validity of a model.
Data prediction model prediction.
Goal: to find a functional relationship between input (IV) & output (DV), can have
many forms.
Linear regression: y = a + bx.
a = intercept
b = slope (if x moves 1 unit, y moves 1 unit).
b: the expected changes in Y, given a 1 unit increase
in X. This is not completely correct because it is not
causation.
Example: using price to predict sales. You can create
a scatterplot to check for the correlations.
Objective: to fit the relationship into a line.
Price = IV = input
Sales = DV = output.
What is a good prediction?
Principle: any statistical analysis is to reduce information loss.
choose a line to minimize the differences.
Residuals = differences = e = ∆ 𝑦 = 𝑦 – 𝑦^
Y^ = predicted y.
Choose a & b.
1. Square ∆y
2. Sum up over all points
5-step framework for linear regression
1. Examine the data
Check for multicollinearity.
Multicollinearity = highly correlated IVs, multiple variables containing the same
information.
This leads to: biased & misleading coefficients. Also: information redundancy.
We want IVs that are not highly correlated. If VIF < 10, the IVs are not highly
correlated.
If VIF > 10, we have a collinearity issue (e.g. age and income).
use either one variable in regression;
transform the correlated variables into a mutually independent set of
predictors (e.g. factor analysis);
collect more data.
2. Formulating the model
Decide which variables to use as input and translate this into a formula.
y = b0 + b1*X1 + … + bk*Xk + e.
Sales = β0 + β1 Advertising + β2 Promotion + β3 Price + β4 BrandEquity + e
In R: DV ~ IVs. Regression of the DV on IVs.
E.g.: regression of Sales on Advertising, Promotion, Price, and Brand Equity.
,3. Estimating the model
Translate the equation into a R formula:
model_with_brand <- lm(Sales ~ Advertising + Promotion + Price +
Brand_Equity, data = train)
summary(model_with_brand)
Any statistical analysis is to reduce/minimize information loss.
Choose coefficients in such a way that the difference (= residuals) between
actual Y and predicted Y is minimized.
OLS Least square criterion: minimize the residual sum of squares (RSS).
4. Validating the model
Naïve prediction: no model involved, only based on distribution (only intercepts,
no other IVs). In a normal distribution, the median = mean.
E.g., What is the height of a Dutch female?
The question is: is the model better than a naïve prediction? test the overall
model significance.
Overall model significance
H0: b1 = b2 = (…) = bk = 0.
We compare the model we run to a null model with no IVs.
We test the null hypothesis that the coefficients (βs) of all IVs are zero, the model
than has no predictive value. Check F-stat and P-value.
If P>0.05 H0 is true the IVs do not impact the prediction (DV). The
coefficients are equal to 0.
If P<0.05 we can reject H0 the model is of predictive value.
For this model, the F-stat is 322.1, with a p-value < 2.2e-16 < 0.05.
Model fit / Strength of association / How good is the model?
R2
We tested that the model is significant, next question: how good is the model for
prediction? test model fit (R2) to validate the current model.
How well does the model fit the data?
R2 = the % of variation in the DV explained by the IV (by the model).
The higher the R2, the better the prediction. Value between 0 and 1.
R2 = the explained variation (SSreg) / total variation (SSy). So, the percentage of
variation that is explained.
, Here: R2 = 0.9485 94.85% of the variation in the sales is captured by the
model.
There is no clear cut-off value for R2, you must consider the setting. In sales
prediction, a big R2 is expected (e.g., 90%), because retailers are often faced
with a relatively stable environment where consumers show persistent habits of
buying products.
Adjusted R2
Adjusted R2 penalizes the number of IVs. Useful to compare models with different
number of IVs. Adjusted R2 is used for model comparison.
Significance of coefficients
To test for the significance of individual coefficients, we test the following
hypothesis for a particular IV:
H0: βk = 0.
Check the p-value < 0.05.
Parameter significance test using t-test.
E.g.; for IV Brand Equity: β=1.46, SE: 0.06, P <2.2e-16 < 0.05.
we reject H0 and we conclude that Brand Equity is of predictive value.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper margot0408. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €6,00. Je zit daarna nergens aan vast.