Marketing Models
Chapter 1 Introduction to marketing models
Why go quant?
As we consider any proposed strategy or tactic, it is helpful to have information that suggests
whether or not it’s a good idea to engage in that action.
Strategic choices are more persuasive when they are ‘fact-based’, that is based on
information and evidence. The strategies are more believable when communicated, and
therefore it is easier to get people on board, from employees to investors to customers.
Many business and marketing questions may be answered by statistics, numbers that
characterize the state of some market or the preferences of some customer segment.
What is a model?
A model is a simplified representation of the world built to help us understand the world and
make predictions about it.
In developing models, we might begin at conceptual level, but we will strive to put good
measures and data in the boxes to help us understand the relationships among them.
We might pose questions about the real world in terms of differences between customer
segments, which would suggest models to compare those groups.
We fit a little regression model to predict likely numbers of purchases of our brand as a
function of just two things: customers’ stated preferences and their past brand purchases.
We know the real world is far more complicated – there are contextual factors (state of the
economy, customer confidence), competitive actions (features or price promotions of
competitor’s brands or the retailer’s own), customer personality idiosyncrasies (biases about
a brand’s country of origin).
The question about whether a model is useful is often best answered in comparison to
another, competing model.
Sometimes we can watch and see whether a scenario unfolds and then we can evaluate the
goodness of the model.
It’s hip to B^2
We use data – empirical evidence – to build models.
Chapter 2 Segmentation and cluster analysis
Introduction
A market must be segmented before we can choose which segment(s) to target and how to
position our market offering.
If we can satisfy one customer in the segment, we stand a good chance of satisfying most of
the customers in that segment. We look for segments to be different from group to group.
A cluster analysis algorithm will take the input variables we feed it, compute a measure of
similarity between the entities, and group together the entities that are most similar, keeping
those that are more different in different clusters.
Input variables
The variables that marketers use can be indicators than are geographic, demographic,
behavioural, and attitudinal.
, If the customers being segmented are businesses, the variables can similarly be geographical,
behavioural, and attitudinal.
Look around and see what data you already have in-house, but be sure not to limit your
cluster analysis to these variables – not all of them should go in, and there may be far more
interesting and important variables that you should include that you simply have no data on,
yet.
We might compile the following database:
o From internal sales data and CRM database, the three behavioural measures (how
recently bought, how much they did spent, how many times bought) and combine
this with a demographic variable.
o Supplement our data with free secondary data online (median household income for
zip-code).
o Our interest was in our customers’ media habits, so that we might make smarter use
of our advertising budget. Send a small sample of our customers a survey.
o We do a little ‘pre-processing’, beginning with checking the simple descriptive
statistics on each variable before tossing it into the mix. The useful variable have to
exhibit some amount of variance.
Before contemplating the cluster analysis, most marketing analysts run a factor analysis to
look for those redundancies.
In cluster analysis, we look for similar groups of customers to form segments, whereas in
factor analysis, we’ll look for groups of highly correlated variables that co-vary together as if
they were driven by a single factor.
o If the 5 recency variables captured the same information, they would all reflect one
factor. We could then go forward with an aggregate of the 5, or one of the 5
variables that we believe is most representative.
Measures of similarity
Correlations range from 1 (2 customers have identical patterns) to -1 (2 customers have the
exact opposite patterns).
A cluster analysis begins by looking for the customers who are highly correlated. The
model puts those customers together into a group to reflect their similarity. The
clustering algorithm brings in customers whose data were a little less similar.
Yet while correlations reflect relative patterns, they don’t reflect mean differences.
For some purposes, the patterns are the most important information, such as if we were
segmenting consumers by what kinds of books they read.
When volume matters, we don’t want to subtract out the means (this is done by correlation
coefficient). We’d compute a Euclidean (distance) between each pair of customers.
The distance captures the elevation information (means) as well as the (relative) pattern
information in calculating the similarities and differences between the two customers’
purchases.
In practice, correlations used the most, then distances.
Clustering algorithms
Hierarchical clusters
One class of algorithms produce ‘hierarchical clusters’, which means that once two
customers are put into the same segment, they are always together.
o The clusters formed at one stage in the model are carried forward.
, In agglomerative techniques, every customer starts in his or her own segment, and with
each iteration, the model puts together customers who are similar, either by forming a new
cluster with two similar customers, or by adding a customer to an already existing cluster
because he or she seems to be like the customers in that segment. You end when all
customers are in the same segment.
Divisive techniques is that all customers begin in one segment, and each iteration breaks off
the customers, or cluster segment that is the most different and should probably be in its
own group. In the end everyone has its own cluster.
Everyone in the same cluster or everyone in a separate cluster are trivial solutions.
Single-link clustering is the name of the model or algorithm that puts a customer into a
segment if he or she is similar enough to at least one member in the existing cluster.
Complete-link clustering is that a customer joins a cluster only if he or she is similar to all the
other members.
Average-link clustering looks at the averages. Cluster centroid is just a fancy term for a
multivariate mean, or a mean along more than one dimension.
To correct for the differences in market shares, we divided the frequency, fij, by the row and
column sums, fi and fj respectively.
But the numbers are similarity – like in that larger numbers mean the pairs of vehicles are
more frequently co-owned and thus should be clustered together.
We examine the two columns (or rows) for sedans and luxury cars and merge the columns as
representing the pair.
A dendogram is a tree-like structure that shows the steps during which each car class
combined with the other.
o The numbers inserted at each step, 0.144, 0.110 are called fusion coefficients.
o Begin by covering most of the figure so that only the labels at the top and their initial
dots are showing. Then move the paper slowly down so that each linking or
clustering is revealed, step-by-step.
o If we stopped at the point that sedan and luxury cars together, and each of the
remaining vehicles represents distinct segments, then we have 7 clusters.
o We can dismiss the beginning point and end point as trivial solutions, and perhaps
In complete-link, we take the minimum similarity index to represent the pair. By single-link,
you have to take the maximum similarity.
Average-link begins by comparing sedans and luxury vehicles and the updating is done by
simple averages – just take the two previous indices, average them and continue.
Ward’s method is a hierarchical clustering technique. It operationalizes the intuition that if
segments or clusters are indeed groups of similar customers, then the variability within a
group should be smaller than the variability across the groups.
o In a standard regression, R^2 is a measure of fit that tells us the amount of the total
variance that is explained by the regression model in proportion to the total
variance.
o R^2 = (SStotal – Sserror)/ SStotal. The total sum of squares measures how close
every customer is to the overall means on all the variables used in the analyses.
o The error sum of squares is computed by comparing each individual customer to the
means of only the other observations in the customer’s cluster. SSerror is small
when the individual customers are fairly close to their cluster means, implying that
there is similarity within the cluster.
o When the SSerror is small, it tells us that the data are close tl their cluster means,
implying that we have a cluster of similar units, and that’s when R^2 is maximized.