Marketing Analytics for Big Data
Lecture 1
Observing behaviour
Digital traces (or exhaust): a record created and stored of some behaviour
Click on a website
Call or location on a phone
Buy with a credit card
Like or share
Watch
Internet of Things (sensor data)
Image
Text
Data and two notions of big
Data can be big in different ways
Big data
Early definition: too large to be uploaded into one machine, “distributed-data big”, domain of
computer engineering
More our focus: “Big Data is a shorthand label that typically means applying the tools of Artificial
Intelligence, like machine learning, to vast new troves of data beyond that captured in standard
databases. The new data sources include web-browsing data trails, social network communications,
sensor data and surveillance data.”
“The term itself is vague, but it is getting at something that is real”
,Primary vs. secondary data
Primary data (custommade): data collected to answer a specific research question
Secondary data (readymade): data collected for non-research purpose (e.g., generating profits,
administering laws)
Bit by bit: big data is a form of secondary data
While most big data applications are secondary or “readymade”, not all are (disagree with reading)
Big data for personalization
“Recommendation algorithms are at the core of the Netflix product. They provide our members with
personalized suggestions to reduce the amount of time and frustration to find something great
content to watch
Big data for boosting engagement
“At any given point in time, there isn’t just one version of Facebook running, there are probably
10,000”- Mark Zuckerburg
Big data for new product development
Tastewise: We monitor social chatter, nearly 100% of online recipes and the country’s most
influential restaurants & menus in order to understand how food is prepared, loved and shared
Big data for reducing churn
Customer churn: customer quits some service
Use past data to estimate model that predicts churn
o Length of time being customer (tenure), number of other services subscribed to,
demographics
Use that model to predict probability of churn on current customers
Intervene on those most likely to churn
,Big data for public policy & economy
“Now the mobile phone has become a primary source of public data intelligence”
Are people going back to work?
Are people going back to restaurants?
Google mobility data
These reports are created with aggregated, anonymized sets of data from users who have turned on
the location history setting, which is off by default.
Using big data to learn about things
Advantage Disadvantage
1. Big 4. Incomplete
An advantage when : “Big data records what happened, but not why”
the event is rare or small E.g.: predicting churn = predicting customers quit
When there is heterogeneity in responses Attracting new customers costs 5-6x more than
(=customers respond differently to the same retaining existing. Solution: use big data to predict
thing) churn, intervene on those likely to churn before they
The relationship is complex churn.
Big data records usage, charges, channel of
acquisition. These may predict churn: customers
who use the service a lot are less likely to churn, but
this isn’t a cause
2. Always on 5. Inaccessible
Collecting data in real-time is important when we From outside the organization: legal, business or ethical
need to know and respond to the answers quickly. barriers given to outside researchers access to data
Traditional data (e.g. from surveys) takes longer to From inside: databases are not integrated, lacking
process variables to match, different coding schemes.
Want to know all touchpoints customer had interacted
with (display, email, web, social)
3. Nonreactive 6. Non-representative
If the sample is representative, you can make inferences
about the population based on your sample
Statistics (characteristics of the sample) estimate
, People usually change their behaviour when they parameters (characteristics of the population)
know they are being observed. With big data
users are typically not aware that they are being Consumers say that reviews are important but.. we know
that they are not always valid: how representative is
GDPR online opinion?
Under the GDPR, cookies that are not strictly
necessary for the basic function of your
website must only be activated after your
end-users have given their explicit consent to
the specific purpose of their operation and
collection of personal data
Conclusion
A lot depends on what research
question you are trying to answer with
big data 7. Drifting
“If you want to measure change, don’t change the
Using OpenTable reservation data to measure”
ask questions about Often, we’re interested in analysing something over time.
o Whether more consumers are Big data sources can suffer from problems because:
The users can change
going out to eat now that How they use it can change
pandemic restrictions have The platform itself changes
been lifted? 8. Algorithmically cofounded
Unrepresentative How the platform is designed can influence behaviour,
introducing bias or noise into what you’re trying to study
population: not all FB encourages users to have at least 20 friends
eaters use OpenTable But on average how many friends do people
o Whether more OpenTable users have?
Also encourages you to become friends with
are going out to eat now that friends of friends
pandemic restrictions have But how often are friends of friends friends?
been lifted? 9. Dirty
Representative: Big data sources can be loaded with junk or spam (bots,
fake reviews, trolls)
OpenTable users
10. Sensitive
Some of the information that companies have is
sensitive
E.g.: Strava fitness app reveals information on
military sites
recorded