Clip 1.1: What are big data?
Observing behavior
Digital traces (or exhaust): a record created and stored on some behavior (for example: click
on a website, a call or location on a phone, buy with credit card, watch, like or share)
Data and two notions of big (meaning)
Data can be big in different ways
Big data
Early definition: too large to be loaded into one machine, “distributed-data big”, domain of
computer engineering
More our focus:
“Big data is a shorthand label that typically means applying the tools of artificial intelligence,
like machine learning, to vast new troves of data beyond that captured in standard
databases the new data sources include Web-browsing data trails, social network
communications, sensor data and surveillance data.”
Primary vs secondary data
- Primary data (custom-made): data collected to answer a specific research question
- Secondary data (readymade): data collected for non-research purpose
,While most big data applications are secondary or “readymade”, not all are.
Clip 1.2: uses of big data
Big data for personalization
“Recommendation algorithms are at the core of the Netflix product. They provide our
members with personalized suggestions to reduce the amount of time and frustration to find
something great content to watch.”
Big data for boosting engagement
“At any given point time, there isn’t just one version of Facebook running, there are
probably 10,000.” – Mark Zuckerberg
Big data for reducing churn
- Customer churn: customer quits some service
- Use past data to estimate model that predicts churn
- Length of time being customer (tenure), number of other services subscribed to,
demographics
- Use that model to predict probability of churn on current customers
- Intervene on those most likely to churn
,Big data for public policy & economy
- “Now the mobile phone has become a primary source of public data intelligence”
- Are people going back to work?
- Are people going back to restaurant?
Clip 1.3: 10 characteristics of big data
Using big data to learn about things
Advantage Disadvantage
1. Big 4. Incomplete
2. Always on 5. Inaccessible
3. nonreactive 6. Non-representative
7. Drifting
8. Algorithmically confounded
9. Dirty
10. Sensitive
Advantages:
1. Big
When is big an advantage? When the event is rare or small:
- The average click through rate on banner ad is 0.35%
- For every 10,000 observations, you have 9,965 no-clicks and 35 clicks
- Suppose you’re running a A/B test:
Average CTR for A = 0.35%, B = 0.40%
0.05% increase could mean a lot of in extra revenue
But what is the confidence interval around 0.05%? How precisely is it estimated?
Also an advantage when there is “heterogeneity” in responses
customers respond differently to the same thing
No heterogeneity (homogeneity) = everyone has the same response
2. Always-on
Collecting data in real-time is important when we need to know and respond to the
answer quickly (for example: economic activity, public health, monitoring
competition, trend spotting)
3. Nonreactive
People usually change their behavior when they know they are being observed.
But with big data users are typically not aware they are being recorded.
, Disadvantages:
4. Incomplete
“Big data record what happened, but not why”
Example:
Predicting churn = predicting customer quit
- Attracting new customers costs 5-6x more than retaining existing
- Solution: use big data to predict churn, intervene on those likely to churn before
they churn
- Big data record usage, charges, channel of acquisition
- These may predict churn: customers who use the service a lot are less likely to
churn
- But this isn’t a cause
5. Inaccessible
- From outside the organization:
Legal, business or ethical barriers to giving outside researchers access to data
- From inside:
Databases are not integrated, lacking variables to match, different coding schemes
Want to know all touchpoints customer has interacted with
6. Non-representative
If your sample is representative, you can make inferences about the population
based on your sample
- Consumers say that reviews are important but...
- We know that they are not always valid
Review Survey
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller irisberkvens. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.06. You're not tied to anything after your purchase.