Course grade: 9.0. Extensive summary for the course Analysis of Customer Data. The summary contains the content of all lectures and tutorials, including additional notes and explanations. The course is taught by dr. B. Cule and dr. G. Napoles, as part of the MSc Data Science & Society at Tilburg Un...
Analysis of Customer Data
MSc Data Science & Society
Tilburg University
1
,Module 1. Introduction and Frequent Pattern Mining
1. Introduction
“People who buy diapers also buy beer”. How did we get there? How far did we go from there?
Types of Customer Data
- Purchase data
- Click logs: refer to the records of user interactions on a website, where each click or action
taken by a person during their visit is documented.
- Trajectories (GPS data)
- Opinions (ratings, reviews)
- Bank transactions
- Demographic data
Companies can capitalize on customer data in various ways, directly or indirectly. One approach
involves the collection of customer data which is then sold to other companies. Alternatively,
companies can indirectly benefit by tracking customer data to enhance their website's
performance and consequently boost sales. For instance, analyzing customer patterns through
click logs can aid in optimizing profits and improving overall services.
What Can We Do with Customer Data?
- Classification (supervised): e.g., should the bank give a loan to a particular customer
based on historical data on similar customers?
- Clustering (unsupervised): e.g., find sub-groups of customers to organize targeted
marketing campaigns.
- Recommender Systems: e.g., what product might the customer want to buy next?
- Next Event Prediction: e.g., pre-fetching web pages in expectations of a click. This entails
analyzing sequential data like web-clicks to forecast whether a customer will make a
purchase or not. Various methods can be employed; certain websites even customize
prices based on insights derived from customer behavior patterns.
The Focus of this Course: Pattern Mining (Building Block of Other Applications)
Patterns are interesting:
- People who buy diapers also buy beer (increase profit, complementary sales)
- People who like “Lord of the Rings” also like “Harry Potter” (recommender system)
- People read domestic news before international news
Patterns are also useful in many other applications:
- Classify/cluster customers based on common patterns in their data
- Recommend items to customers based on patterns in their purchase behavior (and
patterns in similar customers’ purchase behavior)
- Find anomalies in bank transactions (potential fraud): This process differs slightly from
classification. It entails classifying clusters of behaviors categorized as "normal" (no fraud)
versus "abnormal" (fraud) based on patterns. Since instances of "abnormal" behavior are
less frequent than "normal" behavior, the challenge lies in distinguishing anomalies.
However, the learning from patterns in "normal" behavior enables us to recognize these
2
, patterns, thus facilitating their classification as "normal" behavior. If any of the patterns
do not occur in “abnormal behavior”, we could flag these instances as “abnormal”.
- Place beer close to diapers in supermarket shelves.
2. Frequent Itemsets & Association Rules
Association Rule Mining
- Agrawal et al. introduced the model in 1993, which has become a significant focus of
study in the database and data mining community.
- The model is designed for data mining and operates on categorical data only, lacking a
suitable algorithm for numerical data. Note that the products are items, never numbers.
- Its initial application was in Market Basket Analysis, seeking relationships between items
purchased by customers.
- For instance, a rule like {Bread} → {Milk} [sup = 5%, conf = 100%] indicates that 5% of
transactions contain both bread and milk, and whenever bread is purchased, milk is also
bought with 100% certainty. Note that this is a one-way relationship. This is not the same
as saying “people who buy milk, also buy bread”.
The Model: Data
- 𝐼 = {𝑖% , 𝑖' , … , 𝑖) } represents a set of items. All possible items that we encounter in the
dataset.
- A transaction 𝒕 refers to a set of items, where 𝑡 is a subset of the set 𝐼 (𝑡 ⊆ 𝐼).
- The transaction database 𝑻 consists of a collection of transactions 𝑇 = {𝑡% , 𝑡' , . . , 𝑡1 }.
Transaction Data: Supermarket Data
- Market basket transactions are represented as 𝑡% , 𝑡' , ..., 𝑡1 , where each transaction
corresponds to a basket with a collection of items purchased.
o 𝑡% : {bread, cheese, milk}
o 𝑡' : {apple, eggs, salt, yoghurt}
o ...
o 𝑡1 : {biscuits, eggs, milk}
- Concepts:
o An item refers to an individual product or article found in a basket, such as bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o 𝑰 represents the set of all items available for sale in the store, including bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o A transaction refers to the items purchased in a basket; it may have a transaction
ID (TID)
o A transactional dataset is a set of all the transactions recorded, representing the
collective data of items purchased by customers.
Transaction Data: A set of Documents
In the context of market basket transactions, the data consists of a set of documents, where each
document represents a "bag" of keywords or items. Typically, we would remove stop-words, such
as ‘the’, ‘a’, etc. Examples:
- doc1: {Student, Teach, School}
- doc2: {Student, School}
- doc3: {Teach, School, City, Game}
The Model: Rules
The model used for mining association rules is based on the concept of "itemsets" and
"association rules":
- A transaction t contains X, an itemset I, if 𝑋 ⊆ 𝑡.
o {Coach, Game} is an itemset that appears in document 6.
- Association Rule: An association rule is an implication of the form X ⇒ Y, where X and Y
are subsets of the set of items (I), and X and Y have no intersection (items in common).
o 𝑋 ⇒ 𝑌, where 𝑋, 𝑌 ⊂ 𝐼, and 𝑋 ∩ 𝑌 = ∅
- Itemset: Again, an itemset is a set of items. For example, {milk, bread, cereal} is an itemset,
and a single item like {cheese} is an itemset of size 1.
- k-Itemset: A k-itemset is an itemset with k items. For instance, {milk, bread, cereal} is a
3-itemset.
Rule Strength Measures
The strength of association rules is measured using two metrics:
1. Support: Support measures the percentage of transactions containing both X and Y. It can
be expressed as the probability of X and Y occurring together (X ∪ Y). For instance, a rule
with sup = 0.5 means that 50% of transactions contain both X and Y.
o sup = Pr(X ∪ Y)
2. Confidence: Confidence indicates the percentage of transactions containing X that also
contain Y. It represents the conditional probability of Y given X (conf = Pr(Y | X). A rule
with conf = 0.8 means that 80% of transactions containing X also contain Y. In other
words, it is the probability of Y in transactions that already contain X.
o conf = Pr(Y | X)
Support and Confidence
Support count refers to the number of occurrences of an itemset X in a dataset T. In other words,
it counts how many transactions in the dataset contain the specific itemset X. Assuming the
dataset T contains n transactions, the support of the itemset X, denoted as "Support," is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the total
number of transactions (n).
On the other hand, confidence measures the strength of an association rule X ⇒ Y. It is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the count of
transactions containing only the itemset X (X.count). This ratio represents the likelihood that
when X occurs, Y will also occur in a transaction.
Alle Vorteile der Zusammenfassungen von Stuvia auf einen Blick:
Garantiert gute Qualität durch Reviews
Stuvia Verkäufer haben mehr als 700.000 Zusammenfassungen beurteilt. Deshalb weißt du dass du das beste Dokument kaufst.
Schnell und einfach kaufen
Man bezahlt schnell und einfach mit iDeal, Kreditkarte oder Stuvia-Kredit für die Zusammenfassungen. Man braucht keine Mitgliedschaft.
Konzentration auf den Kern der Sache
Deine Mitstudenten schreiben die Zusammenfassungen. Deshalb enthalten die Zusammenfassungen immer aktuelle, zuverlässige und up-to-date Informationen. Damit kommst du schnell zum Kern der Sache.
Häufig gestellte Fragen
Was bekomme ich, wenn ich dieses Dokument kaufe?
Du erhältst eine PDF-Datei, die sofort nach dem Kauf verfügbar ist. Das gekaufte Dokument ist jederzeit, überall und unbegrenzt über dein Profil zugänglich.
Zufriedenheitsgarantie: Wie funktioniert das?
Unsere Zufriedenheitsgarantie sorgt dafür, dass du immer eine Lernunterlage findest, die zu dir passt. Du füllst ein Formular aus und unser Kundendienstteam kümmert sich um den Rest.
Wem kaufe ich diese Zusammenfassung ab?
Stuvia ist ein Marktplatz, du kaufst dieses Dokument also nicht von uns, sondern vom Verkäufer tiu43862142. Stuvia erleichtert die Zahlung an den Verkäufer.
Werde ich an ein Abonnement gebunden sein?
Nein, du kaufst diese Zusammenfassung nur für 7,49 €. Du bist nach deinem Kauf an nichts gebunden.