Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4,6 TrustPilot
logo-home
Resume

Summary Analysis of Customer Data (880655-M-6)

Note
-
Vendu
32
Pages
65
Publié le
03-10-2023
Écrit en
2023/2024

Course grade: 9.0. Extensive summary for the course Analysis of Customer Data. The summary contains the content of all lectures and tutorials, including additional notes and explanations. The course is taught by dr. B. Cule and dr. G. Napoles, as part of the MSc Data Science & Society at Tilburg University.

Montrer plus Lire moins
Établissement
Cours

Aperçu du contenu

Analysis of Customer Data
MSc Data Science & Society
Tilburg University




1

,Module 1. Introduction and Frequent Pattern Mining

1. Introduction
“People who buy diapers also buy beer”. How did we get there? How far did we go from there?

Types of Customer Data
- Purchase data
- Click logs: refer to the records of user interactions on a website, where each click or action
taken by a person during their visit is documented.
- Trajectories (GPS data)
- Opinions (ratings, reviews)
- Bank transactions
- Demographic data

Companies can capitalize on customer data in various ways, directly or indirectly. One approach
involves the collection of customer data which is then sold to other companies. Alternatively,
companies can indirectly benefit by tracking customer data to enhance their website's
performance and consequently boost sales. For instance, analyzing customer patterns through
click logs can aid in optimizing profits and improving overall services.

What Can We Do with Customer Data?
- Classification (supervised): e.g., should the bank give a loan to a particular customer
based on historical data on similar customers?
- Clustering (unsupervised): e.g., find sub-groups of customers to organize targeted
marketing campaigns.
- Recommender Systems: e.g., what product might the customer want to buy next?
- Next Event Prediction: e.g., pre-fetching web pages in expectations of a click. This entails
analyzing sequential data like web-clicks to forecast whether a customer will make a
purchase or not. Various methods can be employed; certain websites even customize
prices based on insights derived from customer behavior patterns.

The Focus of this Course: Pattern Mining (Building Block of Other Applications)
Patterns are interesting:
- People who buy diapers also buy beer (increase profit, complementary sales)
- People who like “Lord of the Rings” also like “Harry Potter” (recommender system)
- People read domestic news before international news

Patterns are also useful in many other applications:
- Classify/cluster customers based on common patterns in their data
- Recommend items to customers based on patterns in their purchase behavior (and
patterns in similar customers’ purchase behavior)
- Find anomalies in bank transactions (potential fraud): This process differs slightly from
classification. It entails classifying clusters of behaviors categorized as "normal" (no fraud)
versus "abnormal" (fraud) based on patterns. Since instances of "abnormal" behavior are
less frequent than "normal" behavior, the challenge lies in distinguishing anomalies.
However, the learning from patterns in "normal" behavior enables us to recognize these



2

, patterns, thus facilitating their classification as "normal" behavior. If any of the patterns
do not occur in “abnormal behavior”, we could flag these instances as “abnormal”.
- Place beer close to diapers in supermarket shelves.


2. Frequent Itemsets & Association Rules
Association Rule Mining
- Agrawal et al. introduced the model in 1993, which has become a significant focus of
study in the database and data mining community.
- The model is designed for data mining and operates on categorical data only, lacking a
suitable algorithm for numerical data. Note that the products are items, never numbers.
- Its initial application was in Market Basket Analysis, seeking relationships between items
purchased by customers.
- For instance, a rule like {Bread} → {Milk} [sup = 5%, conf = 100%] indicates that 5% of
transactions contain both bread and milk, and whenever bread is purchased, milk is also
bought with 100% certainty. Note that this is a one-way relationship. This is not the same
as saying “people who buy milk, also buy bread”.

The Model: Data
- 𝐼 = {𝑖% , 𝑖' , … , 𝑖) } represents a set of items. All possible items that we encounter in the
dataset.
- A transaction 𝒕 refers to a set of items, where 𝑡 is a subset of the set 𝐼 (𝑡 ⊆ 𝐼).
- The transaction database 𝑻 consists of a collection of transactions 𝑇 = {𝑡% , 𝑡' , . . , 𝑡1 }.

Transaction Data: Supermarket Data
- Market basket transactions are represented as 𝑡% , 𝑡' , ..., 𝑡1 , where each transaction
corresponds to a basket with a collection of items purchased.
o 𝑡% : {bread, cheese, milk}
o 𝑡' : {apple, eggs, salt, yoghurt}
o ...
o 𝑡1 : {biscuits, eggs, milk}
- Concepts:
o An item refers to an individual product or article found in a basket, such as bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o 𝑰 represents the set of all items available for sale in the store, including bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o A transaction refers to the items purchased in a basket; it may have a transaction
ID (TID)
o A transactional dataset is a set of all the transactions recorded, representing the
collective data of items purchased by customers.

Transaction Data: A set of Documents
In the context of market basket transactions, the data consists of a set of documents, where each
document represents a "bag" of keywords or items. Typically, we would remove stop-words, such
as ‘the’, ‘a’, etc. Examples:
- doc1: {Student, Teach, School}
- doc2: {Student, School}
- doc3: {Teach, School, City, Game}



3

, - doc4: {Baseball, Basketball}
- doc5: {Basketball, Player, Spectator}
- doc6: {Baseball, Coach, Game, Team}
- doc7: {Basketball, Team, City, Game}

The Model: Rules
The model used for mining association rules is based on the concept of "itemsets" and
"association rules":
- A transaction t contains X, an itemset I, if 𝑋 ⊆ 𝑡.
o {Coach, Game} is an itemset that appears in document 6.
- Association Rule: An association rule is an implication of the form X ⇒ Y, where X and Y
are subsets of the set of items (I), and X and Y have no intersection (items in common).
o 𝑋 ⇒ 𝑌, where 𝑋, 𝑌 ⊂ 𝐼, and 𝑋 ∩ 𝑌 = ∅
- Itemset: Again, an itemset is a set of items. For example, {milk, bread, cereal} is an itemset,
and a single item like {cheese} is an itemset of size 1.
- k-Itemset: A k-itemset is an itemset with k items. For instance, {milk, bread, cereal} is a
3-itemset.

Rule Strength Measures
The strength of association rules is measured using two metrics:
1. Support: Support measures the percentage of transactions containing both X and Y. It can
be expressed as the probability of X and Y occurring together (X ∪ Y). For instance, a rule
with sup = 0.5 means that 50% of transactions contain both X and Y.
o sup = Pr(X ∪ Y)
2. Confidence: Confidence indicates the percentage of transactions containing X that also
contain Y. It represents the conditional probability of Y given X (conf = Pr(Y | X). A rule
with conf = 0.8 means that 80% of transactions containing X also contain Y. In other
words, it is the probability of Y in transactions that already contain X.
o conf = Pr(Y | X)

Support and Confidence
Support count refers to the number of occurrences of an itemset X in a dataset T. In other words,
it counts how many transactions in the dataset contain the specific itemset X. Assuming the
dataset T contains n transactions, the support of the itemset X, denoted as "Support," is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the total
number of transactions (n).

On the other hand, confidence measures the strength of an association rule X ⇒ Y. It is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the count of
transactions containing only the itemset X (X.count). This ratio represents the likelihood that
when X occurs, Y will also occur in a transaction.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 (𝑋 ∪ 𝑌) (𝑋 ∪ 𝑌). 𝑐𝑜𝑢𝑛𝑡
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 = =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑇 𝑛

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 (𝑋 ∪ 𝑌) (𝑋 ∪ 𝑌). 𝑐𝑜𝑢𝑛𝑡
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋 𝑋. 𝑐𝑜𝑢𝑛𝑡




4

École, étude et sujet

Établissement
Cours
Cours

Infos sur le Document

Publié le
3 octobre 2023
Fichier mis à jour le
19 janvier 2024
Nombre de pages
65
Écrit en
2023/2024
Type
RESUME

Sujets

€7,99
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur

Seller avatar
Les scores de réputation sont basés sur le nombre de documents qu'un vendeur a vendus contre paiement ainsi que sur les avis qu'il a reçu pour ces documents. Il y a trois niveaux: Bronze, Argent et Or. Plus la réputation est bonne, plus vous pouvez faire confiance sur la qualité du travail des vendeurs.
tiu43862142 Tilburg University
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
470
Membre depuis
7 année
Nombre de followers
269
Documents
7
Dernière vente
3 jours de cela

4,2

20 revues

5
11
4
5
3
2
2
1
1
1

Documents populaires

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions