Resume

Summary Analysis of Customer Data (880655-M-6)

Note

Vendu

Pages

Publié le

03-10-2023

Écrit en

2023/2024

Course grade: 9.0. Extensive summary for the course Analysis of Customer Data. The summary contains the content of all lectures and tutorials, including additional notes and explanations. The course is taught by dr. B. Cule and dr. G. Napoles, as part of the MSc Data Science & Society at Tilburg University.

Montrer plus Lire moins

Établissement

Cours

Aperçu du contenu

Analysis of Customer Data
MSc Data Science & Society
Tilburg University

1

,Module 1. Introduction and Frequent Pattern Mining

1. Introduction
“People who buy diapers also buy beer”. How did we get there? How far did we go from there?

Types of Customer Data
- Purchase data
- Click logs: refer to the records of user interactions on a website, where each click or action
taken by a person during their visit is documented.
- Trajectories (GPS data)
- Opinions (ratings, reviews)
- Bank transactions
- Demographic data

Companies can capitalize on customer data in various ways, directly or indirectly. One approach
involves the collection of customer data which is then sold to other companies. Alternatively,
companies can indirectly benefit by tracking customer data to enhance their website's
performance and consequently boost sales. For instance, analyzing customer patterns through
click logs can aid in optimizing profits and improving overall services.

What Can We Do with Customer Data?
- Classification (supervised): e.g., should the bank give a loan to a particular customer
based on historical data on similar customers?
- Clustering (unsupervised): e.g., find sub-groups of customers to organize targeted
marketing campaigns.
- Recommender Systems: e.g., what product might the customer want to buy next?
- Next Event Prediction: e.g., pre-fetching web pages in expectations of a click. This entails
analyzing sequential data like web-clicks to forecast whether a customer will make a
purchase or not. Various methods can be employed; certain websites even customize
prices based on insights derived from customer behavior patterns.

The Focus of this Course: Pattern Mining (Building Block of Other Applications)
Patterns are interesting:
- People who buy diapers also buy beer (increase profit, complementary sales)
- People who like “Lord of the Rings” also like “Harry Potter” (recommender system)
- People read domestic news before international news

Patterns are also useful in many other applications:
- Classify/cluster customers based on common patterns in their data
- Recommend items to customers based on patterns in their purchase behavior (and
patterns in similar customers’ purchase behavior)
- Find anomalies in bank transactions (potential fraud): This process differs slightly from
classification. It entails classifying clusters of behaviors categorized as "normal" (no fraud)
versus "abnormal" (fraud) based on patterns. Since instances of "abnormal" behavior are
less frequent than "normal" behavior, the challenge lies in distinguishing anomalies.
However, the learning from patterns in "normal" behavior enables us to recognize these

2

, patterns, thus facilitating their classification as "normal" behavior. If any of the patterns
do not occur in “abnormal behavior”, we could flag these instances as “abnormal”.
- Place beer close to diapers in supermarket shelves.

2. Frequent Itemsets & Association Rules
Association Rule Mining
- Agrawal et al. introduced the model in 1993, which has become a significant focus of
study in the database and data mining community.
- The model is designed for data mining and operates on categorical data only, lacking a
suitable algorithm for numerical data. Note that the products are items, never numbers.
- Its initial application was in Market Basket Analysis, seeking relationships between items
purchased by customers.
- For instance, a rule like {Bread} → {Milk} [sup = 5%, conf = 100%] indicates that 5% of
transactions contain both bread and milk, and whenever bread is purchased, milk is also
bought with 100% certainty. Note that this is a one-way relationship. This is not the same
as saying “people who buy milk, also buy bread”.

The Model: Data
- 𝐼 = {𝑖% , 𝑖' , … , 𝑖) } represents a set of items. All possible items that we encounter in the
dataset.
- A transaction 𝒕 refers to a set of items, where 𝑡 is a subset of the set 𝐼 (𝑡 ⊆ 𝐼).
- The transaction database 𝑻 consists of a collection of transactions 𝑇 = {𝑡% , 𝑡' , . . , 𝑡1 }.

Transaction Data: Supermarket Data
- Market basket transactions are represented as 𝑡% , 𝑡' , ..., 𝑡1 , where each transaction
corresponds to a basket with a collection of items purchased.
o 𝑡% : {bread, cheese, milk}
o 𝑡' : {apple, eggs, salt, yoghurt}
o ...
o 𝑡1 : {biscuits, eggs, milk}
- Concepts:
o An item refers to an individual product or article found in a basket, such as bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o 𝑰 represents the set of all items available for sale in the store, including bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o A transaction refers to the items purchased in a basket; it may have a transaction
ID (TID)
o A transactional dataset is a set of all the transactions recorded, representing the
collective data of items purchased by customers.

Transaction Data: A set of Documents
In the context of market basket transactions, the data consists of a set of documents, where each
document represents a "bag" of keywords or items. Typically, we would remove stop-words, such
as ‘the’, ‘a’, etc. Examples:
- doc1: {Student, Teach, School}
- doc2: {Student, School}
- doc3: {Teach, School, City, Game}

3

, - doc4: {Baseball, Basketball}
- doc5: {Basketball, Player, Spectator}
- doc6: {Baseball, Coach, Game, Team}
- doc7: {Basketball, Team, City, Game}

The Model: Rules
The model used for mining association rules is based on the concept of "itemsets" and
"association rules":
- A transaction t contains X, an itemset I, if 𝑋 ⊆ 𝑡.
o {Coach, Game} is an itemset that appears in document 6.
- Association Rule: An association rule is an implication of the form X ⇒ Y, where X and Y
are subsets of the set of items (I), and X and Y have no intersection (items in common).
o 𝑋 ⇒ 𝑌, where 𝑋, 𝑌 ⊂ 𝐼, and 𝑋 ∩ 𝑌 = ∅
- Itemset: Again, an itemset is a set of items. For example, {milk, bread, cereal} is an itemset,
and a single item like {cheese} is an itemset of size 1.
- k-Itemset: A k-itemset is an itemset with k items. For instance, {milk, bread, cereal} is a
3-itemset.

Rule Strength Measures
The strength of association rules is measured using two metrics:
1. Support: Support measures the percentage of transactions containing both X and Y. It can
be expressed as the probability of X and Y occurring together (X ∪ Y). For instance, a rule
with sup = 0.5 means that 50% of transactions contain both X and Y.
o sup = Pr(X ∪ Y)
2. Confidence: Confidence indicates the percentage of transactions containing X that also
contain Y. It represents the conditional probability of Y given X (conf = Pr(Y | X). A rule
with conf = 0.8 means that 80% of transactions containing X also contain Y. In other
words, it is the probability of Y in transactions that already contain X.
o conf = Pr(Y | X)

Support and Confidence
Support count refers to the number of occurrences of an itemset X in a dataset T. In other words,
it counts how many transactions in the dataset contain the specific itemset X. Assuming the
dataset T contains n transactions, the support of the itemset X, denoted as "Support," is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the total
number of transactions (n).

On the other hand, confidence measures the strength of an association rule X ⇒ Y. It is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the count of
transactions containing only the itemset X (X.count). This ratio represents the likelihood that
when X occurs, Y will also occur in a transaction.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 (𝑋 ∪ 𝑌) (𝑋 ∪ 𝑌). 𝑐𝑜𝑢𝑛𝑡
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 = =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑇 𝑛

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 (𝑋 ∪ 𝑌) (𝑋 ∪ 𝑌). 𝑐𝑜𝑢𝑛𝑡
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋 𝑋. 𝑐𝑜𝑢𝑛𝑡

4

Signaler une violation de copyright

École, étude et sujet

Établissement: Tilburg University (UVT)
Cours: Data Science & Society
Cours: Analysis for Customer Data (880655M6)

Tous les documents sur ce sujet (1)

Infos sur le Document

Publié le: 3 octobre 2023
Fichier mis à jour le: 19 janvier 2024
Nombre de pages: 65
Écrit en: 2023/2024
Type: RESUME

Sujets

data science
pattern mining
sequential pattern mining
episode mining
association rules
apriori algorithm
data science society
analysis
customer data
cust
tilburg university
analysis for customer data

€7,99

Accéder à l'intégralité du document:

Garantie de satisfaction à 100%

Disponible immédiatement après paiement

En ligne et en PDF

Tu n'es attaché à rien

Faites connaissance avec le vendeur

tiu43862142

4,2

(20)

Faites connaissance avec le vendeur

tiu43862142 Tilburg University

Voir profil

Vendu

470

Membre depuis

7 année

Nombre de followers

269

Documents

Dernière vente

3 jours de cela

4,2

20 revues

Documents populaires

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur tiu43862142. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour €7,99. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis) 52759 résumés ont été vendus ces 30 derniers jours Fondée en 2010, la référence pour acheter des résumés depuis déjà 16 ans

Summary Analysis of Customer Data (880655-M-6)

Aperçu du contenu

École, étude et sujet

Infos sur le Document

Sujets

Plus de cours sur Tilburg University (UVT) > Data Science & Society

Faites connaissance avec le vendeur

Documents populaires

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Le document ne convient pas ? Choisis un autre document

Paye comme tu veux, apprends aussitôt

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Garantie de remboursement : comment ça marche ?

Auprès de qui est-ce que j'achète ce résumé ?

Est-ce que j'aurai un abonnement?

Peut-on faire confiance à Stuvia ?