100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Analysis of Customer Data (880655-M-6) €7,49   In winkelwagen

Samenvatting

Summary Analysis of Customer Data (880655-M-6)

 118 keer bekeken  16 keer verkocht
  • Vak
  • Instelling

Course grade: 9.0. Extensive summary for the course Analysis of Customer Data. The summary contains the content of all lectures and tutorials, including additional notes and explanations. The course is taught by dr. B. Cule and dr. G. Napoles, as part of the MSc Data Science & Society at Tilburg Un...

[Meer zien]
Laatste update van het document: 9 maanden geleden

Voorbeeld 4 van de 65  pagina's

  • 3 oktober 2023
  • 19 januari 2024
  • 65
  • 2023/2024
  • Samenvatting
avatar-seller
Analysis of Customer Data
MSc Data Science & Society
Tilburg University




1

,Module 1. Introduction and Frequent Pattern Mining

1. Introduction
“People who buy diapers also buy beer”. How did we get there? How far did we go from there?

Types of Customer Data
- Purchase data
- Click logs: refer to the records of user interactions on a website, where each click or action
taken by a person during their visit is documented.
- Trajectories (GPS data)
- Opinions (ratings, reviews)
- Bank transactions
- Demographic data

Companies can capitalize on customer data in various ways, directly or indirectly. One approach
involves the collection of customer data which is then sold to other companies. Alternatively,
companies can indirectly benefit by tracking customer data to enhance their website's
performance and consequently boost sales. For instance, analyzing customer patterns through
click logs can aid in optimizing profits and improving overall services.

What Can We Do with Customer Data?
- Classification (supervised): e.g., should the bank give a loan to a particular customer
based on historical data on similar customers?
- Clustering (unsupervised): e.g., find sub-groups of customers to organize targeted
marketing campaigns.
- Recommender Systems: e.g., what product might the customer want to buy next?
- Next Event Prediction: e.g., pre-fetching web pages in expectations of a click. This entails
analyzing sequential data like web-clicks to forecast whether a customer will make a
purchase or not. Various methods can be employed; certain websites even customize
prices based on insights derived from customer behavior patterns.

The Focus of this Course: Pattern Mining (Building Block of Other Applications)
Patterns are interesting:
- People who buy diapers also buy beer (increase profit, complementary sales)
- People who like “Lord of the Rings” also like “Harry Potter” (recommender system)
- People read domestic news before international news

Patterns are also useful in many other applications:
- Classify/cluster customers based on common patterns in their data
- Recommend items to customers based on patterns in their purchase behavior (and
patterns in similar customers’ purchase behavior)
- Find anomalies in bank transactions (potential fraud): This process differs slightly from
classification. It entails classifying clusters of behaviors categorized as "normal" (no fraud)
versus "abnormal" (fraud) based on patterns. Since instances of "abnormal" behavior are
less frequent than "normal" behavior, the challenge lies in distinguishing anomalies.
However, the learning from patterns in "normal" behavior enables us to recognize these



2

, patterns, thus facilitating their classification as "normal" behavior. If any of the patterns
do not occur in “abnormal behavior”, we could flag these instances as “abnormal”.
- Place beer close to diapers in supermarket shelves.


2. Frequent Itemsets & Association Rules
Association Rule Mining
- Agrawal et al. introduced the model in 1993, which has become a significant focus of
study in the database and data mining community.
- The model is designed for data mining and operates on categorical data only, lacking a
suitable algorithm for numerical data. Note that the products are items, never numbers.
- Its initial application was in Market Basket Analysis, seeking relationships between items
purchased by customers.
- For instance, a rule like {Bread} → {Milk} [sup = 5%, conf = 100%] indicates that 5% of
transactions contain both bread and milk, and whenever bread is purchased, milk is also
bought with 100% certainty. Note that this is a one-way relationship. This is not the same
as saying “people who buy milk, also buy bread”.

The Model: Data
- 𝐼 = {𝑖% , 𝑖' , … , 𝑖) } represents a set of items. All possible items that we encounter in the
dataset.
- A transaction 𝒕 refers to a set of items, where 𝑡 is a subset of the set 𝐼 (𝑡 ⊆ 𝐼).
- The transaction database 𝑻 consists of a collection of transactions 𝑇 = {𝑡% , 𝑡' , . . , 𝑡1 }.

Transaction Data: Supermarket Data
- Market basket transactions are represented as 𝑡% , 𝑡' , ..., 𝑡1 , where each transaction
corresponds to a basket with a collection of items purchased.
o 𝑡% : {bread, cheese, milk}
o 𝑡' : {apple, eggs, salt, yoghurt}
o ...
o 𝑡1 : {biscuits, eggs, milk}
- Concepts:
o An item refers to an individual product or article found in a basket, such as bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o 𝑰 represents the set of all items available for sale in the store, including bread,
cheese, milk, apple, eggs, salt, yoghurt, and biscuits.
o A transaction refers to the items purchased in a basket; it may have a transaction
ID (TID)
o A transactional dataset is a set of all the transactions recorded, representing the
collective data of items purchased by customers.

Transaction Data: A set of Documents
In the context of market basket transactions, the data consists of a set of documents, where each
document represents a "bag" of keywords or items. Typically, we would remove stop-words, such
as ‘the’, ‘a’, etc. Examples:
- doc1: {Student, Teach, School}
- doc2: {Student, School}
- doc3: {Teach, School, City, Game}



3

, - doc4: {Baseball, Basketball}
- doc5: {Basketball, Player, Spectator}
- doc6: {Baseball, Coach, Game, Team}
- doc7: {Basketball, Team, City, Game}

The Model: Rules
The model used for mining association rules is based on the concept of "itemsets" and
"association rules":
- A transaction t contains X, an itemset I, if 𝑋 ⊆ 𝑡.
o {Coach, Game} is an itemset that appears in document 6.
- Association Rule: An association rule is an implication of the form X ⇒ Y, where X and Y
are subsets of the set of items (I), and X and Y have no intersection (items in common).
o 𝑋 ⇒ 𝑌, where 𝑋, 𝑌 ⊂ 𝐼, and 𝑋 ∩ 𝑌 = ∅
- Itemset: Again, an itemset is a set of items. For example, {milk, bread, cereal} is an itemset,
and a single item like {cheese} is an itemset of size 1.
- k-Itemset: A k-itemset is an itemset with k items. For instance, {milk, bread, cereal} is a
3-itemset.

Rule Strength Measures
The strength of association rules is measured using two metrics:
1. Support: Support measures the percentage of transactions containing both X and Y. It can
be expressed as the probability of X and Y occurring together (X ∪ Y). For instance, a rule
with sup = 0.5 means that 50% of transactions contain both X and Y.
o sup = Pr(X ∪ Y)
2. Confidence: Confidence indicates the percentage of transactions containing X that also
contain Y. It represents the conditional probability of Y given X (conf = Pr(Y | X). A rule
with conf = 0.8 means that 80% of transactions containing X also contain Y. In other
words, it is the probability of Y in transactions that already contain X.
o conf = Pr(Y | X)

Support and Confidence
Support count refers to the number of occurrences of an itemset X in a dataset T. In other words,
it counts how many transactions in the dataset contain the specific itemset X. Assuming the
dataset T contains n transactions, the support of the itemset X, denoted as "Support," is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the total
number of transactions (n).

On the other hand, confidence measures the strength of an association rule X ⇒ Y. It is calculated
as the ratio of the count of transactions containing the combined itemset (X ∪ Y) to the count of
transactions containing only the itemset X (X.count). This ratio represents the likelihood that
when X occurs, Y will also occur in a transaction.

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 (𝑋 ∪ 𝑌) (𝑋 ∪ 𝑌). 𝑐𝑜𝑢𝑛𝑡
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 = =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝑇 𝑛

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 (𝑋 ∪ 𝑌) (𝑋 ∪ 𝑌). 𝑐𝑜𝑢𝑛𝑡
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 = =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑋 𝑋. 𝑐𝑜𝑢𝑛𝑡




4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper tiu43862142. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 84251 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,49  16x  verkocht
  • (0)
  Kopen