100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Analysis of Customer Data €6,29   In winkelwagen

Samenvatting

Summary Analysis of Customer Data

 82 keer bekeken  2 keer verkocht

Summary/lecture notes of the course Analysis of Customer Data.

Voorbeeld 4 van de 35  pagina's

  • 29 maart 2023
  • 35
  • 2022/2023
  • Samenvatting
Alle documenten voor dit vak (1)
avatar-seller
sophiedekkers54
Summary analysis of customer data
Association rule mining  tries to find correlations between different items or itemset/sets
of items in a database.
- A rule-based machine learning method for discovering interesting relations between
variables in large databases.
- Initially used for market basket analysis to find how items purchased by customers
are related.
- Assumes all data are categorical. There is no good algorithm for numerical data.

Terminology
 Set of items  I = {i1, i2 …, im}
 Supermarket example: a list of all articles sold in a supermarket.
 Transaction  t (smaller set of items)
 t  I, t is a subset of the entire set of items I.
 the articles that someone buys in a supermarket.
 Transaction database  T = {t1, t2, …, tn}
 Collection of transactions.
 Example transaction database:




Another example: you could analyze a set of documents to see what kind of words, concepts,
or tags co-occur often. At that point, each document could be correlated to a list of
keywords.




- The keywords are items.
- Each document is a transaction.
- All the documents put together make up a transaction dataset.
- All the words that appear in the entire transaction database is I, the set of all possible
items.

Association rules
Association rules are "if-then" statements, that help to show the probability of relationships
between data items, within large data sets in various types of databases.
An association rule is an implication of the form:
- X  Y, where X, Y  I, and X  Y = 
A single item is an itemset of size 1.
A k-itemset is an itemset with k items.

, - E.g., {milk, bread, cereal} is a 3-itemset.

How do you determine which patterns are good/interesting or not interesting?
 Support measure  tells us how often a particular item occurs in our database or
how often a particular association rule holds in our dataset.
 Sup = Pr(X Y)
 It measures the probability of encountering an association rule or an itemset in
the dataset.
 Confidence measure  looks at the conditional probability. X implies Y but that
doesn’t mean that Y also implies X. it’s a one-directional association rule.
 Conf = Pr(Y | X)
 The confidence tells us what proportion of transactions that contain X also
contain Y.
When you’ve computed the support and confidence of a particular association rule, you can
tell how frequently it occurs and you can tell how confident you are in that rule. So, an
association rule tells us that when X occurs, Y occurs with a certain probability.

Support count = number of transactions in T that contain X.
- T = transaction dataset and X = itemset




n = the number of transactions in T.
 Support and support count are often used as synonyms.
 Absolute support = support count (number of transactions that contain the itemset)
 Whole number
 Relative support = support (proportion of transactions that contain the itemset)
 Real number between 0 and 1

Goal = to find all rules that satisfy the user-specified minimum support (minsup) and
minimum confidence (minconf).
- Confidence measures how strong the rule is, support measures how reliable it is.

Example:
Transaction data:




Minsup = 30%
Minconf = 80%

An example frequent itemset:
- {Chicken, Clothes, Milk} [sup = 3/7]
Association rules form the itemset:

, - {Clothes}  {Milk, Chicken} [sup = 3/7, conf = 3/3] confident
- {Milk, Chicken}  {Clothes} [sup = 3/7, conf = 3/4] not confident

Mining algorithms
There are many. They all use different strategies and data structures.
- Any algorithm should find the same set of rules although their computational
efficiencies and memory requirements may be different.
We’ll study only one:

The apriori algorithm
Has two steps:
1. Find all item sets that satisfy the minimum support threshold (frequent item sets).
2. Use frequent item sets to generate rules.

Step 1: mining all frequent itemsets
Frequent itemset = itemset whose support  minsup.
The apriori property (the key idea of the apriori algorithm): all subsets of a frequent itemset
are also frequent itemsets.
Iterative algo (also called level-wise search):
- Find all 1-item frequent itemsets, then all 2-item itemsets, etc.
- In each iteration k, only consider itemsets that contain some k-1 frequent itemset.
Example:




- Minsup = 0.5
You can see that when looking at the itemsets from a specific length, you only keep the ones
that comply to the minsup value. That is why in the F1 step itemset {4} is dropped.

The items are sorted in lexicographical order.
- Also called dictionary order.
 For example, the permutations of {1,2,3} in lexicographic order are 123, 132, 213,
231, 312, and 321.




Apriori candidate generation

, Join step: generate all possible candidate itemsets Ck of length k.
Prune step: remove those candidates in Ck that cannot be frequent.
Simpler explanation:
1. Generate a list of all candidate itemsets of length-1  each item in the dataset is a
candidate itemset.
2. Prune infrequent 1-itemsets  any candidate 1-itemset that does not meet the
minsup threshold is discarded.
3. Generate a list of candidate 2-itemsets  the algorithm generates itemsets of length-
2 by combining frequent 1-itemsets.
4. Prune infrequent 2-itemsets  any candidate 2-itemset that does not meet the
minsup threshold is discarded.

Step 2: generating rules from frequent itemsets
One more step is needed to generate association rules.
The algorithm generates association rules form the frequent itemsets, where each rule
consists of an antecedent (a set of items) and a consequent (a set of items). The confidence
of each rule is calculated as the support of the combined antecedent and consequent divided
by the support of the antecedent alone.
- A  B is an association rule if: confidence (A  B) > minconf
(support(A  B) = support (AB) = support(X),
Confidence(A  B) = support (AB) / support(A))
Generating rules example:
Suppose {2, 3, 4} is frequent, with sup=50%
- Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup = 50%, 50%, 75%
75%, 75%, 75%, 75% respectively
These generate the following association rules:




The apriori algorithm is a very computational expensive algorithm.
Clearly the space of all association rules is exponential, O(3 m), where m is the number of
items in I. The mining exploits sparseness of data, and high minimum support and high
minimum confidence values. Still, it typically produces a huge number of rules, thousands,
tens of thousands, millions, …

Itemsets  are unordered collections of items, events, etc.
Sequential patterns  items, elements, events, etc. are ordered.



Transactional database vs. sequential database

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper sophiedekkers54. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,29. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 79223 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,29  2x  verkocht
  • (0)
  Kopen