100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
ECB3ADAVE2 - Applied Data Analysis and Visualization II - Full Summary €7,99
In winkelwagen

Samenvatting

ECB3ADAVE2 - Applied Data Analysis and Visualization II - Full Summary

17 beoordelingen
 522 keer bekeken  53 keer verkocht

A detailed summary of all the relevant unsupervised learning methods. Based on the book, articles, lecture slides, exercises & assignments and articles and videos I found through Google. Edit: I got told that the hyperlinks in the document don't work. Once you have bought the summary, please...

[Meer zien]
Laatste update van het document: 3 jaar geleden

Voorbeeld 4 van de 49  pagina's

  • 7 november 2021
  • 8 november 2021
  • 49
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (1)

17  beoordelingen

review-writer-avatar

Door: alecblom • 1 maand geleden

very good and detailed summary, only thing that is missing is deep learning week 8.

review-writer-avatar

Door: giovannatullume • 2 maanden geleden

This is a very good summary of the course, but week 2 on linear algebra is missing.

review-writer-avatar

Door: sidersdavids • 10 maanden geleden

review-writer-avatar

Door: rafaelblyth • 1 jaar geleden

review-writer-avatar

Door: alapusneanu • 1 jaar geleden

review-writer-avatar

Door: bartvanlidthdejeude • 2 jaar geleden

review-writer-avatar

Door: JJ41221 • 2 jaar geleden

Bekijk meer beoordelingen  
avatar-seller
lisannelouwerse
Applied Data Analysis and Visualization II
Universiteit Utrecht – ECB3ADAVE2

Written by Lisanne Louwerse


Summary

,Table of content
WEEK 1 ............................................................................................................................................................. 3
SUPERVISED VS. UNSUPERVISED LEARNING.................................................................................................................... 3
ASSOCIATION RULE ANALYSIS ..................................................................................................................................... 3
WEEK 2 ............................................................................................................................................................. 6
WHAT IS CLUSTERING? ............................................................................................................................................. 6
K-MEANS CLUSTERING .............................................................................................................................................. 7
HIERARCHICAL CLUSTERING ..................................................................................................................................... 11
WEEK 3 ........................................................................................................................................................... 13
DIMENSION REDUCTION.......................................................................................................................................... 13
PRINCIPAL COMPONENT ANALYSIS (PCA) ................................................................................................................... 13
WEEK 4 ........................................................................................................................................................... 19
NON-NEGATIVE MATRIX FACTORIZATION (NMF) ......................................................................................................... 19
PROBABILISTIC LATENT SEMANTIC ANALYSIS (PLSA) .................................................................................................... 21
WEEK 5 ........................................................................................................................................................... 24
FACTOR ANALYSIS (FA) ........................................................................................................................................... 24
INDEPENDENT COMPONENT ANALYSIS (ICA) ............................................................................................................... 27
WEEK 6 ........................................................................................................................................................... 30
MULTIDIMENSIONAL SCALING (MDS) ....................................................................................................................... 30
WEEK 7 ........................................................................................................................................................... 33
CONTINGENCY TABLES AND CORRESPONDENCE TABLES .................................................................................................. 33
CORRESPONDENCE ANALYSIS (CA) ........................................................................................................................... 35
KEY TAKEAWAYS ............................................................................................................................................ 43
ASSOCIATION RULE ANALYSIS ................................................................................................................................... 43
CLUSTER ANALYSIS ................................................................................................................................................. 43
PRINCIPAL COMPONENT ANALYSIS ............................................................................................................................ 44
NON-NEGATIVE MATRIX FACTORIZATION ................................................................................................................... 45
PROBABILISTIC LATENT SEMANTIC ANALYSIS ............................................................................................................... 46
FACTOR ANALYSIS ................................................................................................................................................. 46
INDEPENDENT COMPONENT ANALYSIS ....................................................................................................................... 47
MULTIDIMENSIONAL SCALING.................................................................................................................................. 48
CORRESPONDENCE ANALYSIS ................................................................................................................................... 48




2

,Week 1
Key Words
▪ Supervised / unsupervised learning
▪ Antecedent and consequent
▪ Support, confidence and lift
▪ Apriori algorithm and Apriori principle

Supervised vs. unsupervised learning

▪ Supervised learning
Building a statistical model for predicting / estimating an output (y) based on one or
more inputs (x).
o Classification: predict to which category an observation belongs (qualitative
outcomes).
o Regression: predict a quantitative outcome.

▪ Unsupervised learning
Inputs (x) but no outputs (y). Try to learn structure and relationships from data, like …
… discovering associations among variable values → association rule analysis
… discovering unknown subgroups of observations → clustering
… dimension reduction → principal components analysis


Association rule analysis
Goal: to find joint values of the variables x1, …, xp that appear together most frequently in the
data base.
In the case of binary valued data, association rule analysis is called ‘market basket’ analysis.
Transactions are represented in a binary incidence matrix:
1, if the jth item is purchased as part of transaction i.
xij {
0, if the jth item is not purchased as part of transaction i.




This matrix can now be used to find association rules.
An association rule is the implication

A⇒B antecedent ⇒ consequent
In market basket analysis, it can be seen as an if-then statement:
If you buy A, there is a chance that you buy B as well.
3

, Properties of association rules
The support (or prevalence) of association rule A ⇒ B is the relative frequency of the rule.
It’s the probability of simultaneously observing A and B in a randomly selected market basket,
so Pr(A,B).
number of transactions containing A and B
supp(A ⇒ B) =
total number of transactions

Note that this is the support of an association rule. The support of just an item (set) A is defined as:

number of transactions containing A / total number of transactions.




The confidence of association rule A ⇒ B is the conditional probability of B given A, so
Pr(B|A). It is the likelihood of item B being purchased when item A is purchased.
number of transactions containing A and B
conf(A ⇒ B) =
number of transactions containing A


▪ If conf = 1 : B is always purchased when A is purchased.
▪ If conf = 0 : B is never purchases when A is purchased.


Drawback: The confidence for an association rule having a very frequent consequent (B) will
always be high, even if the antecedent (A) is not frequent. Because of this, a rule containing
two items that actually have a weak association may still have a high confidence value.
To overcome this challenge, lift is introduced.


The lift of association rule A ⇒ B calculates the conditional probability of item B given A,
while controlling for the support (frequency) of B.
number of transactions containing A and B / number of transactions containing A
lift(A ⇒ B) =
number of transactions containing B

In other words:
the rise in the probability of having B in the transaction because of the knowledge that A is present
lift(A ⇒ B) = the probability of having B in the transaction without any knowledge about the presence of A



▪ If lift = 1 A and B are independent.
▪ If lift > 1 A and B often occur together.
▪ If lift < 1 A and B are substitutes to each other. The presence of one item has a
negative effect on the presences of the other item.

Lift can be seen as the “strength” of the rule.



4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper lisannelouwerse. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €7,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 53340 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€7,99  53x  verkocht
  • (17)
In winkelwagen
Toegevoegd