100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
Samenvatting

ECB3ADAVE2 - Applied Data Analysis and Visualization II - Full Summary

Beoordeling
4,8
(17)
Verkocht
56
Pagina's
49
Geüpload op
07-11-2021
Geschreven in
2021/2022

A detailed summary of all the relevant unsupervised learning methods. Based on the book, articles, lecture slides, exercises & assignments and articles and videos I found through Google. Edit: I got told that the hyperlinks in the document don't work. Once you have bought the summary, please send me a message () and I'll send you the pdf with working hyperlinks through :)

Meer zien Lees minder
Instelling
Vak











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
7 november 2021
Bestand laatst geupdate op
8 november 2021
Aantal pagina's
49
Geschreven in
2021/2022
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Applied Data Analysis and Visualization II
Universiteit Utrecht – ECB3ADAVE2

Written by Lisanne Louwerse


Summary

,Table of content
WEEK 1 ............................................................................................................................................................. 3
SUPERVISED VS. UNSUPERVISED LEARNING.................................................................................................................... 3
ASSOCIATION RULE ANALYSIS ..................................................................................................................................... 3
WEEK 2 ............................................................................................................................................................. 6
WHAT IS CLUSTERING? ............................................................................................................................................. 6
K-MEANS CLUSTERING .............................................................................................................................................. 7
HIERARCHICAL CLUSTERING ..................................................................................................................................... 11
WEEK 3 ........................................................................................................................................................... 13
DIMENSION REDUCTION.......................................................................................................................................... 13
PRINCIPAL COMPONENT ANALYSIS (PCA) ................................................................................................................... 13
WEEK 4 ........................................................................................................................................................... 19
NON-NEGATIVE MATRIX FACTORIZATION (NMF) ......................................................................................................... 19
PROBABILISTIC LATENT SEMANTIC ANALYSIS (PLSA) .................................................................................................... 21
WEEK 5 ........................................................................................................................................................... 24
FACTOR ANALYSIS (FA) ........................................................................................................................................... 24
INDEPENDENT COMPONENT ANALYSIS (ICA) ............................................................................................................... 27
WEEK 6 ........................................................................................................................................................... 30
MULTIDIMENSIONAL SCALING (MDS) ....................................................................................................................... 30
WEEK 7 ........................................................................................................................................................... 33
CONTINGENCY TABLES AND CORRESPONDENCE TABLES .................................................................................................. 33
CORRESPONDENCE ANALYSIS (CA) ........................................................................................................................... 35
KEY TAKEAWAYS ............................................................................................................................................ 43
ASSOCIATION RULE ANALYSIS ................................................................................................................................... 43
CLUSTER ANALYSIS ................................................................................................................................................. 43
PRINCIPAL COMPONENT ANALYSIS ............................................................................................................................ 44
NON-NEGATIVE MATRIX FACTORIZATION ................................................................................................................... 45
PROBABILISTIC LATENT SEMANTIC ANALYSIS ............................................................................................................... 46
FACTOR ANALYSIS ................................................................................................................................................. 46
INDEPENDENT COMPONENT ANALYSIS ....................................................................................................................... 47
MULTIDIMENSIONAL SCALING.................................................................................................................................. 48
CORRESPONDENCE ANALYSIS ................................................................................................................................... 48




2

,Week 1
Key Words
▪ Supervised / unsupervised learning
▪ Antecedent and consequent
▪ Support, confidence and lift
▪ Apriori algorithm and Apriori principle

Supervised vs. unsupervised learning

▪ Supervised learning
Building a statistical model for predicting / estimating an output (y) based on one or
more inputs (x).
o Classification: predict to which category an observation belongs (qualitative
outcomes).
o Regression: predict a quantitative outcome.

▪ Unsupervised learning
Inputs (x) but no outputs (y). Try to learn structure and relationships from data, like …
… discovering associations among variable values → association rule analysis
… discovering unknown subgroups of observations → clustering
… dimension reduction → principal components analysis


Association rule analysis
Goal: to find joint values of the variables x1, …, xp that appear together most frequently in the
data base.
In the case of binary valued data, association rule analysis is called ‘market basket’ analysis.
Transactions are represented in a binary incidence matrix:
1, if the jth item is purchased as part of transaction i.
xij {
0, if the jth item is not purchased as part of transaction i.




This matrix can now be used to find association rules.
An association rule is the implication

A⇒B antecedent ⇒ consequent
In market basket analysis, it can be seen as an if-then statement:
If you buy A, there is a chance that you buy B as well.
3

, Properties of association rules
The support (or prevalence) of association rule A ⇒ B is the relative frequency of the rule.
It’s the probability of simultaneously observing A and B in a randomly selected market basket,
so Pr(A,B).
number of transactions containing A and B
supp(A ⇒ B) =
total number of transactions

Note that this is the support of an association rule. The support of just an item (set) A is defined as:

number of transactions containing A / total number of transactions.




The confidence of association rule A ⇒ B is the conditional probability of B given A, so
Pr(B|A). It is the likelihood of item B being purchased when item A is purchased.
number of transactions containing A and B
conf(A ⇒ B) =
number of transactions containing A


▪ If conf = 1 : B is always purchased when A is purchased.
▪ If conf = 0 : B is never purchases when A is purchased.


Drawback: The confidence for an association rule having a very frequent consequent (B) will
always be high, even if the antecedent (A) is not frequent. Because of this, a rule containing
two items that actually have a weak association may still have a high confidence value.
To overcome this challenge, lift is introduced.


The lift of association rule A ⇒ B calculates the conditional probability of item B given A,
while controlling for the support (frequency) of B.
number of transactions containing A and B / number of transactions containing A
lift(A ⇒ B) =
number of transactions containing B

In other words:
the rise in the probability of having B in the transaction because of the knowledge that A is present
lift(A ⇒ B) = the probability of having B in the transaction without any knowledge about the presence of A



▪ If lift = 1 A and B are independent.
▪ If lift > 1 A and B often occur together.
▪ If lift < 1 A and B are substitutes to each other. The presence of one item has a
negative effect on the presences of the other item.

Lift can be seen as the “strength” of the rule.



4
€8,49
Krijg toegang tot het volledige document:
Gekocht door 56 studenten

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Beoordelingen van geverifieerde kopers

7 van 17 beoordelingen worden weergegeven
1 jaar geleden

very good and detailed summary, only thing that is missing is deep learning week 8.

1 jaar geleden

This is a very good summary of the course, but week 2 on linear algebra is missing.

1 jaar geleden

2 jaar geleden

2 jaar geleden

3 jaar geleden

3 jaar geleden

4,8

17 beoordelingen

5
14
4
3
3
0
2
0
1
0
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
lisannelouwerse Universiteit Utrecht
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
340
Lid sinds
9 jaar
Aantal volgers
248
Documenten
0
Laatst verkocht
1 maand geleden
Summaries UU Economics and Business Economics

Feedback is always welcome. Send me a message if you have any comments on how I can improve my summaries. :)

4,6

71 beoordelingen

5
51
4
16
3
3
2
0
1
1

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen