100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
College aantekeningen

2024 Machine Learning Notes Highlights (second part)

Beoordeling
-
Verkocht
-
Pagina's
39
Geüpload op
09-02-2024
Geschreven in
2023/2024

I achieved a score of 18 out of 20, the greatest distinction, in the 'Machine Learning' course in 2024. This success is attributed to the systematic study material I authored on my own. In the second part, it contains chapter 6 to 9, covering the KNN, clustering, recommendation system, ANN, text mining, etc. with a meticulously made navagation pane.

Meer zien Lees minder











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
9 februari 2024
Aantal pagina's
39
Geschreven in
2023/2024
Type
College aantekeningen
Docent(en)
David martens
Bevat
Alle colleges

Onderwerpen

Voorbeeld van de inhoud

10.31 Lec6 Clustering & Association rules
Significant point of this Lec (SP):
• Revisit: Supervised vs. unsupervised learning
• Knn
• Clustering
• Apriori & association rules
• Recommender system


Highlight:
1 Revisit: Supervised and unsupervised model
Supervised model (=predictive data mining), means you discover patterns in training
set to predict value of target variable of items in test set (i.e. discrete target variables:
classification; continuous target variables: regression), whereas unsupervised
model(=descriptive data mining) means you discover regularities in data without
notion of target variable.


Classification, regression, and causal modeling generally are solved with supervised
methods. Similarity matching, link prediction, and data reduction could be either.
Clustering, co-occurrence grouping, and profiling generally are unsupervised. The
fundamental principles of data mining that we will present underlie all these types of
technique.
2 knn

• GOAL = find k instances that are most similar to data point
• Attention: [the importance of standardization] Numeric attributes may have
vastly different ranges, and unless they are scaled appropriately the effect of
one attribute with a wide range can swamp the effect of another with a much
smaller range.
• Number of k and weight vote:




43

,2.1 similarity measures and an example of cosine distance:




44

,Anothter example:
If two data points, (2,2) (8,8)
d=1-(2*8+2*8)/!·"(2^2+2^2)*·"!8^2+8^2""
d=1-32/32
d=0


2.2 Issues/advantages and disadvantages with knn:
¿ It’s comprehensible: justification for model and data instances
¿ Computational efficiency: Training time=0. As a “lazy learner ”, it waits until a
prediction is asked.
¿ Curse of dimensionality: KNN always takes all features into account to calculate
the similarity. Therefore: [selection of features] having too many attributes, or
many that are irrelevant to the similarity judgment, which demands for a data
scientist’s domain knowledge.
¿ Nature of attributes: 1) scaling of attributes; 2) dummy encoding




The ads and disads of KNN:




45

, Advantages
1. Simplicity and Intuitiveness: kNN is incredibly straightforward and easy to
understand, making it a good starting point for algorithm learning and
application.
2. No Training Phase: kNN is a lazy learner, meaning it doesn't learn a
discriminative function from the training data but memorizes the training
dataset instead.
3. Versatility: It can be used for both classification and regression problems.



Disadvantages
1. Scalability: kNN can be computationally expensive, especially with large
datasets, as the distance needs to be calculated between each test sample and
all training samples.
2. Curse of Dimensionality: kNN suffers significantly as the dimensionality of the
data increases because it becomes difficult to compute distances in high-
dimensional space.
3. Optimal k Value: Selecting the optimal value of k is crucial for the
performance of the algorithm, and it can be computationally intensive to
find this value.


3 Clustering


• Goal : Dividing data into clusters such that there is maximal similarity between
items within the cluster and maximal dissimilarity between items of
different clusters.




46
€6,39
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
thaboty
1,0
(3)

Maak kennis met de verkoper

Seller avatar
thaboty Universiteit Antwerpen
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
5
Lid sinds
1 jaar
Aantal volgers
2
Documenten
5
Laatst verkocht
11 maanden geleden

1,0

3 beoordelingen

5
0
4
0
3
0
2
0
1
3

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen