Data Science 11 - Clustering algorithms
Data Science 11 - Clustering algorithms k-Means and variants; Initialization: • Randomly chooses k points from X used as the initial means • k-Means++: Pick initial means, such that they are uniformly distributed in the space. This leads to faster convergence k-Means and variants; Representatives: • k-Medoids or Partitioning Around Medoids (PAM): The cluster representatives are medoids (objects from X). Only the distance between objects is needed Problems with k-Means: • Clustering model with Gaussian distribution does not always fit CURE algorithm • Assumes a Euclidean distance • Allows clusters to have any shape • Uses a collection of representative points to represent clusters CURE algorithm; Pass 1: Pick a random sample of points that fit in main memory • Initial clusters: - Cluster these points hierarchically to create initial clusters • Pick representative points: - For each cluster, p
École, étude et sujet
- Établissement
- Data Science 11 - Clustering algorithms k-Means an
- Cours
- Data Science 11 - Clustering algorithms k-Means an
Infos sur le Document
- Publié le
- 20 mars 2024
- Nombre de pages
- 5
- Écrit en
- 2023/2024
- Type
- Examen
- Contient
- Questions et réponses
Sujets
-
data science 11 clustering algorithms k means an
Document également disponible en groupe