MATH 425 exam 2 all answers correct
Unsupervised learning methods are needed when... data only contains features and no label
What are some of the possible goals within unsupervised learning framework? One possible goal
within the unsupervised learning framework is to discover interesting thi...
Unsupervised learning methods are needed when... ✅data only contains features and no label
What are some of the possible goals within unsupervised learning framework? ✅One possible goal
within the unsupervised learning framework is to discover interesting things about the data that you are
working with. This includes questions such as "Are there any subgroups among the observations or
variables that we can discover?", and "Do you notice any hidden patterns or structures within the
data?". To achieve these goals, we can use methods such as Clustering, and PCA.
Which of the following is not an unsupervised learning approach? ✅K-NN
What is the main challenge in unsupervised learning compared to supervised learning? ✅Due to the fact
that unsupervised learning is much more subjective than supervised learning, there is no clear and
simple goal for the analysis. Instead we are able to go on a case by case basis depending on the data.
Clustering seek a partition of the data into distinct groups so that theobservations within each group are
quite similar to each other. ✅True
Describe two distinct examples of clustering at play in our daily life. ✅In one of my classes, my professor
made us do a partner project. He basically split us into 2 groups, the upper half of the class, and the
lower half (upper being stronger student, lower being weaker students). He then partnered us up by
picking one strong student with one weaker student. The 2 groups he split the class into would be an
example of putting us into subgroups.Another example is that I work for the Professional Edge center on
campus. One of the biggest things we keep track of is the number of appointments that are made
throughout the entire center. We then are able to create subgroups from all the student data. This
usually includes things such as which coach they had a meeting with, what their major is, what year they
are, etc.
K-Means clustering involves ✅specifying the number of clusters
Centroid refers to ✅a point which is the average of all the points in the cluster
, Describe the K-Means algorithm. ✅The 1st step in the K-Means algorithm is to randomly assign number
from 1-K to each observation. You can also select K distinct points, that are as far from each other as
possible, and label them as the centroid of one cluster. The 2nd step is to iterate until the cluster
assignments quit changing. This can happen in 2 ways. the 1st one being for each of the k clusters,
compute the clusters centroid. The 2nd one is to assign each observation to the cluster whose centroid
is closest.
The main idea behind K-Means ✅is to have a small within-cluster variation
Hierarchical Clustering has the following major advantage over K-Means ✅the number of clusters is not
specified at the start
Why do we need to scale features in certain cases? ✅Scaling features can be a very useful tool in certain
cases. If you look at the "Importance of Feature Scales" in the lecture slides, you can physically see how
much of a difference scaling can make. Like in the example shown with the computers and socks, there
may be more socks being sold at the store, but the store is making a lot less on all the socks sold,
compared to just 1 computer being sold. Looking at the very last graph in the slides, you can see that
just selling a few computers creates a much larger profit than the socks. This can help the company
realize where they should focus on making their sales. The other 2 graphs are very misleading, and if a
company did not scale, they make not be focused on the right areas.
Describe the Principal Component Analysis (PCA)? ✅Principal Component Analysis, also known as PCA,
is a very popular approach for producing a low-dimensional representation of the dataset. This can help
when we are given a larger data set of correlated features. PCA can allow us to summarize the set with a
much smaller number of representative features that can explain the majority of the variability in the
original set. PCA can also serve as a tool for data visualization
PCA transforms the original data (X1, X2, ..., Xp) into new features that are uncorrelated. ✅True
Explain the process of choosing the number of principal components for further analysis. ✅We look at
the variance being explained by each component to decide how many to choose.
Explain how PCA provides us with a low dimensional representation of the data. ✅choosing a few
loading vector components corresponds to a low dimensional representation.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper CertifiedGrades. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €10,29. Je zit daarna nergens aan vast.