100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
DSCI 4520 Exam 1 Section 2 Questions with Complete Solutions $12.49   Add to cart

Exam (elaborations)

DSCI 4520 Exam 1 Section 2 Questions with Complete Solutions

 1 view  0 purchase
  • Course
  • WSS
  • Institution
  • WSS

DSCI 4520 Exam 1 Section 2 Questions with Complete Solutions

Preview 2 out of 8  pages

  • November 13, 2024
  • 8
  • 2024/2025
  • Exam (elaborations)
  • Questions & answers
  • WSS
  • WSS
avatar-seller
lectknancy
DSCI 4520 Exam 1 Section 2 Questions
with Complete Solutions
In the k-means clustering technique, the desired number of clusters (k) is a number that
is determined in the middle of the algorithm by calculating the model error.
True
False - Answer-False

We run two k-means clustering models on the same data with k=3 and k=5. The model
with k=3 is necessarily better than the other one because a smaller value of k is always
better for clustering.
True
False - Answer-False

The following chart shows the within-cluster sum of square errors versus the number of
clusters in a k-means clustering model. Based on the Elbow method, what value of k is
optimum for clustering?
2
5
8
4 - Answer-4

Which statement is INCORRECT about clustering?
A. Clustering is useful for predicting association rules
B. Clustering is an unsupervised learning method
C. Clustering has many applications is marketing, insurance, logistics, and health care
businesses
D. Quality of a clustering model depends on the similarity measure that is used -
Answer-Clustering is useful for predicting association rules

Both numerical and categorical variables can be used in the Euclidian distance function
in the k-means clustering algorithm.
True
False - Answer-False

What is the Euclidean distance between the following two records WITHOUT
normalization? Round your answer to 1 decimal.
Euclidean distance formula: - Answer-11.5

The k-means clustering algorithm can easily handle noisy data with outliers as well as
non-convex data patterns.
True
False - Answer-False

, Before computing the distance between two data records, we should normalize the
numerical variables to prevent variables with large scales from having an undue effect.
True
False - Answer-True

Which statement is INCORRECT about the k-means clustering algorithm?
A. The algorithm starts with initial centroids that are determined by distance function
B. The algorithm starts with random seeds as the initial centroids
C. Each data point is assigned to the cluster with the nearest centroid
D. The choice of distance function is arbitrary, and the Euclidean distance function is
very popular - Answer-The algorithm starts with initial centroids that are determined by
distance function

Which statement is INCORRECT about choosing the number of clusters in the k-means
clustering method?
A. Maximizing the within-cluster sums of squared errors (WSS) is the goal when
selecting k
B. Sometimes business considerations impose constrains on the value of k
C. Ability to do a useful profiling based on the cluster centroids helps us select a right
value of k
D. Similar analyses can be used to inform our decision about a right value of k -
Answer-Maximizing the within-cluster sums of squared errors (WSS) is the goal when
selecting k

k-nearest neighbor (k-NN) is a supervised method that can be used for predicting
categorical or numerical targets.
True
False - Answer-True

In the k-nearest neighbor models, increasing the value of k leads to overfitting.
True
False - Answer-False

With the k-NN model for a numerical target, after we determined the k nearest
neighbors of a new data record, how the target value is predicted?
A. Majority vote determines the predicted class
B. Average of the neighbors
C. Through a logistic regression between the neighbors
D. Through a linear combination of neighbors - Answer-Average of the neighbors

What statement is correct about the k-nearest neighbor (k-NN) method?
A. Underfitted k-NN models can be fixed by adding a dummy variable for accuracy
B. Logistic regression is a special case of k-NN
C. The value of k can control model over and underfitting
D. Overfitted k-NN models can be fixed by decreasing k - Answer-The value of k can
control model over and underfitting

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller lectknancy. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $12.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

62890 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$12.49
  • (0)
  Add to cart