Decision Tree - correct answer-a non-parametric supervised learning
algorithm, which is utilized for both classification and regression tasks. It has a
hierarchical, tree structure, which consists of a root node, branches, internal
nodes and leaf nodes.
K-Nearest Neighbors - correct answer-A data mining method that predicts
(classifies or estimates) an observation i's outcome value based on the k
observations most similar to observation i with respect to the input variables.
Naive Bayes Classifier - correct answer-an algorithm that predicts the
probability of a certain outcome based on prior occurrences of related events
Support Vector Machine - correct answer-Supervised learning classification
tool that seeks a dividing hyperplane for any number of dimensions can be
used for regression or classification
Nueral Networks - correct answer-a method in artificial intelligence that
teaches computers to process data in a way that is inspired by the human
brain.
Decision Tree Hyperparameters - correct answer-Many. Includes
min_samples_leaf , min_samples_split , max_leaf_nodes , or
min_impurity_decrease
K-Nearest Neighbor Hyperparameters - correct answer-K-value and distance
function
Decision tree disadvantages - correct answer--Prone to outliers
-tree can grow to be very complex while training complex datasets
, K-Nearest Neighbor disadvantages - correct answer--K has to be wisely
selected
-Large computation cost during runtime if sample size is large
What are two variable selection criteria? - correct answer--Entropy and
Information Gain
-Gini Index
Pure when Entropy = - correct answer-0
Impure when Entropy = - correct answer-1
Entropy - correct answer-a measure of the disorder of a system or energy
unavailable to do work.
Why the minus in the Entropy formula - correct answer-Probabilities are
always between 0 and 1.
log(x) where x < 1 is negative
Each term in the sum is negative, so the result of the sum negative meaning
that the minus makes the result positive
Information Gain - correct answer-the amount of knowledge acquired during a
certain decision or action
Random forests - correct answer--for supervised machine learning, where
there is a labeled target variable
-used for solving regression (numeric target variable) and classification
(categorical target variable) problems
-an ensemble method, meaning they combine predictions from other models
-Each of the smaller models in the random forest ensemble is a decision tree
What is the best hyperplane? - correct answer-The one that maximizes
distance from the hyperplane to data points
Margin - correct answer-the distance between hyperplane and data points
What is the name for the points closest to the hyperplane - correct
answer-Support Vectors
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller topgradesdr. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.99. You're not tied to anything after your purchase.