Summary

Samenvatting An Introduction to Statistical Learning - Machine Learning (F000942)

0 purchase

Course
Machine Learning (F000942)

Institution
Universiteit Gent (UGent)

Book
An Introduction to Statistical Learning

Summary of the Machine Learning course at Ghent University, given by Dries Benoit. For Data Science for Business and Business Engineering, first master

[Show more]

Preview 4 out of 35 pages

View example

Summarized whole book? No
Which chapters are summarized? Unknown
Uploaded on January 11, 2024
Number of pages 35
Written in 2023/2024
Type Summary

machine
learning
machine learning
moving beyond linearity
tree based methods
support vector machines
svm
deep learning
survival analysis
multiple testing
r

Book Title:An Introduction to Statistical Learning

Author(s):Gareth James, Daniela Witten

Edition:Unknown
ISBN:9781071614204
Edition:Unknown

Institution
Universiteit Gent (UGent)
Education
Handelsingenieur
Course
Machine Learning (F000942)

$7.26

Add to cart

Save

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

MACHINE LEARNING
INTRODUCTION: RECAP TO DATA MINING
Machine Learning is the study of computer algorithms that can improve automatically through
experience and by the use of data. It is seen as a part of artificial intelligence (AI).

Data Mining is an interdisciplinary subfield of computer science and statistics with an overall goal to
extract information from a data set and transform the information into a comprehensible structure
for further use.

Regression Models

Linear Regression

A Simple Linear Regression (SLR) is a statistical method that allows us to summarize and study the
relationship between two continuous, quantitative, variables. There is one dependent variable Y
(output, response, outcome…) and one independent variable X (input, feature, predictor…).

Y = β0 + β 1 X

β1 is the slope of the line, representing the change in Y or a one-unit change in X. β0 is the Y-
intercept, representing the value of Y when X is 0. These β parameters are estimated using the least
squares method.

Residuals in SLR are the differences between the observed (actual) values of the dependent variable
and the values predicted by the regression model:

e i= y i− ŷi

Visually, the residuals represent the vertical distance between observation and regression line.

The Residual Sum of Squares (RSS) is a measure of the total error or total deviation of the observed
vales from the values predicted by the regression line.
n
RSS=∑ e i
2

i=1

Residuals are used to evaluate how well the regression model fits the data. A good model will have
residuals close to zero. The regression model is typically fitted by minimizing the RSS. In other words,
the β parameters are chosen in such a way that the sum of the squares residuals (RSS) is minimized.
This method is known as the least squares method. A lower RSS is good.

Assumptions:

- SLR assumes a linear relationship between the variables
- SLR assumes that the residuals are normally distributed and have constant variance.

Multiple Linear Regression

Is an extension of the SLR that involves two or more independent variables to predict a single
response variable. ε is the error term, representing the unobserved factors that affect Y but are not
included in the model.

Y = β0 + β 1 X 1 + β 2 X 2 +…+ β p X p + ε

,The goal is to estimate the β coefficients that minimize the sum of squared differences between the
observed and predicted values of Y (= RSS). The assumptions that held in SLR still apply here. Instead
of generating a regression line, like in SLR, the MLR fits a hyperplane.

Overfitting might be a problem
here. This happens when you include too many predictors without sufficient data, where the model
then fits the training data closely, but fails to predict well to new data.

Difference between confidence and prediction interval?

Confidence and prediction intervals are both statistical concepts used in regression analysis to
provide a range within which a parameter or a future observation is expected to fall.

A confidence interval is used to estimate the range in which we expect the population parameter
(regression coefficient) to fall with a certain level of confidence.

A prediction interval is used to estimate the range within which a future observation (new data
point) is expected to fall.

A confidence interval is usually more narrow than a prediction interval, since the confidence interval
focuses on an entire population whereas the prediction interval only focusses on an individual point.
The uncertainty is bigger in the prediction interval.

Logistic Regression

A statistical method used for modelling the probability of a binary outcome. Despite its name , it is
commonly used for classification problems where the dependent variable has two outcomes
(=dichotomous). It transforms values between -infinity and +infinity to values between 0 and 1.

The logistic function is used to model the probability that a given input belongs to a particular
category.

The outcome of this function is a probability between 0 and 1. To make a binary decision, a threshold
is chosen (commonly 0.5) and if the predicted probability is above this threshold, the observation is
classified as the positive (1) or the negative (0) class.

,The log of odds (=logit) function is often used to interpret the results. This equation linearly combines
the input features, and the parameters represent the change in the log-odds for a one-unit change in
the corresponding feature.

Logit is the ln of the
probability of event happening / probability of event not happening.

Assumptions:

- Assumes the relationship between the independent variables and the logit of the dependent
variable is linear.

The β parameters are found by maximizing the likelihood function. It measures the likelihood of
observing the given set of outcomes given a specific set of parameter values (β). The goal is to find
the set of parameter values that maximize this likelihood. (SLIDE 25)

Linear Discriminant Analysis (LDA)

LDA is a dimensionality reduction and classification technique commonly used in the context of
supervised learning. Its primary objective is to find a linear combination of features that characterize
or separates two ore more classes in the data.

The figure
shows two classes with one predictor. The classes being a house-owner (pink) and a non-house-
owner (green). The predictor (x-axis) here is the amount of money either of the classes have saved or
is on their bank account. When a new data point has to be classified, its class is determined based on
the nearest class mean.

Assumptions:

- LDA assumes the features are normally distributed with each class
- LDA assumes that the classes have the same covariance (= the same SD)

, Main difference with logistic regression is that logistic regression does not make assumptions about
the distribution of x’s. Also, logistic regression focusses on predicting probabilities while LDA focusses
on maximizing the class separability.

Confusion Matrix

A table used in classification to evaluate the performance of a machine learning model. Provides a
summary of the predicted and actual classifications for a given set of data.

True Positive True Negative

Predicted Positive TP FP

Predicted Negative FN TN

- TP; True Positive
- FP; False Positive
- FN; False Negative
- TN; True Negative

Metrics derived from the confusion matrix:

- Accuracy: (TP + TN) / N
- Error: (FP + FN) / N
- Specificity (=TNR): TN / (TN + FP)
- Precision: TP / (TP + FP)
- Sensitivity (=TPR or Recall): TP / (TP + FN)
- FPR: FP / (FP + TN)

Receiver Operating Characteristics (ROC) Curve

ROC curve is a graphical representation that illustrates the performance of a binary classification
model at various classification thresholds. Tool for evaluating the trade-off between True Positive
Rate (TPR) and False Positive Rate (FPR). The threshold varies from 0 to 1. At each threshold, calculate
the TPR and FPR. Plot these pairs on the ROC curve.

This plot is summarized using the AUC-ROC metric.
It is a single value that summarizes how well a binary classification model distinguishes between the
two classes. A higher value is a better performance. 1 is perfect discrimination, 0.5 suggests random
performance.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller LLEO. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.26. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

65646 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling

Summary

Samenvatting An Introduction to Statistical Learning - Machine Learning (F000942)

Document information

Subjects

Connected book

Written for

Seller

Content preview