Summary

Summary Introduction To Machine Learning (822047-B-6)

75 views 3 purchases

Course
Introduction To Machine Learning (822047B6)

Institution
Tilburg University (UVT)

A summary on the Introduction to Machine Learning course provided by Tilburg University for the Cognitive Science and AI bachelor. This is based on the lectures, with notes on the course book added when it provides useful additional information.

[Show more]

Preview 3 out of 27 pages

View example

Uploaded on June 12, 2022
Number of pages 27
Written in 2021/2022
Type Summary

machine learning
ensemble models
feature engineering
dimensionality reduction
model selection
clustering
weak learners
model evaluation
csai
cognitive science and ai
introduction to machine learning

Institution
Tilburg University (UVT)
Education
Cognitive Science And Artificial Intelligence
Course
Introduction To Machine Learning (822047B6)

NienkeUr

Member since 2 year 25 documents sold

$6.89

Added

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

INTRODUCTION TO MACHINE LEARNING
CONTENT
Gradient boosting 15
introduction 2 Stacking (heterogeneous) 15
Terminology 2 Forest models 16
Pre-processing 3 Random Forest 16
Data Scaling 3
Missing value imputation 3 model evaluation and learning with imbalanced
Categorical variables 3 data 17
Tuning parameters 3 Bias + Variance + irreducible error 17
Evaluate using training and test sets 17
K-Nearest Neighbours (KNN) 4 Evaluate using cross-validation 17
KNN Classifiers 4 Evaluation of Binary classification 18
KNN regressions 4 Imbalanced data 18
KNN Strengths, weaknesses, parameters 4
Nearest centroid 4 Feature engineering 19
Working with images 19
Linear regression 5 Transforming text data 19
OLS regression 5 Feature selection 19
Solving OLS minimization 5
Reducing the magnitude of coefficients 5 Dimensionality reduction 20
PCA (principle component analysis) 20
Logistic regression 6 Computing PCA 21
Logistic regression 6 PCA in higher dimensions 21
Multiclass classifications 7 Non-negative matrix factorization 21
Manifold learning:For data visualization 22
Neural networks/Multi-Layer perceptrons 8 T-distributed Stochastic Neighbor Embedding 22
The neural network (NN) approach 8
Popular activation functions 8 Clustering 23
K-means clustering 23
Support vector machines 10 MiniBatchKMeans 23
linear Support vector machines (SVM) 10 Feature extraction using K-means 23
Kernel svms 11 Hierarchical clustering 24
Similarity (kernel) functions 11 Agglomerative hierarchical clustering
SVM regression 11 techniques 24
Pros and cons 24
Naïve bayes and decision trees 12 Density based clustering methods 24
naïve bayes 12 Density-based spatial clustering of applications
Decision trees 13 with noise (DBSCAN) 24
Model complexity 13 Mixture models 25
Decision Tree regressions 13 Gaussian mixture models 25
Evaluating clustering results 25
Ensemble learning, boosting, random forests 14 Silhouette coefficient 25
Ensemble learners 14
Voting (heterogeneous) 14 Comparing different models 26
Bagging (homogenous) 14 Choosing models 26
Boosting (homogenous) 14 What the models look like 27
AdaBoost 15 Examples 27

1

,INTRODUCTION
Machine learning Extracting knowledge from data, at the intersection of statistics, AI, and computer science.
This is used when we need to make sense of unstructured data.
It is used to predict values or to learn something previously unknown.

SUPERVISED LEARNING REINFORCEMENT LEARNING
Algorithms that learn from input/output pairs. This is Reasoning under uncertainty for optimal
used to automate manual labor. decisions. How agents should take actions in an
Given 𝑫 = {𝑿𝒊 , 𝒀𝒊 }, the model will learn 𝑭: 𝑿𝒌 → 𝒀𝒌 environment to maximize a reward.
Given
UNSUPERVISED LEARNING 𝑫 = {𝒆𝒏𝒗𝒊𝒓𝒐𝒏𝒎𝒆𝒏𝒕 ሺ𝒆ሻ, 𝒂𝒄𝒕𝒊𝒐𝒏 ሺ𝒂ሻ, 𝒓𝒆𝒘𝒂𝒓𝒅 ሺ𝒓ሻ}
Only the input data is known, no output data (labels) learn policy and utility functions:
is provided. It can be useful for outlier detection. policy 𝑭𝟏 : {𝒆, 𝒓} → 𝒂 and utility 𝑭𝟐 : {𝒂, 𝒆} → 𝒓
Given 𝑫 = {𝑿𝒊 }, group/cluster the data into 𝑭: 𝑿𝒊 → 𝒀𝒋

SEMI-SUPERVISED
Combine supervised and unsupervised models. This is useful when only a part of the data is labelled
e.g. Based on past information on spam emails, you can filter new incoming mails into Inbox and Spam

ACTIVE
Combine supervised and reinforcement models. You get feedback from the model
e.g. Speech automated systems train your voice and then start working based on this training.

TERMINOLOGY

Label/class The target variable (𝒚) of an instance (datapoint)
Features/attributes The input data (𝑿). The attribute values are feature values, summarized in a feature vector
Model An equation that links the values of features to the predicted value of the target variable
Generalization When a model can make accurate predictions on unseen data, it can generalize from the
training set to the test set.
Score functions Also fit statistics or score metrics. Measures how well the model fits the data
Feature selection Reduce the number of predictors by selecting the important ones (dimensionality reduction)
Feature extraction Reduce the number of predictors by means of mathematical operations (PCA)
Structured data Highly organized data, made up of mostly tables with rows and columns
Unstructured data Unorganized data, for example texts, images etc.

Classification Discrete output, it predicts a class label. Train to find decision boundaries to separate classes
Regression Continuous output, it predicts a value. Train to fit the data and describe relations
One vs rest An approach to use binary classification algorithms on multiclass datasets.
The model is learned for each class separately. For predictions, all classifiers are run on the test
point, the one with the highest score wins.
Pipelines Create a workflow that can execute a sequence of tasks at once
Parameters Variables that are learned during the training of the model
Hyperparameters Variables of which the value is set prior to training the model
Overfitting A model that is too complex for the available data. It is fit too closely to the training set, and
cannot generalize on new data. It also fits to the noise in the training dataset.
It can be avoided by evaluating with separate testing data
Underfitting A model that is too simple for the available data. It will underperform on both the training and
testing sets. Not all aspects and variability in the data are captured by the model.
Dataset size This is intimately tied to model complexity, more data can lead to more complex and
accurate models with lower risk of overfitting
Intuition derived from datasets with few features (low-dimensional datasets) might not hold up
in high-dimensional datasets

2

, Manually crafting decision rules has two major disadvantages:
1. The required logic is specific to a single domain and task.
Slight changes in the task can require a rewrite of the whole system.
2. Designing rules requires a deep understanding of how a decision should be made by a human expert.

Computers and humans resolve problems differently, this can cause issues when making up the rules.
Presenting a lot of data to the computer, after which it can determine the rules by itself, can resolve this issue.

PRE-PROCESSING

DATA SCALING

Machine learning algorithms don’t perform well when the input numerical attributes vary widely in scale.

STANDARD SCALER ROBUST SCALER MIN-MAX SCALER
This is good for non- Less sensitive to skewed data and outliers. Shift the data to an interval set
skewed data The median value is now indicated by 0. by 𝑥𝑚𝑖𝑛 and 𝑥𝑚𝑎𝑥 , usually [0,1]
𝒙−𝝁 𝒙 − 𝒎𝒆𝒅𝒊𝒂𝒏 𝒙 − 𝒙𝒎𝒊𝒏
𝒙=𝒛= 𝒙= 𝒙=
𝝈 𝑰𝑸𝑹 𝒙𝒎𝒂𝒙 − 𝒙𝒎𝒊𝒏

NORMALIZER
Rows are rescales such that the norm is 1. This is useful when just the direction of data matters.
Compute norm ξ𝒆𝒍𝒆𝒎𝒆𝒏𝒕𝒔𝟐 and divide each element by this norm.

Univariate transformations Most models perform best with Gaussian distributed data.
Methods to transform data to Gaussian include Box-Cox and Yeo-Johnson.
Both estimate the best power transformation to get a Gaussian distribution.
Yeo-Johnson can work with negative numbers but is less interpretable.
Binning Separate the values into 𝒏 categories. All values within one category are replaced
by e.g. the mean. This is effective for models with few parameters (e.g. regression),
but not for models with many parameters (e.g. decision trees)

MISSING VALUE IMPUTATION

There are different methods for missing value imputation, the best fit depends on the situation and the data.
Missing value imputation Pre-processing focussed on missing values. Missing data is common in the real world.
Imputation replaces the missing value with an estimate for that value.
Common ways are: Mean/median, KNN, model-driven, or iterative
Mean imputation The mean value for a column is taken. This is not very precise
KNN imputation The mean of the K nearest neighbors in the remaining columns is taken. (more flexible)
Model-driven imputation A regression model predicts which value is expected given the values that are known

CATEGORICAL VARIABLES

Data regularly has categorical/discrete features. It is often necessary to represent these as numbers.
There are two main methods:
1. One Hot encoding
Each category of the initial feature becomes its own dummy feature.
This is popular when there are few categories in the feature. This does not imply order within the feature
2. Count-based encoding
For high cardinality (amount) categorical features. A label aggregates the value of the variable.

TUNING PARAMETERS

This is never done on the test set.
Use the training set to estimate the coefficients for different values of hyperparameters.
Use the validation set to estimate the best degree of the polynomial, by evaluating the fit on the second set.
Use the test set to test how well it generalizes to unseen data.

3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller NienkeUr. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.89. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

76669 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Summary

Summary Introduction To Machine Learning (822047-B-6)

Document information

Subjects

Written for

Seller

Reviews received

Content preview

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Quick and easy check-out

Focus on what matters

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?

Recently viewed by you

Exam (elaborations) ·

Elsevier Exam 2 OB EAQs 2023 with correct answers

Exam (elaborations) ·

NR 511 / NR511 - ALL QUIZ/EXAM QUESTIONS

Class notes ·

NUMERICAL ANALYSIS