Week 1, college 1
Video 1, what is machine learning?
● Humans learn stuff like writing numbers, it is difficult to explain to someone how to
write the letter 2 for example.
○ From these we infer the general rules without making them explicit.
○ Machine learning studies this process to put it in a computer.
■ Instead of providing a description step by step, we want to have some
way to give the computer a large number of examples.
● So that the computer can figure out it’s own program without
explicitly stating what the program is.
What makes a suitable ML problem?
● We can’t solve it explicitly.
○ We do not know the program that solves it.
● Approximate solutions are fine.
● Limited reliability, predictability, interpretability is fine in the problem.
● Plenty of examples available to learn from.
●
○ For a bad example like computing taxes, we know the program and rules that
explicitly solve it.
■ We do not need to learn them.
○ Clinical decisions are too important, so limited reliability is not fine.
Where do we use ML?
● Inside other software.
○ Unlock your phone with your face for example.
● In analytic, data mining, data science.
○ Find typical clusters of users, predict spikes in web traffic.
● In science/statistics
○ If any model can predict A from B, there must be some relation.
Definitions of ML:
● Machine learning provides systems the ability to automatically learn and improve
from experience without being explicitly programmed.
○ If you take it literally, you can think about humans that are then this system.
Offline learning
● Separate learning, predicting and acting
○ Take a fixed dataset of examples (aka instances)
○ Train a model to learn from these examples
, ○ Test the model to see if it works by checking its predictions.
○ If the model works, put it into production.
■ For example use its predictions to take actions.
Problems
● We don’t want to make for all specific problems (like playing chess, driving a car etc)
a specific ML algorithm.
○ We want generic solutions
● Therefore we abstract the learning task.
○ This is called abstract tasks.
■ Classification
■ Regression
■ Clustering
■ Density estimation etc.
○ Then we develop algorithms for these abstract tasks
■ Linear models
■ kNN
■ Decision trees etc
Abstract taks
● First we can divide the abstract tasks in supervised and unsupervised ones.
● Supervised.
○ Explicit examples of input and output.
○ Learn to predict the output for an unseen input.
○ Supervised learning tasks:
■ Classification:
● Assign a class to each example.
■ Regression
● Assign a number to each example.
● Unsupervised
○ Only inputs provided.
○ Find any pattern that explains something about the data.
ML vs:
● Artificial Intelligence
○ AI, but not ML; automated reasoning, planning.
○
● Data science
○ Data science, but not ML: Gathering data, harmonising data, interpreting
data.
, ○
● Data Mining
○ More DM than ML:
■ Finding common clickstreams in web logs
■ Finding fraud in transaction networks
○ More ML than DM:
■ Spam classification
■ Predicting stock prices
■ Learning to control a robot
■ More on the task than data itself.
○
● Information retrieval
○ The task of a search machine
○ Is not many overlap, but the task for instance to find documents for a query
can be classification problems.
○
● Statistic
○ Stats but not ML:
■ analyzing research results, experiment design, courtroom evidence.
○ More ML than stats:
■ Spam classification, movie recommendation.
■ Only predictions that we like to be accurate.
○
, ● Deep learning
○ Deep learning is a subset of ML.
○ Particular ML techniques.
○
Video 2, Classification
● We start with some data.
○ The data can be thought of as a large table, containing examples of the sort
of things we want to learn.
■
■ The numbers are the features, the things we measure about our
instances (the ham or spam).
○ The dataset is fed to a learning algorithm, which outputs a model.
■ The model is called the classifier because we are doing classification.
■
■ The model is constructed so that if it sees a new instance, with the
same features as an instance in the data that we fed to the learner,
then the model assigns it to the same class we saw in the data.
● First you have data, which you put in a table.
○ You then pick the feature.
○ Then every data snippet becomes an instance, with certain features.
○ Then you in the table is a column of the thing the instance represents.