Extensive, clear summary of the course 'Introduction to Machine Learning', based on elaborated notes made during the lecture, including graphs and pictures, connected to the lecture slides. The document is divided in 2 parts, the first part concerning the midterm material, the second part concernin...
Why ML?
Evolution digital data
- Structured Data
Highly organized, made up mostly of tables with rows and columns that define their
meaning.
Excel spreadsheet, relational databases
- Unstructured Data
Everything else
Text messages, audio files of music, video files, images of pictures, memes,
illustrations, images of pictures
Data explosion asks for ML → explosion of unstructured data motivates need for ML
Volume of data collected grows daily
Need knowledge discovery to make sense of unstructured data
ML → computers learn from data to aid knowledge discovery
1. Supervised Learning
Given a set of data and labels, learn a model which will predict a label for new data
Often used to automate manual labour
→ might annotate part of a dataset manually, learn a ML model from those
annotations, use model to annotate rest of your data
Allows to collect data or produce data output from previous experience
Labelled dataset, model can learn from it to provide result of the problem easily
Classification (input data as member of particular class/group, classify new images
labelled with mango/apple in those categories)
Regression problems (prediction)
Given D = { Xi, Yi }, learn a model/function F: Xk -> Yk
X = labels
Y = targets
Use a function such that X can be used to predict Y
, Person Y comes in, can predict with data X (previous patients) whether the person is
diabetic for example
Given satellite image, what is terrain in the image?
Xi = pixels in images, Yi = terrain type (water, river, mountain, (multiple categories))
Given test results from a patient, will the patient have diabetes?
Xi = test results, Yi = diabetes / no diabetes (binary)
Annotated data = data with labels → needed for supervised ML
Teaching a model → trained on labelled dataset, so it can predict outcome of out-of-
sample data
Row → attributes
Needs lots of training data with correctly classified data, then it should be able to
correctly classify new data → learning instead of memory
→ we’re trying to build a model to predict an answer or label provided by a teacher
GIVEN LABELS, LEARN TO PREDICT ON NEW DATA
2. Unsupervised Learning
Find all kinds of unknown patterns of data
Discover patterns in data
Given D = {Xi} group the data into Y classes using a model/function F: Xi -> Yj
Discover trending topics on Twitter or in the news
Grouping data into clusters for easier analysis
Outlier detection (fraud detection and security systems)
Let model work on its own, unlabelled data
Learning without training labels
Finding patterns in the world
Learning without ‘a teacher’
→ the world around us is basically providing training labels
TRIES TO FIND PATTERNS IN THE WORLD
3. Reinforcement Learning
Reasoning under uncertainty to make optimal decisions
How agents should take actions in an environment to maximize some reward
Given D = {environment (e), actions (a), rewards (r)} learn a policy and utility
functions:
Policy: F1: {e,r} -> a
Utility: F2: {a, e} -> R
AlphaGo, computer taught by human
, Pepper Robot, teach to interact with children and other humans
Learning through feedback from the machine’s behaviour
Trial-and-error
https://www.youtube.com/watch?v=xtOg44r6dsE
ML = provides machines ability to learn automatically and improve from experience without
being explicitly programmed
- Supervised learning
Machine learns under guidance, by feeding labelled data, explicitly telling what the
input is and what the output should be.
Labelled data → feed output into algorithm. Knows output → either cat or dog.
The inputs are labelled as either cat/dog and have to be putted in the category cat or
dog
Aim = forecast outcomes
- Unsupervised learning
Act without supervision, data is not labelled, not guide. Machine has to figure out the
given data set and has to find hidden patterns to make predictions about output.
Association / clustering
Association = discovering patterns in data, finding co-occurrences
Clustering = targeting marketing, given a list of customers and some info. Have to
cluster them based on similarities
Machine is only given input data, does not tell algorithm where to go
Has to find patterns in the data
Machine will be fed images of cats and dogs, and will form two groups, one with cat
and one with dog
Will just understand how cats/dogs look and divide them based on this
Aim = discover underlying patterns
- Reinforcement
Establish pattern of behaviour
After a while you adapt, learn from experience
Hit and trial concept → only way to learn is experience
Agent interacts with environment by producing actions and discovers errors and
rewards and once agents gets trained it gets ready to predict new data presented
Input itself depends on actions we take.
, Actions are recorded in matrices → memory
Collects data when exploring
No predefined data
Learn series of actions
Other Kinds of Learning
- Semi-Supervised Learning
Supervised + unsupervised learning
Train on some data, use that for data without labels
Combines small amount of labelled data with large amount of unlabelled data
- Active Learning
Supervised + reinforcement learning → self-driving cars
Learn on the go
Algorithm can interactively query a user to label new data points with the desired
outputs
Examples of Applications:
Recommender Systems
Based on purchase in past, gives option to buy this
- Supervised learning
Never-Ending Language Learning
Trying to learn languages from set data base
Extracts text files from pages, trained on available labelled data → unknown data
- Semi-supervised
Trained on labelled data, to get new
Driverless Cars
- Active learning, supervised and reinforcement
Lighters to detect → supervised
Have to adapt to environment → reinforcement
Biology
- Unsupervised learning
Cluster amino acids
Could be labelled
- Supervised
Labelled data is not a form of structured data, structured is already in tables (excel sheets)
Structured can be supervised and unsupervised
Active
→ if you provide some feedback
Technique applied to supervised learning → uses labelled data
Reinforcement
Does not use labelled data
Provides a reinforcement signal that tells us how good the current outputs of the system being
trained are
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller saskiakriege. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $9.44. You're not tied to anything after your purchase.