100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Lecture Slides & notes Real-life Machine learning (300363-B-6) $7.03   Add to cart

Summary

Summary Lecture Slides & notes Real-life Machine learning (300363-B-6)

 24 views  2 purchases
  • Course
  • Institution
  • Book

This document contains all the lecture slides and notes of the course 'Real-life Machine learning (300363-B-6)', given at Tilburg University as premaster for JADS. This document contains everything needed for the exam and is complete. Goodluck with the course!

Preview 4 out of 71  pages

  • No
  • Alle stof nodig voor het tentamen/everything needed for the exam
  • December 17, 2023
  • 71
  • 2023/2024
  • Summary
avatar-seller
Lecture 1
What are we going to learn today?
- What is machine learning?
- What are supervised and unsupervised machine learning?
- Which are the most common types of machine learning problems?
- Which are the basic steps of the CRoss Industry Standard Process for data mining
(CRISP-DM)?

Machine learning is the field of study that gives computers the ability to learn without being
explicitly programmed

Machine learning
Assume that you are iterating over and over again an exercise
What should be constant in your exercise?
- Learning! - machine learning applies strategies and algorithms, combined with data
and statistics
- Improving! - machine learning applies statistical indices to measure the overlap
between ML prediction and expected result
When you are doing it, it is human learning
When a machine does it, it is machine learning!




An example of supervised learning
Supervised learning - classification
Given a labelled dataset, the model learns to
predict new examples




An example of unsupervised learning
Unsupervised learning - clustering,
dimensionality reduction, anomaly detection
and novelty detection
Given a dataset, without labels, the model
learns to use to cluster/group similar data

,CRISP-DM process model




Business understanding in the CRISP-DM process




Determine business objectives and success criteria
Business objectives and measures to evaluate the results have to be established

Business objectives:
● What is the customer’s primary objective?
● Increase the number of loyal customers
● Selling more of a certain product
● Have a positive marketing campaign

,Business success criteria:
● Objective measure to establish success (e.g. return of investment)

Main steps in a data mining project
1. Define the goals:
Business and data mining experts together have to define the goals. For each goal a
measure must be defined to understand its success
2. Obtain the models:
Pre-process the data, apply data mining algorithms
3. Evaluate results
Use the pre-specified measures to evaluate the models
4. Deploy:
If the evaluation is successful, the model can be deployed

Costs & benefits
Perform a cost-benefit analysis
Compute the benefits of the project (e.g. return on investment)
Compute the costs of the project - main factors:
● Data sources
● Data mining problem to be solved
● Available tools
● Expertise of the development team

Quantify the risk that the project fails:
● Knowledge not available
● Data not available
● Missing tools

Quality data & feature engineering
What are we going to learn today?
- What kind of data exists?
- How to prepare data?
- What is data balancing?
- How to apply data cleaning and feature scaling?
- What is feature selection?

, What kind of data exists?
- Structured data
- Unstructured data
- Semi-structured data

Structured data
Tabular data (rows and columns) which are very well defined
We know which columns there are and what kind of data they contain (the format is very
strict)
Often such data is stored in databases that represent the relationships between the data as
well. Questions about data can be answered by using a query language.

Unstructured data
The rawest form of data that can be any type of file.
Extracting value out of this shape of data is hard, since you need to extract structured
features from the data
For example, you might want to extract topics from movies.

Semi-structured data
This format is between structured and unstructured data
A consistent format is defined. However, the structure is not very strict. For example, it could
not be tabular or parts of the data may be missing.
Semi-structured data are often stored as files. However, some kinds of semi-structured data
can be stored in document oriented-databases. Such databases allow you to query the
sem-structured data

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Dee25. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $7.03. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

60281 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$7.03  2x  sold
  • (0)
  Add to cart