Real-Life Machine Learning
Lecture 1: Introduction & EDA.
30th of november is the deadline of the group project.
You don’t have to present, but just hand in the powerpoint slides. In case they have questions
than they can ask this on the group.
1
,Supervised learning: has labels and a class.
Unsupervised learning: misses the labels.
Dependent variable: labels
Independent variable: color, shape.
Regression is always about numerical attributes,predictions. So in this example predicting which
price the house has.
2
,Because we have no labels, we have to cluster te examples.
Some real world examples of unsupervised learning:
“How can we visualize data points with too many dimensions?”
PCA,ISOMAP, T-SNE.
“Is this credit card transaction fraudulent?”
Data understanding is crucial for understanding your problem. For example a dataset with
different scales, you can’t apply clustering than because of the different clusters.
Main steps in a machine learning project:
1. Define the goals:
Business and data mining experts together have to define the goals. For each goal a
measure must be defined to understand its success.
2. Obtain the models:
Pre-process the data, apply mining algorithms
3. Evaluate the results:
Use the pre-specified measures to evaluate the models.
4. deploy:
If the evaluation is successful, the model can be deployed.
3
, The AI & ML strongly rely on data! Which we will create.
Quality of a dataset criteria:
-missing values, high number is low quality
-number of features, some are useful but some are not, correlations?
Main steps of data preparation (transformations)
How to imply the steps?
We will use python, pandas, numpy, seaborn, mathplotlib.
4