Summary

Data Mining for Business & Governance summary

3 views 0 purchase

Course
(880022M6)

Institution
Tilburg University (UVT)

This document contains a summary of the 7 lectures for the course of Data Mining for Business & Governance. The summary also contains the links for each practical session and the answers to every formative quiz.

[Show more]

Preview 10 out of 105 pages

View example

Uploaded on October 30, 2023
Number of pages 105
Written in 2023/2024
Type Summary

data science
data mining

Institution
Tilburg University (UVT)
Education
Data Mining for Business & Governance
Course
(880022M6)

$4.27

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Data Mining for Business and Governance

Lecture 1: Introduction to Data Mining.................................................................... 1
Pattern classification.................................................................................................................1
Missing values...........................................................................................................................2
Feature scaling.......................................................................................................................... 4
Feature interaction................................................................................................................... 5
Encoding strategies................................................................................................................... 8
Class imbalance.........................................................................................................................9
Practical session 1...................................................................................................10
Formative quiz 1..................................................................................................... 10

Lecture 2: Pattern Classification............................................................................. 15
Rule-based learning................................................................................................................ 15
Bayesian learning.................................................................................................................... 22
Lazy learning........................................................................................................................... 24
Ensemble learning.................................................................................................................. 29
Practical session 2...................................................................................................30
Formative quiz 2..................................................................................................... 30

Lecture 3: Evaluation and Model Selection............................................................ 35
Splitting the data.....................................................................................................................36
Hyperparameter tuning.......................................................................................................... 40
Evaluation measures............................................................................................................... 42
Theoretical concepts...............................................................................................................45
Practical session 3...................................................................................................48
Formative quiz 3..................................................................................................... 49

Lecture 4: Explainable Artificial Intelligence........................................................... 53
Terminology............................................................................................................................ 53
Intrinsically interpretable models: white-box......................................................................... 55
Post-hoc explanation methods - black-box............................................................................. 55
Model-agnostic post-hoc methods......................................................................................... 56
Model-specific post-hoc methods.......................................................................................... 60

, Evaluation and measures........................................................................................................ 62
Practical session 4...................................................................................................64
Formative quiz 4..................................................................................................... 64

Lecture 5: Dimensionality Reduction Methods.......................................................67
Visualization............................................................................................................................67
Role of dimensions..................................................................................................................69
Dimensionality reduction........................................................................................................71
Feature selection.................................................................................................................... 72
Feature extraction...................................................................................................................75
Principal component analysis................................................................................................. 75
Deep neural networks.............................................................................................................76
Practical session 5...................................................................................................78
Formative quiz 5..................................................................................................... 78

Lecture 6: Cluster Analysis for Data Mining............................................................ 81
Centroid-based clustering.......................................................................................................82
The k-means algorithm (hard clustering)................................................................................82
The fuzzy c-means algorithm (soft clustering)........................................................................ 84
Hierarchical clustering............................................................................................................ 86
Spectral clustering.................................................................................................................. 88
Evaluation measures............................................................................................................... 89
Practical session 6...................................................................................................90
Formative quiz 6..................................................................................................... 90

Lecture 7: Association Rule for Data Mining...........................................................93
Association rules..................................................................................................................... 93
Support and confidence..........................................................................................................94
Mining association rules......................................................................................................... 95
The apriori algorithm.............................................................................................................. 96
Itemset taxonomy................................................................................................................... 98
Practical session 7.................................................................................................100
Formative quiz 7................................................................................................... 100

1

,Lecture 1: Introduction to Data Mining
Pattern classification
In this problem, we have three numerical variables (features) to be used to predict the outcome
or target (decision class). The features are X1, X2, and X3, and the decision class is Y. Y is always
a category, not a number, so we can have dogs and cats, different food types etc.
This problem is multi-class since we have three possible outcomes.
The goal in pattern classification is to build a model able to generalize well beyond the historical
training data.
How many features can we have in a pattern classification model? Unlimited. Each classification
problem can have a different number of features.
How many categories can we have in the decision class variable? Unlimited.
Concerning the rows, the values are referred to as instances or observations.

Let’s suppose we need to build a model based on this table, which is the classification model.
The ? instance is not contained in the model we want to build, and we do not know the target
value for this instance. The goal is to create a classifier from the data we have that provides the
decision class based on X1=0.6, X2=0.8, and X3=0.2.

1

,Missing values
Sometimes, we have instances that have missing values for some features. In this case, the first
column X1 is complete and does not present any missing values. X2 and X3 have many missing
values.
It is of paramount importance to deal with this situation before building any machine learning
or data mining model, as they cannot fill in the missing values.
Missing values might result from fields that are not always applicable, incomplete
measurements, lost values.

Imputation strategies for missing values
1. The simplest strategy would be to remove the feature containing missing values (=
removing X2 and X3 columns, which would not solve the problem). This strategy is
recommended when the majority of the instances (observations) have missing values for
that feature. However, there are situations in which we have few features or the feature
we want to remove is deemed relevant.
2. If we have scattered missing values and few features, we might want to remove the
instances having missing values. This is possible when we have large amounts of
instances. However, there are situations in which we have a limited number of instances.

2

, 3. The third strategy is the most popular. It consists of replacing the missing values for a
given feature with a representative value such as the mean, the median or the mode of
that feature. However, we need to be aware that we are introducing noise.
4. Fancier strategies include estimating the missing values with a machine learning model
trained on the non-missing information.

Autoencoders to impute missing values
Autoencoders are deep neural networks that involve two neural blocks named encoder and
decoder. The encoder reduces the problem dimensionality while the decoder completes the
pattern. They use unsupervised learning to adjust the weights that connect the neurons.

Missing values and recommender systems
The input presents three possible states: the person can like the movie, dislike the movie, or has
not watched it/has not expressed interest (which are the missing values). This neural network
takes the information we know (like or dislike) and is able to provide later on the value for the
missing value, providing a recommendation for viewers.

3

,Feature scaling
Normalization
- Different features might encode different
measurements and scales (the age and height of a
person).
- Normalization allows encoding all numeric features in
the [0,1] scale.
- We subtract the minimum from the value to be
transformed and divide the result by the feature range

Standardization
- This transformation method is similar to the
normalization, but the transformed values might not be
in the [0,1] interval.
- We subtract the mean from the value to be transformed
and divide the result by
the standard deviation.
- Normalization and standardization might lead to
different scaling results.

Similarities and differences
Both methods are applied to the whole column of a dataset.
Both methods can be used to put every feature on the same scale.
Both methods do not change the properties of the data, only the scale.

Normalization is always on a [0-1] scale, from the picture (b) it can be seen that the values are
confined in the square from 0 to 1.
Standardization does not necessarily need to be on a [0-1] scale, from the picture (c) it can be
seen that the values range from -1 to +2.

4

,Feature interaction
Correlation between two numerical variables
Sometimes, we need to measure the correlation between numerical features describing a
certain problem domain. For example, what is the correlation between gender and income in
country x?

Pearson’s correlation
- It is used when we want to determine the correlation
between two numerical variables given k observations.
- It is intended for numerical variables only and its value
lies in [-1,1].
- The order of variables does not matter since the
coefficient is symmetric.

5

,Correlation between age and glucose levels

Example from the first column: (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦) = (43 - 41.16)(99 - 81) = 33
2
(𝑥𝑖 − 𝑥) = (43 - 41.16)² = 3.36
2
(𝑦𝑖 − 𝑦) = (99 - 81)² = 324

Pearson’s correlation (R) is 0.53, which tells us it is a medium positive correlation (medium
because it is not 1, which is the perfect positive correlation).

Association between two categorical variables
Sometimes, we need to measure the association degree between two categorical (ordinal or
nominal) variables.

The x² association measure (chi-square statistic)
- It is used when we want to measure the association
between two categorical variables given k observations.
- We should compare the frequencies of values appearing
together with their individual frequencies.
- The first step in that regard would be to create a
contingency table.

6

, - Let us assume that a categorical variable X involves m
possible categories while Y involves n categories
- The observed value gives how many time each
combination was found
- The expected value is the multiplication of the
individual frequencies divided by the number of
observations

Association between gender and eye color
If we have some data, the first step is to build a contingency table. We put information
concerning the two categorical features and the categories. The numbers in the inside table are
frequency values. The numbers outside the table are the sums of columns and rows. There are
two categorical variables such that the first one has n=2 categories and the second has m=3
categories.

We have 26 males from which 6 have blue eyes, 8 have green eyes and 12 have brown eyes. The
number of people with blue, green and brown eyes is 15, 13 and 22, respectively.

→

7

, We have 24 females from which 9 have blue eyes, 5 have green eyes and 10 have brown eyes.
The number of people with blue, green and brown eyes is 15, 13 and 22, respectively.

→

Encoding strategies
Encoding categorical features
We have different types of features: numerical and categorical. In the case of numerical, we can
express them all on the same scale (with either normalization or standardization). In the case of
categorical, we cannot feed a machine learning algorithm with categories. Therefore, we need
to encode these features as numerical quantities.

The first strategy is referred to as label encoding and consists of assigning integer numbers to
each category. It only makes sense if there is an ordinal relationship among the categories. For
example, weekdays, months, star-based hotel ratings, income categories.

One-hot encoding
It is used to encode nominal features that lack an ordinal relationship.
Each category of the categorical feature is transformed into a binary feature such that one
marks the category.
This strategy often increases the problem dimensionality notably since each feature is encoded
as a binary vector.
We have three instances of a problem aimed at classifying animals given a set of features (not
shown for simplicity).

8

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller eleonora28. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $4.27. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

53068 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Seller