Samenvatting

Samenvatting Colleges Data Mining

Name: Samenvatting Colleges Data Mining
SKU: doc_3877656
Rating: 5.00 (1 reviews)
Author: samvanes

1 beoordeling

10 keer bekeken 1 keer verkocht

Vak
(880022M6)

Instelling
Tilburg University (UVT)

Samenvatting van alle colleges inclusief aantekeningen van de colleges. Geschreven in blok 1 herfst 2023.

[Meer zien]

Voorbeeld 4 van de 64 pagina's

Bekijk voorbeeld

Geupload op 21 november 2023
Aantal pagina's 64
Geschreven in 2023/2024
Type Samenvatting

1 beoordeling

Door: nicklanders • 11 maanden geleden

Volgen

samvanes Lid sinds 6 jaar 1 documenten verkocht

€6,49

Toegevoegd

In winkelwagen Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Samenvatting Data Mining
Module 1 - Introduction and preliminaries

Pattern Classification
There are three numerical variables (features) to be used
to predict the outcome (decision class). Y is always a
category, not a number. Feature describe the problem
and the output is the target.

This problem is multi-class since we have three possible
outcomes.

The goal in pattern classification is to build a model able
to generalize well beyond the historical training data. We
want to build a model with this pattern classification which
can solve the problem of the last row.

Missing values

Sometimes, we have instances that have missing values
for some features.

It is of paramount importance to deal with this situation
before building any machine learning or data mining model.
The more information / data we have, the easier it is to
solve. When we delete missing values, we lose value.

Missing values might result from fields that are not always
applicable, incomplete measurements, lost values.

Imputation strategies for missing values
Remove features
The simplest strategy would be to remove the feature containing missing values. This
strategy is recommended when the majority of the instances (observations) have missing
values for that feature.

,→ However: There are situations in which we have a few features or the feature we want to
remove is deemed relevant.

Remove instances
If we have scattered missing values and a few features, we might want to remove the
instances having missing values.
→ However: There are situations in which we have a limited number of instances.

Replacing missing values
The third strategy is the most popular. It consists of replacing the missing values for a given
feature with a representative value such as the mean, the median or the mode of that
feature.
→ However: We need to be aware that we are introducing noise

Fancier strategies of estimating
These include estimating the missing values with a machine learning model trained on the
non-missing information.

Autoencoders to impute missing values
Autoencoders are deep neural networks that involve two neural blocks named encoder and
decoder. The encoder reduces the problem dimensionality while the decoder completes the
pattern.
→ They use unsupervised learning to adjust the weights that connect the neurons.
Unsupervised learning is a type of machine learning paradigm where the model learns
from unlabeled data without specific guidance or supervision in the form of labeled outcomes
or target variables. In other words, the algorithm explores patterns, structures, or
relationships within the data without knowing the correct answers in advance.

Missing values and recommender systems

,Feature scaling
Normalization

Different features might encode different measurements and
scales (e.g. the age and height of a person).

Normalization allows encoding all numeric features in the [0,1]
scale.

We subtract the minimum from the value to be transformed and
divide the result by the feature range.
This is applied to every number in the column.

Standardization

This transformation method is similar to the normalization, but the
transformed values might not be in the [0,1] interval.

We subtract the mean from the value to be transformed and
divide the result by the standard deviation.

Normalization and standardization might lead to different scaling
results.

Normalization versus standardization

Different scales can be put together in one scale with these procedures.
Differences and similarities:
- Difference: With standardization, there are no boundaries.
- Similarity: Both methods change the scales, but not the properties.
Which one to use? It depends, but you have to pay attention to the scale.

, Feature interaction

Correlation between two numerical variables
Sometimes, we need to measure the correlation between numerical features describing a
certain problem domain.
→ E.g. What is the correlation between gender and income in Sweden?

Correlation is a way to measure by how much the data is linear. With zero correlation, you
cannot see any relation.

Pearson’s correlation
It is used when we want to determine the correlation
between two numerical variables given k observations.

It is intended for numerical variables only and its value
lies in [-1,1].

The order of variables does not matter since the
coefficient is symmetric.

Example

0.53 is a medium, positive
correlation. The interpretation
is probably independent.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper samvanes. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 71184 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Samenvatting

Samenvatting Colleges Data Mining

Document informatie

Onderwerpen

Geschreven voor

1 beoordeling

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?