100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Advanced Data Analysis class - For open book exam (content table with links to pages) €15,49   In winkelwagen

Samenvatting

Summary Advanced Data Analysis class - For open book exam (content table with links to pages)

 40 keer bekeken  0 keer verkocht

Summary of course Advanced Data Analysis made for the open book exam containing a content table with clickable links bringing you to the exact page. Description of all theory classes + notes made during classes.

Voorbeeld 4 van de 34  pagina's

  • 4 december 2022
  • 34
  • 2021/2022
  • Samenvatting
Alle documenten voor dit vak (19)
avatar-seller
e18
Summary Advanced Data Analysis
Content table
1. Introduction ............................................................................................................ 5
Big data ............................................................................................................................. 5
Data volume .................................................................................................................... 5
Data velocity .................................................................................................................... 5
Data variety ..................................................................................................................... 5
Data veracity ................................................................................................................... 5
Data .................................................................................................................................. 5
Attribute values ................................................................................................................ 5
Attribute types ................................................................................................................. 5
Properties of attributes ...................................................................................................... 5
Discrete vs. Continuous ..................................................................................................... 5
Dataset types ..................................................................................................................... 6
Record data ..................................................................................................................... 6
Graph ............................................................................................................................. 6
Ordered data ................................................................................................................... 6
Data mining ........................................................................................................................ 6
Definitions ....................................................................................................................... 7
Statistics ...................................................................................................................... 7
Data mining & Statistics ................................................................................................. 7
Challenges in Data mining ................................................................................................. 7
Tasks ................................................................................................................................. 7
Supervised classification .................................................................................................... 7
Applications .................................................................................................................. 8
Unsupervised classification ................................................................................................ 8
Overview ............................................................................................................................ 8

2. Processing principles........................................................................... 9
Common steps .................................................................................................................... 9
Feature extraction ............................................................................................................ 9
Attribute transformation .................................................................................................... 9
Discretization ................................................................................................................... 9
Aggregation ..................................................................................................................... 9
Noise removal .................................................................................................................. 9
Outlier removal ................................................................................................................ 9
Sampling ......................................................................................................................... 9
Simple Random Sampling ............................................................................................... 9
Stratified Sampling ....................................................................................................... 10
Handling duplicate data .................................................................................................... 10
Handling missing values ................................................................................................... 10

1

, Dimensionality reduction .................................................................................................. 10
PCA ............................................................................................................................. 10
Feature subset selection ................................................................................................ 10
Feature creation ........................................................................................................... 11
Processing steps for specific data types ................................................................................. 11
Image data ..................................................................................................................... 11
Survey data .................................................................................................................... 11
Sequence data ................................................................................................................ 11
Text ............................................................................................................................... 12
Category/Ontologies ..................................................................................................... 12
Bag of words ................................................................................................................ 12
Omics ............................................................................................................................ 12
Genomics .................................................................................................................... 12
Transcriptomics ............................................................................................................ 12
Meta-genomics ............................................................................................................. 13
Proteomics ................................................................................................................... 13
Metabolomics ............................................................................................................... 14
Conclusion ......................................................................................................................... 14

3. Unsupervised clustering .................................................................... 15
Definitions ......................................................................................................................... 15
Introduction....................................................................................................................... 15
Clustering ....................................................................................................................... 15
Similarities ..................................................................................................................... 15
Distance measures ........................................................................................................ 15
Measure similarity......................................................................................................... 15
Dendrogram ................................................................................................................... 16
Hierarchical clustering ......................................................................................................... 16
Determination of distance ................................................................................................. 16
Partitional clustering ........................................................................................................... 17

4. Principal component analysis ............................................................ 18
Data & basic variable statistics ............................................................................................. 18
Multivariate data ............................................................................................................. 18
Basic variable statistics .................................................................................................... 18
Data transformation ......................................................................................................... 18
Normalization .................................................................................................................. 18
Comparison between variables ............................................................................................. 18
Covariance ..................................................................................................................... 18
Correlation...................................................................................................................... 18
Data projection .................................................................................................................. 19
Principal component analysis (PCA) ...................................................................................... 19
t-SNE................................................................................................................................ 20



2

,5. Supervised learning........................................................................... 22
Linear classifier .................................................................................................................. 22
Binary classification ............................................................................................................ 22
Support vector machines (SVMs) ....................................................................................... 23
Classification overview ..................................................................................................... 23
Predictive accuracy ............................................................................................................. 23
Class labels..................................................................................................................... 23
Thresholds and accuracy .................................................................................................. 24
Linear threshold ........................................................................................................... 24
ROC-curve ................................................................................................................... 24
PR curve ...................................................................................................................... 24
ROC vs PR curves ............................................................................................................ 24
Nearest neighbour classifier ................................................................................................. 25
K-nearest neighbour (KNN) algorithm ................................................................................ 25

6. Regression ........................................................................................ 26
Simple linear regression ...................................................................................................... 26
Multiple linear regression..................................................................................................... 26
Best fit & objective function ................................................................................................. 26
Non-linear regression.......................................................................................................... 27
Problems ........................................................................................................................... 27
Overfitting ...................................................................................................................... 27
Speed & scalability .......................................................................................................... 28
Interpretability ................................................................................................................ 28
Robustness ..................................................................................................................... 28
Regularized regression ........................................................................................................ 28
Elastic net ...................................................................................................................... 28
Common approach ............................................................................................................. 29

7. Machine learning methods................................................................. 30
Classification ..................................................................................................................... 30
Algorithms ...................................................................................................................... 30
Decision tree ..................................................................................................................... 30
Choosing features ............................................................................................................ 30
Gini impurity ................................................................................................................... 30
Advantages .................................................................................................................. 31
Disadvantages .............................................................................................................. 31
Example Decision Tree ..................................................................................................... 31
Random forest ................................................................................................................... 31
Bootstrapping ................................................................................................................. 31
Bagging.......................................................................................................................... 32
Out-of-bag performance ................................................................................................ 32
Gini importance ............................................................................................................... 32



3

, Example Random Forest ................................................................................................... 32
Neural networks & deep learning .......................................................................................... 32
Neurons ......................................................................................................................... 32
Neural network................................................................................................................ 33
Perceptron ................................................................................................................... 33
Artificial Neural Networks ................................................................................................. 33
Deep learning .................................................................................................................... 34
Performance ................................................................................................................... 34
Google DeepMind ............................................................................................................ 34




4

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√  	Verzekerd van kwaliteit door reviews

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper e18. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €15,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 60904 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€15,49
  • (0)
  Kopen