Data Science samenvatting. Ik heb deze samenvatting gemaakt om te leren voor het vak Data Science dat in het 3de jaar wordt gegeve. Op basis van het lesmateriaal van Universiteit Leiden. Het is een uitgebreide samenvatting + key concepts (een nog compactere versie van dezelfde stof)
Uitgebreide samenvatting per lecture + Key concepts (kleinere
samenvatting van samenvatting)
Data Science lecture 1 5
Research Paradigms 5
Data Challenges 5
Application domain 5
Task definition questions 6
Supervised vs Unsupervised 6
Addressing data science problems: 7
Mean vs Median 7
Outliers 7
Regression 8
Simple linear regression 8
Multiple linear regression 8
Logistic Regression 9
Loss functions 9
Sigmoid 10
Lecture 6 28
Data collection 28
Using Existing labelled data 28
Create new labelled data 28
Inter-rater agreement 29
Interpretation of Cohen’s Kappa 29
Lecture 7 30
Data Preparation 30
Feature extraction 30
Dense vs Sparse data 30
Text Classification 31
Traditionally 31
Preprocessing: Raw text to features 32
Clean up and normalisation 32
Tokenization 32
Pre-processing with NLP tools 32
Feature creation 32
Image to matrix 33
Image feature extraction 33
Convolutional neural networks 33
Need to knows 34
Image preprocessing 34
2
Jesse de Gans
,Lecture 8 35
Choosing models and methods 35
Choosing supervised vs Unsupervised: 35
Choosing between classification clustering or regression: 35
Decide on features 35
Choosing the right estimator 35
Supervised Classification models 36
Transfer learning 36
Transfer learning for images 36
Transfer learning for text 36
Lecture 9 37
Feature normalisation 37
Scaling numerical features 37
Dimensionality reduction 37
PCA (Principal component Analysis) 38
Significance testing 38
Which test to use 38
Lecture 10 39
Natural Language processing 39
Text data challenges 39
Zipfs law 39
Bag-of-words model: Text as classification object 40
Words(terms) as features 40
Computing term weights (real valued) 40
Term frequency (tf) 40
Inverse document frequency (idf) 41
Tf-idf(term-frequency Inverse document frequency) 41
Term-document matrix 41
Words and polysemy 42
Word embeddings 42
Learning word embeddings 42
Neural language models 43
Application of transfer learning to image and text data 43
Lecture 11 44
Evaluation of classification 44
Evaluation for regression 44
Confusion matrices 44
Error analyses 45
Dimensionality reduction 46
Class imbalance 46
Machine learning 46
Hyper param optimization 47
Lecture 12 49
Big data 49
Responsible data science 49
Risks and opportunities 49
Explainable models 50
Key concepts: 51-61
4
Jesse de Gans
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller jessedegans. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.38. You're not tied to anything after your purchase.