Samenvatting

Summary of all Datacamp modules for Data Science Skills with import codes and steps to perform analysis (325243-M-6))

Name: Summary of all Datacamp modules for Data Science Skills with import codes and steps to perform analysis (325243-M-6))
SKU: doc_2952240
Rating: 4.00 (1 reviews)
Author: jesmen12

1 beoordeling

6 keer verkocht

Instelling
Tilburg University (UVT)

This document contains a summary of all datacamp modules with all important codes, functions, methods and steps to perform certain analysis. Useful for pacticing before the exam.

[Meer zien]

Voorbeeld 4 van de 42 pagina's

Bekijk voorbeeld

Geupload op 23 juni 2023
Aantal pagina's 42
Geschreven in 2022/2023
Type Samenvatting

1 beoordeling

Door: matsvvelzen • 8 maanden geleden

Volgen

jesmen12 Lid sinds 2 jaar 123 documenten verkocht

€8,99

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Total summary of Data Science skills
Inhoud
Chapter 1: Introduction ........................................................................................................................................... 3
Chapter 2: Intermediate python ............................................................................................................................. 5
Chapter 3: DataFrames............................................................................................................................................ 8
Chapter 4: Supply chain Analytics in Python ......................................................................................................... 12
Chapter 5: Cleaning data in Python ....................................................................................................................... 20
Chapter 6: Cluster analysis .................................................................................................................................... 21
Chapter 7: Machine learning with scikit-learn (model testing) ............................................................................. 30
Chapter 8: Introduction ......................................................................................................................................... 33

1

,Chapter 1: Introduction

Python basics Variables, types
Lists and sublists Indexes, slicing, manipulating/changing, functions, methods
Numpy 2d arrays, statistics, generate data

Chapter 2: Intermediate python

Matplotlib Basic plots, customization
Dictionaries & pandas Create, indexes, slicing, manipulating/changing, loc, iloc,
Logic control flow & filter Operators, Booleans, (Logical_)and/or/not, if/else filtering
Loops While loop, for loop, iterrows

Chapter 3: DataFrames

Transforming Methods, sorting and subsetting, manipulating/changing
Summary statistics .agg method, counting, groupedby, pivot tables,
Slicing and indexing Indexing, outer/inner indexing, loc/iloc, sorting
Visualizing DataFrame Basic plots, missing values, creating DataFrames,

Chapter 4: Supply chain Analytics in Python (Linear optimizing)

Optimization & PuLP Linear programming, decision variables, lpSum
PuLP modelling List comprehension, logical constraints
Solve and evaluate model Common mistakes, solve PuLP model, sanity checking
Sensitivity analysis Shadow pricing, testing solution

Chapter 5: Cleaning data in Python

Data problems Type constraints, numeric or categorical, range constraints, duplicates
Categorical problems Inconsistent, cleaning text data,
Advanced problems Uniformity, date formatting, cross field validation, missing data
Record linakage Comparing string, edit distance, generating pairs, linking dataframes

Chapter 6: Cluster analysis

Introduction clustering Unsupervised learning, labelled/unlabelled data, hierarchical clustering, k-means
Hierarchical clustering Different methods, cluster labels, visualize clusters, limitations
K-means Generate & cluster labels, Elbow method, limitations, seeds
Real world clustering Images, document clustering, clustering multiple features

Chapter 7: Machine learning with scikit-learn (model testing)

Classification Supervised learning, exploratory data analysis, nearest neighbours, predicting,
model performance (train/test split)
Regression Linear regression, cross validation, regularization
Fine-tuning model Class imbalance, confusion matrix, ROC curve, probabilities, hyper parameter
Pre-processing & pipelines Dummy variable, missing data, pipleline, centring & scaling

Chapter 8: Linear classifiers

Logistic regression & SVM Fitting and predicting, model evaluation, LinearSVC, boundaries
Loss functions Linear classifiers, predictions, least squares, loss function diagrams
Logistic regression L1 & L2 Regularization, training accuracy, probabilities, multi-class regression
Support vector machines Support vectors, decision boundaries, kernel SVMs, comparing

2

,Chapter 1: Introduction
Variables and types
• Variables: Variable = value
• Types: Type(‘variable’) (Float, Integer, Strings, Booleans, List, Dictionary, etc.)
> Different behaviour using operators for different types of floats.
> When working with different types -> Convert if necessary before using operators.

Python lists
• Lists: Lists are used for storing small amounts of one-dimensional data containing different types.

• Sublists: One list can contain more sublists

> But, can’t use directly with arithmetical (matrix) operators (+, -, *, /, ...)

Subsetting lists and slicing (indexes)
• Element: The number in a list.
• Index: The index of an element in the list, it starts at 0.

• Slicing: Select multiple elements in a list and creating a new list.
> [Start : End] -> Start is included, End is excluded!
> Start or end at index 0: [:4] & [5:]

Subsetting lists of lists
x[rows][columns] x[2][:2] returns [‘g’, ‘h’]

Manipulations (change the list)
1. Change/replace: Fam [7] = 1.86 Changes the height of dad
2. Change slice: Fam [0:2] = [“Lisa”, 1.74] Changes the 0 and 1 index to new value
3. Adding/extend: Fam + [“me”, 1.79] Adds ‘me’ and 1.79 to the list
4. Remove: del(fam[2]) Removes “emma’’ from the list
> Watch out because the indexes of the list have now changes!

How lists work
> The list is saved on your computer, while the variable references to it.
> Want to create a new variable containing an existing list > Don’t refer to the existing variable with the list.
> Create new list, use: x = [‘a’, ‘b’, ‘c’] y = list(x) or y = x[:]

Functions and packages
• Function: a piece of reusable code, aimed at solving a particular task (on a certain variable for example).
> Function(‘variable’)
> Examples: type(), print(), str(), int(), float(), bool(), max(), round(), len(), sorted()
> Use help(‘function’) to get more information of how the function works and the inputs (documentation).

• Methods: functions that belong to objects, depending in type of object there are different methods.
> object.method(‘element’)
• List methods syntax: ‘list’.method(‘element’) example: fam.index(“mom”) =4
• String methods syntax: ‘variable’.method() example: sister.capitalize() = ‘Liz’
Strings: Lists:
x.len() x.append(element)
x.upper() x.remove(element)
x.capitilize() x.index(element)
x.replace() x.sort()

> Different methods can behave differently depending on the type or change the object

3

, Packages (Numpy, matplotlib, scikit-learn, etc.)
• Packages: directory of python scripts, containing modules with new methods, functions and types.
You will have to install new packages in your own system.

Importing a package or a module of a package
1. > numpy.array()
2. > np.array() (preferred)
3. > array() (onhandig)(verwarrend met pakketten)

Numpy (Numeric Python)
> The NumPy array is similar to the list, but has one additional feature: performing calculations on entire arrays.

1. Import numpy package as np.
2. Change the regular lists into a NumPylist to do calculations using np.array(‘list’)
3. Do calculations with the arrays

> NumPy arrays can only use one type! If not, it will automatically convert to strings (homogenous).
> Regular lists and NumPy arrays have different behaviours when e.g. using operators.

Subsetting NumPy arrays

(1) (2)

2D Numpy Arrays (list of lists / 2 rows) Subsetting

Statistics
np.mean(array[])
np.median(array[])
np.min(array[])
np.max(array[])
np.sum(array[])
np.std(array[])
np.corrcoef(array1[], array2[])

Generate data

4

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper jesmen12. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €8,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 69411 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis

Samenvatting

Summary of all Datacamp modules for Data Science Skills with import codes and steps to perform analysis (325243-M-6))

Document informatie

Onderwerpen

Geschreven voor

1 beoordeling

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

In een paar klikken geregeld

Direct to-the-point

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?