Summary

Summary Cheatsheet for Data Mining for Business and Governance Exam (2 pages)

0 purchase

Course
Data Mining for Business & Governance (880662-M-6) (880662M6)

Institution
Tilburg University (UVT)

Prepare effectively for your Data Mining for Business and Governance exam with this concise and structured cheatsheet. Spanning 2 pages, this resource is tailored for exam success, offering a quick reference guide organized by lecture topics. Featuring LaTeX-rendered mathematical formulas fo...

[Show more]

Preview 1 out of 2 pages

View example

Uploaded on May 27, 2024
Number of pages 2
Written in 2023/2024
Type Summary

cheatsheet
data mining
machine learning
data science

Institution
Tilburg University (UVT)
Education
Data Science & Society
Course
Data Mining for Business & Governance (880662-M-6) (880662M6)

$6.09

Add to cart

Save

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

§ Basic Positively(R) skewed – mean > median > / union
mode; at least (1 − k12 )% no more than k sd of the train_df_scaled = scaler.fit_transform(train_df)
mean.Positively(R) skewed – mean > median > mode; test_df_scaled = scaler.transform(test_df)
at least (1 − k12 )% no more than k sd of the mean. § Evaluation and Model Selection out-of-sample evalu-
label encoding: assign integer numbers to each cate- ation. Optimizing hyperparameters. : three disjoint
gory. It only makes sense if there is an ordinal relationship sets:training, validation and test. Stratification:similar
among the categories. One-hot encoding: encode nom- class distribution. → k-fold cross-validation: mutually
inal features that lack an ordinal relationship; increases exclusive equal size subsets. nested k-fold CV: OIO.
the problem dimensionality. Class imbalance: Over- Hyperparameter tuning: random search. Bias: pre-
sampling; Undersampling; SMOTE(might induce noise); dictions - ground truth Variance: consistency in predic-
VarP = E(x2 ) − E(x)2 P-correlation = tions. complexity ↑ bias ↓ var↑ Decision tree pruning
(xi − x̄)(yi − ȳ) prepruning: node → leaf; postpruning: branches → leaf
pP ; χ2 association measure
(xi − x̄)2 (yi − ȳ)2
P
CV:cv_results = cross_validate(RandomForest
Pn Pn (Oij − Eij )2 pi × pj Classifier (random_state=42), X, y, cv=5) Grid
= i=1 j=1 ; Eij = Oij :observed Searchgrid_search = GridSearchCV(estimator=model,
Eij k
together; Eij : Expected value; param_grid= param_grid, cv=5)
Drop narows: df1.dropna(thresh=0.9*len(df), § XAI Interpretability: implicit capacity to explain
axis=1, inplace=True) Mean Imputation:df[’f’]. its reasoning process. Explainability: provide a jus-
fillna(mean_v, inplace=True) Normalization: tification for the predictions. Transparency: Algo-
sklearn.preprocessing.scaler = MinMaxScaler(); rithmic transparency, decomposability,and simulatabil-
df[’f’] = scaler.fit_transform(df[[’f’]]) Stan- ity. Intrinsically interpretable models: Linear re-
dardization: scaler = StandardScaler() La- gression, Decision tree, k-Nearest Neighbors. parsimo-
bel Encoding: encoder = LabelEncoder() nious (less is more). Post-hoc explanation methods:
df[’sex’] = encoder.fit_transform(df[’sex’]) Model-agnostic post-hoc: measure how the changes in the
label_encoder = LabelEncoder(); encoded_data = inputs affect the model’s outputs. 1 Partial dependency
label_encoder.fit_transform(Cancer_risk) plots. the marginaleffect of a feature on the model’s pre-
§ Classification Algorithms Rule-based learning: Deci- dictionwhen fixing the feature values. → average the class
sion Tree internal node: test on an attribute;branch: probabilities toa desired decision class. plot allows inspect
outcome of thetest; leaf node/terminal node: whether therelation between the feature and the target-
classPlabel; root node: topmost; entropy(P): = variable is monotonic, linear, etc. 2 Permutation fea-
−Pi i pi log2 Pi , measure of discorder(0 → pure). Infor- tureimportance: compute thefeature importance as the
mation value: weighted entropy. info gain: gain(fi ) = increase in themodel error when permuting the values
inf o(root) − inf o(fi ) Bayesian learning: assume ofthe feature being analyzed. Drawback: assume unre-
features are independent. Bayes’ theorem: P (Ci | alistic independency. 3 Shapley values (SHAP): com-
P (X | Ci ) · P (Ci ) putes the feature contribution. can be used in both lo-
X) = Naïve Bayes: P (X|Ci ) =
Qn P (X) cal and global contexts. cons: computationally expen-
k=1 P (xk |Ci ) = P (xi |Ci ) · P (x2 |Ci )...P (xn |Ci ) Normal- sive. 4 Local surrogates (LIME): generates synthetic
ization: P (C1 |X)/(P (C1 |X) + P (C2 |X)) The assump- instances around the small groups of instances. cons:
tions of independence and equalimportance of features are unstable 5 Global surrogates: approximate the behav-
rarely fulfilled. ior of the complex model with a a transparent model.
lazy learning: similar instancesshould lead to the same cons: describe the black-box model rather than problem.
decision classes. KNN: works well when theclasss are 6 Counterfactual explanations: describes the smallest
clearly sperated. odd k. sensitive tooutliers, the number changeto the feature values that produces adifferent de-
of neighbors andthe distance function. Minkowski:p=1- sired output Model-specific post-hoc: based on the rep-
Manhattan, p=1-Euclidean; Chebyshev: max difference; resentation structuresof the black-box models 1 Random
Cosine Similarity = cos(θ) Cosine Distance = 1 − cos(θ) Forests: compute the importanceof each problem feature
Ensemble learning: Bagging - bootstrap aggregation from their inner knowledge structures. cons: Feature im-
majority vote. Random Forest: build several decision portance based onimpurity can be misleading when fea-
trees, each using a randomselection (with replacement) tureshave many unique values. 2 Fuzzy Cognitive Maps:
of features and instances Boosting After a classifier Mi recurrent neural networks - neurons denote variables. Fea-
is learned, update the weights for difficult instances in ture importance is computed from theabsolute values of
next classifier Mi+1 Accuracy: (TP + TN) / all; Pre- weights connected toeach neuron in the network. cons:
cision = TP / (TP + FP); Recall =TP/(TP + FN); doesn’t consider activation values of neurons. Evalua-
Fβ = (1 + β 2 )pr/(β 2 p + r); Jaccard Index: IoU, overlap tion and measures Function level (number of rules of

1

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller binli. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.09. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

71250 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling

Seller

Exam (elaborations) ·

Summary ·

Class notes ·

Exam (elaborations) ·

Class notes ·

Package deal ·

Class notes ·

Exam (elaborations) ·

Class notes ·

Summary

Summary Cheatsheet for Data Mining for Business and Governance Exam (2 pages)

Document information

Subjects

Written for

Seller

Content preview

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Quick and easy check-out

Focus on what matters

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?

Recently viewed by you

Exam (elaborations) ·

NR602 Pediatric Midterm Study Set 280 QUESTIONS WITH ANSWERS

Summary ·

Samenvatting Samen opvoeden, ISBN: 9789046904831 Pedagogisch Werkveld

Class notes ·

Vorlesungszusammenfassung Kostenrechnung Wirtschaftsingenieurswesen

Exam (elaborations) ·

WGU D351 v2: Your Complete Guide to Success Across All Topics

Class notes ·

OBS114 Lecture Notes - Part A: Unit 1 (Introduction to Entrepreneurship)

Package deal ·

Sociaal werk: alle samenvattingen Sociaal Werk // 1e jaar // 1e semester

Class notes ·

Alle werkcolleges grondslagen vennootschapsbelasting TUI (695231-B-6)

Exam (elaborations) ·

COM3708 Assignment 1 PORTFOLIO Semester 2 2024 - DUE 9 September 2024

Class notes ·

Class notes Calculus 2 (MTH122) Essential Calculus, ISBN: 9781133710875