Samenvatting

Practical 5.1: Advanced Data Analysis: full summary + explanations

10 keer verkocht

Instelling
Universiteit Antwerpen (UA)

PART 1 of Practical 5: Advanced Data Analysis: full summary + explanations, screenshots and answers to questions

[Meer zien]

Voorbeeld 2 van de 11 pagina's

Bekijk voorbeeld

Geupload op 5 mei 2023
Aantal pagina's 11
Geschreven in 2021/2022
Type Samenvatting

Volgen

Bi0med Lid sinds 5 jaar 505 documenten verkocht

€4,49

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Supervised classification with Decision Trees
and Random Forests

1 The Breast Cancer dataset
This dataset contains features regarding images of malignant and benign
breast tumors. There are ten continuous features that describe the size and
shape of each tumor. These are a) radius (mean of distances from center to
points on the perimeter) b) texture (standard deviation of gray-scale values)
c) perimeter d) area e) smoothness (local variation in radius lengths) f)
compactness (perimeter
/ area - 1.0) g) concavity (severity of concave portions of the contour) h) con-
cave points (number of concave portions of the contour) i) symmetry j) fractal
dimension ("coastline approximation" - 1).

The goal of this dataset will be to use to provided features to predict if a
sample is malignant (M) or benign (B). The dataset has been split up into
a training data set and a test data set. We can read in the dataset with the
following command:
cancer.train<-read.csv(file="breast-cancer-
train.csv",header=TRUE,stringsAsFactors=T)
cancer.test<-read.csv(file="breast-cancer-
test.csv",header=TRUE,stringsAsFactors=T)
summary(cancer.train)

2 Decision tree
We will use the ’rpart’ R package to learn and apply our decision trees. Install
it from CRAN if you have not already done so. We can load in the library with
the standard command:
library(rpart) library(rpart.plot)
We then need to apply it to the breast cancer dataset. One of the standard
optimizations that is part of the rpart() function is to optimize the number
of branches to include in the decision tree. The more branches, the higher the
chance to overfit, but we need some branches to solve our classification problem.
It does this by running an internal cross-validation, where part of the training
data is held out to validate, to optimize this branch parameter.
tree <-
rpart(diagnosis~.,data=cancer.train,method="class")
#Overview of the optimization
printcp(tree)

1

, #CV optimization of branch
number plotcp(tree)

We needed to set our method to "class" for classification. Now that we have
an optimal decision tree for classification, we can visualize it by using the
following command:
rpart.plot(tree,main="Decision Tree for Cancer Dataset")
This gives a clear overview of the features that are being used in the decision
tree.

2

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, Bancontact of creditcard en je bent klaar. Geen abonnement nodig.

Focus op de essentie

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Bi0med. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 69052 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Samenvatting

Practical 5.1: Advanced Data Analysis: full summary + explanations

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud