Resume

Practical 5.1: Advanced Data Analysis: full summary + explanations

65 vues 10 fois vendu

Cours
Advanced Data Analysis (2052FBDBMW)

Établissement
Universiteit Antwerpen (UA)

PART 1 of Practical 5: Advanced Data Analysis: full summary + explanations, screenshots and answers to questions

[Montrer plus]

Aperçu 2 sur 11 pages

Voir l'exemple

Publié le 5 mai 2023
Nombre de pages 11
Écrit en 2021/2022
Type Resume

advanced data analysis
practical 5
r

Établissement
Universiteit Antwerpen (UA)
Cours
Biomedische Wetenschappen
Cours
Advanced Data Analysis (2052FBDBMW)

Bi0med

Membre depuis 5 année 493 documents vendus

4,49 €

Ajouté

Ajouter au panier

Ajouter au liste de veux

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Supervised classification with Decision Trees
and Random Forests

1 The Breast Cancer dataset
This dataset contains features regarding images of malignant and benign
breast tumors. There are ten continuous features that describe the size and
shape of each tumor. These are a) radius (mean of distances from center to
points on the perimeter) b) texture (standard deviation of gray-scale values)
c) perimeter d) area e) smoothness (local variation in radius lengths) f)
compactness (perimeter
/ area - 1.0) g) concavity (severity of concave portions of the contour) h) con-
cave points (number of concave portions of the contour) i) symmetry j) fractal
dimension ("coastline approximation" - 1).

The goal of this dataset will be to use to provided features to predict if a
sample is malignant (M) or benign (B). The dataset has been split up into
a training data set and a test data set. We can read in the dataset with the
following command:
cancer.train<-read.csv(file="breast-cancer-
train.csv",header=TRUE,stringsAsFactors=T)
cancer.test<-read.csv(file="breast-cancer-
test.csv",header=TRUE,stringsAsFactors=T)
summary(cancer.train)

2 Decision tree
We will use the ’rpart’ R package to learn and apply our decision trees. Install
it from CRAN if you have not already done so. We can load in the library with
the standard command:
library(rpart) library(rpart.plot)
We then need to apply it to the breast cancer dataset. One of the standard
optimizations that is part of the rpart() function is to optimize the number
of branches to include in the decision tree. The more branches, the higher the
chance to overfit, but we need some branches to solve our classification problem.
It does this by running an internal cross-validation, where part of the training
data is held out to validate, to optimize this branch parameter.
tree <-
rpart(diagnosis~.,data=cancer.train,method="class")
#Overview of the optimization
printcp(tree)

1

, #CV optimization of branch
number plotcp(tree)

We needed to set our method to "class" for classification. Now that we have
an optimal decision tree for classification, we can visualize it by using the
following command:
rpart.plot(tree,main="Decision Tree for Cancer Dataset")
This gives a clear overview of the features that are being used in the decision
tree.

2

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur Bi0med. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour 4,49 €. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

84669 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 14 ans

Commencez à vendre!

Universités et collèges populaires

Livres populaires

Resume

Practical 5.1: Advanced Data Analysis: full summary + explanations

Infos sur le Document

Sujets

École, étude et sujet

Vendeur

Avis reçus

Aperçu du contenu