Samenvatting

Part 2 summary Business Analytics: Week 5 - Week 7

Name: Part 2 summary Business Analytics: Week 5 - Week 7
SKU: doc_906603
Rating: 3.75 (4 reviews)
Author: jtimmermans

4 beoordelingen

139 keer bekeken 14 keer verkocht

Vak
Business Analytics

Instelling
Universiteit Van Amsterdam (UvA)

Part 2 of the summary for the course 'Business Analytics'. Includes all the reading material for week 5, 6, and 7. Week 5 --> Read chapter 4.1-4.3 Week 6 --> Read chapter 5.1-5.2 Week 7 --> Read chapter 8.1-8.2.2

[Meer zien]

Voorbeeld 3 van de 17 pagina's

Bekijk voorbeeld

Heel boek samengevat? Nee
Wat is er van het boek samengevat? 4.1 | 4.2 | 4.3 | 5.1 | 5.2 | 8.1 | 8.2.1 | 8.2.2
Geupload op 7 december 2020
Aantal pagina's 17
Geschreven in 2020/2021
Type Samenvatting

part 2
endterm
tentamen
business analytics
uva
samenvatting
an introduction to statistical learning

4 beoordelingen

Door: Ádámv1 • 4 jaar geleden

Door: siezenisjoelle • 4 jaar geleden

Door: tomascepas7 • 4 jaar geleden

Door: timothykwok • 4 jaar geleden

Volgen

jtimmermans Lid sinds 4 jaar 151 documenten verkocht

€5,99

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Summary Endterm Business Analytics week 5 - week 7

WEEK 5 CHAPTERS
In this chapter, we study approaches for predicting qualitative responses, a process that is known as
classification.

4.1 An Overview of Classification

Classification problem example:
A person arrives at the emergency room with a set of symptoms that could possibly be
attributed to one of three medical conditions. Which of the three conditions does the individual have?

Just as in the regression setting, in the classification setting we have a set of training observations
(x1,y1),...,(xn,yn) that we can use to build a classifier. We want our classifier to perform well not only
on the training data, but also on test observations that were not used to train the classifier.

4.2 Why Not Linear Regression?

Suppose we have three possible diagnoses.

Linear regression would not be appropriate in this case, because of the qualitative responses. The
difference between for example stroke and epileptic seizure, would not be the same as the difference
between drug overdose and epileptic seizure. Each of these codings would produce fundamentally
different linear models that would ultimately lead to different sets of predictions on test observations.

Only if the response variable’s values did take on a natural ordering, such as mild, moderate, and
severe, and we felt the gap between mild and moderate was similar to the gap between moderate and
severe, then a 1, 2, 3 coding would be reasonable.

For a binary (two level) qualitative response, the situation is better. For instance, perhaps there are
only two possibilities for the patient’s medical condition: stroke and drug overdose. We could then
use the dummy variable approach where stroke = 0 and drug overdose = 1.
For a binary response with a 0/1 coding as above, regression by least squares does make sense; it can
be shown that the Xˆ β o btained using linear regression is in fact an estimate of Pr(drug overdose|X) in
this special case. However, if we use linear regression, some of our estimates might be outside the
[0,1] interval (e.g. below 0), making them hard to interpret as probabilities! Nevertheless, the
predictions provide an ordering and can be interpreted as crude probability estimates. Curiously, it
turns out that the classifications that we get if we use linear regression to predict a binary response
will be the same as for the linear discriminant analysis (LDA).

1

,4.3 Logistic Regression
4.3.1 The Logistic Regression Model

How should we model the relationship between p( X)=Pr(Y=1|X) and X? (For convenience we are
using the generic 0/1 coding for the response).

To avoid the problem that p will be lower than 0 or higher than 1, we must model p(X) using a
function that gives outputs between 0 and 1 for all values of X. Many functions meet this description.
In logistic regression, we use the logistic function

To fit the model, we use a method called maximum likelihood, which we discuss in the next section.
The logistic function will always produce an S-shaped curve of this form, and so regardless of the
value of X, we will obtain a sensible prediction.

After a bit of manipulation of the above formula, we find that

The quantity p( X)/[1−p(X)] is called the odds, and can take on any value between 0 and ∞. Values of
the odds close to 0 and ∞ indicate very low and very high probabilities of default, respectively. By
taking the logarithm of both sides of the above formula, we arrive

The left-hand side is called the log-odds or logit. In a logistic regression model, increasing X by one
unit changes the log odds by β1, or equivalently it multiplies the odds by eβ1. The amount that p( X)
changes due to a one-unit change in X will depend on the current value of X. But regardless of the
value of X, if β1 is positive then increasing X will be associated with increasing p( X), and if β1 is
negative then increasing X will be associated with decreasing p(X)

4.3.2 Estimating the Regression Coefficients

We could use (non-linear) least squares to fit the model, but the more general method of maximum
likelihood is preferred, since it has better statistical properties.
The basic intuition behind using maximum likelihood to fit a logistic regression model is as follows:
we seek estimates for β0 and β1 such that the predicted probability ˆp(xi) of default for each
individual, corresponds as closely as possible to the individual’s observed default status. In other
words, we try to find ˆβ0 and ˆβ1 such that plugging these estimates into the model for p(X), yields a
number close to one for all individuals who defaulted, and a number close to zero for all individuals
who did not. This intuition can be formalized using a mathematical equation called a likelihood
function:

2

, The estimates ˆβ0 and ˆβ1 are chosen to maximize this likelihood function.

4.3.3 Making Predictions

Once the coefficients have been estimated, it is a simple matter to compute the probability of Y for
any given X. For example, the probability below is less than 1%

One can use qualitative predictors with the logistic regression model using the dummy variable
approach explained in 4.2.

4.3.4 Multiple Logistic Regression

We now consider the problem of predicting a binary response using multiple predictors. By analogy
with the extension from simple to multiple linear regression in Chapter 3, we can generalize the
simple logistics regression formula as follows:

where X = (X1,...,Xp) are p predictors. This equation can be rewritten as

Again, we use the maximum likelihood method to estimate the coefficients.

Confounding
A phenomenon where the results obtained using one predictor may be quite different from those
obtained using multiple predictors, especially when there is correlation among the predictors.

Table 4.2 only has student status as a predictor. Table 4.3 has 2 predictors, credit card balance and
students status.

3

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper jtimmermans. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €5,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 50064 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Part 2 summary Business Analytics: Week 5 - Week 7

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

4 beoordelingen

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?