Lecture 5: Logistic Regression
Cluster Analysis: Step by Step
• Step 1 | Defining the objectives
• Step 2 | Designing the study
• Step 3 | Checking assumptions
• Step 4 | Estimating the model and assessing fit
• Step 5 | Interpreting the results
• Step 6 | Validating the results
Step 1: Defining Objectives
The purpose of logistic regression is to
´ Predict the likelihood that an event will or will not occur
´ Assess variables that affect occurrence of the event
- Direction of influence
- Magnitude/importance (how strong will the change in a variable be)
Examples:
o What is the probability that a person will respond to a Neckerman direct mailing, and how can Neckerman
adjust its mail content to increase this probability?
o Does improved waiting time at the checkout increase the likelihood of visiting a C1000 store?
o What makes people more likely to donate to Foster Parents charity?
All three examples looking for a causal relationship:
o A set of variables influencing an outcome à the likelihood of an event happening
o Dependence method: we have a clear outcome variable, that is something is going to happen, interested in
variables in the likelihood that that is going to happen (ANOVA, Linear Regression)
Step 2: Designing the analysis
´ …..involves decisions on:
- The variables to be included
- The sample
o Size
o Composition
o Estimation versus holdout
Logistic Regression vs. Anova/ Lineair Regression:
o The fact that the outcome variable for Logistic Regression is a 0/1 variable, it’s a dummy
o The dummy can take on two values that are mutually exclusive (they cannot occur together)
o Variables are exhaustive (it is one or the other)
Variables:
´ The dependent variable can be …
- “naturally” dichotomous (0/1: two mutually exclusive/exhaustive options) (a mailing is responded
to or it isn’t), or
- Reconstructed from metric variable (polar extremes approach à into a (0/1) variable
´ The independent variables: (can be metric or non-metric, but then transform)
- Can be metric or dummy variables
, - Selection based on theory,
or intuition Polar extreme approach
1. Rank order your observations
2. Split the data in to three parts (low/ mid/ high values)
3. Drop the mid-values and keep low and high (turn them in 0 and 1)
Example in lecture: Advertising through in-store screens
o Do consumers notice ad messages on TV screens in store?
o How does this depend on message characteristics?
o Which consumers are more likely to notice the message?
o Store intercept survey: 879 respondents
o Dependent variable: probability that in-store display =seen/not seen (recall?)
o Independent variables (explanatory variables):
- Consumer: store visit frequency, spending on electronics, education (3 levels), Home Tv-ad seen
(yes/no)?
- Message: length (3 levels), sound (yes/no)
´ Sample:
- # observations/#independent variables (At least 20 rows in your data set for each explanatory
variable)
- Group sizes (The yesses or no’s) (representative? oversampling?) (Oversampling only with a rare
event à make the data set in such a way that there are more people that have a yes, may have a
better chance of finding the effect (but be careful with interpreting the results)
- Analysis versus Holdout sample
- Proportionally stratified subsamples à The proportion of yesses and no’s is the same in
your estimation and in your holdout sample
Step 3: Checking assumptions
´ Two groups for outcome variable
- (More Than Two: Extension: Multi Nominal Logit Model)
´ Robust to deviations from multivariate normality and homoscedasticity (don’t bother checking, since
Robust)
- Homoscedasticity: Equal variances in the groups of 0’s versus 1’s for the error terms
´ Check multicollinearity: Overlap between explanatory variables
´ Outcome = probability (a likelihood or a probability) (positive, between 0 or 1)
- Must lie between zero and one
- Need to adjust model form: S-shaped
If we would use a regular linear regression model we have a problem, because there is no guarantee that whatever
values you use for the explanatory variables (for the x’s), that your outcome will be guaranteed to be between 0 and
1 à we use a different function (see plot below)
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Kaat123. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €2,99. Je zit daarna nergens aan vast.