Lecture 5: Logistic Regression
Cluster Analysis: Step by Step
• Step 1 | Defining the objectives
• Step 2 | Designing the study
• Step 3 | Checking assumptions
• Step 4 | Estimating the model and assessing fit
• Step 5 | Interpreting the results
• Step 6 | Validating the results
Step 1: Defining Objectives
The purpose of logistic regression is to
´ Predict the likelihood that an event will or will not occur
´ Assess variables that affect occurrence of the event
- Direction of influence
- Magnitude/importance (how strong will the change in a variable be)
Examples:
o What is the probability that a person will respond to a Neckerman direct mailing, and how can Neckerman
adjust its mail content to increase this probability?
o Does improved waiting time at the checkout increase the likelihood of visiting a C1000 store?
o What makes people more likely to donate to Foster Parents charity?
All three examples looking for a causal relationship:
o A set of variables influencing an outcome à the likelihood of an event happening
o Dependence method: we have a clear outcome variable, that is something is going to happen, interested in
variables in the likelihood that that is going to happen (ANOVA, Linear Regression)
Step 2: Designing the analysis
´ …..involves decisions on:
- The variables to be included
- The sample
o Size
o Composition
o Estimation versus holdout
Logistic Regression vs. Anova/ Lineair Regression:
o The fact that the outcome variable for Logistic Regression is a 0/1 variable, it’s a dummy
o The dummy can take on two values that are mutually exclusive (they cannot occur together)
o Variables are exhaustive (it is one or the other)
Variables:
´ The dependent variable can be …
- “naturally” dichotomous (0/1: two mutually exclusive/exhaustive options) (a mailing is responded
to or it isn’t), or
- Reconstructed from metric variable (polar extremes approach à into a (0/1) variable
´ The independent variables: (can be metric or non-metric, but then transform)
- Can be metric or dummy variables
, - Selection based on theory,
or intuition Polar extreme approach
1. Rank order your observations
2. Split the data in to three parts (low/ mid/ high values)
3. Drop the mid-values and keep low and high (turn them in 0 and 1)
Example in lecture: Advertising through in-store screens
o Do consumers notice ad messages on TV screens in store?
o How does this depend on message characteristics?
o Which consumers are more likely to notice the message?
o Store intercept survey: 879 respondents
o Dependent variable: probability that in-store display =seen/not seen (recall?)
o Independent variables (explanatory variables):
- Consumer: store visit frequency, spending on electronics, education (3 levels), Home Tv-ad seen
(yes/no)?
- Message: length (3 levels), sound (yes/no)
´ Sample:
- # observations/#independent variables (At least 20 rows in your data set for each explanatory
variable)
- Group sizes (The yesses or no’s) (representative? oversampling?) (Oversampling only with a rare
event à make the data set in such a way that there are more people that have a yes, may have a
better chance of finding the effect (but be careful with interpreting the results)
- Analysis versus Holdout sample
- Proportionally stratified subsamples à The proportion of yesses and no’s is the same in
your estimation and in your holdout sample
Step 3: Checking assumptions
´ Two groups for outcome variable
- (More Than Two: Extension: Multi Nominal Logit Model)
´ Robust to deviations from multivariate normality and homoscedasticity (don’t bother checking, since
Robust)
- Homoscedasticity: Equal variances in the groups of 0’s versus 1’s for the error terms
´ Check multicollinearity: Overlap between explanatory variables
´ Outcome = probability (a likelihood or a probability) (positive, between 0 or 1)
- Must lie between zero and one
- Need to adjust model form: S-shaped
If we would use a regular linear regression model we have a problem, because there is no guarantee that whatever
values you use for the explanatory variables (for the x’s), that your outcome will be guaranteed to be between 0 and
1 à we use a different function (see plot below)
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Kaat123. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $3.24. You're not tied to anything after your purchase.