6.2.1. Ridge regression
Recall that least squares regression minimizes RSS to estimate coefficients. The coefficients are unbiased, meaning that
least squares doesn't take variable significance into consideration when determining the coefficient values.
-> first term= RSS
-> second term= shrinkage penalty: term that shrinks the coefficients towards 0
-> λ =tuning parameter that controls the relative impact of the penalty term on the regression model
λ is large: coefficients must be small to make the second term small enough
-> coefficient estimates that come from ridge regression= biased: because variable significance
-> different values of λ will produce different sets of coefficient estimates
-> choose proper λ value through cross-validation
SCALING OF THE VARIABLES IS IMPORTANT
Ridge regression> least squares regression
Advantage ridge regression: bias-variance tradeoff
λ =0: high variance, no bias -> penalty term has no effect
increases λ -> flexibility of ridge regression decreases-> variance decreases-> bias increases
=> variance of the ridge regression predictions as a function of λ
if p is almost as large as n: use ridge regression (bc least squares regression has high variance)
Ridge regression > subset selection
=> computational advantages: ridge only fits a single model
Disadvantages:
-> will include al p predictors in the final model
-> penalty will shrink all of the coefficients towards 0 but will not set any of them exactly 0
(unless λ =0)
-> problem for model interpretation when p is large
1
,6.2.2. Lasso regression
-> shrink coefficients estimates towards 0
-> different penalty: forces some of the coefficients estimates to be exactly zero when the tuning
parameter λ is large enough
=> lasso regression performs variable selection (easier to interpret the final model)
• λ =0: least squares fit
• λ is super large: null model (coefficients estimates=0)
ridge regression vs lasso regression
lasso can produce a model involving any number of variables
ridge will always include all of the variables
ridge regression > lasso regression
=> response is a function of a large number of predictors
ridge regression < lasso regression
=> response is a function of only a few of the predictors
- all the points on a given ellipse share a common value of the RSS
- the further away from the least square coefficients estimates, the more RSS increases
- the lasso and ridge regression coefficients estimates are given by the first point at which an
ellipse contacts the constraint region (=blue region) = de schattingen van de lasso- en rigde
regressiecoëfficiënten worden gegeven door het eerste punt waarop een ellips het
beperkingsgebied raakt (=blauw gebied)
- lassobeperking heeft hoeken
=> ellipsen snijden het beperkingsgebied vaak op een as=> gelijk aan nul
- here: snijpunt bij B1=0 : resulting model will only include B2
- ridge: circular constraint with no sharp points (cirkelvormige beperking)
=> intersectie zal over het algemeen niet voorkomen op een acis=> niet -nul
2
, p=3
ridge regression=sphere
lasso= polyhedrion
p>3
ridge= hypersphere
lasso= polytope
advantage lasso:
-> more interpretable models that involve only a subset of the predictors
-> bc off variable selection
TYPES OF SHRINKAGE
o ridge: shrinks each least squares coefficients estimate by the same proportion
o lasso: shrinks each least squares coefficients estimate towards zero by a constant amount
-> coefficients that are less than this amount in absolute value are shrunken entirely to 0
= soft thresholding
=> feature selection
BAYESIAN INTERPRETATION
▪ Gaussian distribution (with mean zero and standard deciation a function of λ)
=> posterior mode for B (=most likely value for B given the data) = ridge regression solution
=posterior mean
▪ Double- exponential (Laplace, with mean zero and scale parameter a function of λ)
=> posterior mode for B= lasso solution (not a posterior mean)
SELECTING THE TUNING PARAMETER Λ
1. create a grid of different λ values
2. determine the cross-validation test error for each value
3. choose the value that resulted in the lowest error
3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller MarieVerhelst60. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $7.59. You're not tied to anything after your purchase.