ISYE 6501 - MIDTERM 2 | 2024
QUESTIONS AND ANSWERS
when might overfitting occur - when the # of factors is close to or larger than the # of
data points causing the model to potentially fit too closely to random effects
Why are simple models better than complex ones - less data is required; less chance of
insignificant factors and easier to interpret
what is forward selection - we select the best new factor and see if it's good enough
(R^2, AIC, or p-value) add it to our model and fit the model with the current set of
factors. Then at the end we remove factors that are lower than a certain threshold
what is backward elimination - we start with all factors and find the worst on a supplied
threshold (p = 0.15). If it is worse we remove it and start the process over. We do that
until we have the number of factors that we want and then we move the factors lower
than a second threshold (p = .05) and fit the model with all set of factors
what is stepwise regression - it is a combination of forward selection and backward
elimination. We can either start with all factors or no factors and at each step we
remove or add a factor. As we go through the procedure after adding each new factor
and at the end we eliminate right away factors that no longer appear.
what type of algorithms are stepwise selection? - Greedy algorithms - at each step they
take one thing that looks best
what is LASSO - a variable selection method where the coefficients are determined by
both minimizing the squared error and the sum of their absolute value not being over a
certain threshold t
How do you choose t in LASSO - use the lasso approach with different values of t and
see which gives the best trade off
why do we have to scale the data for LASSO - if we don't, the measure of the data will
artificially affect how big the coefficients need to be
What is elastic net? - A variable selection method that works by minimizing the squared
error and constraining the combination of absolute values of coefficients and their
squares
, what is a key difference between stepwise regresson and lasso regression *** - If the
data is not scaled, the coefficients can have artificially different orders of magnitude,
which means they'll have unbalanced effects on the lasso constraint.
Why doesn't Ridge Regression perform variable selection? - The coefficients values
are squared so they go closer to zero or regularizes them, but the coefficient values are
never equal to zero
What are the pros and cons of Greedy Algorithms (Forward selection, stepwise
elimination, stepwise regression) - Good for initial analysis but often don't perform as
well on other data because they fit more to random effects than you'd like and appear to
have a better fit
What are the pros and cons of LASSO, Ridge and Elastic Net - They are slower but
help make models that make better predictions
Which two methods does elastic net look like it combines and what are the downsides
from it? - Ridge Regression and LASSO.
Advantages: variable selection from LASSO and Predictive benefits of Ridge.
Disadvantages: Arbitrarily rules out some correlated variables (e.g. LASSO doesn't
know which one should be left out); Underestimates coefficients of very predictive
variables (i.e. Ridge Regression)
What are some downsides of surveys? - Even if you have what appears to be a
representative sample in simple ways, maybe it isn't in more complex ways.
If we're testing to see whether red cars sell for higher prices than blue cars, we need to
account for the type and age of the cars in our data set. This is called: - Controlling
what is a blocking factor *** - a source of variability that is not of primary interest to the
experimenter
what is an example of a blocking factor - The type of car, sports car or family car, is a
blocking factor that it could account for some of the difference between red cars and
blue cars. Because sports cars are more likely to be red; if we account for the
difference, we can reduce the variability in our estimates
Under what conditions should you run A/B tests - When you can collect data quickly.
When the data is representative and the amount of data is small compared to the whole
population
Do you have to decide the sample size ahead of time for A/B tests - no, and we can run
the hypothesis test anytime we want
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller Mboffin. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $11.99. You're not tied to anything after your purchase.