ISYE7406_HW2_2 QUESTIONS AND ANSWERS
HW2 Introduction Work with “fat” data set and apply 7 linear regression models to the data set. For each model, find out the testing error. Since the data set is small, apply Monte Carlo Cross Validation to all the models and find out the average Testing error for each model after 100 loops. Exploratory Data Analysis The “fat” data set has 252 observations and 18 variables. The first column “brozek” is the dependent variable representing the percentage of body fat. The rest 17 variables are potential predictors. We split data into two data sets, training dataset and testing dataset. Since the dataset is small, we choose 10% as testing sample. After split the data, we plot the scatter and distribution for dependent variable. The distribution of dependent variable is skew to right having long tail on its right. From the plot below, we can identify the maximum number 45.10 is outlier. Look at the correlations between all variables and correlation to dependent variable “brozek”. Siri variable is 100% correlated to dependent variable “brozek”. The density variable is also high correlated to “brozek”. Besides high correlated to dependent variable, there are some independent variables are also high related. Methods Apply 7 different linear regression models to analyze data. The 7 models are below. Mod1: Linear regression with all predictors; Mod2: Linear regression with the best subset of k = 5 predictors variables; Mod3: Linear regression with variables (stepwise) selected using AIC; Mod4: Ridge regression; Mod5: LASSO; Mod6: Principal component regression; Mod7: Partial least squares. First train 7 different linear regression models to training data set and get 7 models. Get training error to apply 7 different models on training d
Geschreven voor
Documentinformatie
- Geüpload op
- 24 oktober 2023
- Aantal pagina's
- 21
- Geschreven in
- 2023/2024
- Type
- Tentamen (uitwerkingen)
- Bevat
- Onbekend
Onderwerpen
Ook beschikbaar in voordeelbundel