ISYE 7406 ISYE7406_HW2_2
Introduction Work with “fat” data set and apply 7 linear regression models to the data set. For each model, find out the testing error. Since the data set is small, apply Monte Carlo Cross Validation to all the models and find out the average Testing error for each model after 100 loops. Exploratory Data Analysis The “fat” data set has 252 observations and 18 variables. The first column “brozek” is the dependent variable representing the percentage of body fat. The rest 17 variables are potential predictors. We split data into two data sets, training dataset and testing dataset. Since the dataset is small, we choose 10% as testing sample. After split the data, we plot the scatter and distribution for dependent variable. The distribution of dependent variable is skew to right having long tail on its right. From the plot below, we can identify the maximum number 45.10 is outlier. Look at the correlations between all variables and correlation to dependent variable “brozek”. Siri variable is 100% correlated to dependent variable “brozek”. The density variable is also high correlated to “brozek”. Besides high correlated to dependent variable, there are some independent variables are also high related. Methods Apply 7 different linear
Written for
- Institution
-
Georgia Institute Of Technology
- Course
-
ISYE 7406
Document information
- Uploaded on
- October 24, 2023
- Number of pages
- 21
- Written in
- 2023/2024
- Type
- OTHER
- Person
- Unknown