1
Predictive Modelling using R
Student’s Name
Professor’s Name
Institute Name
Date 2
Predictive Modelling using R
Introduction
The purpose of the report is to identify the suitable predicted models (Linear regression Model, Logistic Regression Model & Decision Tree Variable) for the given employee data sets in R studio, which contain the 26 variables for the Globex Pharma, a pharmaceutical company. The predictive models are intended to predict whether the employee leaves the company or retain
in the company. This can be best described with the categorical variable, “Attrition” given in the data set. It has two values, “Yes or No”. Attrition can be defined as the departure of an employee
from the company due to any reason such as termination, resignation, retirement or death, etc. (Gartner, 2022). In the given context, it is related to the resignation most likely. Multiple linear regression, logistic regression, and decision tree models are developed for the dependent variable
“Attrition”, and the number of the independent variables. A suitable assumption in the data sets is also developed to run the models. 3
Methodology
The given data sets include the variables that have the numerical values such as Education, Environment, Job Involvement, Job Satisfaction, Relationship Satisfaction, Stock Options Level, Work-Life balance, and High Performance. All of these variables have numerical values that define the likelihood, and measurement and explain the variable. For example, Education has the values “1, 2, 3, 4, and 5”, which define the employee education level in the company such as high school, diploma, bachelor's, master's, and Ph.D. respectively. In the same way, other variables such as Gender, Department, Job role, Business Travel, and Marital Status define the actual measurement of the variable. While remaining variables such
as Age, DistanceFromHome, Monthly income, NumCompaniesWorked, Salary Increase, TotalWorkingYears, TrainingTimeLastYear, Yearsatcompany, YearsInCurrentRole, YearsSinceLastPromotion, and YearsSinceLastPromotion defines the actual values for the employees in the data sets. The variables such as Over18, and OverTime has the two values “Yes” and “No” similar to the attrition variable. The best model to predict whether the employees leave the organization or not can be performed through the linear regression model in the given data sets. The linear regression model
is used to predict or develop the relationship between the two variables or factors (Terence, 2021). In the given data set, the dependent variable and independent variables are defined and regression analysis is performed in the R studio. Since employees can leave the organization due to multiple reasons such as Job involvement, Job satisfaction, number of working years, salary increment, work-life balance, relationship with managers, promotion, rewards or recognition, and employee satisfaction rate (Melanie, 2021). Therefore, a multiple linear regression model is used, in which there are several independent variables and dependent variables. In the given