100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Data Mining Final Exam 2024 Questions with 100% correct Answers $12.49   Add to cart

Exam (elaborations)

Data Mining Final Exam 2024 Questions with 100% correct Answers

 9 views  0 purchase
  • Course
  • Data Mining
  • Institution
  • Data Mining

Data Mining Final Exam 2024 Questions with 100% correct Answers In a data set with 22 variables, if 13% of the values, randomly spread across observations, are missing (blank), what is the probable percent of complete and usable observations? 4.67 (1 − 0.13)22 = 0.0467 or 4.67%. In a data...

[Show more]

Preview 3 out of 17  pages

  • August 16, 2024
  • 17
  • 2024/2025
  • Exam (elaborations)
  • Questions & answers
  • Data Mining
  • Data Mining
avatar-seller
Edwardsus
Data Mining Final Exam 2024 Questions with 100% correct
Answers


In a data set with 22 variables, if 13% of the values, randomly spread across observations, are missing
(blank), what is the probable percent of complete and usable observations?

4.67



(1 − 0.13)22 = 0.0467 or 4.67%.




In a data set with 20 variables, if 8% of the values, randomly spread across observations, are missing
(blank), what is the probable percent of complete and usable observations?

(1 − 0.08)20 = 0.1887 or 18.87%.




When performing an analysis, one technique is called RFM. Which of the following is not reflective of
RFM?

Relevancy;

RFM is the acronym for recency, frequency, and monetary.




Mark wants to have a better understanding of his client base at the credit union. To do so, he is
running a report to show loan amount approval with corresponding credit scores. He realized the data
set is quite large and wants to create categories by grouping. To do this, he needs to do all the
following except

Remove 20% of the data to create a training set;



Binning is taking the entire data set, identifying the value to be binned into smaller groups, ensuring
no data overlapping, and labeling the bin accordingly.

,In R, Mary wants to understand the number of days between rain events in Chicago, IL. What function
is used to find the number of rain events between today and January 1, 2026?

diffitime




Using R, what is the formula that will allow for the weekday function to display the day of the week
for November 15, 2020?

>weekdays(as.Date("2020-11-15"))




Using R, what function is used to evaluate the categories in the variable to identify the dummy
variables?

ifelse




Michael is examining a data set and trying to determine which category he can transform into a
dummy variable. Of the four variables, Employee Number, Pay Rate, Hire Date, and Sex, which is the
best fit to use a dummy variable?

Sex




Marcus wants to include the month of the year in the analysis as categories. How many dummy
variables will be needed?

11;

If a given k categories = 12, then k − 1, or 12 − 1 = 11 dummy variables.




Kara is reviewing categories where a series of numbers represent the type of loan. She would prefer
the actual name of the loan be retained when running her analysis. Using Microsoft Excel, what
function will allow Kara to retain the category name instead of recording them in numbers?

IF function;

, An IF function allows for statements to be crafted to transform numbers into category names.




What data preparation technique is Maeve using when she extracts a payroll data set into two
separate files, one for hourly employees and one for salary employees?

Subsetting




Regression analysis captures the relationship between only two distinct variables.

False;



Regression analysis captures the relationship between 2 or more variables.




The response variable is the outcome of a variable, whereas the predictor is the input variable(s).

True




R2 in linear regression is the correlation coefficient.

False;

R2 in linear regression is the coefficient of determination, which is the proportion of the sample
variation in the response variable that is explained by the sample regression equation. The correlation
coefficient is the relationship between two variables.




R2, also known as the coefficient of determination, quantifies the proportion of the sample variation
in the predictor variables (xi) that is explained in the sample regression equation.

False;

R2 quantifies the sample variation of the response variable y that is explained in the sample
regression equation, not the predictor variables.

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Edwardsus. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $12.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

85443 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$12.49
  • (0)
  Add to cart