True or false: Bar charts are useful for comparing a single statistic (e.g. average, count,
percentage) across groups. The height of the bar represents the value of statistic, and
different bars correspond to different groups. - ✔️✔️True
Assume that you are running Neural platform in JMP Pro. Which penalty method should
be chosen if your data set has large number of X variables, and you think that a few of
them contribute more than others to the predictive ability of the model? [ No penalty ;
Absolute ; Logarithmic ; Squared ] - ✔️✔️Absolute
To obtain an honest estimate of future classification error, we use the classification
matrix that is computed from ________. - ✔️✔️Validation data
Identify whether the task required is supervised or unsupervised learning: Predicting
whether a company will go bankrupt based on comparing its financial data to those of
similar bankrupt and nonbankrupt firms. - ✔️✔️Supervised learning, all information
evaluated is known
Identify whether the task required is supervised or unsupervised learning: Printing of
custom discount coupons at the conclusion of a grocery store checkout based on what
you just bought and what others have bought previously. - ✔️✔️Unsupervised learning;
outcomes are unknown
True or false: The test data are used to build models, or to further tweak the model or
improve its fit. - ✔️✔️False
_____________ is used for assessing the performance of the final chosen model on
new data - ✔️✔️The test data partition
When a model is fit to training data, zero error with those data is not necessarily good.
This special case is called ______. - ✔️✔️Overfitting
Which of the following are the most popular visualization tools in JMP_Pro? -
✔️✔️Graph Builder, Fit Y by X, Distribution
Scatter plots play important role in prediction. Next step can be developing a model.
Scatter plots provide information about relationships (linear or non-linear) between
variables. The variables in scatter plot ________. - ✔️✔️Numerical
, In a box plot, the box include %50 of the data, the horizontal line represents
(i)____________, the top and bottom of the box represent (ii)________, respectively. -
✔️✔️(i) the Median (50th percentile); (ii) the 75th and 25th percentiles
In JMP a diamond is displayed in the box, where the center of the diamond is
_________. - ✔️✔️The mean
The density ellipsoid in scatterplot matrix is a good graphical indicator of the correlation
between two variables. The ellipsoid collapses diagonally as the correlation between the
two variables approaches either 1 or -1.
The ellipsoid is more circular if the two variables are more correlated. (TRUE or
FALSE?) - ✔️✔️False; The ellipsoid is more circular (less diagonally oriented) if the
two variables are less correlated
True or False: Sensitivity and Specificity are plotted on an ROC Curve. - ✔️✔️True
How do you calculate the error rate on a classification matrix (Confusion Chart)? -
✔️✔️Total incorrect predictions / total predictions
The 'portion' of a lift curve represents what percent of the data, and how is this portion
sorted? - ✔️✔️The portion (portion = .2 = p) represents the top p% (20%) of the data,
as sorted by their predicted probability of predictor
The lift of a lift curve represents what? - ✔️✔️The lift value (lift = 2.2) represents the
relative likelihood of finding a certain predictor relative to the likelihood of finding that
predictor amongst the overall proportion of that predictor (lift = 2.2 means you are 2.2
times more likely to find that predictor in that data set)
True or false: Principal Component Analysis (PCA) is intended for use with quantitative
values - ✔️✔️True
True or false: The idea of PCA is to find a linear combination of the two variables that
contains most, even if not all, of the information, so that this new variable can replace
the two original variables. - ✔️✔️True
How would the correlations change if we normalized the data first? - ✔️✔️Correlations
will not change, since data are normalized by computing correlations
True or false: Pairs of variables that have a very strong (positive or negative) correlation
contain duplicative information. Therefore, we want to omit the variables that are
strongly correlated to others to avoid multicolinearity (when fitting models). - ✔️✔️True
??? Which of the following are the methods that we use for dimension reduction? (4
correct answers) - ✔️✔️Removing independent variables from the model ; random
selection of variables for model development ; logistics regression ; removing one of the