1.1) In statistics, the variance inflation factor (VIF) is the quotient of the
variance in a model with multiple terms by the variance of a model with one
term alone.[1] It quantifies the severity of multicollinearity in an ordinary
least square’s regression analysis. It provides an index that measures how
much the variance (the square of the estimate's standard deviation) of an
estimated regression coefficient is increased because of collinearity.
Cuthbert Daniel claims to have invented the concept behind the variance
inflation factor, but did not come up with the name.
We can calculate k different VIFs (one for each Xi) in three steps:
Step one
First we run an ordinary least square regression that has Xi as a function of all
the other explanatory variables in the first equation.
If i = 1, for example, equation would be :
where is a constant and e is the error term.
Step two
Then, calculate the VIF factor for with the following formula:
where R2i is the coefficient of determination of the regression equation in step
one, with on the left-hand side, and all other predictor variables (all the
other X variables) on the right-hand side.
Step three
Analyse the magnitude of multicollinearity by considering the size of the
. A rule of thumb is that if then multicollinearity is high
(a cut-off of 5 is also commonly used).
Some software instead calculates the tolerance which is just the reciprocal
of the VIF. The choice of which to use is a matter of personal preference.
, 1.2) The coefficient of determination, R^2, is used to analyse how differences
in one variable can be explained by a difference in a second variable. For
example, when a person gets pregnant has a direct relation to when they give
birth.
Step 1: Find the correlation coefficient, r (it may be given to you in the
question). Example, r = 0.543.
Step 2: Square the correlation coefficient.
0.5432 = .295
Step 3: Convert the correlation coefficient to a percentage.
.295 = 29.5%
1.3) The C-statistic (sometimes called the “concordance” statistic or C-index)
is a measure of goodness of fit for binary outcomes in a logistic regression
model. In clinical studies, the C-statistic gives the probability a randomly
selected patient who experienced an event (e.g. a disease or condition) had a
higher risk score than a patient who had not experienced the event. It is
equal to the area under the Receiver Operating Characteristic (ROC) curve
and ranges from 0.5 to 1.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller cilliersfrederick. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $13.80. You're not tied to anything after your purchase.