Biostatistics Final
review(updated)questions well answered
The lengths of stay for six patients were 0, 0, 1, 2, 2, and 16 days. Which is (are) the best measure(s) to
summarize these data?
Median, mean, median and range, mean and standard deviation, median and standard deviation -
correct answer ✔✔Median and range because the data are skewed and have an outlier, the median and
range would best summarize the data.
An epidemiologist attempts to predict the weight of an elderly person from demispan. She randomly
chooses 70 elderly subjects in a particular geographic area and records their weight and demispan
measurements in the form of (x i, y i) for i = 1..., 70. Given that the value of the Pearson correlation
coefficient is zero, what can be deduced?
There is a strong negative relationship between weight and demispan, there could be some nonlinear
relationship between weight and demispan, there is no relation between weight and demispan, all pairs
of values of weight and demispan are practically identical, there is an almost perfect relationship
between weight and demispan. - correct answer ✔✔There could be some nonlinear relationship
between weight and demispan: The justification is that the Pearson correlation only looks at linear
relationships. The zero value means that there is no linear relation but there could be a non linear one.
For example, if points are (-3, 9), (-2, 4), (-1, 1), (0,0) (1,1) (2,4), (3,9), then the Pearson correlation is zero
by Y = X squared.
Which of the following statistical tests is not considered a nonparametric test?
Mann-whitney, Tukey's, Kruskal-Wallis, Wlixocon rank-sum - correct answer ✔✔Tukey's test; there are
actually two Tukey's tests. One is a post hoc procedure for ANOVA, and the other is a test for additivity
used in ANOVA. Neither is a nonparametric test.
A researcher is designing a new questionnaire to examine patient stress levels on a scale of 0 to 5. What
type of outcome variable is being collected?
Interval, ration, binary, ordinal, nominal - correct answer ✔✔Ordinal; data are at the ordinal level of
measurement if they can be arranged in some order, but differences between data values either cannot
be determined or are meaningless.
,If the chances for a second event to occur stay the same, regardless of the outcome of a first event, then
the two events are: indeterminate, independent, mutually exclusive, equally likely - correct answer
✔✔Independent; Two events A and B are independent if the occurrence of one does not affect the
probability of the occurrence of the other. If A and B are not independent, they are considered
dependent.
In simple linear regression, what is a method of determining the slope and intercept of the best-fitting
line?
Least squares, r-square, least error, regression, minimum error - correct answer ✔✔Least squares;
Simple linear regression involves data on a dependent variable y and one or more independent variables
(x 1 , x 2 , etc.). Regression analysis involves finding the "best" mathematical model (within some
restricted class of models) to describe y as a function of the x's or to predict y from the x's. The
regression line is the presentation of the regression equation. Residuals are used to determine the best-
fitting line, and residuals are calculated by subtracting the observed minus expected values along the
regression line. A straight line satisfies the least-squares property if the sum of the squares of the
residuals is the smallest sum possible.
In a group of individuals, the probability of characteristic C is 0.4, and the probability of characteristic D
is 0.2. The probability of their intersection is 0.10. Which of the following statements is correct?
Characteristics C and D are not independent, char. c and d are mutually exclusive, C and D are
independent and mutually exclusive, not enough information is given, C and D are independent - correct
answer ✔✔Characteristics C and D are not independent. Two events A and B are independent if the
occurrence of one does not affect the probability of the occurrence of the other. Because there is some
probability of an intersection of events, these events are no independent.
If all of the numbers in a list increase by 2, then the standard deviation is: unchanged, cannot be
determined without the actual list of numbers, increased by 4, increased by 2 - correct answer
✔✔unchanged; adding a constant number to a list of data does not change the standard deviation, but it
will change the list of numbers
The sensitivity of a particular screening test for a disease is 95%, and the specificity is 90%. Which of the
following statements is most correct?
If a person has the disease, there is a 5% chance that the test will be negative. If a person does not have
the disease, there is a 5% chance that the test will be positive, of 100 people sampled from a population
with the disease, the test will correctly detect 90 individuals, of 100 people sampled from a population
with the disease, the test will correctly detect 95 individuals as positive for the disease, if a person tests
positive, the probability of having the disease is 0.95. - correct answer ✔✔Of 100 people sampled from a
population with the disease, the test will correctly detect 95 individuals as positive for the disease;
, sensitivity is the proportion of truly disease people in the screened population who are identified as
diseased by the screening test. It is a measure of the probability of correctly diagnosing a case or the
probability that any given case will be identified by the test (e.g. true positives). Specificity is the
proportion of truly non-diseased people who are so identified by the screening test. It is a measure of
the probability of correctly identifying a non-diseased person with the screening test (e.g. true
negatives).
Specificity - correct answer ✔✔Specificity is the proportion of truly non-diseased people who are so
identified by the screening test. It is a measure of the probability of correctly identifying a non-diseased
person with the screening test (e.g. true negatives).
Sensitivity - correct answer ✔✔Sensitivity is the proportion of truly diseased people in the screened
population who are identified as diseased by the screening test. It is a measure of the probability of
correctly diagnosing a case or the probability that any given case will be identified by the test (e.g. true
positives).
Which is the most correct statement about a scatterplot?
It is used to compare the means of two variables, it is used to investigate the relationship between two
continuous variables, it shows the relationship between any two variables, it is a useless plot when the
relationship between two variables is nonlinear, it is used to determine whether to perform a linear
regression - correct answer ✔✔It is used to investigate the relationship between two continuous
variables; A scatterplot diagram is a plot of paired (x,y) data with a horizontal x-axis and a vertical y-axis.
A scatterplot can be used to investigate the relationship between two continuous variables as well as to
identify outliers within a data set.
The Central Limit Theorem states that: - correct answer ✔✔the sample mean is approximately normal;
the central limit theorem states that if the sample size is large enough, the distribution of the sample
means can be approximated by a normal distribution, even if the original population is not normally
distributed. In other words, the distribution of the sample means approaches a normal distribution as
the sample size increases.
Assume that a researcher has measured weight in a sample of 100 overweight adults before and after a
diet and exercise program conducted at the local health department's weekly Eat Healthy-Be Fit
community program. To determine whether the mean weight decreased six weeks after the exercise
program compared to the initial baseline measures, the researcher should: - correct answer ✔✔Conduct
a t-test for dependent samples, A t-test is a hypothesis test to compare population means and
proportions. In this case, the sample is dependent because the tests are performed on the same
individuals in the sample.