Table of Contents
Part I Exploring and Collecting Data
Chapter 1 Data and Decisions 1-1
Chapter 2 Displaying and Describing Categorical Data 2-1
R
Chapter 3 Displaying and Describing Quantitative Data 3-1
U
Chapter 4 Correlation and Linear Regression 4-1
Case Study: Paralyzed Veterans of America 4-49
E
Part II Modeling with Probability
S
Chapter 5 Randomness and Probability 5-1
S
Chapter 6 Random Variables and Probability Models 6-1
I
Chapter 7 The Normal and Other Continuous Distributions 7-1
O
Part III Gathering Data
N
Chapter 8 Data Sources: Observational Studies and Surveys 8-1
Chapter 9 Data Sources:Experiments 9-1
N
Part IV Inference for Decision Making
O
Chapter 10 Sampling Distributions and Confidence Intervals for Proportions 10-1
C
Case Study: Real Estate Simulation
Chapter 11 Confidence Intervals for Means 11-1
D
Chapter 12 Testing Hypotheses 12-1
E
Chapter 13 More about Tests and Intervals 13-1
M
Chapter 14 Comparing Two Means 14-1
Chapter 15 Inference for Counts: Chi-Square tests 15-1
Brief Case: Loyalty Program 15-27
Part V Models for Decision Making
Chapter 16 Inference for Regression 16-1
Chapter 17 Understanding Residuals 17-1
Chapter 18 Multiple Regression 18-1
Copyright © 2019 Pearson Education, Inc.
, Chapter 1 – Data and Decisions
SECTION EXERCISES
SECTION 1.1
1. a) Each row represents a different house that was recently sold. It can be described as a case.
b) There are six quantitative variables in each row plus a house identifier for a total of seven variables.
2. a) Each row represents a different transaction (not customer or book). It can be described as a case.
R
b) There are six quantitative variables plus two identifiers in each row for a total of eight variables.
SECTION 1.2
U
3. a) House_ID is an identifier (categorical, not ordinal); Neighborhood is categorical (nominal); Mail_ZIP is
categorical (nominal – ordinal in a sense, but only on a national level); Acres is quantitative (units – acres);
Yr_Built is quantitative (units – year); Full_Market_Value is quantitative (units – dollars); Size is
E
quantitative (units – square feet).
b) These data are cross-sectional. Each row corresponds to a house that recently sold so at approximately
S
the same fixed point in time.
4. a) Transaction ID is an identifier (categorical, nominal, not ordinal); Customer ID is an identifier
S
(categorical, nominal); Date can be treated as quantitative (how many days since the transaction took place,
days since Jan. 1 2009, for example) or categorical (as month, for example); ISBN is an identifier
I
(categorical, nominal); Price is quantitative (units – dollars); Coupon is categorical (nominal); Gift is
O
categorical (nominal); Quantity is quantitative (unit – counts).
b) These data are cross-sectional. Each row corresponds to a transaction at a fixed point in time. However,
the date of the transaction has been recorded so the data could be reconfigured as a time series. It is likely
N
that the store had more sales in that time period so a time series is not appropriate.
SECTION 1.3
N
5. It is not specified whether or not the real estate data of Exercise 1 are obtained from a survey. The data
would not be from an experiment, a data gathering method with specific requirements. Rather, the real
O
estate major’s data set was derived from transactional data (on local home sales). The major concern with
drawing conclusions from this data set is that we cannot be sure that the sample is representative of the
population of interest (e.g., all recent local home sales or even all recent national home sales). Therefore,
C
we should be cautious about drawing conclusions from these data about the housing market in general.
6. The student is using a secondary data source (from the Internet). No information is given about how, when,
D
where and why these data were collected or if it was the result of a designed experiment. It is also not
stated that the sample is representative of companies. There are concerns about using these data for
generalizing and drawing conclusions because the data could have been collected for a different purpose
E
(not necessarily for developing a stock investment strategy). Therefore, the student should be cautious
about using this type of data to predict performance in the future.
M
CHAPTER EXERCISES
7. The news. Answers will vary.
8. The Internet. Answers will vary.
9. Survey. The description of the study has to be broken down into its components in order to understand the
study. Who– who or what was actually sampled–college students; What–what is being measured–opinion of
electric vehicles: whether there will more electric or gasoline powered vehicles in 2025 and the likelihood
of whether they would purchase an electric vehicle in the next 10 years; When–current; Where–your
location; Why–automobile manufacturer wants college student opinions; How–how was the study
1-1
Copyright © 2019 Pearson Education, Inc.
,1-2 Chapter 1 Data and Decisions
conducted–survey; Variables–there are two categorical variables–what students think about whether or not
there will be more electric or gasoline powered vehicles in 2025 and the second categorical variable is also
ordinal–how likely, using a scale, would the student be to buy an electric vehicle in the next 10 years;
Source –the data are not from a designed survey or experiment; Type–the data are cross-sectional;
Concerns–none.
10. Your survey. Answers will vary.
11. World databank. Answers will vary but chosen from the following possible indicators:
R
GDP growth (annual %)
GDP (current US$)
GDP per capita (current US$)
U
GNI per capita, Atlas method (current US$)
Exports of goods and services (% of GDP)
E
Foreign direct investment, net inflows (BoP, current US$)
GNI per capita, PPP (current international $)
GINI index
S
Inflation, consumer prices (annual %)
Population, total
S
Life expectancy at birth, total (years)
Internet users (per 100 people)
I
Imports of goods and services (% of GDP)
O
Unemployment, total (% of total labor force)
Agriculture, value added (% of GDP)
CO2 emissions (metric tons per capita)
N
Literacy rate, adult total (% of people ages 15 and above)
Central government debt, total (% of GDP)
Inflation, GDP deflator (annual %)
N
Poverty headcount ratio at national poverty line (% of population)
O
12. Arby’s menu. Who–Arby’s sandwiches; What–type of meat, number of calories (in calories), and serving
size (in ounces); When–not specified; Where–Arby’s restaurants; Why–assess the nutritional value of the
different sandwiches; How–information was gathered from each of the sandwiches on the menu at Arby’s,
C
resulting in a census; Variables–there are 3 variables: the number of calories and serving size are
quantitative, and the type of meat is categorical; Source–data are not from a designed survey or experiment;
D
Type–data are cross-sectional; Concerns–none.
13. MBA admissions. Who–MBA applicants (in northeastern U.S.); What–sex, age, whether or not accepted,
E
whether or not they attended, and the reasons for not attending (if they did not accept); When–not specified;
Where–a school in the northeastern United States; Why–the researchers wanted to investigate any patterns
in female student acceptance and attendance in the MBA program; How–data obtained from the admissions
M
office; Variables–there are 5 variables: sex, whether or not the students accepted, whether or not they
attended, and the reasons for not attending if they did not accept (all categorical) and age which is
quantitative; Source–data are not from a designed survey or experiment; Type–data are cross-sectional;
Concerns–none.
14. MBA admissions II. Who–MBA students (in program outside of Paris); What–each student’s standardized
test scores and GPA in the MBA program; When–2009 to 2014; Where–outside of Paris; Why–to
investigate the association between standardized test scores and performance in the MBA program over
five years (2009–2014); How–not specified; Variables–there are 2 quantitative variables: standardized test
scores and GPA; Source–data are not from a designed survey or experiment, data are available from student
records; Type–although the data are collected over 5 years, the purpose is to examine them as cross-
sectional rather than as time-series; Concerns–none.
Copyright © 2019 Pearson Education, Inc.
, Chapter 1 Data and Decisions 1-3
15. Pharmaceutical firm. Who–experimental volunteers; What–herbal cold remedy or sugar solution, and cold
severity; When–not specified; Where–major pharmaceutical firm; Why–scientists were testing the
effectiveness of an herbal compound on the severity of the common cold; How–scientists conducted a
controlled experiment; Variables–there are 2 variables: type of treatment (herbal or sugar solution) is
categorical, and severity rating is quantitative; Source – data come from an experiment; Type–data are
cross-sectional and from a designed experiment; Concerns–the severity of a cold might be difficult to
quantify (beneficial to add actual observations and measurements, such as body temperature). Also,
scientists at a pharmaceutical firm could have a predisposed opinion about the herbal solution or may feel
pressure to report negative findings about the herbal product.
R
16. Start-up company. Who–customers of a start-up company; What–customer name, ID number, region of
the country (coded as 1 = East, 2 = South, 3 = Midwest, 4 = West), date of last purchase, amount of
U
purchase ($), and item purchased; When–present day; Where–not specified; Why–the company is building a
database of customers and sales information; How–assumed that the company records the needed
information from each new customer; Variables–there are 6 variables: name, ID number, region of the
E
country, and item purchased which are categorical and date and amount of purchase are quantitative. Date
could be coded as categorical as well; Source–data are not from a designed survey or experiment; Type–
S
data are cross-sectional; Concerns–although region is coded as a number, it is still a categorical variable.
17. Vineyards. Who–vineyards; What–size of vineyard (most likely in acres), number of years in existence,
S
state, varieties of grapes grown, average case price ($), gross sales ($), and percent profit; When–not
specified; Where–not specified; Why–business analysts hope to provide information that would be helpful
I
to producers of U.S. wines; How–questionnaire to a sample of growers; Variables–there are 5 quantitative
O
variables: the size of vineyard (acres), number of years in existence, average case price ($), gross sales ($);
there are 2 categorical variables: state and variety of grapes grown; Source–data come from a designed
survey; Type–data are cross-sectional; Concerns–none.
N
18. Spectrem group polls. Who–not completely clear. Probably a sample of affluent and retired people; What–
pet preference, number of pets, services and products bought for pets (from a list); When–not specified;
N
Where–United States; Why–provide services for the affluent; How–survey; Variables–there are 3
categorical variables: pet preference, list of pets and list of services and products bought for pet; Source–
O
data from a designed survey; Type–data are cross-sectional; Concerns–none.
19. EPA. Who–every model of automobile in the United States; What–vehicle manufacturer, vehicle type (car,
C
SUV, etc.), weight (probably pounds), horsepower (units of horsepower), and gas mileage (miles per
gallon) for city and highway driving; When–the information is currently collected; Where–United States;
Why–the EPA uses the information to track fuel economy of vehicles; How– among the data EPA analysts
D
collect from the automobile manufacturers are the name of the manufacturer (Ford, Toyota, etc.), vehicle
type….”; Variables–there are 6 variables: vehicle manufacturer and vehicle type are categorical variables;
weight, horsepower, and gas mileage for both city and highway driving are quantitative variables; Source–
E
data are not from a designed survey or experiment; Type–data are cross-sectional; Concerns–none.
M
20. Consumer Reports. Who–46 models of smart phones; What–brand, price (probably dollars), display size
(probably inches) operating system, camera image size (megapixels), and memory card slot (yes/no);
When–not specified; Where–not specified; Why–the information was compiled to provide information to
readers of Consumer Reports; How–not specified; Variables–– there are a total of 6 variables: price,
display size and image size are quantitative variables; brand and operating system are categorical variables,
and memory card slot is a nominal variable; Source–not specified; Type–the data are cross-sectional;
Concerns–this many or may not be a representative sample of smart phones, or includes all of them, we
don’t know. This is a rapidly changing market, so their data are at best a snapshot of the state of the market
at this time.
21. Zagat. Who–restaurants; What–% of customers liking restaurant, average meal cost ($), food rating (0-30),
decor rating (0-30), service rating (0-30); When–current; Where–not specified; Why–service to provide
information for consumers; How–not specified; Variables–there are 5 variables: % liking and average cost
Copyright © 2019 Pearson Education, Inc.