QM1 Statistic terms
Chapter 1: data and decisions
Big data: the collection and analysis of data sets so large and complex that traditional
methods typically brought to bear on the problem would be overwhelmed.
Business analytics: the process of using statistical analysis and modeling to drive
business decisions.
Case (record/row): a case is an individual about whom or which we have data.
Categorical (or qualitative) variable: a variable that names categories (whether with
words or numerals).
Context: the context ideally tells who was measured, what was measured, how the data
was collected, where the data were collected, and when and why the study was
performed.
Cross-sectional data: data taken from situations that vary over time but measured at a
single time instant are said to be a cross-section of the time series.
Data: recorded values, whether numbers or labels, together with their context.
Data mining (or predictive analytics): the process of using a variety of statistical tools to
analyze large databases or data warehouses.
Data table: an arrangement of data in which each row represents a case, and each
column represents a variable.
Data warehouse: a large database of information collected by a company or other
organization usually to record transactions that the organization makes, but also used
for analysis via data mining.
Experimental unit: an individual in a study for which or for whom data values are
recorded. Human experimental units are usually called subjects or participants.
Identifier variable: a categorical variable that records a unique value for each case, used
to name or identify it.
Metadata: auxiliary information about variables in a database, typically including how,
when, and where (and possible why) the data were collected; who each case represents;
and the definitions of all the variables.
Nominal variable: the term “nominal” can be applied to a variable whose values are used
only to name categories.
Ordinal variable: the term “ordinal” can be applied to a variable whose categorical values
possess some kind of order.
Participant (subject): a human experimental unit
Quantitative variable: a variable in which the numbers are values of measured quantities
with units.
Record: information about an individual in a database.
Relational database: a relational database stores and retrieves information. Within the
database, information is kept in data tables that can be “related” to each other.
Respondent: someone who answers, or responds to, a survey.
Spreadsheet: a spreadsheet is a layout designed for accounting that is often used to
store and manage data tables.
Subject (participant): a human experimental unit.
Times series: data measured over time. Usually, the time intervals are equally spaced or
regularly spaced.
Units: a quantity or amount adopted as a standard of measurements.
Variable: a variable holds information about the same characteristic for many cases.
, Chapter 2: visualizing and describing categorical data
Area principle: in a statistical display, each data value is represented by the same
amount of area.
Bar chart (relative bar chart): a chart that represents the count (or percentage) of each
category in a categorical variable as a bar, allowing easy visual comparisons across
categories.
Cell: each location in a contingency table, representing the values of two categorical
variables, is called a cell.
Column percent: the proportion of each column contained in the cell of a frequency
table.
Conditional distribution: the distribution of a variable restricting the who to consider only
a smaller group of individuals.
Contingency table: a table displaying the frequencies (sometimes percentages) for each
combination of two or more variables.
Distribution: the distribution of a variable is a list of:
All the possible values of the variable
The relative frequency of each value
Frequency table (relative frequency table): a table that lists the categories in a
categorical variable and gives the number (the percentage) of observations for each
category.
Independent variables: variables for which the conditional distribution of one variable is
the same for each category of the other.
Marginal distribution: in a contingency table, the distribution of either variable alone. The
counts or percentages are the totals found in the margins (usually the right-most column
or bottom row) of the table.
Mosaic plot: a mosaic plot is a graphical representation of a (usually two-way)
contingency table. The plot is divided into rectangles so that the area of each rectangle
is proportional to the number of cases in the corresponding cell.
Pie chart: pie charts show how a “whole” divides into categories by showing a wedge of a
circle whose area corresponds to the proportion in each category.
Row percent: the proportion of each row contained in the cell of a frequency table.
Segmented (or stacked) bar chart: a segmented bar chart displays the conditional
distribution of a categorical variable within each category of another variable.
Simpson’s paradox: a phenomenon that arises when averages appear to contradict the
overall averages.
Total percent: the proportion of the total contained in the cell of a frequency table.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller maureendebruijn. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.85. You're not tied to anything after your purchase.