QM1 Statistic terms
Chapter 1: data and decisions
Big data: the collection and analysis of data sets so large and complex that traditional
methods typically brought to bear on the problem would be overwhelmed.
Business analytics: the process of using statistical analysis and modeling to drive
business decisions.
Case (record/row): a case is an individual about whom or which we have data.
Categorical (or qualitative) variable: a variable that names categories (whether with
words or numerals).
Context: the context ideally tells who was measured, what was measured, how the data
was collected, where the data were collected, and when and why the study was
performed.
Cross-sectional data: data taken from situations that vary over time but measured at a
single time instant are said to be a cross-section of the time series.
Data: recorded values, whether numbers or labels, together with their context.
Data mining (or predictive analytics): the process of using a variety of statistical tools to
analyze large databases or data warehouses.
Data table: an arrangement of data in which each row represents a case, and each
column represents a variable.
Data warehouse: a large database of information collected by a company or other
organization usually to record transactions that the organization makes, but also used
for analysis via data mining.
Experimental unit: an individual in a study for which or for whom data values are
recorded. Human experimental units are usually called subjects or participants.
Identifier variable: a categorical variable that records a unique value for each case, used
to name or identify it.
Metadata: auxiliary information about variables in a database, typically including how,
when, and where (and possible why) the data were collected; who each case represents;
and the definitions of all the variables.
Nominal variable: the term “nominal” can be applied to a variable whose values are used
only to name categories.
Ordinal variable: the term “ordinal” can be applied to a variable whose categorical values
possess some kind of order.
Participant (subject): a human experimental unit
Quantitative variable: a variable in which the numbers are values of measured quantities
with units.
Record: information about an individual in a database.
Relational database: a relational database stores and retrieves information. Within the
database, information is kept in data tables that can be “related” to each other.
Respondent: someone who answers, or responds to, a survey.
Spreadsheet: a spreadsheet is a layout designed for accounting that is often used to
store and manage data tables.
Subject (participant): a human experimental unit.
Times series: data measured over time. Usually, the time intervals are equally spaced or
regularly spaced.
Units: a quantity or amount adopted as a standard of measurements.
Variable: a variable holds information about the same characteristic for many cases.
, Chapter 2: visualizing and describing categorical data
Area principle: in a statistical display, each data value is represented by the same
amount of area.
Bar chart (relative bar chart): a chart that represents the count (or percentage) of each
category in a categorical variable as a bar, allowing easy visual comparisons across
categories.
Cell: each location in a contingency table, representing the values of two categorical
variables, is called a cell.
Column percent: the proportion of each column contained in the cell of a frequency
table.
Conditional distribution: the distribution of a variable restricting the who to consider only
a smaller group of individuals.
Contingency table: a table displaying the frequencies (sometimes percentages) for each
combination of two or more variables.
Distribution: the distribution of a variable is a list of:
All the possible values of the variable
The relative frequency of each value
Frequency table (relative frequency table): a table that lists the categories in a
categorical variable and gives the number (the percentage) of observations for each
category.
Independent variables: variables for which the conditional distribution of one variable is
the same for each category of the other.
Marginal distribution: in a contingency table, the distribution of either variable alone. The
counts or percentages are the totals found in the margins (usually the right-most column
or bottom row) of the table.
Mosaic plot: a mosaic plot is a graphical representation of a (usually two-way)
contingency table. The plot is divided into rectangles so that the area of each rectangle
is proportional to the number of cases in the corresponding cell.
Pie chart: pie charts show how a “whole” divides into categories by showing a wedge of a
circle whose area corresponds to the proportion in each category.
Row percent: the proportion of each row contained in the cell of a frequency table.
Segmented (or stacked) bar chart: a segmented bar chart displays the conditional
distribution of a categorical variable within each category of another variable.
Simpson’s paradox: a phenomenon that arises when averages appear to contradict the
overall averages.
Total percent: the proportion of the total contained in the cell of a frequency table.
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper maureendebruijn. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.