Population = the set of objects under investigation. Objects themselves are the elements.
Data (measurements) = made on the elements and reflect some individual characteristics of the
elements.
Sample = studies consider only a part of the population of interest (a sample). The data, than
are measurements of the elements in the sample. These data contain hidden information that
has to be detected by statistics in order to become knowledge.
Data have to be collected first, then the data have to be summarized in terms of informative
numbers. Often measurement comes from a sample.
Subdivision of statistics:
1. Descriptive statistics: includes the collecting of data, and summarizing and presenting
them by means of tables, graphs and distinctive numbers. Collecting data involves
observations that follow from experiments. Data must be summarized (assure no
important information is loosed).
2. Probability theory: studies the behaviour and the laws of chance and probability in
experiments that allow more than one outcome. Precision of statistical procedures is
always expressed in terms of probabilities
3. Sampling theory: studies methods of sampling and their properties. One method is
random sampling, where the elements of the population have the same chance of being
chosen in the sample.
4. Inferential statistics: studies and applies methods to draw conclusions about distinctive
numbers of the whole population of interest by considering only a sample.
Variables
- Characteristic = a feature of interest that is used to compare the elements.
- Population variable = a well-defined prescript for observing a characteristic.
- Observation/observed value (or data) = when a variable is measured at an element.
- Qualitative (categorical) variables = a variable with categorized values.
o Nominal variable = if the values cannot be ordered in a natural way (e.g. gender)
o Ordinal variable = if the values can be ordered naturally (size)
- Quantitative (numerical) variable = values that are ordinary numbers. Can be:
o Discrete = if its set of possible values can be counted or;
o Continuous = if the set of possible values consists of all real numbers in an
interval (concerns large numbers).
- Alternative/dichotomous variables = qualitative variables that can take only two values
(man or woman for instance)
- Dummy variable = if one of the two values of an alternative variable is coded 1 and the
other as a 0
- Interval variable = If the ratio of two values of a quantitative variable is meaningless.
Otherwise it’s a ratio variable
Populations versus samples
- Census = when a variable is observed at all elements of the populations
- Population dataset = the resulting dataset from census; it contains all possible info
- Sample = the subset of a population
1
, - Sample dataset = the resulting dataset from a sample
- Sample statistics / statistics = if, for a certain variable, the dataset is a sample dataset
- Population statistics = if the dataset is a population dataset
- Parameters = a number of other measurable factor forming one of a set that defines a
system
- A statistic measures some overall feature of a set of objects (fixed number)
- A variable measure some individual feature that can take different values with different
individual objects
Chapter 2: Tables and graphs
Nominal variables
- (absolute) frequency = the number of times that a certain value occurs in the dataset
- Relative frequency = divided frequency of a value by the total number of observations.
The proportion of all observations in the dataset with that value times 100
- Frequency distribution = overview of all different values in the dataset jointly with
accompanying frequencies
Ordinal variables
- Cumulative variables = values that can be put in increasing order
- Cumulative (relative) frequency distribution = overview of all different values
combined with the respective cumulative (relative) frequencies
Quantitative variables
- Discrete variable = each different value forms a class if there are not too many
- Continuous variable = the classes are usually adjoining intervals
(Cumulative) distribution for a discrete variable
The cumulative distribution function (or distribution function for short) of a dataset of
observations of a discrete variable is the function F such that, for all real number b:
F(b) = relative frequency of the observations ≤ b
Properties of the distribution function for a discrete variable
- It is a non-decreasing step function
- It jumps to higher vertical levels at the different values of the dataset
- The jump sizes are just the relative frequencies of the different values in the dataset
Data of continuous variables
- Categorical system = the classes do not have common values and cover a whole range
- Classification = denoted data in a categorical system
- Classified frequency distribution = frequency distribution that gives an overview of the
chosen classification and the respective frequencies
2
,Cumulative distribution function when the variable is continuous.
In this case F(b) is the proportion of the observations that according to the classified frequency
distribution is smaller or equal to b.
If b is the upper bound of a class in the classification, then F(b) is just the cumulative relative
frequency up to and including that class. If b is smaller than the lower bound of the first class
in the classification or larger than the upper bound of the last class, then it is clear that F(b)
respectively equals 1 or 0.
When F(b) is defined for b in a class (l,u) of the classification with lower bound l and upper
bound u, note that F(u) is just the cumulative relative frequency up to and including that class
while F(l) is the cumulative relative frequency up to and including the preceding class in the
classification. Hence, F(u) – F(l) is the relative frequency for the class (l,u). For b in a class (l,
u) value F(b) of F at b follows by putting pairs (l,F(l)) and (u,F(u)) as dots in two-dimensional
system of axes, connecting them with a straight line and using this line to define F(b).
Figure: linear interpolation
(cumulative) distribution function of a classified frequency distribution
The (cumulative) distribution function F of a classified frequency distribution is the function
that arises from the cumulative relative frequencies of the lower and upper bounds of the classes
in the classification by using the method of linear interpolation.
Property: F is not a step function by a continuous and non-decreasing function that on each
class (l, u) of the classification goes from F(l) to F(u) by way of a straight line.
Time series data
- Cross-sectional data = most datasets so far have been measurements made at one
moment in time
- Time series data = measurement of a single variable at successive periods or moments
in time. The sequence of successive data is called time series.
3
, Chapter 3: Measures of location
Statistics mean value and percentage overweight are of interest; they help summarize the
dataset. Location refers to some central position of the dataset and its distribution but is not yet
defined precisely. Examples of measures of location are mode, median and mean.
Nominal variables
- Mode / modal value = the value within the dataset that has the highest frequency
o Unimodal = if dataset has one mode value
o Bimodal = if dataset has two modal values
o Multimodal = If dataset has more than two modal values
Ordinal variables
- Median = if the value of the data points is odd, the value of the middlemost observation
of the ordered data. If the number of data points is even, the middlemost pair of the
ordered data can be determined.
o If the dataset is a population dataset, its median is denoted by µmedian
o If the dataset is a sample dataset, its median is denoted by χmedian
Quantitative variables
- Median = (Xm1 + Xm2) / 2. If the dataset is even and quantitative.
Arithmetic mean
Proportion of successes
For qualitative variables the mean of a dataset is not defined, but for a quantitative variable is
defined. It is shown that the mean of observations of a 0-1 variable does make sense and is
equal to the proportion of ones in the dataset.
Weighted mean
Geometric mean
4
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper markoverkamp. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €4,99. Je zit daarna nergens aan vast.