Financial Data Decision Analysis
1. Introduction & Refresher
Empirical research
1. Motivation by previous studies
2. Formulation of a testable model
3. Collection of data
4. Model estimation
5. Interpret model
6. Use for further analysis, policy implications
Financial Data Characteristics I
Market data
- Frequent: daily, tick-by-tick
- High quality: no measurement error
- But: sometimes too much data, unstable relationships
Country data, macro-economic data
- Less frequently monthly, quarterly, annual
- Less reliable, data revisions
Corporate finance data
- Much data though not frequent: quarterly, annual
- Proxies, measurement errors, data revisions
Financial Data Characteristics II
Types of data
- Numbers (cardinal data)
1. Returns: -10.2%, +2.3%, …
2. P/E multiples: 3.12, 5.10, -10.28, …
- Ordered data
1. Credit ratings: AAA, AA+, AA, AA-, A+, …
- Non-ordered data
1. Issuing debt, issuing equity or retaining dividends
Cross-Sectional Data
Cross-sectional data are a sample of one or more variables collected at single point in time.
Examples of cross-sectional regressions analysis: The relationship between company size and the
return on investment in 2008
Time Series Data
Follow one country/firm/stock… over time. Examples are how the value of a company’s stock price
has varied when it announced the value of its dividend payment or what macro fundamentals do
explain the changes in a sovereign CDS spread
,Panel Data
Panel data has the dimensions of both time series and cross sections. Examples are the impact of
bank debt on corporate risk over time, the relationship between company size and the return on
investment in the last 20 years or monthly prices of a 10-year sample of 100 companies traded on
the NYSE
When doing Empirical Research
Try to get to know the data first
- Descriptive statistics (mean, sd, min, max)
- Scatter and/or rime-series plots
- Correlation analysis
Next step: Think and use your theory/intuition
Random variables and expectations
Definition: A random variable is any variable whose value cannot be predicted exactly. There are
discrete and continuous random variables:
- Discrete: specific set of possible values (e.g. throw a dice)
- Continuous: a continuous range of values (e.g. temperature)
Population: set of all possible values of the random variable
Probability distribution example: X is the sum of two dice
If there is 1/6 probability of obtaining each number on the red die and the same on the green die,
each outcome in the table will occur with 1/36 probability
,The distribution in this example is symmetrical, highest for X equal to 7 and declining on either side.
Expected Value of a Random Variable
The expected value of a discrete random variable is the weighted average of all tis possible values,
taking the probability of each outcome as its weight. You calculate it by multiplying each possible
value of the random variable by its probability and summing.
𝑛
𝐸(𝑋) = 𝑋1 𝑃1 + 𝑋2 𝑃2 + ⋯ + 𝑋𝑛 𝑃𝑛 = ∑ 𝑋𝑖 𝑃𝑖
𝑖=1
, Expected Value Rules
For example:
𝑌 = 𝑏1 + 𝑏2 𝑋
𝐸(𝑌) = 𝐸(𝑏1 + 𝑏2 𝑋)
= 𝐸(𝑏1 ) + 𝐸(𝑏2 𝑋)
= 𝑏1 + 𝑏2 𝐸(𝑋)
Let g (X) be any function of X. Then the expected value of this function is given by:
𝑛
𝐸(𝑔(𝑋)) = 𝐺(𝑋1 )𝑃1 + 𝐺(𝑋2 )𝑃2 + ⋯ + 𝐺(𝑋𝑛 )𝑃𝑛 = ∑ 𝐺(𝑋𝑖 )𝑃𝑖
𝑖=1
Population Variance of a Discrete Random Variable
The population variance is defined as the expected value of the square of the difference between X
and its mean
𝑛
2 2}
𝑉𝑎𝑟(𝑋) = 𝜎 𝑥 = 𝐸{(𝑋 − 𝜇𝑥 ) = (𝑋1 − 𝜇𝑥 ) 𝑃1 + ⋯ + (𝑋𝑛 − 𝜇𝑥 ) 𝑃𝑛 = ∑(𝑋𝑖 − 𝜇𝑥 )2 𝑃𝑖
2 2
𝑖=1
Note that: 𝜎𝑥 = √𝜎 2 𝑥