Research Methods in Finance
How to import files in Gretl?
1. Open athena.
2. Open Ufora in Chrome (Athena) and download the file.
3. Click on ‘show in folder’ and replace the file to your home folder (-Athena).
4. Open Gretl (Athena) and import the file.
1. INTRODUCTION AND DEALING WITH DATA (PART 1 & 2)
Slide 13
The application of statistical techniques to problems in finance. → to answer a research question.
Financial / economic econometrics
→ focuses on other kind of data (financial or economic information)
→ financial data: bv. stock price; economic data: bv. GDP
→ financial data: less exposed to the small sample problem!! ☺ bv. GDP only has 4 quarters, stock
price is continually recalculated.
→ no measurement errors with financial data ☺
→ No data revisions with financial data because no measurement errors ☺
Slide 14
Important!! Framework we use when we do analyses
Starting point: looking at papers from the past and understand
1. Y = return Bel20 (to understand the performance of the stock market)
X = GDP
2. Regression model: link the different variables → effect?
3. Collect data for Bel20 and GDP
Slide 15
Non-linear: vb. Y = a + bx^2
Slide 16
Assume x = 0 (we haven’t studied), then y = 25
x = 1000, then y = 75
B = (y2-y1)/(x2-x1)
Slide 17
First: read other papers that have done a similar analysis!!
Slide 19
Panel data is more informative!
Aggregation: if you want to analyze house prices you can look at
- The prices of individual houses
→ if you want to know the underlying individual house prices
- The house price index (=“what is the general price of houses within a Region bv. Vlaanderen of
Belgium) and compare to other regions.
→ you lose a lot of information because you don’t see the individual prices, but useful if you
want to see the bigger picture
The used method depends on the research question!!
,Slide 20
Some data will only be available at the lower / higher frequency vb. daily, weekly, monthly, yearly,…
GNP: only available quarterly
Unemployment: available at monthly level
Economic data: low frequency (quarterly basis)
Stock market data: high frequency (daily basis)
→ if you want to link these 2, you will have to express both variables in the highest possible
frequency. Bv. Daily → quarterly basis
Slide 22
Panel data might be converted into pooled data
→ pooled data = panel data which is converted into cross sectional data
→ no time-dimension: aggregate different time dimensions vb. look at both year 1
and year 2 together.
Slide 25
Refenitiv Workspace/Datastream: data we need for thesis
→ financial and macro-economic data
→ fundamental information about companies, bonds, derivatives,…
Slide 28
If you download data: raw data vb. Price stock; you want to look at changes in the price, so you have
to transform the data.
Slide 29
Return = percentage change in the price
Returns are unit-free bv. 1 → 2 of 1000 → 2000; both +50%!!
Continuously compounded returns = log returns
→ ln (pt) – ln (pt-1)
→ we use this see next slide
Slide 30
R(1 → 5)= r1 + r2 + r3+ r4 + r5
Slide 32
S&P500t = 4353 (index)
The value of an index is not that informative, you have to compare it with the values before!
The price of an individual stock bv. Apple = 145 = value of 1 share of Apple, this is informative itself.
Slide 33
Y1 = 100 and y2 = 106 → increased with 6%
y3 = 109 → increased with 9% tov the base year
Inflation: (CPI2020 – CPI2019)/CPI2019
GDP growth: (GDP2020-GDP2019)/GDP2019
Slide 34
House price index: expressed in nominal terms, the price of a house
→ the prices can increase because there is a higher demand for houses
→ the house price can also increase because of inflation
Convert nominal into real house prices: exclude inflation by using CPI
,Slide 35
2013: 162245 (nominal house price)
(Nominal house price 2013/CPI2013)*CPI2004
(162245/123.6)*100 = 131266
Slide 42
To compute RF (risk-free rate), we divide USTB3M by 12
→ see next slide!
Excess return = return above the risk-free assets: is the return of the S&P500 higher than the risk-free
return?
→ you expect the answer will be yes, otherwise no one would invest in the stock market
→ when we calculate this, we see negative numbers: when the stock market goes down. In average:
positive numbers!
Slide 43
Answer: 2; the risk-free rate should also be expressed on the monthly basis → same frequency!!
Slide 46
Note: don’t put them on the same graph because it would not be very useful
Exercise: don’t do “multiple graphs”, but instead “graph specified vars” and select the both variables.
→ Gretl uses two different y-axes (left and right)
→ right click plot: “use only 1 Y-axes): you can barely see the values of ford (not useful).
Slide 47
Answer: 3; you can only compare the both of them when you use only one y-axis.
Slide 48
Red line: simple regression for the relation between the two variables
→ positive relation: if RSANDP is high, RFORD is high and vice versa
Dot top right: outlier; the return of FORD is much higher than the return we would expect, based on
the return of the stock market.
You can choose to exclude these extreme observations (outliers).
Slide 49
Bins = number of intervals (normally the default options is okay).
Top left: test statistic for normality; p-value (0.0002) < 0.05, so this is not a normal distribution!
Slide 51
Measures of central tendency: gives us the most likely value of a series.
→ here: value in the center of the observation
Measures of spread: measures how the returns are spread over the intervals.
Slide 52
Random variable: you can’t predict the value you will become.
Slide 53
Don’t know formula.
Distribution is symmetric, unimodal (only one peak) and has 2 descriptive statistics (the mean and
the standard deviation).
, Slide 55
N = number of observations.
Desirable econometric property: When the underlying variable is normally distributed, the mean will
also be normally distributed.
Continuous data: all the variables will be different (a lot of numbers after the comma).
Slide 56
You take the square because you want to treat the observations above the mean the same as the
observations below the mean; negative observations become positive!
Slide 57
Q1: the value under which 25% of the values are
Q3: the value under which 75% of the values are
Semi-standard deviation: you only look at the values below the mean, and calculate your standard
deviation (you can also look at the sigma above the mean!!)
→ useful because investors don’t like returns between the mean: how likely are the negative
returns? How are the returns below the mean dispersed?
Unit free: not expressed in dollars, euros,…
→ example: rental values for 3 apartments in each city
Manchester: 90, 100 and 110
London: 900, 1000, 1100
Sigma(Manchester) < sigma (London) because the observations lay further away from the
mean in absolute terms! This is not the case in relative terms (the same).
When you divide the sigma by the mean, the CV will always be the same
Which one to use? Report different measures!!
Slide 58
If you have a normal distribution, skewness = 0.
Often, skewness does not equal 0, so that the distribution will be asymmetric.
- Negatively skewed: higher probability of lower values (investors don’t like this because the
chance at low returns is higher)
- Positively skewed: higher probability of extreme positive observations (investors like this because
the chance at extreme high returns is higher) vb. Lotto: you can only lose what you pay for your
tickets.
Slide 59
If >3: higher chance at more extreme observations (=fat-tailed). Investors don’t like this because they
are risk-adverse.
Slide 61
Answer: 2; Ford (0.7333).
Treasury bills is the lowest → less risk, so less return
Median is completely different! Reason: a lot of outliers ( → look at minimum and maximum!!)
Everything is expressed in percentages! Vb. 0,73%
Standard deviation is the highest for Ford: no compensation like we have with S&P
CV: same scale, not very interesting to compute
Skewness: higher probability of having an extremely high observation for Ford.
Kurtosis: higher possibility for extreme positive or negative observations for Ford.
Slide 62
Answer: 1; Ford