HC 3 Forecasting
t=1,2,3,… Index denoting time period of interest (t=1 is first period)
y1,y2,y3…,yn yt denotes value at time period t (vb. gem temperatuur; t=1 staat voor dag 1 y1 is gemiddelde temperatuur op dag 1)
Let op: Gender should be considered a factor in R instead of character, because gender is a categorical value.
Forecasting process
1. Forecast goal Definition; determine forecast goal.
- Descriptive 'Time series analysis’; describe patterns in historical data (trends, seasonal)
Predictive ‘Time series forecasting’; use to predict future values (can’t use future information).
- Forecast horizon & forecast updating; forecast use determines horizon (bv. 1 month
forecast for revenue management). Level of automation depends on task & use in practice
→ Roll forward forecast= refreshing of forecast based on new data
2. Get data: Data Collection & cleaning → data sources
3. Explore & Visualise Data; view data in R & check values (plot)
SYNTAX:
ts is a function to create time series objects start/end= begin date, frequentie
→ name.ts <- ts(filename$category, start = c(year, month), end = c(year, month), freq = #months)
plot(x, y, main="title", xlab=“X-ax naam", ylab=“y-ax naam”, xlim=c(xmin, xmax), ylim=c(ymin, ymax))
Time Series Components
Systematic; - Level; describes average value of the series in given data
- Trend; looking at past sales to determine possible trends to predict future (growth/decline of graph)
- Seasonality; predictable cyclic variation depending on the time within the year (seasonality)
Non systematic; - Noise; random variation that results from measurement error or other causes that are not accounted
Time series with additive components; yt= Level + Trend + Seasonality + Noise → Additive; values vary by constant amount
Time series with multiplicative component; yt= Level * Trend * Seasonality * Noise → Multiplicative; values vary by percentage
STL function captures 4 components
The time series components are level, noise,
trend, and seasonality. Level and noise are
always present. Trend is likely to be present.
Seasonality is likely to be present if the data
follows the same patterns (seizoenen)
4. Pre-Processing; only need a small data amount
- Obsolete or redundant fields
- Missing values
- Outliers; extreme values
- Values not consistent with policy
- Time span (irrelevant period or data)
- Unequally-spaced series; fout tijdinterval waarden
Missing values can affect ability to generate forecasts AND AFFECTS PERFORMANCE MEASURES
How to handle missing data?
- Ignore, forecasting method does not use the variable
- Omit missing value
- Replace missing value with a constant, mean, randomly generated value etc.
- Replace missing value with value based on other characteristics of the data
Remove data that don’t help with forecast (mistake / duplicates)
Imputing value; Use prediction / forecasting methods to impute the values
- Naive forecast (easiest)= predict value for next week depending on value of similar time span (this week). Ft+1=Yt
Seasonal naive forecast; Ft+k=Yt-M+k
→ Advantage; simplicity and ease of deployment
→ Disadvantage; it does not take trends & external information into account
→ Improve; Combine data-driven & model-based methods (average their Ft)
5. Partition Series; Workflow to create data partitions;
- Training set → contains data used to build various models
- Validation set → used to compare performance of each model; benchmark
- Test set → used to assess performance of chosen model with new data
Choose training/validation that mimics forecast horizon, this depends on:
- Forecast horizon
- Seasonality
- Length of series; not too long as longer period gives less recent info
- Underlying conditions affecting series
,Difference between data partitioning in cross sectional and time series data;Cross sectional data partitioning is usually done
randomly. In time series, a random partitioning does not mimic temporal uncertainty where we can use past to forecast future.
6. Apply Forecasting Method
What forecast methods are we going to use? Data-driven, model-based or judgmental? Combining or Ensembles?
Data driven methods; “learn” patterns from data (adjusts over Model based methods; use mathematical model to forecast.
time). Possible with naive forecast & smoothing methods. Works on small data sets and can include external
Advantageous when when structure of time series changes over information.Training data used to estimate parameters.
time and requires less user input and are easily automated. Estimated parameters are then used to generate forecasts.
Large time series is necessary for adequate learning. Possible with; regression, neural networks, machine learning
→ better for local (changing) patterns → better for series with global pattern
Wanneer data onvoorspelbaar is (fluctuatie), is data driven beter, Met noise en weinig signal, is model based beter omdat data
omdat het zich aanpast door de tijd heen. niet fluctueert en een mathematical functie kan gebruiken.
Judgemental Forecast; used when there is lack of good data. - Combining multiple forecast can improve predictiveness
Also used to adjust statistical forecast and to compare/combine - Two-level forecast: method 1 gives forecast , method 2 uses
data with. forecast errors (from 1) to generate future forecast errors.
→ methods; delphi method, scenario forecasting Ensembles; apply multiple forecasting methods to same series.
Average different forecast for higher precision.
7. Evaluate & Compare Performance
Model which fits data well does not necessarily forecast well
- Underfitting; Performs poorly on training data. Model is too simple.
→ Increase complexity by adding trend/seasonality
- Overfitting; Performs well on training data, but poor on validation
set (cannot generalize data).
→ Decrease model complexity by removing trend/seasonality
If forecasting model gives satisfactory performance on validation set; The
training and validation periods must be re-combined, and then the forecasting
model should be applied to the complete series before producing future forecasts.
8. Implement Forecast/System
should be applied to the complete series before producing future forecasts.
Predictive Accuracy measures: FORECAST ERROR=RESIDUALS
Positive forecast error —> systematically under forecasting
Forecast error = Actual value - Forecast value; Gives forecast difference at time t
Mean absolute error/deviation → Gives magnitude of average absolute forecasting error.
Disadvantage; forecast error is scale dependent
Average Error → Gives an indication whether forecast are averagely over/under forecasting
Disadvantage: forecast error is scale dependent
Mean absolute percentage error → Gives a percentage of forecasts deviation from the
actual values (if value is 0, MAPE cannot be calculated).
Advantage; scale independent.
Disadvantage; cannot use 0 value & puts heavier penalty on negatives
Mean squared error → Gives averaged squared error as forecasting error.
Advantage: sensitive to large errors, accepts 0.
Disadvantage; Hard to interpret, not scale independent
Root mean squared error → has same units as data series (model with lower RMSE for
training period is more useful for describing components of time series, model with lower
RMSE for validation period is better for forecasting)
Performance Charts and metrics are used to: 1. Evaluate predictive accuracy on validation period
2. Assess overfitting by comparing training & validation performance.
Example
MAE=1/3 * (|-1|+1+|-2|)=
Average error=1/3* (-1+1+-2)=
MAPE=1/3 * (|-1/1|+1/11+|-2/15|)= * 100=
RMSE= WORTEL (1/3 * ((-1)^2+(1^2)+(-2)^2))=
\
, HC 4
Smoothing method; based on averaging values over multiple periods to reduce noise
Estimating level; Naive forecasts, mean forecast, moving average (MA), and simple exponential smoothing (SES) are used to
forecast series with no observable trend or seasonality. Level estimated by previous values or averages.
1. Moving average
Centered moving average: based on a window W centered around time Yt
- Odd width: center window on time t, average the values in the window (5)
- Even width: take two most centered windows and average the values in (4)
Trailing moving average: based on a window from time Yt and backwards →
Window Width; Wider windows will expose more global trends, while narrow windows will reveal local trends.
Forecasting: use domain knowledge to determine the best window size.
Longest window → Over smoothing; you take average of entire sample (W=Sample size)
Smallest window → Under smoothing; apply naive forecast (W=1)
R—> You should de-trend as wel as
de-seasonalize the series before using
an MA to generate forecasts.
♦ Forecast jaar 20 = jaar 21+t als er geen waarden beschikbaar zijn voor jaar 20
♦ MA assumes no trend, so if trend continues to increase the predicted MA value will most likely under forecast
Link between MA & SES; In both methods the user must specify a single parameter. In moving average → window width (w).
In exponential smoothing → smoothing constant(α). Parameter determines importance of new information over older ones.
Relation between a window width (w) and smoothing coefficient (α) is approximately w= 2 / α - 1
α=2/(w+1) → low w gives high α
2. Simple exponential smoothing; use a weighted moving average (α) so that weights decrease exponentially in the past
Compared with MA, SES gives more weight to recent observations
→ yi= sum van de gegeven waarden
→ toekomstige voorspelde waarden blijven gelijk aan Lt +k wanneer
werkelijke(nieuwere) waarde niet wordt gegeven (yt+k)
Smoothing Constant Alpha → α determines how much weight is given to past
- α=1; past observations have no influence over forecast → Under smoothing
- α → 0; past observations have large influence on forecast = Over-smoothing
3. Adaptive Learning Process &
→ Toekomstige voorspelde waarden blijven gelijk aan Ft +k wanneer werkelijke waarde niet wordt gegeven (yt+k)
Differencing is removing a trend and/or seasonality from a time series by
taking the difference between two observations.
- Lag-1 difference: yt – yt-1 removes trends
- Lag-M difference: yt – yt-M removes seasonality, depending on timespan
(You compare winter 2017 with winter 2016 with lag -4, 4 seasons)
- Double-differencing: difference differenced serie to remove quadratic trend
♦ NOTE! Simple exponential smoothing or Holts Method is better if serie has a lot of noise and no seasonality!!!