Let \(Y_t\) be the actual value of a point in a time series at time \(t\). A forecast for the
value at time \(t+1\), \(F_{t+1}\), can be made using a weighted average of the previous known
values:
$$ F_{t+1}=w_0Y_t+w_1Y_{t-1}+w_2Y_{t-2}+\cdots $$
When \(N\) terms are used and \(w_0 = w_1 = \cdots = w_N = \frac{1}{N}\), the series produces
an \(N\) length backwards moving average. Note, other moving averages such as central or forward moving averages can be constructed by defining where the range of data used falls
with respect to \(t\).
If instead of the above, the form \(w_i=\alpha(1-\alpha)^i\) for some constant \(\alpha\)
is used the forcast becomes
$$ F_{t+1}=\alpha Y_t+\alpha(1-\alpha)Y_{t-1}+\alpha(1-\alpha)^2Y_{t-2}+\alpha(1-\alpha)^3Y_{t-3}+\cdots $$
Factoring out \(1-\alpha\) from the second terms onwards on the right-hand side gives
$$ F_{t+1}=\alpha Y_t+(1-\alpha)(\alpha Y_{t-1}+\alpha(1-\alpha)Y_{t-2}+\alpha(1-\alpha)^2Y_{t-3}+\cdots)
$$
But, by definition
$$ \alpha Y_{t-1}+\alpha(1-\alpha)Y_{t-2}+\alpha(1-\alpha)^2Y_{t-3}+\cdots = F_{t} $$
So the expression for the forecast simplifies to
$$ F_{t+1}=\alpha Y_t+(1-\alpha)F_{t} $$
i.e., the forcast at the next time point is equal to the weighted average of the last
time point and the forecast value of the last time point. The weighting factor, \(\alpha\),
is found such that it minimises the difference between the known data and the series
forecasts for those points using the above induction, subject to
\(0 \leqslant \alpha \leqslant 1\). For the simple expontial smoothing example above the
value of \(\alpha\) which minimises (in this case) the mean absolute error is found
to be \(\alpha=0.521\). A value \(F_1=Y_1\) can be used to initialise the forecast.
The Holt-Winters method expands this exponential smoothing to include an expected level and slope,
\(a_t\) and \(b_t\) respectively, to allow for the trend, and a seasonal adjustment, \(s_t\),
to allow for seasonal variation. For an additive trend and seasonality, the smoothing is carried out using
\begin{aligned} a_t &= \alpha (Y_t - s_{t-p}) + (1-\alpha)(a_{t-1}+b_{t-1}) \\ \\
b_t &= \beta(a_t - a_{t-1})+(1-\beta)b_{t-1} \\ \\
s_t &= \gamma(Y_t - a_t)+(1-\gamma)s_{t-p}
\end{aligned}
where \(\alpha\), \(\beta\) and \(\gamma\) are the smoothing coefficients for the level,
slope and seasonal values respectively, and \(p\) is the period of the seasonal data; 12
in this example. To initialise the series at least one whole seasonal cycle plus an
extra value is required, and
the values of the forecast for fitting purposes starts at \(F_{p+1}\). The initialisations
for \(a_p\), \(b_p\) and \(s\) are
\begin{aligned}
a_p &=\frac{1}{p}(Y_1+Y_2+ \cdots +Y_p) \\ \\
b_p &= \frac{1}{kp} \sum_{i=1}^{k} \left( Y_{p+i} - Y_{i} \right) \\ \\
s_i &= Y_i - a_p, \quad i=1,2, \cdots , p.
\end{aligned}
where \(k\) is the number of data points beyond \(p\) used to initialise \(b_p\).
Ideally \(k=p\), so two whole seasonal cycles are used. However, if this is not possible
a lower value can be used subject to a minimum value of \(k=1\).
The forecast \(m\) months ahead is given by
$$ F_{t+m} = a_t + mb_t + s_{t-p+m}$$
Again, the values of \(\alpha\), \(\beta\) and \(\gamma\) are found
that minimises the difference between the known data and the series
forecasts for those points using the above induction, subject to
\(0 \leqslant \alpha,\beta,\gamma \leqslant 1\), and using \(m=1\).
The equations above are relevant for an additive trend and seasonality. There are other
possible combinations, as shown below, each with its own set of equations. The set below
is known as the Pegels' classification scheme.
Pagels' classification scheme of time series types.
No trend, no seasonal |
No trend, additive seasonal |
No trend, multiplicative seasonal |
Additive trend, no seasonal
|
Additive trend, additive seasonal
|
Additive trend, multiplicative seasonal
|
Multiplicative trend, no seasonal
|
Multiplicative trend, additive seasonal
|
Multiplicative trend, multiplicative seasonal
|