Estimated reading time: 13 minutes
In the previous articles we have explored two regulatory frameworks for the derivation of interest rate risk. Solvency II for the insurance sector in the EU and Wtp for the pension sector in The Netherlands. Let’s now perform a model validation on a statistical model that calculates the interest rate risk purely on historical data; we call this the BASE method.
The BASE method explores the full available distribution of changes in the data series and makes an inference about the likelihood of extreme observations over multiple projection horizons1. The structure resembles the percentile calculations in the Wtp model where the distribution of projections into the future is used to say something about the risk. The difference lies in the fact that the BASE method utilizes only observed or historical data points. The relevance and reliability of the calculated risks is therefore influenced by the length and appropriateness of the historical data.
For the current analyses we have combined two datasets: long maturity Dutch sovereign bond rates (source: KEF, from 1900) as well as 12-year maturity zero-coupon bond rates (source: DNB, from 2004). Where the two datasets overlap, we assume that the DNB dataset is the most accurate. We have analyzed two periods: 1900-2023 (full period) and 1995-2023 (EU period).
How would our BASE interest rate risk model look if we have limited data points (EU period)?
First, we explore interest rate projections created using a restricted period starting from 1995, close to the initiation of the European Union. This period is mainly characterized with declining interest rates and a few short periods with interest rate increases.
Given that we observe relatively short periods of interest increases, it is not surprising that when we use the BASE method with this dataset to calculate the risks up untill 10 years ahead, we only observe an increase in the negative (VaR99.5%) scenario in the first 2 years. The upside risk then flattens up till year 4 and has a downward trajectory afterwards (see Figure 2). We see that the negative scenario reaches a maximum of 6.1% in year 3. After that it decreases to 3.8% in year 10 (somewhat similar to 2023 levels of 3.1%). For the VaR50% and VaR0.5% we see both downward sloping directions for most of the predicted periods. This is not surprising as the interest rate data over the restricted period, since 1995, clearly had a downward trajectory.
When we compare the BASE predictions with the extended Solvency II (2020) model we note that the predictions up to 2 years ahead roughly align. However, as we go further in the future we see that these BASE predictions are much less conservative than the Solvency II model.
How accurate is the BASE interest risk estimation based on the shorter period?
We use the model validation method introduced in previous articles to backtest the reliability of the BASE interest risk method. In this case we use 2005-2023 as the validation period over which we estimate performance metrics (correlation and validation). The graph below plots monthly data of 1-year ahead projections for VaR99.5%, 50% and 0.5% scenario relative to the actual observed rate in that period. For each datapoint in the validation timeframe, we use the real data from 1995 until the preceding year to derive the percentiles. E.g. for the 2008 percentiles calculation we have used 1995 – 2007 realized interest rate data.
With a correlation of 85% between the VaR50% and the realized DNB rate, the BASE method is relatively good at capturing the expected trend. Since we are interested in capturing risk, we want to focus on the tails. The validation shows that only 93% of the observations fall below the estimated 99.5% VaR level. Therefore we are not able to capture 6.5% of the extreme observations. Furthermore, 3% of the observed data lies below the VaR0.5% level. This indicates that we are not capturing the lower tail accurately either. Although these results are better than the current Solvency II (2016 version) model, we note that the overall volatility, as measured by the distance between the upper and lower tails, was also too low during the COVID and post-COVID period, which does not align well with reality.
Can we improve the BASE interest risk accuracy by extending the historical dataset?
If we go further back in time and look at the development of the interest rate from 1900 onwards, we see that, except for the peaks around the two World Wars in 20s & 40s, the period until 1960 is relatively stable. After that we observe a more volatile period characterized with a peak in 1980, a low in 2020. Along the way we observe significant fluctuations .
Calculation of the projected percentile levels using interest rate changes on the full dataset leads to more stable outcomes. It resembles more the projections of the extended Solvency II (2020) model. We can observe this in the graph below. We find that the projected VaR50% level fluctuates slightly around the current level of 3.1% over the full prediction period. Similar to the prediction based on the shorter period (see Figure 2), the VaR99.5% level slightly drops in year 5. After that it increases to 7.4% by year 10. For the VaR0.5% level we observe an increase after year 6, contrary to the continuous decrease for the extended Solvency II model.
Backtest on the BASE model with full history
Utilizing the full history of the data series in the BASE method clearly leads to better results. When we again use the 2005-2023 period as our model validation period, we see clear improvements in the tail estimation of the interest rate risk BASE model. And also, we observe that the correlation between the realized DNB rate and the VaR50% remains strong at 85%.
The percentage of observations over the test period which lies below the 99.5% level is now already 96%. And we do not see any realized data points which are under the 0.5% level. Furthermore, the range of the estimated risks tends to be a bit larger. Also, we see that the model captures the interest jump in 2022 a lot sooner. Now around the mid 2022 instead of beginning of 2023 in the shorter validation period, see Figure 3. Overall, over this test period, we are clearly much better able to capture the extremes by utilizing the longer 1900 data set rather than the shorter 1995 version.
What if we perform a long backtest, since 1950?
In addition to our backtest since 2005 we would want to know how the same BASE model would perform for a long backtest period, from 1950 to 2023.
It’s good to see that our model maintains a good performance under varying conditions. For the upper tail we see 4% breaches where we expected 0.5%. This means that the validation difference of 3.5% did not further increase. For the lower tail we now see 3% breaches instead of the expected 0.5%. This means a validation difference of 2.5%, which signals a pretty good accuracy.
Also good to note is that the costs of this model stay low and adjust gradually to the increasing interest volatility. We see an average cost of 1.47% for the upper tail and 1.16% for the lower tail. These numbers are clearly more usable than the model costs we observed for the Solvency II models.
How accurate are 10 year ahead projections?
In addition to 1 year ahead prediction, we have also performed validations on predictions multiple years in the future. These show us that performance statistics remain high for projections up until 4 years into the future. However, as we try to project further into the future, we note that we end up with less reliable estimates.
Figure 8 below for example, shows the model validation results of interest rate risk projections of 10 year ahead. We note that, due to the falling interest rates, all observations now fall below the 99.5% scenario (or even the 95%). This shows that we are too conservative with respect to the upper tail risk when we predict 10 years ahead. In addition, 20% of the observations now lie below the 0.5% scenario. This shows that the lower risk estimate was not low enough.
These observations also shows that the size of the declines in the 2015-2020 period was quite exceptional compared to previous periods observed in the dataset. We note that the realised rates did not cross the lower percentile aggressively, but rather stay close to that lower band. This shows us that, although not very accurate, the model still provided a useful risk estimation for a prediction far in the future.
This also serves as a reminder that one should be more cautious when utilising risks calculated for the more distant future. For example, by using more conservative risk percentiles. Or, by updating the risk estimates more frequently, preferably on yearly basis.
Do metrics above indicate that we should always use the full dataset when we calculate interest rate risk?
Figure 6 and 7 show that utilizing the full history leads to better backtest results compared to using only the relatively short EU period. Indeed, having sufficient observations is very important for every statistical analysis. Otherwise, it would compromise the validity of the results.
Nevertheless, using the full dataset is not necessarily better if the current environment is different from most of the periods in the available dataset. For instance, if the available data includes mainly periods of relatively stable inflation, economic growth and employment our risk estimations would provide quite gentle risk estimations that would not be appropriate if the macro-economic environment changes to what we saw in 2022.
Therefore, what matters more than using the full dataset is using the “correct” data points. That is, we would want to restrict our estimation to include only periods that resemble the current and potential subsequent economic situation to get a proper risk estimation. It is evident that the shorter EU dataset failed to provide us with a context similar to the post-COVID environment.
In a nutshell…
Regulatory frameworks such as Solvency II or Wtp can be a good start to obtain an interest rate risk estimation. However, this model validation shows us that a pure statistical approach, such as the BASE method, can lead us to a more accurate interest risk estimation.
We also see that the length of the historical data series is of paramount importance for an accurate risk estimation. When using a shorter dataset starting from 1995 we see that, although better than the current Solvency II model (2016 version), this would not capture the interest dynamics of the post-COVID period. However, when we use the full dataset since 1900, we see a substantial improvement in the backtest results, which showed to be more accurate than the updated Solvency II model (2020 version) and less costly.
Limitations
The BASE method has some limitations. One is that the BASE method does not say much about the context we are in. If we do not properly understand the current dynamics that affect the interest rate, we cannot be sure that either the full (1900-) or the EU-era (1995-) period is better at estimating the future interest rate risk. We could of cours opt to always choose the dataset that shows the most conservative risk. However if the environment is one of relative stability, many would find this too conservative, not needed and costly. Therefore we cannot consider those predictions ‘better’ if the current context is very different from the past dataset.
The second limitation stems from the inability to adequately capture the risk on different maturities. The dataset before 2004 only contains 10–15-year maturities. Therefore if we want to use the BASE method for different maturities, we would have to make additional model assumptions.
We will address the limitation of independence of context by performing a model validation on the REGIME-method. If you want to read more about the principles behind this method, you can read the article onreference class methodology.
In addition we want to know how all these models compare to each other when we compare them with objective measures. We do this in the article about the selection of an optimal risk model.
Footnotes
- This model can be seen as a simplified Vasicek model. We do not make any assumptions about the distribution but follow a pure data driven approach. We do not apply mean reversion correction. This would make the model slightly more conservative compared to a model where mean reversion is applied.This approach is consistent with findings by Jan Willem van den End who found a mean reversion parameter close to 0 for long-term interest rates in his research paper ‘Statistical evidence on the mean reversion of interest rates’ .
↩︎