In a previous post, we discussed the prospect of inflation and how one might create appropriate stress for such a scenario on FactSet. The specific example we discussed raises a more general question of the proper design of stress tests. Remember, the problem that we encountered was that most models did not have history that contained any significant rise in CPI. The highest CPI rise observed in the past 20 years was about 5%, while we wanted to see what a 10% rise will look like. The premise of our search was that 5% is fundamentally different from a 10% rise when we are talking about CPI, and thus it would be inappropriate to simply linearly extrapolate from 5% environment to the 10% environment (a scalar of 2).
What is the general idea at work here? Do we always have to observe in history some number of exactly the same impacts as we are trying to model, in order for the covariance structure to make sense? If not, what kind of extrapolation is appropriate and what kind is not? We will first provide some guidelines, and in the next post will dwell more on the reasoning and theoretic economic issues involved.
In general, the process of stress testing should force the practitioner to answer the following questions (we are talking only about factor stress tests, since historical stress testing design is quite obvious):
Question 1: What kind of impact are we trying to model? Stress testing is not about predicting specific events like particular company’s default or a natural disaster. It is all about impacts. In order to properly design stress tests, you have to think about stress testing as a tool that allows you to examine systematic weakness in your portfolio. The best analogy for the process is car crash testing, where a designer cares not about what may cause a particular accident, but rather about a limited number of possible impacts. That is why we use the term Portfolio Crash Testing when referring to stress testing.
The impacts can come from a few categories:
- Broad market impacts described by indices such as S&P 500
- Sector impacts described by indices such as S&P Financials or S&P Technologies
- Economic variables in a loose sense of this term (e.g. oil, gold, CPI, GDP…)
Sometimes it is useful to combine from the same or multiple categories. Our multiple factor stress testing functionality was designed specifically for that purpose.
Question 2: Now that we decided on the financial impacts we are modeling we have to ask; is the model in use well suited for modeling this impact?The model is well suited for our task if there were similar impacts observed in its history. Similar does not mean exactly the same. A useful simplification for the purposes of stress testing is to think about each of the above mentioned factors as roughly two kinds of behavior. One kind could be characterized as more or less trading range; this environment as we will see in the second part of the post is what is usually described by the economic theory as an equilibrium or near equilibrium state. The relationships between assets (described in the case of most risk models as correlations) are mostly stable.
The second kind of an environment is one in which changes are sharp and the relationships can rapidly change (e.g., rise in correlations). This is the extreme environment in which market participants lose any sense of equilibrium, and supply and demand fluctuate sharply, possibly becoming strongly mismatched. If we have some observations of the extreme variety, it is fair to extrapolate from them, even if the magnitude of our stress testing shock is considerably larger. For example, if we saw a 30% decline in S&P 500, it is fair to extrapolate to 50% or even 60%, because the events differ in degree, but do not differ qualitatively in a major way. However, if nothing that we could call a major shock was observed in the sample to a given factor, linear extrapolation is likely to be hopelessly wrong. This goes for inflation. A 10% inflation is fundamentally different to the economy that 2 0r 3%,
even 5%. That is why linear extrapolation from existing CPI data will not work.
We should be clear that for vast majority of the impact the risk model has some observations from the extreme sample; therefore it is quite fair to extrapolate as long as those extreme observations get enough weight in a calculation of the covariance matrix (see next question). In summary, we assert that market conditions when, for example, the S&P 500 went down significantly in 1998 are similar to those observed in the 2008 crash and will be similar should another major sell-off occur. There are many reasons for this and we will elaborate on some in the follow-up to this post.
Question 3: Should I use the Event Weighted or Time Weighted method for stress testing?
A detailed discussion along with empirical testing can be found in
Tail Risk and VaR: Reconciling Theory with Reality in FactSet’s Portfolio Analysis. In short, Event Weighted is suited best for extreme impacts, because it overweighs the extreme observations in the calculation of the covariance matrix. Since stress testing is mostly concerned with major impacts, the Event Weighted method is preferred in majority of cases. The Time Weighted method should be used when we want to determine portfolio moves in the environment where the relationships will stay as they are now and were recently (i.e., no sharp disequilibrium occurs). It is important to note that in times of major market reversals Time Weighted and Event Weighted methods converge, because the Time Weighted method assigns higher weight to the recent observations which also happened to be extreme.
There is one more question remaining.
Question 4: What if we are trying to model something that has no precedent in the history of the model, e.g., a significant rise in CPI?
The way to approach this problem is to consider other impacts that may become highly correlated with the one we are trying to model if a major move in one of them occurs. For example, when we were designing our stress testing product in 2006-2007, one of the most interesting shocks that we wanted to test was a significant decline in housing prices. However, significant declines in housing had not yet occurred, and there were no broad home price declines in the history of the models. This led us to consider what else would happen if housing were to drop significantly. The first and obvious observation was that the financial sector was likely to suffer a great deal in a falling home price environment. Thus, we designed tests around major declines in S&P Financials and called them housing price stress tests. Subsequent events showed that our hypothesis was correct and portfolio reactions were reasonably accurately predicted. Another useful, if more complicated, example is inflation. In the
previous post, we described such an inflation proxy test by simultaneously shocking gold up 40% and keeping real estate flat. We chose to stress gold up 40% and housing prices 0% (flat) for the following reasons. Gold up is a well known inflation hedge, because it really has no reason to move outside of inflation of the money supply. However, it was rising very significantly from 2001 to now, even though there was not a large consumer inflation for most of this period. But how is that possible?
Our hypothesis was that the inflation really was there all along, but it was channeled into assets like real estate and financial assets. It was kept out of the consumer products, because China was exporting deflation. The way that China did it was by keeping their currency artificially low vs. the U.S. Dollar. In other words, they forced their citizens to underconsume, since their currency was worth less than it otherwise would have been had the market forces been allowed to play out. This coupled with the fact that China is a key exporter of consumer products to the U.S. kept consumer prices artificially low. The significant inflation of money supply which was going on since at least 2002 was channeled into assets like real estate and financial assets, as we said above.
What did China get out of this symbiotic relationship? It got to build a huge production base to position itself as the economic powerhouse by using U.S. consumption at the expense of the underconsumption of its citizens as the engine of growth. What did the U.S. get out of it? The U.S. got the ability to lower the Fed Funds rate without paying the price of the consumer price inflation and the resulting instability.
All this long reasoning explains why creating a stress test with only gold prices up 40% was not enough to model this scenario. It would have simply given us the asset inflation scenario that we observed in 2002-2008. That is why we explicitly added the cap to the real estate return of 0%. The flat real estate return suggested that we want to see the impact of a 40% rise in gold prices without the asset inflation component (since real estate was the major beneficiary of asset inflation), that is we want to see a consumer inflation.
As you can see the process of creating proxy tests is quite elaborate and requires much more effort in the design stage. We believe that it is fairly infrequently that we have to resort to this approach, but when we do it can be of great value.
In my next post, we will trace the origins of the equilibrium thinking back through Harry Markowitz to the work of Leon Walras and will show why the Event Weighted method corrects for the problems inherent in assuming financial market runs as “a constant and known statistical process” ( as quoted in a Basel Committee on Banking Supervision report).
Make sure you see part two of this entry by subscribing to this blog.