Tuesday, February 24, 2009

How different are the risk model providers, part two: absolute risk

Continuing with the question from my February 4 entry "How different are the risk model providers?" I shift my definition of risk from Tracking Error (Active Risk in Aegis-speak) to Absolute Risk (Portfolio Risk in Aegis-speak). Another way to think of Absolute Risk is as the tracking error against a cash benchmark.

While both forms of risk are important, I am sure many would have shared my initial guess that the conclusions of the analysis would have been virtually identical and, frankly, not worth another post. To my surprise, this wasn’t the case.

Using the same analytical construct, let’s review the results:
On first glance, Model Y clearly predicts much higher risk than the other models. This is different from what we observed when we considered tracking error. The results are consistent across all styles and the magnitude of the difference is striking. You can clearly and easily see, “Model Y is very different.”

Model X predicts higher risk than Model Z across all style groups, though the magnitude varies, and the smallest differences are in small cap. Probably the most important question in this comparison is whether the difference is statistically significant:
The Welch’s t-test of statistical significance confirms what we can see by eye-balling the averages. So, in total, this analysis suggests the three models are indeed significantly different predictors of absolute risk.

Coming soon: When we talk to clients about risk, we always focus on magnitude and direction. So, for my final perspective on this study, we will consider the three- and 12-month change in predicted risk. Has the predicted risk changed similarly across the three models?

To receive future posts by e-mail, subscribe to this blog.

Monday, February 23, 2009

Taking Risk Bonus: Hear our interview with "My Life as a Quant" author Emanuel Derman

The latest in our podcast series features Emanuel Derman, author of My Life As a Quant and formerly the head of the Quantitative Risk Strategies group at Goldman Sachs. In this interview, Derman discusses how financial models have been recently misused as well as his perspectives on how much to rely on models in your process.

Along with defending the use of risk models, Derman proposes possible solutions for restoring confidence in ratings agencies. Derman states that he is often asked how much a role financial engineering played in the crisis:
"It’s not black and white, but the Icelandic banks didn’t go under because their value-at-risk model was wrong or because of subprime CDOs, but they just rode a boom of borrowing short and leveraging themselves and lending lots of money, and one day they couldn’t borrow anymore."
To hear more from Emanuel Derman, listen to our entire interview on iTunes. You can also listen to the full audio online, or read a transcript of the interview. Part two of the interview will be released next week.

Thursday, February 19, 2009

The five best financial and risk management journals

When I returned from vacation Monday after a week out of the office, the pile of financial journals cluttering my desk that I received while away sparked an idea for my blog this week. At FactSet, we receive numerous questions from our clients asking for suggestions on which financial publications are most relevant as there are so many places to read articles on finance in general and in the areas of risk management and quantitative investing in particular.

Among the scores of websites, magazines, journals, and newsletters vying for our attention each day, these are my top five recommendations:
  1. CFA Institute Financial NewsBrief. I start each day reading this morning e-mail, which summarizes top news stories of the day on topics such as the global financial markets, business, economics, and politics. All summaries contain links to the full article. You can sign up to receive the daily NewsBrief here.

  2. Financial Analysts Journal. Published six times per year by the CFA Institute, the Journal is a collection of papers on general finance topics from some of the top names in the industry. The most recent issue includes articles from John Bogle, Harry Markowitz, and Emanuel Derman.

  3. CFA Digest. Another CFA Institute publication, this one comes out once each quarter and provides brief summaries of approximately 50 recent papers. After skimming the topics and authors on the contents page, I then read the summaries of the topics I find interesting. The summaries list the publication in which the complete paper was published. (The CFA Institute also puts out some other worthwhile publications such as the CFA Magazine.)

  4. Journal of Portfolio Management and 5. GARP Risk review. For topics more directly related to portfolio, quantitative, and risk management, these are the two publications I read most frequently.

Honorable Mention: Other great sources of research on these topics can be found on the websites of FactSet’s risk model vendors. Northfield, Barra, APT, and Axioma all produce an extensive number of research papers on their websites.

Please share your thoughts on which publications you find most valuable in the comments below.

To receive future posts by e-mail, subscribe to this blog.

Wednesday, February 11, 2009

The behavorial psychology behind stress testing

If you were to predict people’s reactions to a stressful event, such as a fire, where would you start? Would you gauge their responses to a similar high stress event like an earthquake, multiply their reactions to a mildly stressful event like a traffic jam, or monitor their emotions on a relaxing weekend day?

The Basel Committee on Banking Supervision recently released a very interesting document called “Principles for Sound Stress Testing Practices and Supervision.” The ideas expressed in it are broadly relevant and go far beyond banking industry-specific issues. Here are two examples:

“Most risk management models, including stress tests, use historical statistical relationships to assess risk. They assume that risk is driven by a known and constant statistical process i.e. they assume that historical relationships constitute a good basis for forecasting the development of future risks.”
"…given a long period of stability, backward-looking historical information indicated benign conditions so that these models did not pick up the possibility of severe shocks nor the build up of vulnerabilities within the system. Historical statistical relationships, such as correlations, proved to be unreliable once actual events started to unfold… Extreme reactions (by definition) occur rarely and may carry little weight in models that rely on historical data.”
These quotes are interesting to us for two reasons. First, they exhibit a very common misconception that assuming a constant process and measuring stable correlations is pretty much the only way to use historical data. Secondly, they indirectly validate the method for performing factor stress tests that FactSet introduced last Spring, called Event Weighted testing.

As for the first point, the crux of the matter is in the assumption of a stable and a linear system. This type of thinking can be roughly understood in the following way: Airline stocks depend on oil prices as related their inputs. Oil stocks depend on oil prices as related to their outputs. Therefore, there should be some stable economic relationship between the airline companies and the oil companies that would warrant their stocks to move together in a stable pattern. This is the constancy assumption. The assumption of normal distribution by using correlation as the measure of that relationship completes the picture by introducing the assumption of linearity.

Stability and linearity of relationships are ideas that led to the much publicized talk of “decoupling” of international markets from the U.S. in the Summer of 2008. It was prominent enough for Donald Kohn, the Vice Chairman of the Federal Reserve, to discuss at the International Research Forum on Monetary Policy as late as June 2008. The idea was that foreign markets – and especially emerging markets – diversified their trade and could withstand a U.S. meltdown. Of course, the ensuing events quickly put that idea to rest, but it seemed plausible based on the assumption of stability and linearity.

What it failed to take into account is that during extreme events the forces in play are not really economic, but rather trading-induced and psychological. The financial system is not some random process, but driven by real human beings who frequently make very similar judgments (call it herd behavior if you like, I just don’t like that term) based on their emotions, especially in times of crisis.

There is not any permanent linear correlation such as would be described:
When stock A goes down 4% stock B usually goes down by about 2%; therefore when stock A will go down 40% stock B will go down by about 20%. No, in fact stock B is likely to follow stock all the way to the 40%, because in crisis the economic relationships are not nearly as important. This discontinuity produces a so-called “rise in correlations,” which the document cites.

One could say hindsight is 20-20. But the idea of decoupling was not difficult to disprove even based on the historical data available in August of 2008.

Consider the following table:

Correlations of S&P 500 with Emerging Markets as of 8/31/2008
The “Calm” row contains correlations of S&P 500 with various emerging markets calculated using one year of data prior to 8/31/2008. However, if we used only the 25 most extreme days during the 8/31/2007 to 8/31/2008 period, the picture would be completely different. The decoupling and the suspect diversification benefits vanish even without considering the events of Fall 2008. This is because extreme environments, though caused by different events (in fact an infinity of possible trigger events) produce similar environments, which can be modeled by focusing on the historical extreme data points, rather than simply historical data without any regard for the period from which it was drawn.

This is precisely the idea of the Event Weighted stress testing which FactSet introduced in presentations last Spring. In summary, stress tests should not use the data from benign conditions AT ALL. Depending on the stress test, historical data points that are used to model the correlations for the stress test have to be drawn from the most similar extreme events. Risk in financial markets is very much about people’s behavior, and you don’t learn about people’s behavior during a fire by observing them on a calm picnic day.

To receive future posts by e-mail, subscribe to this blog.

Monday, February 9, 2009

Taking Risk Bonus: Hear our interview with R-Squared's Jason MacQueen

Jason MacQueen, one of the founders of R-Squared, a custom risk model provider, sat down with Taking Risk's Sean Carr to discuss a new short term global model available in Portfolio Analysis. MacQueen provided insights as to the logic behind short term models, the difference between a global and domestic perspective, and the R-Squared perspective on dummy variables.

To hear the interview in its entirety, listen on iTunes or stream the full audio online (the file may take a few moments to download). You can also read a full transcript of the interview.

Wednesday, February 4, 2009

Really, how different are the various risk model providers?

In my countless client visits over the last 10 years, the most popular risk question has easily been some form of, “Really how different are the various risk model providers?” I have always responded by jumping into the differences between the models. Usually, this starts with named factor vs. principal component vs. hybrid models. From there, I delve into how the factors are defined. But, I don’t know that I ever really answered the question.

So, I want to share a pretty straightforward analysis to address the risk practitioner’s question, “How different are APT, Barra, and Northfield?” Over the next few weeks, I will share additional variations of our core question.

But first, let’s begin with a comparison of the tracking error.

Using FactSet’s LionShares database of mutual fund holdings, I’ll focus on 300 U.S. equity funds. The funds represent the 30 fund constituents of Lipper’s mutual fund indices for the nine standard style boxes and equity income because I wanted to use real portfolios and didn’t want to arbitrarily select the portfolios. I focus on tracking error (Active Risk in the language of Aegis) as of 12/31/2008 using the Russell style index as the benchmark for each style box and the S&P 500 as the benchmark for Equity Income.

I compare the tracking error of APT U.S. Long, Barra USE3L, and Northfield U.S. fundamental. First, I compare the average tracking error and then judge whether the difference is statistically significant by whether the Welch’s T-test is greater than two. While it is reasonable to debate whether this is the perfect theoretical test, I believe that this test best represents the perspective of an investment manager or plan sponsor.

For our purposes, we will refer to the three models as X, Y, and Z. My task isn’t to suggest which model is “best.” Frankly, my analysis doesn’t offer genuine insight on that question and including the names of model providers might mislead the reader to conclusions about the “best” concept.

Let’s review the results:


Looking at Table 1, Models X and Y appear similar. The overall difference is quite small, and we don’t see either model consistently larger. In fact, if anything, we would say that Model X suggests slightly higher tracking error for Large Cap and Mid Cap strategies while Model Y predicts slightly higher tracking error for Small Cap strategies. But, are the differences significant?

On the other hand, Model Z looks to suggest far lower tracking error than either of the other models. This is true across all fund categories.

Turning our attention to the statistical significance:

The test of statistical significance seems to really only bolster the intuitive conclusions drawn from reviewing the averages. The shaky observation that models X and Y might be similar in aggregate, but different for large, mid and small caps lacks support. We also see that the differences between model Z and either X or Y is virtually always significant.

So, how different are APT, Barra, and Northfield? This analysis suggests two of the three models are similar and the other predicts significantly lower tracking error. Though this is just US equity, the conclusions are quite consistent across size and style.

Coming soon: In my next entry, I will remove the benchmark to determine how (or if) our conclusions change when we change from Tracking Error/Active Risk to Absolute Risk/Portfolio Risk.

To receive future posts by e-mail, subscribe to this blog.