Wednesday, February 4, 2009

Really, how different are the various risk model providers?

In my countless client visits over the last 10 years, the most popular risk question has easily been some form of, “Really how different are the various risk model providers?” I have always responded by jumping into the differences between the models. Usually, this starts with named factor vs. principal component vs. hybrid models. From there, I delve into how the factors are defined. But, I don’t know that I ever really answered the question.

So, I want to share a pretty straightforward analysis to address the risk practitioner’s question, “How different are APT, Barra, and Northfield?” Over the next few weeks, I will share additional variations of our core question.

But first, let’s begin with a comparison of the tracking error.

Using FactSet’s LionShares database of mutual fund holdings, I’ll focus on 300 U.S. equity funds. The funds represent the 30 fund constituents of Lipper’s mutual fund indices for the nine standard style boxes and equity income because I wanted to use real portfolios and didn’t want to arbitrarily select the portfolios. I focus on tracking error (Active Risk in the language of Aegis) as of 12/31/2008 using the Russell style index as the benchmark for each style box and the S&P 500 as the benchmark for Equity Income.

I compare the tracking error of APT U.S. Long, Barra USE3L, and Northfield U.S. fundamental. First, I compare the average tracking error and then judge whether the difference is statistically significant by whether the Welch’s T-test is greater than two. While it is reasonable to debate whether this is the perfect theoretical test, I believe that this test best represents the perspective of an investment manager or plan sponsor.

For our purposes, we will refer to the three models as X, Y, and Z. My task isn’t to suggest which model is “best.” Frankly, my analysis doesn’t offer genuine insight on that question and including the names of model providers might mislead the reader to conclusions about the “best” concept.

Let’s review the results:


Looking at Table 1, Models X and Y appear similar. The overall difference is quite small, and we don’t see either model consistently larger. In fact, if anything, we would say that Model X suggests slightly higher tracking error for Large Cap and Mid Cap strategies while Model Y predicts slightly higher tracking error for Small Cap strategies. But, are the differences significant?

On the other hand, Model Z looks to suggest far lower tracking error than either of the other models. This is true across all fund categories.

Turning our attention to the statistical significance:

The test of statistical significance seems to really only bolster the intuitive conclusions drawn from reviewing the averages. The shaky observation that models X and Y might be similar in aggregate, but different for large, mid and small caps lacks support. We also see that the differences between model Z and either X or Y is virtually always significant.

So, how different are APT, Barra, and Northfield? This analysis suggests two of the three models are similar and the other predicts significantly lower tracking error. Though this is just US equity, the conclusions are quite consistent across size and style.

Coming soon: In my next entry, I will remove the benchmark to determine how (or if) our conclusions change when we change from Tracking Error/Active Risk to Absolute Risk/Portfolio Risk.

To receive future posts by e-mail, subscribe to this blog.

4 comments:

  1. You refer here to the concept of a 'best' risk model. Can you expand a bit on this- what does this mean to you? Surely the determination of 'best' is in the interpretation and use of the model, rather than the model's construction?

    ReplyDelete
  2. What would be a better test of course, is to plot predicted tracking error, vs realized tracking error. Simply add a fourth column in table 1 which is realized and look at statistical significanse against realized, not each other..
    Steve Greiner; Allegiant Asset Mgmnt

    ReplyDelete
  3. Steve, thanks for your comment. I agree comparing ex-post vs. ex-ante is a more appropriate test to determine which model provider is “best” for you.

    Bill Latimer, former FactSet Quant Specialist, did an interesting presentation on this subject at the FactSet Symposium in 2003 in San Diego. Also, in snooping around for research on this topic, Sean Carr found an interesting study from Jean Paul van Straalan at ABN Amro (http://www.financialplaces.com/actemem/ms/suptech/managing-ex-ante-tracking-error). Let me just elaborate a bit more on the comparison.

    Because the three models have long-term horizons, you should compare the ex-ante risk against the twelve month ex-post risk. So, looking at calendar 2008 for example, ex-ante risk as of 12/31/2007 compares to the monthly ex-post tracking error from 12/31/2007-12/31/2008. This comparison can be easily done in Portfolio Dashboard.

    But, you can’t use the actual portfolio returns to calculate ex-post risk because you would incorrectly account for trades that aren’t known by the ex-ante risk calculation. So, you need to freeze the holdings as of 12/31/2007 and then look at the subsequent twelve monthly returns as if there was no trading in the portfolio. This is easily done in Portfolio Analysis by setting the holdings mode to “Beginning of Period” and the calculation frequency to “Single.” We need to make this same consideration for the benchmark, otherwise the Russell mid-year rebalance would significantly affect our ex-post result when it isn’t even known to our ex-ante risk calculation.

    The 24 monthly returns can be easily generated in a Performance report in Portfolio Analysis and instantly archived to SPAR for computation of the returns-based, ex-post, 2008 tracking error.

    These steps would allow you to generate the four numbers (3 ex-ante tracking errors + 1 ex-post tracking error) needed to best do the comparison for a single portfolio.

    ReplyDelete
  4. Allow me to jump in here on a few issues with trying to perform the above analysis:

    Point one is about the nature of the two tracking error measurements: the first is a forecast tracking error, an average expected deviation given all possible futures; the second is a realised tracking error, a measure of the deviation realised over one single path of that future. Over a long run these should come together, but to just consider 2008, a truly exceptional year for risk, on which to make your judgement would be flawed, I believe.

    My second point relates to the nature of the data used in the calculation of the measurements. Risk models are traditionally constructed using monthly data, and if the comparison were to use any other frequency then you would not be comparing like with like as variance does change with frequency (Red/Blue noise as variance increases/decreases with period).

    On the nature of what is best, would I prefer a model that has consistently provided me with accurate long-term estimates over a number of portfolios for the past decade, or one that was much more responsive to the recent volatility witnessed in the market and that I could use to navigate the short term horizon? Or should I have both?

    I know that each of the risk providers considered here publish research to support all of their models and are generally very happy to share this. Find them at www.apt.com, www.mscibarra.com and www.northinfo.com.

    ReplyDelete