A fantastic article in the Wall Street Journal today discusses the problems with aggregated data, something we have discussed previously on this blog.
The article discusses Simpson’s Paradox, a statistical phenomenon that can make averages misleading due to the differing sizes of subgroups. It uses the example of unemployment. The current unemployment rate is 10.2%, not as bad as at the peak of the 1982 recession when it was 10.8%. However, according to Princeton University economics professor Henry Farber, compared with a similarly educated worker in 1982, “the worker today has higher unemployment at every education level.” It turns out that the average unemployment rate is only lower now because today’s workers are, on average, more educated.
College graduates, who have the lowest unemployment rate, are now more than a third of the work force, compared with roughly 25% in 1983, says the Labor Department. Meanwhile, the share of high-school dropouts has shrunk to roughly 10% of the work force, from nearly 20% in 1983.
It could easily be argued that this recession is worse than 1982, since college graduates (4.9% versus 3.6%) as well as high school dropouts (14.9% versus 13.6%) are having more trouble finding jobs.
Aggregate data is tricky and can often obscure the real truth behind the numbers. People find statistics persuasive and many groups cite statistics to “prove” their position. This article points out that it is entirely possible that the statistics they cite prove exactly the opposite position.
In the investment industry, junk statistics can sometimes crop up in backtesting. It’s important to know how the backtesting was conducted, whether the data set has survivor bias, how many parameters are fit to the data, and what kind of testing for robustness was done. All too often, product behavior going forward does not match what was expected from the backtest. Part of the appeal of our Systematic Relative Strength family of products, I think, is that the statistical testing is well done. In fact, we are planning to put out a white paper on our proprietary testing methods and how they differ from what is typically seen in the not-too-distant future. If you are interested in being on the distribution list for this white paper and you are not already on our distribution list, please sign up here.






