Monday, June 18, 2012

Having Too Much Data

Nassim Taleb, in his new book (via Ritholtz):

The more frequently you look at data, the more noise you are disproportionally likely to get (rather than the valuable part called the signal); hence the higher the noise to signal ratio. And there is a confusion, that is not psychological at all, but inherent in the data itself. Say you look at information on a yearly basis, for stock prices or the fertilizer sales of your father-in-law’s factory, or inflation numbers in Vladivostock. Assume further that for what you are observing, at the yearly frequency the ratio of signal to noise is about one to one (say half noise, half signal) —it means that about half of changes are real improvements or degradations, the other half comes from randomness. This ratio is what you get from yearly observations. But if you look at the very same data on a daily basis, the composition would change to 95% noise, 5% signal. And if you observe data on an hourly basis, as people immersed in the news and markets price variations do, the split becomes 99.5% noise to .5% signal. That is two hundred times more noise than signal —which is why anyone who listens to news (except when very, very significant events take place) is one step below sucker.
There is a biological story with information. I have been repeating that in a natural environment, a stressor is information. So too much information would be too much stress, exceeding the threshold of antifragility.
This is how I would think the NSA spying deal would be.  Think of all the useless emails, phone calls and text messages you send a day.  Now think about that multiplied by 300 million (or several billion).  Now think about trying to find the needle in the haystack which is an actual real threat.  I wonder how many brown people have become the target of surveillance for sending an email in which they say they'd like to blow something up, maybe figuratively? 

I also see the potential of this in our lean events and metric boards at work.  Especially the metric boards.  They are targeted to make people accountable, but many of the stats that are tracked are pretty opaque and poorly formed up.  We also have an issues list, and you sure as hell don't want to end up on that.  Whatever the issue is, it better be something that can be solved in a day, or you will be questioned about where you stand each day until it is solved, or you can make a credible case that it has been solved.

No comments:

Post a Comment