Friday, September 30, 2011

Using Math To Ferret Out Fraud

The Guardian, via nc links:
Government figures are subjected to various audits already, of course, but alongside checking that things marry up with one another, forensic statisticians also have ways of spotting suspicious patterns in the raw numbers, and thus estimating the chances that figures from a set of accounts have been tampered with. One of the cleverest tools is something called Benford's law.
Imagine you have data on, say, the population of every world nation. Now, take only the "leading digit" from each number: the first number in the number, if you like. For the UK population, which was 61,838,154 in 2009, that leading digit would be "six". Andorra's was 85,168, so that's "eight". And so on.
If you take all those leading digits, from all the countries, then overall, you might naively expect to see the same number of ones, fours, nines, and so on. But in fact, for naturally occurring data, you get more ones than twos, more twos than threes, and so on, all the way down to nine. This is Benford's law: the distribution of leading digits follows a logarithmic distribution, so you get a "one" most commonly, appearing as first digit around 30% of the time, and a nine as first digit only 5% of the time.
Next time you're waiting for a bus, you can think about why this happens (bear in mind what leading digits do when quantities repeatedly double, perhaps) but reality agrees with this theory pretty neatly, and if you go to the website testingbenfordslaw.com you'll see the proportions of each leading digit from lots of real-world datasets, graphed alongside what Benford's law predicts they should be, with data from Twitter users' follower counts to the number of books in different libraries across the US.
It doesn't work perfectly: it only works when you're examining groups of numbers that span several orders of magnitude, for example.
This is an interesting subject to me, as I've noticed that I create a pattern when trying to come up with random numbers for something.  Nate Silver had a post on the subject when he was accusing Strategic  Vision of falsifying poll data.  Uses of math to catch criminals is as good of a use of math as using it for gambling.

No comments:

Post a Comment