Most English-speakers are familiar with Murphy’s Law, which states:
Anything that can go wrong, will go wrong.
Realistically, this should read:
Anything that can go wrong, will go wrong… in the worst possible way at the most inconvenient time.
There are a lot of opportunities for Murphy’s Law to prove itself in the world of data analysis. Here are some of the ones that most often occur in my traipsing through the data.
Any data you previously checked and assume is now fine has been corrupted.
The fact that the word “assume” is in there should be warning enough (as should the word should). But we humans like to think that once something is fixed, it will remain that way. Not so, thanks to Murphy. Unless you fixed the source, fixed the data, QA’ed the data, ran in production for months without issue, etc., the data is likely still wrong. And if you did manage to do all that, the data would still find a way of being wrong … like by the sender of the data changing the layout or something.
Any analysis required in a hurry will uncover horrendous data quality problems
This is as certain as the Sun rising in the East. The boss or client calls and says they need something by end of day. You scramble to code up whatever-it-is-that-cannot-wait only to find… the data is missing! Or it doesn’t match what it was last week at all. Or there’s no link between the data elements you need to join (see also: You can’t get there from here). This has got to be one of the better ways to do data quality testing – apart from the randomness and stress, of course.
Any project in which you have thought up and planned for every possible contingency will find new ways to break.
There’s all kinds of talk about the 7 Ps. I’m here to tell you that it’s a bunch of bunk when it comes to data. Here is my version for analytics: Prudent Prior Planning Presents Plural Potential Problems. If databases had vocal chords, I’m convinced that you could hear them laughing a mile away. No matter how many ways you think you’ve got it covered, there will be some new and interestingly FUBAR that prevents you from doing the job efficiently.
Please comment to share your favorite (or least favorite, as the case may be) Murphy’s Law moments in analysis.