The Danger of Data
The Danger of Data
“There are Lies, Damned Lies and Statistics,” said Benjamin Disraeli, twice British Prime Minister in the late nineteenth century. Recently John Thornhill wrote in the Financial Times of ‘Lessons from History on the dangers of blind trust in data’. In our era of measurement we would do well to heed the warnings. As a child I was taught that ‘98% of people die in bed; therefore bed is a dangerous place’. Later years taught me that it could be – but not because of dying.
There are two aspects of data that create problems. First, its reliability. There has always been fake news, more accurately described as lies. Some of that has been hyperbole, exaggeration to illustrate a point. Authors and others have had – or, perhaps, assumed – a license to exaggerate to avoid language so qualified that it becomes meaningless. Official reports make the point. Their wording is often turgid and has little or no impact.
The official objective, of course, is to impart the truth. Pontius Pilate is reported to have questioned ‘what is truth?’ It is surely a relevant question. Literal truth can be extremely misleading. Indeed, the whole adversarial system of law is demonstrably a game of cunning for at least some of the time. How a question is asked can determine the relevance and therefore the validity of the answer. Defining truth as ‘intention’ is a way round this problem but it raises other issues of the motivation behind the intention. Motives are seldom simple.
At some stage of the development of law, lies became so prevalent that courts adopted an oath to hear witness statements. I cannot find a date when this first happened, no doubt a long time ago. It was an admission that much of ‘everyday communication’ was untrue. Lying under oath then became perjury, regarded as a very serious crime. However, it has become tacitly accepted that someone may lie in court about their own guilt to avoid punishment. This is illegal but relatively few perjury prosecutions are brought even though it is prevalent.
The second data confusion issue is its interpretation, especially when used out of context. Was the airworthiness of the two Boeing planes that crashed recently attested before the application of the computer glitch to correct an acknowledged mistake or after? Press statements that I have seen have not been clear about this. It seems hardly possible that a company like Boeing – or any company for that matter – would intentionally mislead on such an important issue. Context of a statement is clearly key to its validity.
Our greatest concern about data is its availability to business, government and hackers. The search for a new version of privacy will not slow but the goal is still far off. Even so, the race to prove theory by measurement will proceed apace. Now is the time to reflect on what it all means. Data can be so easily rigged. As we know, with data it’s not the answers that matter but the questions. What question would be asked, for example, if there were a second referendum on the UK leaving (or not) the EU? We saw what politicians did with the answers to the last referendum. Who knows what they will do to the question(s) to the next?
Measurement is good when it is done well. When the tailor under-measures your collar size and you find that you cannot fasten the top button to allow you to wear a tie, you have to buy a new shirt. Lesson: always make the collar bigger than the measurement you are told by your tailor. To extend the sartorial advice – never buy shoes in the morning. Your feet swell after lunch. Buy shoes only after 3pm. ‘Morning measuring sells second pair of shoes after lunch.’
Measurement of public opinion and attitudes to sensitive subjects is especially difficult. The question you ask may well not be the one your respondents answer. Asking ladies in the 1950s why they preferred to wear stockings with a seam down the back of them to ones without a seam, produced all sorts of answers. None of them said ‘because I don’t want to be thought a slut who doesn’t wear stockings’. That, however, was the correct answer. 60+ years later the issues are somewhat different. But the danger of misleading data is as bad if not worse.
Key to the proper use of data is understanding the relationship between different bits of it. Excellent specialists will prescribe advanced and sophisticated drugs for complex medical situations identified by ingenious diagnostic technology. The absence of a General Practitioner keeping an eye on the overall strategy can easily lead to conflicting remedies being administered, sometimes with disastrous consequences. The ‘what if’ question is not asked frequently enough.
Data, however cleverly construed, cannot yet equal the creative mind of a human. That is not to say it never will. Nor is it to deny that on balance computers make fewer errors than humans. The difference is that when they do go wrong they ‘think’ (if we can attribute such a concept to a computer) they are doing what they were told. And we know how often people doing what they are told go wrong. Frequently.
That is where our data danger can lead if we are not careful.
Handle data with care.