With a partner who works in the actuarial field, I usually switch off when he starts to talk about statistics and data, but over the past couple of weeks I’ve found myself drawn into learning far more than I ever thought I would.
One of the defining stories of the last few months in politics has been the failure of conventional polling methods to predict the success of both the UK vote against remaining in the European Union and the election of Donald Trump in the United States. Studying politics led me to believe that trying to control human beings is like herding cats, so I was interested to read a couple of articles that seem to have reasonable explanations for how these perceived upsets were predictable – if we look at the use of big data.
This article at The Guardian, by a lecturer at Goldsmiths called William Davies, argues that the general public are now suspicious of statistics, that using them only serves to close down discussion rather than prove a point. Davies takes us through a fascinating account of the use of statistics to shape our modern nation states since the 1700s.
“Casting an eye over national populations, states became focused upon a range of quantities: births, deaths, baptisms, marriages, harvests, imports, exports, price fluctuations. Things that would previously have been registered locally and variously at parish level became aggregated at a national level.”
These quantities have been used ever since to predict the behaviour of groups and make decisions on the management of countries. However, as we become more and more connected to the internet through things like our communication devices, shopping, and travel habits, information is built up that reveals new quantities that have nothing to do with those traditionally imposed upon us from above. Would I vote in the same way as another white, middle-aged, middle-class woman? Not necessarily, but the polls assume I would.
Which explains in part the reason the pollsters were so wrong when it came to predicting the last few months. An article published in Das Magazin [in German] explained the use of big data by two campaigns – one working to convince UK voters to leave the EU, and the other Donald Trump’s bid to become president. This was translated into English and published by Motherboard (Vice Media).
In the 1980s a technique called OCEAN determined that most people’s personalities were made up of a combination of five markers – Openness, Conscientiousness, Extroversion, Agreeableness and Neuroticism – and their behaviour could be predicted if these were known. The problem was the lack of data, because it took many hours to collect enough information from individuals.
This problem was solved in 2007, by a team of psychologists at Cambridge University in the UK. They developed a platform for Facebook which allowed people to fill out quizzes about themselves. You know what’s coming – instead of the expected few mates of the students, several million people completed the quizzes. Realising the potential of this, the team refined the model.
“In 2012, Kosinski proved that on the basis of an average of 68 Facebook “likes” by a user, it was possible to predict their skin color (with 95 percent accuracy), their sexual orientation (88 percent accuracy), and their affiliation to the Democratic or Republican party (85 percent).But it didn’t stop there. Intelligence, religious affiliation, as well as alcohol, cigarette and drug use, could all be determined. From the data it was even possible to deduce whether deduce whether someone’s parents were divorced.”
Using the new way of analysing potential voters, these campaigns were able to target people with incredible precision. Ads were used not just to convince citizens to vote Trump, but also to keep potential Clinton-supporters away from the voting booths. Up to 175,000 variations of the same ads were deployed in order to find the versions that worked.
I’m still figuring out how I feel about this. On the one hand, it looks unsettlingly like murky politics driven by companies with unclear ownership. On the other, there is incredible potential for big data to be used in a positive way. We could use information about populations to build smarter cities and create cheaper and more environmentally friendly ways of living. I’m optimistic about the potential for good, but more than just a little apprehensive about the ways in which it appears to have been used thus far.