We exist in a world where ‘big data’ drives many decisions about what happens to us, what we are offered, and how we interact. The process of advanced business analytics is welcomed by some, who enjoy everything from customised medicine through to seamless online experiences, whilst others are afraid of what people know about them, and what they can do with this knowledge.
Does ‘big data’ mean ‘bad behaviour’?
So what is all the fuss about? What are people scared of with data analytics – and should they be? If companies have access to all of our data, does that mean that they can do things which we would consider unacceptable breaches of our privacy or individual identity, or make bad decisions – that is, demonstrate ‘bad behaviour’?
What is ‘big data’ anyway?
Data analytics is seen in business as a massive driver of competitive advantage. The more you know about someone, the better you can both predict their behaviour, as well as offer them products and services that best suit them.
The deep integration of data collection opportunities is now continuous with normal life. Each time you interact with technology, with an organisation, each time you give your details, you are providing data. Your phone sends out continuous streams of data (especially about your location), and every search or action on the web provides a data source.
The data has therefore shifted from passive and descriptive, to more active and predictive. The data from almost every action and interaction can be collected and provides so much more ‘information’ that can be correlated in the attempt to understand us as a community, but also as individuals.
What is collected?
From a behavioural standpoint, data can be viewed as ‘communication’ between individuals and the organisations that capture and apply analytics to decode it. This can be viewed at three levels:
• Syntax: the data points, metadata, text strings and interactions that occur. This is the pure information.
• Semantics: the meaning that is drawn from the data. This happens through the amalgamation and correlation of data, the interrogation of the text for keywords and sentiment, the connection of event data and the context data. The quality of the algorithms employed determines the insights (semantics) that can be drawn from the data.
• Pragmatics: the outcomes of receiving the insights. This is simply what we do with the outputs of the data analysis. The actions it drives and the behaviours that it modifies.
Most people ‘sign up’ to giving away access to their data, but this is only the start of the communication process – it is the semantics and pragmatics where the value in the data actually lies.
What are people scared of?
In general the fears that people have can be broken down into control, significance and secrets. We are essentially scared that allowing our data to be used to make decisions about us based upon populations and probability, we lose control over what experiences we can have. When computers can take actions based upon data analysis, there is a strong feeling of loss of control.
Further, as we are reduced to a set of ‘data points’, we lose the significance of being an individual person, and are simply reduced to a ‘number’. Although things can be more highly customised to enhance our experience, we are no longer a ‘person’ but rather a set of data. This provides a deep fear of manipulation and again a lack of control of one’s own personal experience.
The other thing that people fear is that ‘big brother’ is watching everything, and either knows, or can work out, all of their secrets. This idea of being ‘fully exposed’ and even then exploited has a level of intrusiveness about it.
In the end, these fears tap into the whole ‘big brother’ versus individualism view of society. Each society needs to have the discussion about what is an appropriate use of data- where privacy starts and finishes, and what is in the ‘public good’.
What has changed?
In a way, ‘Big data’ is really a misnomer. There has always been the ability to analyse complex data sets, and even develop machine learning algorithms to refine and enhance this analysis. It is not the ‘data’ that worries people, it is more the semantics and pragmatics – the insights that are drawn from its analysis, and what is then done with it.
Three things have happened in the past few years which have drastically impacted data analytics. In 30 years, there have been almost no changes in the fundamental algorithms of analysis, however:
• The speed of analysis has drastically improved. What would have been an overnight ‘batch run’ can, in some instances, almost be performed in real time. Systems can be built to respond to immediate actions, experiences and preferences.
• The storage capability of data has grown exponentially. The simple ability to hold more data allows more data points to be used in any analysis.
• The types of data have changed. Instead of simple descriptive data, we are now seeing a deep integration of data extraction into our lives. Facebook, Twitter, your Fitbit, your mobile phone’s location data, your search history, and your preferences are all new data points which are captured in real time to provide significantly more data points about your experiences. The digitisation of old data also provides historical information (including text) from which to draw meaning, which adds longitudinal data to what is currently available.
Should we be ‘scared’?
We have no need to fear ‘data’. It is what we ask of the data, and how we use it, that provides the potential for adding benefit or causing harm. In the end, these are human decisions. A computer only does the calculations we ask it to perform. It learns to manipulate information from previously analysed data sets only through the algorithms we give it.
‘Big data’ by itself has no value, and can often act to confuse and distract. We have to choose what to focus upon within the masses of information available, and to select what is going to assist us to make decisions or add value to experiences that counts. If we do this well it offers massive advantage. It also gives us the ability to act in high quality ways – or exhibit bad behaviour with the knowledge we have.
It is always the quality of the question we ask of a data set which determines the value of ‘big data’. If we ask poor (or inappropriate) questions, then we get poor outcomes (the old adage ‘garbage in, garbage out’ is always true in analytics!) We then need quality analytical processes to extract the answer from the available data – or determine that the answer cannot be ‘stretched’ from the data that is available.
We should therefore concern ourselves more with the human element of data analytics, rather than the data itself. It is here where the ‘bad behaviour’ is more likely to occur.
How should big data be used?
The human factor becomes important in the selection of the question that we ask of the data, the interpretation of the meanings that we draw from the insights, and what we do with the data.
If we distance ourselves from moral, ethical or purpose-led thinking, then we have the potential to misuse data analytics and cause great harm. However, if we allow ourselves to always filter analytics through these lenses, then the outcomes can serve the business, the individual and society.
It is great to hear, for example, that Google uses an ethical board to help guide such judgements.
Consider the following circumstances:
• Analytics show that people of a certain racial background are more likely to be involved in automobile accidents. Should we charge that race more for insurance?
• Analytics determines that a key predictor of a behaviour is sexual preference. Should you now start asking for this information from your customers?
• The analytics suggest that there is a probability someone may be involved in a criminal action. Should you use that information to pre-emptively arrest them?
In each of these circumstances, a moral, ethical or ‘purpose’ based consideration is required. Are we asking about what is acceptable in society, what will be good for the individual, who could be harmed, and what are the potential unintended consequences of the choices we make?
It has been shown that we also get into massive problems when we simply let computer systems process data and take actions without human oversight. If we allow the computers to take actions based upon the analytics, the potential for unintended outcomes, compounding errors and critical errors are possible (and, in fact, likely).
We have seen in share market systems where computer programs have ‘predicted’ from a series of indicators a downward market shift, so the system initiates a sale. This sale did not take into account a change in the context of a lead indicator (innovations that make an indicator less relevant in a model). This sale sets off another process which drives further sales, and a negative, recursive process begins.
Given the speed of such systems, in a matter of minutes significant actions with unwarranted or unintended consequences emerge. This has happened several times in the stock market. New systems now only run for a few minutes in such processes before they are triggered to pause and ‘get a human’ involved.
Innovation can also make models obsolete. Although this can be managed with machine learning, an innovative break can require a new algorithm to be developed. Using the old algorithm can lead to poor analysis if the relevance of data sources or data points changes. Therefore every process needs to be appropriately monitored so that what it produces is known to be still relevant (and not simply assumed to be so).
We also need humans to provide curiosity. Asking ‘what else’? About processes, inputs, outputs, uses and relevance is a human trait which allows innovation in systems and processes.
Further, using outliers to ask questions of the algorithm or data can lead to disruptive change opportunities. Often the real insight is in the outlier data that doesn’t fit the ‘usual’ model, but instead provides insight into something unexpected, and perhaps valuable not served by the current process.
In the end, the machines only crunch the data in the way that people ask them to. The human factor, and in particular the quality of the thinking and evaluation of the people creating, communicating and acting on the analytics is the most important thing.
What can we do?
What is true is that most people do not even read the terms and conditions of what they sign up to. They blindly authorise programs to collect, transmit and own data which is then utilised in analytics – for good or for bad. They worry about losing their individualistic identity, but are the chief architect of allowing organisations to gather and analyse their information. Taking control of these choices is an important step for individuals to regain their personal sense of control in the process.
We can also demand that the information that is collected is used in responsible ways to enhance the experiences of individuals, communities and the organisations processing the data. Ensuring that quality thinking happens throughout the process – in establishing the question, in designing the algorithms, in reviewing the semantics and operating on the pragmatics is important. These are all critical ‘check points’ on the journey to bad (or reasonable) behaviour. When the data is used in moral, ethical and purpose-based ways, it should add significant value and enhance our experiences. Being aware of how data is being collected and used, and having conversations about what each culture deems ‘reasonable’ will allow appropriate understanding and guidelines to help the ‘human interface’ with data behave well with the resource they command.
Big Data conjures up images of the Terminator and the Minority Report. Data by itself is not the cause of ‘bad behaviour’ – however it opens up the possibility of it unless we encourage data to be used appropriately and responsibly, and unless we take responsibility for what we allow to be collected, and how we define (as a society) its use, bad behaviour based on our data may very well occur.