Rhema Vaithianathan: Data - the heavy lifting can be done blind

April 7, 2017

Rightly, the idea of our personal data being collected or passed on, without our permission, has a tendency to spark alarm in New Zealand.

But the good news is that, when it comes to data analytics, we can achieve an astonishing amount without the need for personalised data.

Between the extremes of personalised data and population level data is the useful (and often misunderstood) category of de-identified or "confidentialised'' data.

It tells the same story about the experiences of an individual, across aspects like health, employment, education and justice. However, crucial pieces of personal data are absent, such as names, addresses, birth dates and other details that would identify individuals.

For researchers and policy makers, large sets of this de-identified data are gold. They give us everything we need and nothing we don't.

And when it comes to statistical research, we can find out an enormous amount by interrogating millions of lines of rich but nameless records.

We can measure how effective a particular programme is by comparing what happens to people who receive a service with what happens to those who don't. In this way we can find out what works.

We can also identify factors that pre-dispose people to certain adverse events, by working backwards from those adverse events to the precursors.

De-identified data can also be used to create predictive risk models – algorithms capable of predicting the risk that a particular individual will experience a certain event. We train those algorithms on vast collections of historical and de-identified data until they are as accurate as possible.

All of this is possible using a large, detailed data set that never reveals names or identifying details of the individuals in it.

Data sets of this kind are a win/win. Researchers and policy makers have access to incredibly rich data; but individuals do not have to give up their privacy in order to benefit.

The availability of de-identified data relies on a trusted third party meticulously transforming personalised information into de-identified data.

In New Zealand, Statistics New Zealand has been doing this effectively for decades. The census, Household Labour Force Survey and New Zealand Health Survey are just a few examples where extremely personal information is collected and then transformed into robust, secure and de-identified research datasets.

It speaks volumes for the trusted position of Statistics NZ that New Zealanders are happy to hand over extremely sensitive information about aspects like religion, relationship, health and addiction status, knowing that it will be kept secure while at the same time being used to create vital new knowledge and insights.

High response rates for significant but voluntary surveys like the Household Labour Force Survey and New Zealand Health Survey (both around the 80 per cent mark) reflect the fact that there have been no notable breaches of de-identification protocol in New Zealand.

Having a government department that takes responsibility for de-identifying data, and for controlling access to, and use of, that data has been critical. Preventing re-identification of individuals is of paramount importance, hence the strict controls on how we as researchers and policy makers can use the data and report what we find.

The findings of research done using de-identified data can be incredibly useful. For example, if a programme is found to be extremely effective, it may receive additional funding. And vice versa.

Of course, in some cases accessing personal, identifiable data can make it possible to apply research findings even more effectively. For example, access to personal data could allow a government agency to locate individuals to offer them additional support or preventative programmes. But first, a convincing case has to be presented, and this is where de-identified data comes into its own.

The willingness of New Zealanders to allow third-parties access to personally identified data cannot be assumed. It is a question for New Zealand as a whole, and New Zealanders as individuals with different levels of comfort around the concept. Without a broader social licence, this will not come to pass.

Fortunately, while we are really just beginning our exploration around how and when personal data use is appropriate, researchers and policy makers can achieve a huge amount of social good without ever knowing your name.

Professor Rhema Vaithianathan is a health economist, co-director of the Centre for Social Data Analytics at Auckland University of Technology and a member of the New Zealand Data Futures Partnership Working Group.

First published in the Dominion Post on Tuesday 28 March 2017:  http://www.stuff.co.nz/dominion-post/comment/90877788/Rhema-Vaithianathan-Data-the-heavy-lifting-can-be-done-blind