De-identification: the de-vil is in the de-tail

By Timothy Pilgrim

Thursday November 3, 2016

As you read this there is a bill before the federal parliament to create a new criminal offence of re-identifying de-identified Australian government data. If you’re not sure what that means, or how it affects you, let me explain.

Right now Australian Government agencies are in something of a data space race, seeking to unlock the substantial research and innovation potential of their information holdings. It is a pursuit with great economic and community value — as contemporary data analytics and computing power are constantly unlocking ways to use data that were unfathomable a generation ago.

Imagine, for example, that you visit your doctor and undertake a test or scan for a potentially serious condition. Then imagine if your doctor had access to software that allowed your results to be compared with millions of similar tests worldwide. Then imagine if this could then be linked to treatment records of millions of similar cases — yielding treatment options and their statistically verifiable success rates. The wisdom of your doctor just became the wisdom of thousands, and your prospects have improved.

” … the better approach is to avoid the prospect of re-identification in the first place, by developing the most robust frameworks and processes for de-identification possible”

But equally, consider the personal information that lies at the foundation of this potential — millions of intimate records of patient data; powerful and lifesaving in the right hands but destructive and ruinous in the wrong.

This is the conundrum of our data innovation potential. As both the Australian Information Commissioner and Australian Privacy Commissioner, I understand the great value of information, and that this value is often best realised when it can be shared, used and built upon, and I also know that if that data includes personal information, Australians retain rights as to how that data is used or reused, which I strongly uphold.

If only there was a way to separate the ‘personal’ from the ‘information’ when our personal information is used for research purposes. Well, in theory there is, and it’s called de-identification.

De-identification is a smart and contemporary response to the privacy challenges of data — using the same technology that allows data analytics to strip data sets of their personal identification potential, while retaining research utility.

When done correctly, it has the potential to solve the privacy dimensions of data analytics.  But, if ever there was a topic apt for the expression “the devil is in the detail”, this is it.

Expertise is essential to prevent re-identification

De-identification is simple in concept but expert in execution.  It is far more complicated than removing names or postcodes, and the risks of getting it wrong can be substantial and public. Famous examples of ‘re-identification’ by hackers and privacy advocates point to the risks of poorly executed de-identification strategies.

Fortunately, the track record of expertly de-identified data is strong, and my office is in favour of using de-identification as a privacy tool; provided that checks and balances, audit and review, and quality control built in.

The Australian government is providing further assurance by legislating to make it an offence to re-identify data that has been de-identified — a significant challenge when de-identification has been done correctly, but one that is prudent to proscribe, given the ever-increasing pace of computing technology.

However, the better approach is to avoid the prospect of re-identification in the first place, by developing the most robust frameworks and processes for de-identification possible. And in this aspect Australia can take a leadership role, just as we are doing with the proscription of re-identification.

To achieve that leadership, government, business and academia will need to work together to explore, test and refine, using our combined skills and expertise. Because if there’s one thing we can all understand about de-identification, it’s that getting the detail right is critical.

To this end, my office is hosting a de-identification workshop in Canberra this month, bringing together leading experts from Australia and overseas, to get to the forefront of this important data and privacy issue.

These experts include Dr Khaled El Emam — who has been instrumental in developing frameworks to successfully free up the use of health data for research purposes in the US and Canada, and I expect the workshop will be a robust and up-to-date exploration of an issue that is of great importance to Australia’s push into the data age.

While de-identification is not the only approach available to manage the privacy dimensions of data, it is a one with powerful potential, provided we get it right. And while the devil may be in the detail, so is the solution.

Timothy Pilgrim is the Australian Information Commissioner and Australian Privacy Commissioner.

He will host the Data Sharing and Interoperability workshop at GovInnovate in Canberra on November 16. For details visit

About the author
Inline Feedbacks
View all comments

The essential resource for effective
public sector professionals