The Commonwealth has released a new guide to managing the risk of private information about individuals being nefariously obtained from anonymised datasets that are shared and analysed for secondary public benefit.
The new data de-identification decision-making framework recognises the risk of re-identification can’t be totally eliminated, but sets out the best practice for how it can be reduced. Data61’s website states “the need for well-thought-out de-identification has never been more acute” and notes there is “a growing body of examples” where it hasn’t been done properly.
Federal information commissioner Timothy Pilgrim explains: “It is an exercise in risk management, rather than an exact science, and it’s important that we strike the right balance between maintaining useful data and making sure it’s safe.”
But what is the right balance? If we assume assume that “making sure it’s safe” should always come before any secondary use of public data, then the key question becomes: how safe is safe enough?
The guide is based on the “functional de-identification” approach, but also acknowledges the popular but less comprehensive Five Safes system, and has five key principles:
- It is impossible to decide whether data is safe to share/release by looking at the data alone.
- But it is still essential to look at the data.
- De-identification is a process to produce safe data but it only makes sense if safe useful data is produced.
- Zero risk is not a realistic possibility in producing useful data.
- The measures put in place to manage risk should be proportional to the risk and its likely impact.
The new book warns it does not eliminate the need for expert advice and that “complex judgement calls” will still be required. It points out data sharing involves ethical, as well as legal concerns, and adds:
“Data subjects may not want data about them being re-used in general, by specific third parties, or for particular purposes.”
The most practical way to enable secondary statistical use of personal data that has been through a de-identification process is by sharing it only with certain trusted users, under certain conditions and perhaps in a secure location.
“Open data environments are really only appropriate to data that is either not personal in the first place or have been through an extremely robust data-focussed de-identification process that ensures with a very high degree of confidence that no individual could be re-identified and no disclosure could happen under any circumstances,” the framework cautions.
The authors explain de-identification as a highly contextual and ongoing process of risk management – not one that leads to the dataset being permanently safe – using a building analogy: “We do not expect reinforced concrete to be indestructible, but we do expect that a structure made out of it will have a negligible risk of collapsing.”
Leading IT laywer Peter Leonard, who retired this year and chairs the Privacy and Communications Committee of the NSW Law Society as well as the Data Access, Use and Privacy Workstream for the Internet of Things Alliance Australia, thinks the book is well adapted for Australia’s regulatory environment.
He points out similar issues are covered in a white paper recently published by a Australian Computer Society team, led by NSW chief data scientist Ian Oppermann, who heads up the state government’s Data Analytics Centre. Leonard also sits on the DAC advisory board.
“It’s a little bit broader in its coverage than the privacy commissioner’s paper, in that it’s not looking only at the technical issues around making the identification work, but the broader issues around the management of data sharing and the assessment of the degree of risk in using different data types,” he told The Mandarin. “That’s an important piece of work.”
As well as improved understanding of the technical side of de-identification, he believes data custodians generally need a better undertstanding of privacy law. He believes more public engagement is required, to determine what the Australian public’s risk appetite really is.
The debate around big data and privacy has become highly polarised. Leonard doesn’t agree with privacy advocates who present a black-and-white situation where data sharing inherently puts privacy at unacceptable risk.
He points out that “if you judge everything in terms of release into the public arena, where this sort of full array of re-identification attacks can be waged against the data” then you wouldn’t do any sharing, because “almost all data will be re-identifiable — if not today, in the not-too-distant future.”
But as the guide points out, most of what researchers and organisations want to do can be done within controlled environments where the risk of a re-identification attack is very low.
It makes sense to weigh the specific risks in each case against the social benefits. As the privacy commissioner, Pilgrim has recognised a need to build “social licence” for data sharing in the past and all governments make some efforts in this regard. The new guide is a reasonable step to improving the situation, but more needs to be done.
In Leonard’s view, public and private sector leaders have generally “squibbed” on engaging in genuine public debate about what is acceptable use of data sharing and analytics.
“I think industry has ducked for cover on it because it’s too hard and their view has been that each time they try and discuss this issue, they end up bringing more trouble on themselves,” he said.
“And I think government has been even worse, in that it’s had data breaches and responded by the re-identification offense legislation and so on, rather than really engaging with the community in a constructive way saying: ‘These are the sorts of things we want to do. This is how we think we can control those uses in appropriate ways. Are you comfortable with that?’”
But Australia’s political leaders don’t need to look too far for inspiration, either. In his view, New Zealand’s Data futures Partnership involved one of the best examples in the world of public engagement on these issues, using a series of discussion papers in plain English and public forums around the country.
“That’s exactly what I think we need to do in Australia,” Leonard contends. “We’ve been hampered by lack of coordination between the states and the feds on the issue, the feds came to the issue later than New South Wales, Victoria, and South Australia, which have looked at data sharing legislation in a lot more detail. The feds are only now starting to look at it.”
He sees a serious lack of nuance in discussions of data sharing and the risks involved; one side says it is unsafe, the other says it is, which leads most people to simply pick a side.
“And that kind of stifles the debate,” Leonard said. “The privacy advocates point to the re-identification risk of releasing data to the public, and the government says, ‘Well, no, we’re not preposing to release it to the public, we’re doing it in a controlled environment.’
“And sectors of the population then say, ‘Well, how can we trust you? Look what happened with the Census, look at what happened with robodebt and so on.’
“I just think that the time has come for government to be more open about what it wishes to do, and to recognise that it actually needs to lead a discussion with its citizens about what it wants to do, and why it should not be as concerned because government will lift its game in ensuring that these uses of data sharing are properly evaluated and socially beneficial.”
The public service echo-chamber
In the experienced IT lawyer’s view, “the public is legitimately and rightly worried about erosion of privacy” that has been occurring through technological change more generally, and that public servants haven’t been taking their responsibilities in this regard seriously enough.
“From my experience in dealing with both government agencies and private sector organizations, there is still, to my view, a much too narrow … understanding of how privacy law operates, and how to properly evaluate the privacy impact of what they are doing,” he said.
“Secondly, there is still much too much bravado in both government and business around their own management of information in order to comply with privacy laws.
“I’ve practiced in this area for over 30 years and I’ve seen just about everything go wrong in some way or in another and the one thing that has really come home to me is that, unless people get out of their offices and think more broadly around how other sectors of the community see issues and try and take a more community-oriented view of how some sectors of the community view uses of information, they sort of end up in echo chambers that reinforce to them their views that they know best and they can manage the privacy issues.”
Canberra especially suffers from this, Leonard believes.
“I’d call it an echo-chamber reinforcement of over-confidence, within both government agencies and businesses, that they can best decide how the community should think about privacy issues, and that they can manage it without properly explaining to the community why they should be trusted to manage it, doing everything that they should do to manage privacy properly.”
Leonard argues the legislative frameworks don’t support a more nuanced debate either.
“The problem is that the privacy legislation stands alone and without consideration of social benefits and the weighing of social benefit against privacy impact,” he explained.
“For example, the data sharing legislation in New South Wales has not been very effective, because it basically lifted the restrictions under individual statutes that previously restricted, for example, Education sharing information with Health, or sharing information with Family and Community Services, but it said all of that remains subject to the Privacy Act.
“And the Privacy Act then, in effect, imposes an absolute restriction on sharing of sensitive personal information held by state government agencies without the consent of the affected individuals.
“So there is there no mechanism for weighing up privacy impact against social benefit and asking, ‘Well, if there is some marginal effect on some individual’s privacy, is the social benefit from that activity sufficiently outweighing that marginal impact to be of benefit to the community?’”
As an example, he suggests there might be a public benefit of giving GPs information on the outcomes of medical procedures performed by different specialists, to inform their decisions about where to refer patients in the future. Arguably this is personal data about those surgeons, in which case each would have to give their separate consent.
Leonard says there is no legal mechanism “for weighing up the social benefit, or a government benefit, or anything else” against the rights of the surgeons, in this example, to control information about their professional performance, regardless of what social benefits the hypothetical scheme would deliver.
This has fed the over-confidence in de-identification, which has been presented, including by lots of government agencies, as a simple process that turns personally identifiable information into safely anonymised statistical data, no longer subject to privacy law. This is a major misunderstanding, as the new guidebook emphasises.
The Australian Bureau of Statistics and the Institute of Public Administration Australian (ACT Branch) are jointly hosting a half-day conference on public sector data integration in Canberra on November 3.