The Department of Health breached the Privacy Act by publishing open data, according to a ruling made last week by former information and privacy commissioner Timothy Pilgrim, a day before his retirement.
A cynic might think the publication of today’s statement from the Office of the Australian Information Commissioner just before the Easter long weekend is an attempt to avoid publicity.
The case concluded with Pilgrim accepting an enforceable undertaking from the department, in which it acknowledges that the information it released “potentially enabled persons who were highly skilled and committed to identify some [Medicare] providers” — but does not accept patients have also been identified from the same data.
Cyptography researchers Vanessa Teague, Chris Culnane and Ben Rubenstein, reported they had successfully re-identified at least some patients from the anonymised dataset last December. “This was disclosed to the Department in December 2016,” they wrote at the time.
Pilgrim concluded this re-identification of patients — which involves matching data points about individual people with other easily accessible information from the internet — was not such a big deal. As such, the Privacy Act breach only relates to the less worrisome report from the same group in September 2016 that MBS service provider ID numbers could be re-identified.
Teague does not agree, but told journalists today she applauded the decision regarding the provider numbers in any case.
Why Pilgrim thinks patient data was mostly safe
Pilgrim decided patients were not “reasonably identifiable” in the data for the purposes of the Privacy Act, a decision he explained in an investigation report also published by the OAIC.
“Drs Culnane, Rubinstein and Teague reported having a high level of confidence that they had re-identified a small number of patients in the dataset by linking the dataset with information that was available online,” Pilgrim wrote.
“They identified a number of scenarios in which individuals might be identifiable. The report compiled by Data61 also outlined a number of re-identification scenarios.
“In contrast, the Department submitted that there had not been confirmed identification of these individuals, and that the possibility of a false match had not been ruled out.”
Health also argued that patients at risk of re-identification “had unique or rare attributes, such as individuals having undergone a series of unusual procedures, or having undergone certain procedures at a highly atypical age” and Pilgrim agreed the risk was low.
“While there is some risk of re-identification of patients by a sufficiently informed and skilled person, this risk is extremely low,” he concludes in the investigation report.
“Further, in the event that a possible match between a known person and a patient in the dataset occurs, it would be extremely difficult to confirm whether the match is correct.
“Because the dataset only covers a 10% sample of the population, it is possible that the known person is in fact in the 90% of the population not included in the dataset, and merely shares certain attributes with a person (the possible match) who is within the 10% sample.
“It may be possible to look to external sources of information to rule out this possibility for some individuals, but the need for this further step lowers the risk of re-identification further.”
Important lessons for data custodians
Health simply accepts “it could have employed a more robust encryption methodology to prevent the decryption of the provider number field when selecting the method of encryption to be used in respect of certain fields in the 10% datasets” and adds:
“As at the date the Department offers this undertaking, no individual, be they a provider or a patient, has complained to the Department that the publication of the 10% datasets by the Department interfered with the privacy of that individual.”
The OAIC notes favourably that Health was co-operative in the investigation, and took “quick and comprehensive steps” to mitigate the issue before putting in place improvements to data governance and release processes. Perhaps needlessly, it notes the breach was unintentional and that the public servants thought they had done enough to protect privacy.
“This incident holds important lessons for custodians of valuable datasets containing personal information,” the OAIC states.
“Determining whether information has been appropriately de-identified requires careful, expert, and likely independent evaluation. Who the information is released to must also be considered.”
“Appropriate processes should sit behind any decision to release de-identified personal information. This incident offers an opportunity for Australian Government agencies to strengthen their approach to publishing data derived from personal information.”
The enforceable undertaking lasts for two years. The department has agreed to a two-stage process of reviewing its policies and procedures around the release of open data by external, independent experts, with oversight by the OAIC. It will have to implement any recommendations made by the independent reviewer or reviewers.
Any future release of similar “unit record level data” will need to be treated as a high-risk project as defined by the new Australian Public Service privacy code that takes effect in July. The undertaking does not protect the department if any individual wants to take legal action over the breach.
Health has also undertaken to follow the new whole-of-government Process for Publishing Sensitive Unit Record Level Public Data as Open Data “including considering alternatives to public release of such datasets, such as release to trusted recipients and release in secure environments.”
Along with the new code and a more rigorous process across government, the OAIC reminds agencies it has a series of new guides to de-identification and data analytics on its website.