The Australian Computer Society has made the case for the government to develop a systemised and standardised safe data-sharing regime.
Government policy makers, data scientists, and businesses that gather and use data about individuals face the dilemma of utilising the benefits of smart services and data sharing without taking away the privacy rights of Australians, according to the ACS’ new Privacy-Preserving Data Sharing Frameworks report.
NSW chief data scientist and ACS vice president Dr Ian Oppermann argued de-identifying data is a complex but key issue for the future of government.
“Answering the question of ‘will linking de-identified datasets actually lead to being able to identify someone?’ turns out to be a very subtle and complex challenge,” he said.
“That challenge however underpins our ability to use smart services in the future, from smart lights to smart cities, even smart government. The work described in this year’s technical whitepaper takes us further down the path to being able to address that challenge.”
The ACS has been exploring various aspects of data sharing over the past three years, in an attempt to develop useful risk frameworks, most recently to address privacy and personal information concerns.
The risk frameworks presented in the latest report were built on the assumption that a measure for the amount of personal information in a linked de-identified dataset can be developed. This measure — which the ACS has dubbed a Personal Information Factor (PIF) — looks at the uniqueness of the members of a dataset and the information gained from identifying these individuals.
The report explored re-identification risk in de-identified data using a PIF, and set out a framework for privacy-preserving data sharing.
“While our approach is heuristic, the processes we present demonstrate credible ways to consider the challenges of data sharing and — it is hoped — provide a basis for building principles-based data sharing and governance frameworks,” the report stated.
“While this paper has focused on personal information, preserving privacy is a cornerstone of any safe data sharing framework. Ultimately, systemising and standardising algorithmic calculations of safe data sharing — with independent verification demonstrating the trust, efficacy and benefit to the community — will be required for Australia to truly benefit from evolving digitally driven developments.”
ACS president Yohan Ramasundara said the paper was an important milestone in developing a framework that takes advantage of the benefits of shared data while protecting personal information.
“With the invention of digitised data, information is plentiful and creatively leveraged by public and private interests. While Data is very important for governments and businesses, preserving individual privacy is critical,” he said.
The framework has addressed technical, regulatory, and authorising mechanisms to smart services creation and cross-jurisdictional data sharing between governments and industry.
The report made seven conclusions:
1. Many of the voiced concerns about data sharing are expressed as concerns about privacy. In practice they are based on concerns about the sensitivity of data and use of outputs to address these concerns.
2. The use case for data strongly influences the risk framework required and the methods (aggregation, suppression, obfuscation, perturbation) appropriate for increasing data safety.
3. It is feasible to develop a meaningful Personal Information Factor (PIF) giving a measure of personal information in de-identified, people centric data. Information theoretic metrics show promise for many common protection methods and can be enhanced to cover perturbed data.
4. Re-identification risk and levels of personal information in data are related but different concepts.
5. Understanding the relationship between different features in a dataset helps to those that have the greatest impact on data utility after protection methods are applied.
6. Development of a meaningful measure of relative utility is feasible for datasets protected through aggregation, generalisation, obfuscation and perturbation. Information theoretic metrics based on Mutual Information (between original and protected datasets) shows promise.
7. Dealing with “trajectories” (or pathways) in data is critical to its safe use and release. Developing methods to address trajectories is possible. The methods explored in this paper show promise; however, the complexity of the approaches may limit real-world implementation.
Correction: an earlier version of this article incorrectly described Dr Ian Opperman as the NSW “chief scientist” rather than chief data scientist. In fact the NSW chief scientist and engineer is Professor Hugh Durrant-Whyte.