The federal government is increasingly keen on big data, whether it’s used in academic research, private business or its own bureaucracy, but appears unsure of how to bring the public with it and establish confidence that it can be trusted to balance the benefits with risks to privacy.
A lot of people in government, business and research are very clear on the value of big data and keen to maximise its value by combining and sharing as much of it as possible to create the ever larger statistical resources.
This value is often explained in general terms, as though it were self-evident that statistics are good, so more accurate and insightful statistics can only be a good thing. This generally fails to excite the general populace about the potential of valuable social and economic insights or useful new apps.
At the same time, concerns about the erosion of individual privacy that has occurred at an accelerating pace over the past 20 years have grown and are now widely held. The level of concern, expressed dramatically through last year’s Census boycott movement, far outweighs any public enthusiasm for big data analytics.
The Commonwealth has a growing list of data-related projects and is trying to sell their expected benefits. Assistant Minister for Digital Transformation Angus Taylor, who recently announced the Digital Transformation Agency had taken over responsibility for the “high value” data.gov.au and NationalMap platforms, plays the unofficial role of chief data evangelist.
In May, Taylor announced some details of the Data Integration Partnership for Australia (DIPA), a project to “maximise use of the government’s vast data assets” that was funded with $130.8 million worth of past efficiency dividends through the Public Sector Modernisation Fund.
The plan is for the Australian Public Service to build on the Multi-Agency Data Integration Project, a major cross-agency trial run by the Australian Bureau of Statistics and five other agencies that is currently in an evaluation phase. Its purpose is to build and continuously grow sets of detailed longitudinal data about Australian citizens.
Another of the building blocks for further data integration in the APS is the Department of Industry, Innovation and Science’s Business Longitudinal Analysis Data Environment (previously known as EABLD before the name was re-arranged to spell BLADE). This uses Australian Business Numbers as a statistical linkage key to facilitate research on companies, so personal privacy is not a major concern.
One couldn’t say the same of the MADIP, a more ambitious project combining sensitive information provided by the Australian Taxation Office and four departments: Education and Training, Health, Human Services and Social Services. Or the Australian Digital Health Agency’s plans for “secondary use” of data from e-health records, which are detailed in a new public consultation paper released last week.
Chief data evangelist
Part of Taylor’s role is trying to reassure a wary public and sceptical advocates for privacy rights that government can be trusted, both in its own increasingly risky forays into data integration and in setting and maintaining standards for how other organisations handle their growing stores of sensitive personal information.
One such effort, a YouTube video in which Google’s director of research Peter Norvig tells Taylor what government agencies can learn from the data giant about earning and maintaining public trust, was viewed less than 100 times in its first three weeks online.
Norvig speaks of the value of plain English and letting people use services without having to log in and have their details and activity recorded and analysed for personalisation. Google’s various products use lots of data and clever algorithms to give business users capabilities they could never develop themselves, he explains.
“Government could certainly participate in the same way if that made sense for them; they could call in to use our data,” Norvig tells Taylor.
He suggests the government “could be looking at the kinds of things [Google is] doing, seeing if they can duplicate that” and should “popularise the fact” that it has loads of data and ask, ‘Who wants to do something useful with it?’
He also warns of “garbage in, garbage out” errors. One recent example is the way annualised income data from tax returns was matched with periodic income data reported to Centrelink, leading the welfare agency to suggest thousands of debts might exist when they did not.
“Move quickly. Make it easy for people to try new ideas,” Norvig advises. He says Google tells its employees to work on helping the users and then mostly lets them experiment, but also has processes to protect user privacy — “there’s need-to-know and there’s anonymisation and all that,” he says casually.
Google doesn’t share “user data” with anyone outside the company, according to Norvig. “But there’s other types of data like data we might have collected off the web, [and] we can supply that to third party researchers,” he says. One example is “traffic data” that is not supposed to be personally identifiable.
The Google exec is more circumspect about building longitudinal datasets from data about people, which is the goal of MADIP, when asked about this by the assistant minister.
“Because we are so worried about the personally identifiable data, we tend not to do a lot with longitudinal studies of, say, your search history. We don’t want to know! Right? Because it would too dangerous for that to leak out,” Norvig tells Taylor.
The video’s blurb suggests “transparency is key to building public trust in data use” and says government agencies and private companies should act as “shepherds” of data that describes real people.
“Individuals should therefore have clear explanations about how their data is used, including commitments to privacy and benefits of participating,” it adds, but Taylor’s recent efforts to take charge of this for the Commonwealth haven’t had a lot of impact.
Engagement needs to go DIPA
The ABS has published fairly extensive information about MADIP, including four moderately detailed case studies, but it’s doubtful many of the public have heard of it. And much like the YouTube video, Taylor’s May announcement about the DIPA was mostly ignored.
The assistant minister said the DIPA was an “extraordinary opportunity to support policy development and deliver real outcomes for Australians” but gave almost no details explaining why other Australians should catch his data enthusiasm.
“A central capability within the DIPA will coordinate specialised teams focused on social, industry, environmental and government efficiency policies,” he explained somewhat mysteriously.
To be fair, the statement does mentions a few slightly specific possibilities: “identifying and preventing risk of disability in the workplace, supporting ongoing workforce participation for those with a disability, and better understanding the effects of medications to avoid adverse reactions” – and a general idea of the expected outputs:
“The DIPA will create high-value national data assets to build longitudinal data about populations, businesses, the environment and government to inform the development and evaluation of policies and programs.”
He included an assurance that the data will all be “de-identified and analysed in controlled environments governed by strict processes and legislation” and a final statement that generally speaking, this will be a good thing: “The DIPA aims to benefit all Australians through improvements in social and economic welfare and better outcomes for businesses.”
The DIPA fact sheet explains the ABS and Australian Institute of Health and Welfare are the only two agencies so far that are accredited to handle this kind of data integration, which basically means storing the sensitive data in secure environments, only giving access to trusted users, and being careful about what outputs are published.
Other organisations can also apply to the the National Statistical Service to become Accredited Integrating Authorities. There is also a reassurance that these new datasets won’t be used to accuse citizens of doing the wrong thing:
“The DIPA will only use data for statistical and research purposes to help better understand patterns and trends, not individuals. It will not be used for enforcement or compliance purposes.”
Another four brief paragraphs somewhat gradiosely described as “cases studies that demonstrate the uses and value of data integration” point out this isn’t a new thing for government. They note MADIP and BLADE, as well as New Zealand’s Integrated Data Infrastructure, which is over a decade old, along with Western Australia’s 20-year-old healthcare data linkage scheme.
Taylor also recently foreshadowed a new data rights framework to give consumers more access and control over data about them that is stored by companies.
“The government is working with industry to develop a standardised set of APIs that must make data easily available to consumers and their agreed third party advocates,” he told a conference in the United States.
He announced this “universal data right” as a complement to the Digital Transformation Agency’s GovPass digital identity system, which is due to go public early next year, and explains both mainly as ways to increase competition in the economy by making it easier for customers to switch between competing firms.
It doesn’t seem to be one of Taylor’s main talking points but he has suggested they should also boost privacy – GovPass by foiling identity theft, and data rights by somehow moderating the growing power of companies that are “accumulating huge amounts of data” about our lives; in a Sky News interview his key examples were Facebook and the Chinese jack-of-all-trades WeChat.
What this will mean is not clear yet but the assistant minister told Ticky Fullerton he thinks “we can avoid the downside of the huge value that we can generate from data” with his new principle of data ownership.
As governments move swiftly ahead with data integration, it is imperative that they demonstrate that “huge value” to the whole of society in ways that even the least tech-savvy among us can easily grasp, because the “downside” is presently much better understood.
The routine assurances that there is nothing to worry about are based on the fact the information will be made anonymous, which leaves the spectre of re-identification attacks as the key privacy risk to manage.
Read more: A new data de-identification decision-making guide has been released to help public servants and others navigate this area. But there is still more work to be done on engagement, says Peter Leonard, a leading expert on the legal side of data use and privacy who discussed these issues with The Mandarin.
The Australian Bureau of Statistics and the Institute of Public Administration Australian (ACT Branch) are jointly hosting a half-day conference on public sector data integration in Canberra on November 3.