APS Census open data set removed amid privacy concerns

By Stephen Easton

Wednesday October 5, 2016

A privacy risk has been identified in a set of data from the Australian Public Service employee Census that was hosted on data.gov.au, prompting the second take-down of a public information asset in a week.

According to the Australian Public Service Commission, the annual open data set from the APS Census was uploaded with agency identification numbers attached for the first time this year. This created a risk that personal details of individual public servants could be identified. The APSC has been quoted elsewhere explaining:

“We decided that extra care should be taken to ensure individual officers could not be inadvertently identified. A review of the data is underway. We anticipate this data will be available via data.gov.au in the coming week.”

The data set was accessed 58 times before it was taken down, according to the APSC.

Update, 6.30pm, October 5:

The APSC confirms the information was uploaded to the open data portal on August 26, 2016, and taken down on September 13. A spokesperson told The Mandarin:

“The APS employee census collects attitudinal data. It is not administrative data and it does not collect respondent names or contact details.

“The de-identified and significantly aggregated APS employee census dataset is published annually on data.gov.au. Respondents are advised of this when they complete the survey.”

“This year, for the first time, the census dataset included a numeric code for each APS agency. No agency was named.

“The dataset was removed from data.gov.au on 13 September because the APSC decided that extra care needed to be taken to make certain that individual officers could not be identified, especially if cross referenced with a range of other publicly available data.”

It appears the privacy commissioner is not investigating at this stage, and the APSC did not respond to questions about how the issue was discovered or by whom.

Re-identification risks loom

After last week’s revelation that ID numbers for doctors and pharmacists could be extracted from a set of Department of Health data, today’s news is a second very public hiccup for the whole open data initiative, which hinges on the concept that anonymised data does not breach the privacy of the people it refers to.

However, a serious challenge that governments are facing is what degree of anonymity must be engineered into de-identified data sets prior to their release that makes them both sufficiently secure but still useful.

Over the last decade, data researchers have made leaps and bounds in their ability to re-identify individual people based on anonymous information about them, including from open data sets, their browser history or even the links they anonymously share on social media.

Steve Wilson
Steve Wilson

Consultant Steve Wilson, an expert on digital identity and privacy who has kept up with these developments, believes governments have failed to do the same. He believes a lot of the keenest supporters of open data — including researchers in other fields who use the statistical information it provides — “seem to be completely blithe to the privacy risks” that have increased in recent years.

Wilson, who is also a principal analyst with San Francisco-based Constellation Research, is of the view that these problems identified in APS Census and Department of Health data are “damaging” to the whole open data initiative.

Based on the scant details available on Wednesday morning, he speculated the problem could be to do with the “small cell size” dilemma.

In this case, knowing that a line of data refers to someone working in a specific agency that is very small would make it possible to deduce the identity of that single anonymous public servant. This might also require some other freely available information, such as that on the APSC website or in the Government Online Directory.

“There’s been a pattern over the last few years of richer and richer data releases for research purposes,” Wilson told The Mandarin. “On the other hand, little or no privacy impact assessments (PIA) are being done.”

The change to add agency identifiers to this year’s APS employee Census data release which has caused the problem is “exactly what a PIA is for” in Wilson’s view. “Privacy advocates are not naive about this,” he said. “We know there’s a trade-off, but every time you change the nature of a data set, you should do a PIA.”

He believes moves by the Australian Bureau of Statistics to enhance the data collected in the national Census and make it possible to link with other data sets were part of this push “under pressure from researchers to make these data sets richer and more useful” but with little appreciation of privacy risks revealed by other researchers, from the field of information security.

The consultant described himself as “an enormous supporter of open government” but questions whether open data is all it’s cracked up to be.

“They’re not really releasing data about the machinery of government, they’re releasing data about the worker bees and the citizens.” he said, adding “there is not much evidence all this data is actually influencing government policy” in his view.

At the same time, he sees public servants and the researchers who make use of the data often complaining that privacy concerns are a hindrance to their work and the exciting possibilities they envisage from open data, while laying a guilt trip on those who want to put the brakes on.

“Statisticians ask us to believe that all of this finer grain research is going to lead to better policy outcomes, and I’m very cynical about that,” Wilson said. “I’m very uneasy about scientists saying that their craft is going to improve the lot of ordinary Australians … when the scientists are yet to show us any evidence that their data is improving policy.”

He also thinks there will be some deterrent effect from criminalising the act of re-identifying the personal details of individuals from anonymised government data and associated actions, as Attorney-General George Brandis plans to do.

“I think that’s probably by and large, a good move, but what people don’t understand is re-identifying data is already illegal under the Privacy Act,” said Wilson.

About the author
Inline Feedbacks
View all comments
The Mandarin Premium

Insights & analysis that matter to you

Subscribe for only $5 a week

Get Premium Today