Big data useless without human element

By David Donaldson

September 4, 2015

Big data is an invaluable tool for government, but by itself won’t reveal all the answers you’re looking for — for that you need people, argues Barry Sandison, deputy secretary, Health Compliance and Information at the Department of Human Services.

“You have to be curious. You’ve got to wonder, you’ve got to look at ‘what if?’,” he said at last month’s Australia and New Zealand School of Government conference in Melbourne.

“You can’t just assume that data is going to get there and say ‘here is the answer that’s what we’ve been looking for’. If you are, you’re probably trying to channel the answer and get the answer you want, rather than what the data will tell you.”

He said when he first heard Donald Rumsfeld’s maxim about known-knowns, known-unknowns et cetera, he thought it trite, but “getting involved in the data space over the last few years, it actually started to make sense.” There are plenty of unknown-unknowns in data, and the best way to uncover them is to start digging.

Sandison’s advice was to make sure there are people familiar with the subject matter on board and maintain an open mind about how to approach the data:

“We can use the data in a variety of ways … you’re only getting in there with truly understanding the attributes of your data set. Again that’s back to the people who know and understand how to get in there,” he said.

“Go and find something, have a starting proposition, a bit of a target, but then work through, don’t narrow it down too much, because then you’re asking what you think is the right question about what you think is the right data, and you could be totally wrong and waste a lot of time.”

Records of address changes, payment histories, incomes and a range of other indicators over long periods mean DHS datasets are useful in longitudinal studies. Even if data analytics ultimately tell you what you already suspected, having hard evidence helps in battles for funding with central agencies.

Take foster care, for example — foster carers receive the family tax benefit and are thus trackable as a cohort — thanks to DHS data, policy developers know that those who had one foster carer have a 40% likelihood of being on welfare ten years later, whereas 82% of those who had six carers were on welfare after a decade.

“No social worker would be surprised at all, but it gives evidence, gives data, to argue, ‘well what might be done differently?’,” he said.

“And if you’re curious, it might get you to ask other questions — is it more meaningful to when you’re a teenager versus when you’re a young person?”

DHS is improving its understanding and use of unstructured data. “We’re getting better at that, we’re not great. But understanding who’s saying what, how we can interact as an organisation immediately if something starts trending about a particular issue,” he stated.

Although many of the insights from unstructured data relates to policy questions, sometimes the service delivery becomes the talking point — queues, waiting times and technology platforms all come up in conversations.

Sandison welcomed collaboration — on a cost-recovery basis — with other departments, agencies and organisations outside government in accessing some of the vast trove of data it holds on the millions of Australians who have had interactions with the welfare system.

A lot of the skill is in asking the right questions, he argues — so DHS has “helped some people to ask us the right question … give us an idea of the question, we help frame it, and then we’ll see if we can respond.”

But because de-identifying and cleaning datasets can be time-consuming, Sandison recommends seeing if there’s already data available on or elsewhere.

“Before you get to integrating data, and looking through datasets, look at what data is around that’s de-identified. Don’t have the fight that could take you a year to work your way through. You can use state data alone, Commonwealth data, whatever, but take your time, understand what’s in there, work out what your true focus might be, then if need be have the debate around privacy,” he thinks.

Understanding the numbers is all about people. Sandison gave a simple example of how different approaches, the selection of different datasets on the same group of interest, can lead to different conclusions.

When it comes to public housing, “when somebody moves, surprise surprise, somebody else on welfare moves into the state housing. So if you just look at the isolated dataset, it looks like nothing has changed, there’s been no improvement,” he explained.

“But if you stop and really understand what’s in the data and look at it, and you look at perhaps people moving out and having a change in their life, and somebody else moving in, [that means] you look at the overall changes and the patterns that sit within the community, not just the raw set of numbers,” said Sandison.

“One of the things is you’ve got to be curious. Data won’t give you the instant answer. You’ve got to stop and think about what it might be and follow the data trail. Follow your curiosity and see what happens. Bit by bit, some of the pictures become clearer.”

About the author
Inline Feedbacks
View all comments
The Mandarin Premium

Insights & analysis that matter to you

Subscribe for only $5 a week


Get Premium Today