Big data overload: keep analytics focused on business needs

By Stephen Easton

Friday July 22, 2016

The astounding possibilities of the big data era and the internet of things are only just beginning to emerge and already there is a problem: data overload.

Artificial intelligence is a big part of successful new data-driven approaches to public sector challenges, like the Department of Immigration and Border Protection’s Border Risk Identification System, and might also be the key to extracting business value from the data deluge.

The ACT government’s recently appointed chief digital officer Jon Cumming says it’s “pretty clear” that AI has a major role to play in big data analytics, and sees new challenges emerging as we increasingly rely on it to fish meaningful, useful insights from the growing seas of digital information.

Questions that interested early science fiction authors like Isaac Asimov — who speculated about both AI and the possibilities for predictive data analysis before either were anywhere near reality — are becoming real concerns.

“Those sort of possibilities actually become real, where the artificial intelligence … starts making decisions that weren’t necessarily entirely consistent with what we intended,” said Cumming, in a recent presentation to the Canberra Evaluation Forum.

He imagines that in future, advice or decisions made by AI systems might need to be peer-reviewed by other separate AI entities.

“The exciting part is, things in the past we thought didn’t matter will suddenly be shown to actually matter, and they’ll be indicators, or proxies for other things,” Cumming added.

“We can’t sit there just analysing everything and expect to make progress.”

According to an estimate from IBM, 90% of the world’s data was created in the last two years. The amount is growing exponentially and the internet of things will only add to the growing abundance of information, with lots of real-time data available making predictive analytics increasingly accurate.

Another interesting factoid, quoted by Cumming, is that only about 80% of that data is being analysed.

“And I would suggest that’s probably going to drop rather than get bigger, unless we do something different to the way we have,” he continued, pointing out that the problem of paralysis by analysis — common in the public sector more generally — can still apply in the digital realm.

Taking larger and larger amounts of data and analysing it more and more quickly won’t solve all the world’s problems. It won’t necessarily reveal what’s important for particular business aims. Nor will it make decisions for us.

“We can’t sit there just analysing everything and expect to make progress, because the learning curve is all ahead of us,” said Cumming, who came across from New Zealand to be the ACT government’s first CDO.

He argued the best bet to explore what potential there is for big data, the internet of things and AI was to “just get this stuff, play with it in a safe way, and then we’ll learn and understand more and more about it — and we’ll understand what we don’t understand, which sometimes is more important”.

Most of the time, Cumming points out, government has to work pretty hard to gain the trust of citizens and their consent to use data about them.

“For government, the challenge we have is that we’re trying to achieve really complex outcomes … and that requires a high level of trust from our consumers that we’re doing the right thing,” he said, pointing out that while most people give very broad consent to companies like Facebook and Google to collect and use data about them for a wide range of uses, they are much more wary of government.

“Information overload may end up resulting in simplistic prejudiced soundbites.”

People agree to Facebook’s terms and conditions because they accept the personal benefit outweighs the perceived cost. As the Australian Bureau of Statistics has found with its decision to retain names in this year’s Census, it’s much more difficult to get individuals to make a similar trade-off in exchange for the benefits that a particular piece of public service work will provide society as a whole.

Another risk from the big data overload is a situation of widespread confusion, where it gets increasingly difficult to discern which pieces of information to believe, according to Cumming.

“Information overload may end up resulting in simplistic prejudiced soundbites, and the response to that is to make sure that we are very human-centered in the way that we present our data — not just claiming logical superiority [but] genuinely convincing people,” he said.

“Other people are out there making simple statements that are entirely wrong, so we need to be intelligent and simple and human-centred in the way we talk about things.”

The situation that already plays itself out in politics where the “loudest” and simplest claim often drowns out more accurate understandings could get much, much worse because “somewhere in that big data, you’ll be able to find a piece of data that supports your argument, and you’ll just shout it louder”.

“And our job [as public servants] is to actually look into dark corners and pull out all the data and put it together in a way that gives a genuine feel for the landscape of what we’re looking at,” said Cumming.

Data is only useful when it leads to actionable information

Immigration’s impressive Border Risk Identification System provided the CEF attendees with an example of what can be achieved quickly by a small team of geeks experimenting in a safe environment. The prototype was built in just six weeks on a desktop and is able to analyse data about all incoming passengers, collected at point of departure. It took about a year and less than $1 million to fully roll out.

“My contention is I can regulate less and find more bad guys at the same time.”

Aiming to perform considered analysis on information about 100% of passengers would be “ridiculous” if human staff had been applied to the job, said Gavin McCairns, who helped develop BRIS as a first assistant secretary at Immigration several years ago.

But that’s what a learning system built by a small team could do. One result was that DIBP halved the number of people its officers “tapped on the shoulder” on arrival, but actually doubled the number of people it refused entry.

“That’s astonishing,” said McCairns, who is now in charge of chasing down money launderers as acting chief executive of AUSTRAC. “It’s a 400% efficiency gain. Volume is still growing, no more staff.”

The passenger risk assessment algorithm behind BRIS was built using an open source programming language called R, and profiles passengers based on details like their age, gender, nationality and country of birth. It is guided by patterns in a wealth of data about people who had visas granted, refused and cancelled, as well as who was detained and who was deported.

While it vastly increased the agency’s ability to regulate the flow of people into the country, it all happens without the passengers knowing, so there is no extra red tape.

“You’d think if I was profiling everyone I’d bug more people,” said McCairns, who emphasised the point that government regulators like DIBP and AUSTRAC not only have to “stop more bad guys” but they have to do it without placing too much burden on the legitimate economy.

“So if I apply this to my regulatory environment at the moment [at AUSTRAC], my contention is I can regulate less and find more bad guys at the same time,” he told the CEF. “That’s a really hard call, but that’s my contention, and I’m going to try and build models to do that. I’ve only been here six months, so I need to do some more models.”

Evaluation is built into BRIS, according to McCairns, because “all the reality is plugged back into the model” and used by the system to learn.

For example, if the “dodgy” passengers were commonly using fake Norwegian passports, the system would notice and more would be pulled aside on arrival. If the crims realised Norwegian passports were suddenly under suspicion, McCairns says BRIS would soon adjust.

“So we’ve built an evaluation within the model,” he said, “and the model starts to slow itself down if the reality doesn’t match the model. We’ve built artificial intelligence into it. So what it does is it actually builds its own new models … to go with the reality.”

A recently completed statutory review is likely to give AUSTRAC more scope to share data, within the public sector and with private sector organisations, subject to privacy regulations, of course. The agency also intends to establish a Financial Intelligence Centre of Excellence to foster collaborative work and information sharing.

With AUSTRAC’s growing corps of overseas staff stationed inside equivalent financial tracking agencies, McCairns hopes to set up some kind of clever analytical system with data fed into from all over the world.

But despite his enthusiasm for what big data analytics can achieve, McCairns is adamant that business needs always have to come first and technology second. Opening his talk, he emphasised that to be valuable, data must lead to information that can help inform decisive action.

About the author
Inline Feedbacks
View all comments
The Mandarin Premium

Insights & analysis that matter to you

Subscribe for only $5 a week


Get Premium Today