Plenty of government agencies are keen to use data to improve their own efficiency or better the services they provide the public, but simply lack the experience to know which questions to ask.
That’s the call from one of the world’s leading experts on data in government, who says agencies need to get data smart staff and start asking questions.
“Even if they have the people, they don’t quite know what can be done,” says Rayid Ghani, director of the University of Chicago’s Center for Data Science & Public Policy. It doesn’t help that there tend not to be enough staff to lend time to pursuing an undefined goal, either.
Part of the problem is that while there are always examples of using data to improve business processes coming out of large companies like Google, the work of government is often too different or specialised to fit with private sector use cases.
“If you’re a retailer and you’re doing pricing, you go buy a pricing tool and install it,” the former Obama 2012 campaign chief scientist mused at a Monash University event earlier this week.
“But if you’re a health agency or human services agency and you want to predict who’s at risk of something, you start with something vanilla. People will sell you lots of tools, but none of them do that. They do something else, and then you have to pay them to do customisation and integration and all of that, whereas the private sector already has these tools for very customised problems.”
But it can be extremely worthwhile if you figure out how to use data analysis well. It can provide useful answers for some highly vexed questions, but also some surprising ones.
One example Ghani offered was a risk prediction system for police brutality in the United States. In the system Ghani reviewed, the criteria for which officers were most likely to offend were arbitrary, having been drawn up by people sitting around a table suggesting what they thought would be the key indicators. This rendered it basically useless as a predictive tool. Not only did it flag 50% of all officers as being at-risk, making it impossible to target interventions at the right people, but some of those who would later offend weren’t even flagged — only pre-identifying 70% of problem individuals.
Looking at historical data allowed for a far more accurate picture of who was likely to offend. It also threw up some unexpected results, suggesting the number of attendances at suicide or family violence-related incidents in the previous two weeks was one predictor for brutality — underlining that the solution needed to not just be about getting rid of risky people, but managing the mental health of all staff better.
The seven most common applications for data
Ghani, who often works with government agencies to come up with data solutions, explained that most problems fit into one of seven types. The first four are the top operational problems he sees, while the others focus on longer term issues.
First, prevention and early warning systems. Crunching a large amount of data — some of which does not even need to be collected, but is sitting in a siloed database in a different branch or agency — can help predict where problems are likely to arise with much greater accuracy than existing systems.
Ghani gave the example of lead-based paint in houses in the United States. As the US Environmental Protection Agency notes, “if your home was built before 1978, there is a good chance it has lead-based paint.” Young children are particularly at risk of harm from peeling lead-based paint, thanks to their habit of picking up things off the floor — which may be covered in lead-contaminated dust — and putting them in their mouths.
But previously the government had no predictive system in place, relying on the clearly problematic risk indicator of children developing lead poisoning, which causes irreversible damage, to know where the problem lay. A predictive approach was needed, but it would not be practical to test every old house in the country for lead contamination.
Instead, it turned out there was already a large amount of useful compliance data that had not been tapped for prevention, so by putting that pre-existing data to work, the US government was able to pinpoint which houses were most at risk of having lead paint, and is now able to work with householders most at risk of exposure to deal with the problem.
Of course, as Ghani pointed out, the data itself does not solve the problem, but is one effective tool that contributes to the solution — the issue now is convincing parents something as seemingly mundane as paint could be a serious risk to their child’s health.
The second common use for data is prioritisation in compliance and inspections — the problem of “I’ve got this many things I need to inspect, I can only inspect this many, how do I prioritise that?” Ghani explained. In work he did with the EPA Ghani discovered that many of the criteria for deciding where inspections should happen were arbitrary, meaning inspectors were not targeting those likely to be non-compliant. Again, data analysis can help reveal who is most likely to be breaking the rules, allowing the agency’s finite resources to be spent where they will get results — but also meaning companies doing the right thing aren’t being bothered by government.
Third is scheduling of service delivery — “ambulances, medics, any type of thing that’s figuring out who do I send, where do I send them, how do I move them around,” he says. When emergency services respond, for example, they need to know whether one police car needs to be sent, or two. Or three fire trucks. If you don’t dispatch enough, precious time is wasted correcting the error; dispatch too many and someone else who needs them might miss out. Being able to use data to inform those decisions can help the system run more efficiently — not only are services better but you might not need to buy as many new police cars.
He gave the example of how a non-government organisation in Kenya that provides public toilets had reached its service capacity. It was sending teams to empty every toilet every day — there was no public sewage system — but could not afford to hire anyone else so could not build more. Data on how each toilet was used allowed the NGO to only attend each one as often as was necessary, boosting their capacity two and a half times — a huge return from a relatively small system tweak. City of Melbourne recently started doing something similar with so-called smart bins.
Fourth is routing information within the organisation — “you’ve got requests coming in and you need to figure out which department should they go to,” Ghani explained. “It’s a pretty mundane, boring task and right now most often humans do that.” Data can help software figure out where to send files; automation means lower costs.
Fifth is using data to figure out which intervention is most worth doing to get the desired impact. Ghani has recently been working with the Mexican government, for example, around maternal mortality.
“They tried to reduce it but they just weren’t sure why it wasn’t going down. So they wanted more data to figure out of the 3000 policies they could possibly modify, which ones should we prioritise? Which five or six should we narrow it down to that we then explore and decide which one is the policy to do?”
Sixth is conducting evaluations that can then be used to optimise policy, “where you’re really looking at historical data to see who did the policy work on, which people did it help, which people did it hurt.” Often evaluation is used to justify funding, but Ghani says historical data can be used more often to figure out how to better target citizens in future.
The final use, Ghani notes, is working out how to turn unstructured data into structured data. This means processing information held in audio, video or text to enter into a database which can then be used for other purposes.
Commit to genuine collaboration
Education and training need to improve if governments are to properly harness the possibilities of data, Ghani says.
“Universities generally do a pretty bad job at training people how to do useful things with data science in general, but especially with problems governments are facing. Most students, you ask them what are the top five problems that a health agency faces? You’re going to get a blank look,” he suggests.
Even hackathons tend to be of limited value, tending to result in a map, often telling agencies information they already know.
Too many public sector bosses don’t know which questions to ask, how to build a data team or even consume the results they’re given, he adds.
There is a lack of use cases governments can adapt. And the incentives make it worse — doing data analysis can have a significant positive impact, but the perceived risks and lack of criticality can mean momentum never builds.
Collaboration between the public sector and universities is a great way to build experience and produce useful results.
“But our constraint is these projects have to be real projects, they can’t be made up,” Ghani notes. It needs to be worth the effort, which will be undermined by secrecy, control or unwillingness to use the results.
“It needs to have real data that you have. It can’t be you download some data from somewhere on the web and play around with it. And it needs to be a real agency, an organisation that’s willing to implement and validate and actually do something with this.”
The other thing standing in the way of governments using data science more is that the tools developed often remain on a (digital) shelf gathering dust.
“They often stay siloed somewhere in whoever did the work and nobody else finds out about them. So the next person has to start from scratch,” he says.
“So given that government is about helping people, what about open sourcing these things, or … you could create reusable software or data. Those are the type of things we’ve been focused on.”