Evidence-based policy … if everyone claims to want it, and practice it, why do so many ‘announceables’ get glowing report cards without the slightest tinkering? Nicholas Gruen on the problems with policy evaluation.
Calling for policy to be more ‘‘evidence-based” rolls off the tongue with beguiling ease. What’s there not to like?
Yet a closer look reveals that policy has always been, and remains today, a largely evidence-free zone. In this essay I want to explore the forces that bring this about with my next piece proposing some steps toward a solution.
The flow of information is central to governance. That’s why specifying standard weights and measures turns up as part of the sovereign’s role in Magna Carta. The integrity of the information flow is, if you’ll pardon the tautology, integral to the package. So it’s not surprising vouchsafing the integrity of accounts also goes back centuries. The UK National Audit Office traces its lineage to the auditor of the exchequer in 1314.
“For all the talking up his own wonkish commitment to evidence-based policymaking, little was done to build public sector capability or practice … “
Integrity institutions like the auditor-general came into their own in the nineteenth century. But if the professionalisation of audit arose from the increasing complexity of government, what’s developed since draws us well beyond the basic idea of audit for integrity.
Indeed today, auditor-general’s performance audits deliberate on efficiency and effectiveness. But if an organisation lacks the right information systems, an audit from outside cannot determine with any accuracy its effectiveness or efficiency. Of course auditors-general can critique an agency’s monitoring and evaluation system. Yet, many years of routine performance auditing notwithstanding, one study found “less than $1 of every $100 the [US] federal government spends is backed by even the most basic evidence”. Would things be much different here?
Of course we want government activities to be evidence-based. And yes, there really are lots of ways in which governments could take advantage of “big data”. But that’s the TED talk. Back in the world of actual experience, policy development and even a lot of policy thinking hasn’t got far beyond high-level slogans. In 2007, Allen Schick’s review of performance budgeting and accrual budgeting offered this observation: “Australia’s ambitious strategy to thoroughly evaluate all programmes was accorded much publicity, but no announcement was made when the strategy was terminated”.
In fact it had been the Hawke/Keating governments that had launched more the high-level evaluation strategy identified by Schick and substantial work was done and capability was built under the program. The Howard government wound it back. But for all new prime minister Kevin Rudd’s talking up his own wonkish commitment to evidence-based policymaking, little was done to build public sector capability or practice in the area.
The NSW Centre for Program Evaluation was a product of the NSW Commission of Audit in 2012. Active for three years now, it has not released any publication on any NSW policy initiative. In fact, a review of the NSW liquor and lock-out laws was completed last year, but in a sensitive political climate it has not been released.
Agendas, endless loops and too many cooks
Once one gets beyond the slogans, it becomes clear how much difficult detail there is. Monitoring and evaluation involves all manner of compromises between objectives. Like the choices involved in choosing to be approximately right rather than precisely wrong. If you’re trying to promote something as intangible as social capital, or something as (relatively) simple as better targeting government income support to those most in need, what will you measure?
Apart from the intrinsic difficulty of measuring such things, there are all manner of other issues involved. Different parties will want different information. Those at the coalface — if they’re any good — will want evidence that assists in the endless feedback loop of measuring their impact and improving it over time. Those at the centre of the system may be interested in this information, but they’ll have their own objectives.
To establish a good monitoring and evaluation regime you need to work methodically from general objectives to delivery at the coalface. But here’s the thing. Those at the centre of the system need to listen to those at the coalface every bit as much as those at the coalface need to listen to those at the centre. After all, those at the coalface are where the action is. Where connections might be made between one objective government has — reducing domestic violence for instance — and other objectives — like improving educational outcomes and keeping families together and kids safe.
So the coalface and those at the centre of the system must listen conscientiously to each other to jointly serve the wellbeing of the whole system. Yet the centre and the coalface of the system are also respectively, the top and bottom of a hierarchy. Now those at the bottom of the system listen intently to the wishes of those at the top as if their career prospects depend on it (they do). But when it happens at all, those at the top listen to those below as an act of noblesse oblige.
Thus for instance in JobActive (previously known as the Job Network), service providers may be required to determine how many job applications a job seeker has made. Yet those at the coalface may have good evidence that such reporting is counterproductive — and puts the whole interaction in a punitive frame which may vitiate the very objectives of the program.
The best evaluation money can buy
Another huge issue arises when considering the purpose of evidence. If evidence is used by those higher in a system about those lower in the system, this unleashes strong incentives which will probably degrade the information flow.
It’s commonsense that, while the agency should be closely involved in any evaluation of its performance, it shouldn’t control the evaluation. Yet this is standard fare. Thus for about fifteen years state services have been benchmarked by the PC-produced “blue book”. Yet each year I search the statement for any discussion of auditing the data or seeking to ensure it’s untainted by the bias of specific interests of those collecting it. I’ve never found it.“Everyone wins a prize. There are no unsuccessful projects.”
Then there’s agencies performing their own evaluations, or choosing “independent” parties to provide such evaluation of agency programs. Obviously a crock from the start. If you’re one of those contractors, you can be — you’re encouraged to be — independent. But within reason. Guess how much repeat business you’ll get if you find that the agencies that commission evaluations aren’t delivering value for money? Likewise in the role-play that is — ahem — Best Practice Regulation, agencies who see it as their job to get up regulation for their ministers do the regulatory impact statements. Thus the letter of this law of evidence-based policymaking is met while delivering a travesty of its spirit.
Indeed, there are incentives at every level of the system for those generating information at a lower level to obfuscate and euphemise the bad news and highlight the good news to those above them. Indeed as I argued in another context, government is performance. This generates incentives originally projected down from the political level but assiduously transmitted at each layer in the hierarchy producing a generalised preference of the system for generality over specificity, euphemism over candour.
Everyone wins a prize. There are no unsuccessful projects. (And the inability to fire anyone for generalised uselessness creates some pretty strong incentives for avoiding information systems from which one could diagnose it.)
Ideally information should be generated by the coalface for its own purposes — namely to optimise its own performance — and then aggregated through to the centre in such a way that doesn’t unleash perverse incentives.
This is what Toyota seems to have done within its production system. Toyota spent literally ten times as much per worker as American auto firms on training. They were trained in statistical control, had job security which minimised perverse incentives and were given the task of using the information to endlessly optimise their productivity. That’s one model of how we should be trying to set up the information capillaries and arteries of our programs. But it’s a difficult, fraught, business.
Think outside the university
There are also a bunch of cultural and resource issues. Lots of the coalface workers delivering services feel uncomfortable in an evaluation culture — certainly one imposed from above. If they’re to come up with a regime that works well they may need additional resources from others more skilled in and comfortable with evaluation.
And profound culture problems in academia lie in wait for the unwary. For many it’s natural to think of universities when seeking improvements in evidence-based policymaking, the performance of independent evaluations of policy and so on.
Yet apart from the incentive incompatibilities of allowing those delivering programs to commission their own evaluations, academic participation often pushes the system towards “Rolls Royce” solutions.
Academics’ incentives are to display their (academic) expertise and to generate learned publications on the boundaries of their discipline’s knowledge. But the thing is, most data for decision-making in programs is like data for decision-making in business. There should be virtually no consideration of arbitrary standards of statistical significance — a major preoccupation of academic evaluation — and the timeframes should often be days — when academics’ time frames are months and years.
The standard of evidence-based decision making in a well run business is some considered (often just commonsensical) compromise between values such as timeliness, the probative force of evidence, cost, convenience and the absence of good reasons to the contrary. Thus various “nudge units” have made A/B testing normal in government — at last! But this is skill that requires a low degree of academic prowess, and should be done in real time. The same goes for most decisions in most programs, though of course more substantial independent evaluations may have some place in the scheme of things.
So I think we need to work towards a quite new model of evidence-based policy, delivery and decision making, a subject I’ll take up in the next instalment of this essay.
 Schick, Allen, 2007. “Performance Budgeting and Accrual Budgeting: Decision Rules or Analytic Tools?, Journal on Budgeting, Volume 7, No. 2, OECD.
 Keith Mackay, 2011, The Performance Framework of the Australian Government, 1987 to 2011, Vol. 11, (3), pp 1–48.
 It’s remarkable that, given economics’ provenance as a ‘science’ of public policy, monitoring and evaluation of the efficacy of policy isn’t a compulsory part of an economics degree. (After all, ‘evidence’ is a part of a law degree – though of course it doesn’t involve much theoretical reflection on the nature of evidence but rather learning the legal rules – both wise and foolish – made by the profession around evidence).
 David Collander’s article “Creating Humble Economists: A Code of Ethics for Economists” is worthwhile in this regard proposing that economics should think of themselves as engineers not scientists:
“Engineering is different than science. The primary goal of engineers is solving a specific problem with available resources, and an engineering solution can only be judged relative to its cost. Whereas the scientific method does not allow shortcuts to save time and money, the engineering method does. Engineering is by nature applied, and it has no scientific core, or general formal methodological prescriptions based on the scientific methods. Billy Vaughn Koen, (2003) who has written what appears to be the current standard methodological treatise for engineering defines the engineering method as “The strategy for causing the best change in a poorly understood or uncertain situation within the available resources.” He describes an engineer as an individual who solves problems using engineering heuristics. He argues that an 3 engineer makes no pretence of having found the truth, or having found the “correct” model. 1 An engineer focuses on finding solutions that work, and uses whatever methods he finds best leads to finding a solution to the particular problem he is trying to solve. In the engineering field, there are no rigid prescriptions guiding method.”
Continue reading at The Mandarin: Nicholas Gruen on why Australia needs an evaluator-general