In the first part of this essay, I elaborated on evidence-based policymaking and service delivery, pointing to all manner of pathologies that must be dealt with to deliver something effective. The way in which KPIs distort reporting and can pervert incentives have been well known at least since Gosplan, though no doubt one could find examples of it in the ancient world.
But there are many more problems from the myriad practical challenges and compromises involved in measuring outputs and outcomes to pathologies of culture and delegation. Yet feedback between delivery of policy and services and measurement of outcomes is fundamental to building a high performing organisational capability. I also pointed to the grand, progressive project of the nineteenth century that saw all manner of institutional development.
Just as an aside, since I mentioned how antithetical academic culture is to evidence-based policy, I note that think tanks have evolved to fill that space but sadly they are mostly funded to push ideological barrows. The British have introduced ‘what works’ centres to fill the void. They’re focused on discovering and communicating information to practitioners in a way that is useful – whereas academics are engaged in an activity that (scandalously) many argue is largely indifferent to being useful or even right.
In any event, this post sets out the case for a new institution — the evaluator-general. The problems the institution must solve or, to speak more modestly, ameliorate, include:
- ensuring that monitoring and evaluation is done well – no mean feat given the inherent difficulty of the task coupled with its lowly status compared with grand policy making which is where the big public service career payoffs are.
- the bureaucracy’s penchant for soft secrecy and euphemism which ramifies through every level of the service and through to its political masters
- the need to build a relationship of mutual respect and accommodation — of true collaboration — between monitoring and evaluation at the coalface and its aggregation and standardisation at the system level. That means:
- Finding ways to interdict the ‘default’ setting in such arrangements for those in the centre of the system — who are typically at the top of a hierarchy — from dictating terms to those at the coalface.
- Monitoring and evaluation at the coalface should be principally for the purpose of optimising effectiveness and efficiency at the coalface. But its output should also flow to the centre where it will be used to make decisions on the quality and efficacy of the services being delivered at the coalface and thence in ‘managing’ the coalface including by rewarding and penalising, expanding and contracting services. The perverse incentives such a process unleashes must degrade the information and its flow as little as possible.
So here’s what I propose. Rather than simply being talked about by prime ministers as if it were all commonsense, evidence-based policy and its organisational underpinning — monitoring and evaluation — becomes a key function of any concerted endeavour in the public sector as accounting is. (I’ll make some limiting comments on this later, but for the purposes of exposition assume a concerted endeavour is a program — like chaplains in schools, the R&D tax credit or a police program to reduce domestic violence.)
No new program could be introduced without a properly worked up monitoring and evaluation plan. Existing programs would be systematically exposed to this regime over time. Evaluation should be done at the level of delivery. As this occurs, further synthesis and analysis at various levels will typically occur.
Evaluation would be done by people with domain skills in both evaluation and in the service delivery area who were formally officers of the Office of the Evaluator-General. They would report to both the portfolio agency delivering the program and to the evaluator-general with the EG being the senior partner in the event of irreconcilable disagreement. All data and information gathered would travel to the centre of both the EG’s and the departmental systems. Meanwhile, the portfolio agency would report to their minister but the EG would report to Parliament — as the auditor-general does.
The monitoring and evaluation system would be built from the start to maximise the extent to which its outputs can be made public and the public could be given access to query the system, though the system itself would only provide public information outputs that met strict privacy safeguards.
Good intentions versus good outcomes
So does this ameliorate the problems enumerated above?
- Expertise in evaluation has a home which is independent, has some teeth and offers career progression (a critical thing in addressing what I call problems of irreducibility which I have not elaborated in this post).
- Soft secrecy and euphemism finds it difficult to contaminate the evaluation system. It won’t be the case, as it is now, that “everyone gets a prize” — that virtually all programs are publicly reported as a success. The EG has their ‘man on the ground’ in the bowels of the system — measuring its efficacy. Officers of the EG will make sure there’s an auditable chain of accountability to resist pressures to game the system. The information emerges in the ordinary course of the EG’s reporting — without pressures to suppress information and gild the lily. As with the auditor-general, there are minimal incentives to make the government or the wider executive look good — or bad.
- The involvement of EG officers in the system, in system design and at the ‘coalface’ of delivery, should help reduce the tendency of the top of hierarchies to dictate terms to those at the coalface which is necessary if an organic relationship of mutual accommodation is to be built. The whole monitoring and evaluation system only has an interest in generating accurate information. The intrinsic motivation of EG officers should generally lead EG officers to help the agencies they work with to improve their performance. At least they should have minimal incentives to degrade the information system, which is itself, a huge de-motivator.
This is also a development of the principles of Westminster government which I’d argue is constructed from two separate systems. Both systems aim at public good, but the former (which delivers government services and assists the political executive decide what should be done) does it on the presumption that public good is best done via a competition between political representatives from which the public can choose. The latter (which one might suggest provides the informational superstructure on which the former operates) reports on what is being done and guards basic integrity. It aims directly at public good outputs that are owned by all.
This offers some scope to reconfigure the vexed idea of civil service independence along the is/ought positive/normative distinction. Reporting on what is should, in principle be independent – ie constituted as a direct public good reporting through the Parliament. Advising and doing what the government judges ought to be done is done is a ‘contestable’ public good and occurs at the direction of the government of the day. This is also the principle appealed to in a recently published Mandarin piece of mine and Nick Kamper’s arguing for the release of analytical models and their movement to “neutral territory” under the aegis of the Parliamentary Budget Office.
In proselytising the case, piggybacking a proposal onto an existing institution usually helps to squeeze it through the Overton Window. One could make the EG an office within the auditor-general’s function. After all program evaluation is already one of the auditor’s functions. This might be worth trying in the short term, but I doubt it’s a good idea even to that extent. Despite auditor-generals’ fondest endeavours their involvement is often seen as inimical to innovation. By contrast a central purpose of the EG would be to grow the intelligence for the system to successfully innovate.
Finally I’m not much of a fan of widely rolling ambitious new approaches even my own — before we’ve figured out if they work. So I think this approach should be trialled, initially perhaps with just a few programs perhaps in several different portfolios and adapted, expanded or abandoned on its merits over time.