Expanding on in his Evaluator General theory, lateral thinking economist Nicholas Gruen suggests we look at evaluation in a completely new light.
Because the idea I have called “the Evaluator General” is several ideas knitted together to try to resolve several dilemmas, it comes with numerous implications and intentions that are often missed or misunderstood. So I’ve addressed them separately in specific articles. This article is in the same spirit, explaining that a central goal for me is for evaluation to become less of a ‘thing’ – separate from the activity it’s evaluating.
For better or worse, policymakers tend to come at evaluation from one of just two perspectives. First, for program managers it can give them an independent set of eyes to help assess how they’re going and how to improve. This can be particularly important in the public sector where objectives are multiple and won’t generally map onto any financial metric the way profit is the ultimate indicator of success in the private sector. Second, for those governing and funding programs, also meets accountability needs – if one party is doing something technically challenging and what they’ve achieved or how they achieved it isn’t transparent to outsiders – separate evaluation can be useful.
This and various other exigencies, such as people’s desire to build and participate in ‘professions’ has led to the growing institutionalisation and professionalisation of evaluation. There’s plenty to like about this. And as a new discipline and profession, evaluation is much fresher than mature disciplines whose intellectual foundations are deeply unsatisfactory and yet ossified years ago. This is true of my discipline economics1 but of others too, their commanding heights confined to academia, an increasingly bureaucratised, fast foodified institution.
Evaluation is also many sided – only a few decades old, yet now a vast, loose network of approaches. It’s being taken to be something far more settled and definitive in the push for more evaluation. I think it contains riches. But getting something evaluated isn’t like getting financial accounts audited or one’s design for a bridge checked by an engineer.
Indeed one of the more interesting and I expect productive areas of evaluation if done well is so called ‘goal free’ evaluation. There, the evaluator assesses the impact of the program without calibrating it against – and ideally not even knowing what the program’s stated goals were. This can improve program hygiene just as double blindness adds to the hygiene of a randomised controlled trial. It can also facilitate wider, and so potentially more powerful evaluative insights. (Nothing could demonstrate its value better than the fact that it’s hard to find any sign that those in the central agencies know of its existence. It rarely dawns on the Great and the Good that it might be best if they forebore from directing each and every purpose of the endeavours they fund).
Further, if you look around at the great technical achievements of humanity ‘evaluation’ didn’t play much of a role in them – the Apollo program, the development of the internet, the Manhattan Project which, in a few short years took an equation and produced two working prototypes of weapons that can now blow up the entire species. And nor did ‘evaluation’ – conceived as formal and separate from delivering the goods – play much of a role in the delivery of AlphaZero’s technical wizardry in chess or the miracle of the Toyota Production System.
All of those achievements involved lots of evaluative thinking. But that thinking took place as part of the process of doing the work, not as a professionally delivered ‘thing’ from outside. But this isn’t how professions work. Professions sell services and so ‘evaluation’ is being brought into the production of government services as plumbing or landscaping would be. That’s just one reason why it’s not working all that well and won’t if we continue to misunderstand it.
On reading through the PC’s recent Background Paper accompanying its Draft Strategy on Indigenous Evaluation, you can read some of its stipulations as wise advice on how government agencies should manage its use of this new profession.
Evaluation is most effective when it is integrated into each stage of policy and program development, from setting policy objectives and collecting baseline data, through to using evaluation findings to inform future policy and program design.2
Or you can read it as evidence of the inherently problematic nature of evaluation when it’s configured as something that’s separate to, in principle, outside the thing it’s evaluating.
As I developed my idea of the Evaluator General, I had this second interpretation in my mind. If you think about it carefully, I simply don’t know how you can operationalise what the PC specifies as the requirements of good evaluation except by bringing it into and alongside operations in an ongoing capacity. Evaluative thinking is of the essence in most of the improvement organisations manage. And it’s in short supply – thus for instance the New Zealand government’s Wellbeing strategy is focused on measuring wellbeing without directly considering how they can improve it. No doubt they’ll evaluate it at some stage, but I can tell them that now.
Good program design should contain a great deal of evaluation. If a particular mechanism is important – that people will respond to a particular call to action in a letter for instance, or that children with particular learning needs are best handled in some particular way – it can be tested before we commit to it. And then again and again after we have. This is one of the things that, sad to say, it took ‘nudge units’ to introduce into many government programs. But evaluation and testing goes on all the time in a well run organisation. It’s going on in Facebook and Google and Amazon and Toyota in numerous sites and programs as we speak.
There will no doubt be times where there’s a case for stepping back and so putting some space between operations and their evaluation. But that’s really quite rare in well run organisations other than those corralled into it by the requirements of process. In numerous examples presented in boxes by the Commission in its Guide to its Draft Strategy, evaluation is asked to answer questions that come up, and could easily be handled as the program went on. In fact I’m not sure there are any where it wouldn’t arguably be better.
Be that as it may, this was precisely one of the things I wanted to encourage with my proposal for an Evaluator General. Under the arrangements as I envisage them, those delivering services work away for their line agency alongside those with expertise in evaluation who report to the line agency but are formally under the direction of the Evaluator General. Together those whose job is to do, and those whose job is to know collaborate to understand and improve the program day in day out.
In his best-seller The lean start-up Eric Rees writes about how start-ups should use their presence in the market to learn. Instead of making complex plans based on lots of assumptions, he recommends making:
constant adjustments with a steering wheel called the Build-Measure-Learn feedback loop. Through this process of steering, we can learn when and if it’s time to make a sharp turn called a pivot or whether we should persevere along our current path.3
Now re-read the earlier passage quoted from the Commission. I defy you to explain how what’s called for can be delivered if evaluation is separated from what its evaluating. That’s why in my model, the Evaluator General is responsible for monitoring and evaluation. Its officers are tasked with knowing and recording and prompting the evaluative thinking which, while it should assist with meeting pre-set program goals, should also range more broadly around all the things the program is achieving and might be brought to achieve.
1. As the philosopher Martha Nussbaum put it:
[W]e have to grapple with the sad fact that contemporary economics has not yet put itself onto the map of conceptually respectable theories of human action. (Indeed, it has repudiated the rich foundations that the philosophical anthropology of Adam Smith offered it).
2. p. 154.
3. Eric Ries. The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses, Currency, p. 41.