Tom Burton: five lessons from #CensusFail (updated)

By Tom Burton

August 14, 2016

It is still early days in the #CensusFail debacle — and many of the important facts are not known or verified — but there are some obvious early lessons.

One: learn well

This is the digital equivalent of an airline crash. If there is one thing the aviation industry is good at, it is meticulously reviewing what actually went wrong and what are the systemic and practical learnings for all players.

For government and their agencies who are being implored to innovate and change, it is particularly important there be a sober and considered assessment based on what can be learnt. The public sector and its leadership is forever being lectured to about being risk averse. If we are to move away from the command and control culture, then we have to be mature enough to sensibly understand and learn from failures and poor judgement. As the mantra goes: fail fast.

For Malcolm Turnbull this is even more so. His whole brand as a Prime Minister has been to remake the traditional blame game that has so infected Australian politics. As he said at the launch of The Mandarin: “We’ve got to try new things and, if you try new things, a lot of them won’t work, but so what? If you smash people because they try something and it doesn’t work then they’ll never try anything new again.”

Two: risk management

Changing the public sector risk model is a theme Turnbull has consistently pushed. At the launch of the Innovation and Science Agenda in December he called for a fundamentally new approach: “One of the aspects of the political paradigm I am seeking to change is the old politics where politicians felt they had to guarantee that every policy would work, they had to water everything down so there was no element of risk,” he said.

“If some of these policies are not as successful as we like, we will change them. We will learn from them. Because that is what a 21st century government has got to be. It has got to be as agile as the start-up businesses it seeks to inspire.”

And at an IPAA speech this year he implored senior bureaucrats to take up his transformation program. “Open your minds and be bold,” he told the cream of Canberra’s public service.

If at the first hurdle we resort back to an old fashioned lynching mob, then you can say goodbye to any real public sector innovation out of Canberra for another decade.

Three: accountability in a digital era

In the Westminster system it is the portfolio minister who is meant to take the rap. In this case it is Treasurer, Scott Morrison, aided and abetted in recent times by tyro Minister Michael McCormack. While there are lots of political inspired observations about who should be responsible, the reality is this is a multi-player problem involving a very fickle technology.

I am fond of saying that web technology is akin to the Apollo 11 mission. Much of it is stuck together with rubber bands and band-aids. Every day we fly to the moon and back by the grace of God. The digital world is fabulously transformative, but at the end of the day it is basically electrons being herded (at massive pace) through the technological equivalent of a sheep gate, AKA a transistor.

That said, the public agencies who have, or should have had clear roles to deliver a successful census include, the Australian Bureau of Statistics, Treasury as the policy and portfolio department, the respective Ministerial offices and their ministers, the Digital Transformation Office, the Australian Cyber Security Centre and the Australian Signals Directorate.

The prime contractor IBM (a content partner of The Mandarin) and its network partners, NextGen Networks  and Telstra, will have to fully account for the system and application design and in particular the cyber defence plan. All are best of breed, with massive technical capability between them, but the revelation by Turnbull that there was no DDoS mitigation plan beggars belief.

Telstra for instance operates a sophisticated clean pipe technology and scrubbing system, that purports to remove malevolent traffic and which normally could be relied upon to repeal most DDoS attacks. IBM for its part has one of the biggest security practices on the world, monitoring over 20 billion security events a day.

Web installations are like large plumbing exercises, with various associated supportive hardware such the DNS servers and routers. The installation is configured with different size pipes and powered by a stack of computers. Some of this technology is actual hardware, some is virtualised. Some is locally hosted, some is in a far away data centre, aka the cloud. These days, cyber security also includes sophisticated techniques to monitor and defeat malevolent activity well before it reaches the core.

How the actual security architecture was put together for this site, the hardware redundancy plan and the professional judgments that were made once the site was inevitably attacked, will be the nub of the review.  And hopefully a good set of learnings — albeit done the hard way.

An important fact is that it was IBM that took the site down. This was not because of any performance issues.

The primary DDoS mitigation play was a geo-blocking strategy dubbed “Island Australia.” It was set up and activated to isolate the site from off shore attack, the typical source of most amplification attacks. This strategy was foiled when one of the network providers, NextGen (and its partner Vocus) inadvertently left open a pipe through Singapore, leaving the site exposed.

The site was shuttered after a 7:25 pm amplification attack appeared to camouflage suspicious malicious activity inside the installation. The attack was relatively small, but caused one of the routers to fail, leaving the site essentially naked. Closer analysis later revealed the Census site had not been infiltrated. The false positive was caused by a poorly programmed device IBM had set up to monitor security of the site.

But faced at the time with real evidence suggesting the site had been compromised, it was decided to bail. That was a real tough professional call, but imagine for one moment the outcry if the site had been kept open and the census data was hacked. My crude political assessment: Turnbull would have been toast.

Cyber security is enormously challenging for every organisation, government or otherwise. There are literally web shopping malls, that are as easy as Amazon to use,  to order up any sort of cyber skullduggery at ridiculously low prices.

Moreover web operators have to deal with a gaggle of vendors, all claiming their technologies and defence strategies are superior. This means any agency looking to put in place a robust cyber strategy  has to rely on multi-vendors and technologies each peddling distinctly different strategies. Because of the military and security culture that dominates within many vendors, it is also an industry that plays with its elbows up. When a major breach occurs, their arch competitors pile into grab market opportunities, and are not shy about claiming the alleged deficiencies of their competitors.

Against this backdrop governments (and corporates) need to be careful when assessing the various claims and counter claims of the various “experts”. This is challenging for the federal agencies, where each agency is broadly allowed to make its own calls, subject to complying with the base security manuals and in the future some sort of benchmark PM&C is developing.

The ABS for example, chose a few years ago to be exempted from the traditional gateway system that protects most Canberra agencies from malicious cyber attack. This meant it does it alone and in-house — a heroic call, given the sophistication needed.

This is a highly technical area and given the profile of the Census, why ACSC and ASD were not all over this program will also be an important question to answer. Turnbull has been singing the praises of ASD as an international leader — but leaving them to play in the dot mil space, has left the civilian APS very underdone for cyber smarts. Especially for a one-off large live event site, like the Census.

At a big picture level there are some serious questions to be asked about vendor management and vendor capture. In the technology world it is a perennial sport to blame the big system integrators when things go wrong. But if the banks, telcos and airlines have learnt anything from their large scale IT plays, it is that client agencies need a robust and informed governance system to over see the massively complicated world of system development and management. This means having the internal capability to understand the risks and design features of any system. The digital equivalent of not letting the generals run the war.

In the case of cybersecurity, this also means having a robust governance, reporting and testing environment and high level C-suite visibility and knowledge around cybersecurity issues. Not just for one off projects, but deeply embedded in the managerial capability of the agency.

By my observation, out side of the big agency players (ATO, DHS, Defence, Border Force), this serious approach to cyber governance and leadership just does not exist in most Canberra agencies. Ditto in almost all the state government agencies. That is a major  risk as we move headlong into large scale digitalisation.

Rapidly improving public sector cyber security is also not helped by the intensely secretive attitude the key agencies ACSC and ASD have to cyber security. We still have absolutely zero public learning agencies can use from last year’s Bureau of Meteorology attack . In the US breaches are reported as a matter of course which contributes to a much more sensible and mature discussion about the absolute reality of cyber security — that there are only two types of organisations in the world, those who have bee hacked and those who don’t know they have been hacked.

The appointment of former Australian Federal Police officer Alastair McGibbon as a special cyber security advisor is a good step forward, but as a country we need to be much more ready to admit vulnerabilities, if only to avoid some of the hysteria on full display this week. This inevitably means lifting the spend on cyber security. The two governments considered world leaders, Singapore and Israel, insist their public agencies spend well above ten per cent of their ICT budgets on security. In Canberra the equivalent cyber spend is estimated to be around two per cent.

Four: Census in a world full of data

The ABS was already pushing to have a once-in-10-years Census, but this was rejected by the Abbott government, which instead opted to stick to the old schedule with reduced funding. Why have a central census in a world where we are swimming in data collecting points — be they population changes, transport, health, employment, income etc etc etc. All in real time ! And most available to government right now.

Simply cloning the Census in a web format  also opened up risks that were not necessary. The most obvious  was the push to have fifteen million households try to fill out the census web form on the one night. This was the traditional way it was done when it was paper based, but as a web application created a large technical risk by pushing everyone to complete on the night. There would have been little statistical impact to spreading the completion period over say a week or two, or going state by state. As it turned out we have all been given to September 23 in any case.

Five: digital arrogance

The high profile failure of the Census site to work under attack will no doubt see many call a pause for the digital transformation agenda in government. Bit like the Italian priests of the 15th century trying to stop the printing presses.

But if there is a bigger lesson to be learnt, it is that we have to understand better how citizens are reacting to, and struggling with, the warp speed of change that digital is bringing with it. The legitimate concerns around privacy were badly misread and not anticipated by almost all — from the PMO downwards. This led to a high level of anxiety and arguably elevated the Census to the hacker’s honey pot it became. That same anxiety also arguably underpinned the decision to quickly bring down the site, amid fears of a network breach.

About the author
Newest Most Voted
Inline Feedbacks
View all comments
6 years ago

In the APS there has long been a culture of reverence for policy development and policy advice.

There is much less concern and respect for intellectual and practical skills essential to deliver programs, projects and procurement successfully, with excellent communication and risk management.

High level commercial capabilities are also now essential to collaborate effectively with the external providers that are increasingly essential for the support and delivery of services to the community.

These capabilities cannot be developed by training programs. They are learned primarily through experience, enriched by coaching and mentoring,

To improve performance the Government should reconsider and redefine the skills that APS leaders and organisations must have to perform well and how will assure the community that they do.

Geoff Edwards
Geoff Edwards
6 years ago

Tom Burton, you haven’t mentioned the reported budget cuts at ABS and the appointment of a reportedly non-content-rich economist as Australian Statistician.

When I engaged with ABS during my higher degree studies in c.2007, it was proud of its standing as equal-most highly regarded statistical agency in the world along with Canada. Something has gone seriously wrong with the management or funding of the organisation to lead to such a failure of public administration. Risk management and poor contract admin seem like proximate causes – what were the ultimate drivers?

6 years ago

I’m pretty sure IBM hasn’t been ‘best of breed’ for around 30 years – have you forgotten WestPac CS90s? (Australia’s biggest IT disaster to this day – by IBM)…and Telstra? When I worked there we called it a ‘work free smokeplace’. Best of breed would be: Amazon for cloud with Microsoft breathing down their necks. An ISP? NextGen maybe (they handled 1/2 the traffic for the census – without issue)

5 years ago

The following are a set of possible events which have been suggested by some supposed insiders.

1/ ABS did not use a comprehensive DDoS prevention services it just used a geoblock of traffic outside of Australia in the event of an attack.

2/ Unfortunately, the last attack hit them from inside Australia. This was a straight up DNS reflection attack with a bit of ICMP thrown in for good measure. It filled up their firewall’s state tables. Their solution was to reboot their firewall, which was operating in a pair.

3/ They hadn’t synced the ruleset for the secondary when they rebooted the firewall so the secondary did not work. This resulted in a short outage.

4/ Sometime later IBM’s monitoring equipment spat out some alerts that were interpreted by the people receiving them as data exfiltration. Already jittery from the DDoS disaster and wonky firewalls, they became convinced they’d been owned and the DDoS attack was a distraction to draw their focus away from the exfiltration. A common strategy often warned against.

5/ ABS pulled the pin and ASD was called in.

6/ The IBM alerts were false positives incorrectly characterising offshore-bound system information as exfiltrations.

7/ ASD undertook full incident response before website was live again hence the delay of two days.

Assuming the above points are heading in the general direction of the truth, it would appear that this specific problem was caused by failures in the project’s infrastructure area – both design and operations. Unless you can see the contract and related correspondence you would not know who may be to blame legally. However the takeaway message for me is that management both in ABS and suppliers probably did not pay enough attention to their technologists. They always need to ask:

Are the right technical people actually doing the right jobs properly and has someone independent confirmed all is ok to go live?

Is the cost of risk mitigation strategies (DDOS prevention) being correctly assessed against true cost of failure (was the right level of management involved in that decision and fully briefed)?

5 years ago

Enjoyed your article Tom. I’m hoping that we haven’t unintentionally clipped the government’s risk appetite for using tech in policy. Census 2016 was a failure but it was also a sign of progress and most telling a measure of how far yet we need to go to fully innovate government. There is no doubt that the ABS needs to get smarter but the best way to learn is by experience and they will certainly will have gained plenty from this last week. I argue we should probably encourage #censusfailfast

The essential resource for effective
public sector professionals