Service continuity: national auditor urges drills for disruption

By Harley Dennett

November 12, 2014

It’s not just terrorism that threatens the safety of staff and continued operation of government services, the federal Attorney-General has warned agency heads in a new directive. Bushfires, floods or a simple power outage can have profound and costly impact on essential ICT systems, employees and families in the community.

Preparation is key: the A-G reminded agency heads to apply the Protective Security Policy Framework and promote protective security as part of their agency’s culture. The directive stated this would build trust with Australians and international partners, with an added bonus:

“A progressive protective security culture that engages with risk will foster innovation, leading to the increased productivity of Government business.”

The Australian National Audit Office agrees. In the latest in its audit series on Business Continuity Management, released last week, the ANAO looked at three agencies for their preparedness and openness to learning from past incidents. More than just dealing with root cause, Auditor-General Ian McPhee said agencies must consider and address the impact of interrupted operations on the community:

“When a disruption occurs, often an entity will initially activate emergency response or disaster management arrangements to ensure safety of staff and assets. However, entities also need to have arrangements in place for the continuity and/or resumption of essential services and ultimately return to business as usual.”

As expected, all three agencies — the Civil Aviation Safety Authority, Department of Finance and the Department of Social Services — had taken the core steps to complying with the PSPF:

  1. Establish governance structures;
  2. Assess risks;
  3. Identify critical functions, services or assets;
  4. Undertake Business Impact Analysis (BIA); and
  5. Develop a Business Continuity Plan (BCP).

What the ANAO set out to do was see how effective those steps were, as all three had experienced a number of business disruptions since January 2010 — ranging from minor and inconvenient like partial evacuations, to the significant like week-long office closures due to extreme weather events. The audit found many examples of good work being done:

“Finance’s approach was the most structured, providing a clear line of sight between the 17 functions it identified as critical and the action that would be undertaken to recover in the event of disruption, including key dependencies and resource requirements.”

Indeed, Finance has incrementally improved its responses using post-incident reports to assess disruptions, in particular identifying the reasons why the BCPs were, or were not, initiated during an incident. Using an in-house template for its post-incident reports, Finance has encouraged discussion of the impact of disruptions, actions taken to resolve the problem, and further actions to improve continuity in the future.

Business continuity management structures and key responsibilities (Source: ANAO, adapted from entity BCM frameworks)
Business continuity management structures and key responsibilities (Source: ANAO, adapted from entity BCM frameworks)

Capturing disruption experience

In August last year, when an overnight water leakage from a tenant on the above floor flooded a server room, Finance’s communications infrastructure was put at risk, threatening several critical projects. The BCP for the service centre and switchboard was enacted. Although the incident only lasted a day, the post-incident report included a wealth of detail for enabling analysis, including a detailed event log, staff impact, root cause, corrective actions, implications for IT planning and communications, as well as a summary of issues arising from the incident, assigned actions and due dates.

To prepare for incidents that have never occurred before, Finance tests its response with exercises both entity-wide and at the critical function level. In 2012-13 the department ran “Exercise Sparky”, a practice run in the event of a major power outage to its office buildings and tenancies.

In the first phase the central control team declared a business interruption event and activated the BCM arrangements. In phase two, the communications strategies and decision-making processes used by the central control team and enabling services advisors were activated to test the co-ordinated response to the scenario. Phase three had the enabling service advisors separately convening members of their recovery team to practice a response. The audit found significant value in such exercises:

“Exercise Sparky provided the executive board and BCM stakeholders with insights into the level of assurance Finance’s BCM procedures provide in the event of a business interruption event. The exercise also highlighted some areas for improvement to be followed up.”

In addition, Finance tested individual branches and external entities such as the Reserve Bank of Australia — with its consent.

Due to its comprehensive and methodological approach to internal assessment, Finance escaped any major recommendations from the Auditor-General.

The Department of Social Services also conducted exercises, including the “Iron Triangle IV” scenario, which imagined a situation where malware was discovered on key funding and grant management systems, resulting in system outages while the team isolated the problem and sanitised the systems. But not all went as smoothly as hoped.

Social Services found it could improve communication and ensure the crisis response team considered wider implications for the community as a whole. Further, there needed to be clearer understanding of strategic issues, such as whether BCPs are adequate for the circumstances, and consideration of mission critical activities and their maximum acceptable outages in priority order.

The ANAO suggests agencies monitor overall preparedness by regularly testing BCPs through exercises, reporting on real-life examples of disruptions and how the agency responded, but also critically examining whether business impact analysis has an up-to-date priority of critical functions and assets. It recommended also looking upstream for hidden and bottleneck dependencies that could indirectly impact a critical system:

“It is important for entities to identify the activities and resources that support critical business processes, as well as internal and external dependencies. Then the entity can analyse the consequences of business interruption. Finally, prioritisation of the key processes enables the organisation to apply its limited resources in the most effective manner.”

About the author
Inline Feedbacks
View all comments
The Mandarin Premium

Insights & analysis that matter to you

Subscribe for only $5 a week


Get Premium Today