Feature
posted 11 May 2004 in Volume 7 Issue 1
From the brink of disaster
Recent terror threats, together with the impact of 9/11 and the more recent Madrid bombings, have made firms far more aware of the necessity of disaster-recovery planning. Carolyn Lees, IT director at Kennedys, particularly focuses on the IT perspective of disaster recovery to suggest some effective action plans for facing the future with confidence.
Imagine the following dialogue between an IT director and managing partner:
ITD: “In the event of a major disaster, what would you be willing to lose?”
MP: “Well, nothing really, or perhaps we could do without the photocopiers.”
ITD: “So we should include all of our services, except photocopiers, within how many hours?”
MP: “Well, being realistic, our clients would expect us to be up and running after 24 hours. We have legal obligations to fulfil.”
ITD: “OK, if we want to put everything back in place within 24 hours, it will cost us £2m per year.”
MD: “How much?!”
This is not a far-fetched scenario. What are we willing to pay? Of course, our awareness of threats to our business has increased since 9/11. After all, the cost of the physical damage alone was estimated at $19bn, but the ripple effect of loss that spread over numerous economic activities ran into thousands of billions.
An efficient and effective disaster-recovery plan utilises resources in the best way possible and delivers the desired outcome. A cost-effective plan has to ensure that the value-for-money element is built into the various stages of the planning cycle. Blanket statements, such as the one made by our imaginary managing partner, that all services are important, reveal the extent of the task at hand. Before a plan can be formulated, assumptions have to be challenged, risks assessed, impacts estimated and money made available. These are the many building blocks of a complex puzzle. Let’s start by looking at the type of risks a business faces and work our way through the impact assessment and the elements that can deliver efficient and cost-effective disaster-recovery strategies.
Terrorist attacks have raised awareness of the possible impact of a major disaster on our own businesses. The loss of lives and business opens our eyes to our vulnerability in the face of not only horrific terrorist attacks, but also any form of natural or other disaster that could affect us. Most recently, the Madrid bombings were a warning not to let the disasters of yester year fade into a distant memory. It is a risk that none of us can afford to take.
In the US, the investment in disaster recovery following the 1987 stock market crash and the 1993 World Trade Center bomb led to much higher levels of business continuity than existed previously. Cantor Fitzgerald, which lost 680 out of 1,000 employees was operational for bond trading two days after 9/111.
The Confederation of British Industry is urging the government to put more pressure on companies to deal with business continuity. After all, statistics show that there is a less than ten-per-cent survival rate for those without a plan2. Many businesses, including our clients, are insisting on stringent business-continuity plans being in place as part of the terms and conditions of appointment. Do we need more reasons to act? While terrorist attacks may be prevalent in our minds, they are not our only source of vulnerability:
-
Virus attacks threaten us daily and could bring our networks to a standstill;
- Hackers who delight in causing disruption are itching to gain access to our networks to cause the maximum amount of damage;
- Damage from fire or floods can cause large-scale disruption;
- Power failures lead to disruption and loss of data;
- Server or communications failure.
What premium do we pay to cover that risk and what is the risk in financial terms?
Calculating the financial impact of a disaster or interruption to business is not an exact science. Even applying standard guidelines of what to include in an analysis of impact cannot reveal an exact amount. If all of our services are important to us and we expect, as per our managing partner’s statement, a return to business after 24 hours, do we calculate the loss of business as a daily proportion of the balance-sheet value of the business today? If so, this does not include the intangible assets that we have not recorded on the balance sheet, such as our intellectual capital, the consequential loss of business, the damage to our reputation and so forth. The final result of our business-impact calculation will not make for happy reading. The notion of an annual £2m premium may already not seem quite so daunting.
From an investment point of view, the majority of costs to cover business continuity concern IT. Of course, there are a host of non-IT related issues that need to be addressed, such as the PR strategy in the event of a disaster, access to bank accounts, the formulation of an internal-communications strategy and so forth, but the bulk of the cost covers that fundamental IT building block.
Assessment of our risk must take our stakeholders’ interests into account (see figure one). However, a plan cannot fully accommodate all stakeholder interests. For example, the library manager will deem it critical to have access to online catalogues; the training manager to training records; and the HR manager to personnel files. The essential starting point is to address:
-
Who are your main stakeholders?
- How important are the stakeholders to the business?
- What are their interests?
- What are the IT systems that are critical to uphold the main stakeholders’ interests?
- How quickly do we want to have which critical IT systems up and running again?
- Which IT systems are not critical?
- How quickly do we want to have non-critical IT systems up and running again?
Establishing what is critical relies on us understanding what the stakeholders require and which IT systems meet those requirements. Making a distinction between critical and non-critical systems allows the firm to apply the disaster-recovery strategy at two levels:
-
How do we provide critical services within x hours?
-
How do we provide non-critical services within x+y hours?
On completion of the risk assessment, the impact analysis and the definition of critical and non-critical services, we can move onto the next stage of the process – putting the plan together. Now, we need to look at our resources, the required outcome, and the efficiency factors that can be built into the plan.
The telecommunications infrastructure of your business is central to this plan and should act as a starting point as without it resources will not communicate. Your network architecture shows your current communications map. Structures can vary from single office to multiple national or international office sites.
The central-communications infrastructure then has to be adapted to accommodate the identified critical services in the most resourceful way possible.
For many years, businesses have used the term ‘virtual’, describing a simulated function that would normally have a physical manifestation. An example of this type of service in the legal sector would be an online deal room. The virtual environment has opened up many more possibilities of transparent boundaries. In the 80s and 90s, the term ‘outsourcing’ more often than not referred to the uprooting of an entire IT department into the hands of an outside supplier. The IT industry had not developed sufficiently to offer the level of fragmented outsourcing services that we are able to purchase from a variety of vendors today. Nowadays, we can easily operate beyond the boundaries of our physical structure to take advantage of the many services that suppliers offer to support our IT operations. Vendors are competing heavily for services from Application Service Providers (ASPs), off-site storage and off-site back-up facilities to name but a few. We make our selection of what we want to outsource and what we don’t. It is here that we have to apply our assessment and knowledge to make our disaster-recovery investment work.
Our approach to critical services should be such that we look for a business-continuity strategy that we can apply to the shortest disruption, as well as to a serious disaster.
In figure one, we can see that the list of critical systems consists of e-mail plus diary, DMS, virus protection, CMS, remote access, accounts system, back-up services, firewall and internet access. Applying the outsourcing model, the firm can build resilience into its infrastructure upfront. Instead of using expensive back-up tapes, which, in addition to the tapes themselves, have high maintenance costs, an off-site vaulting service can be used. Restoring files can be tested regularly and with ease, thereby ensuring reliability. Data-storage requirements can be easily negotiated in a competitive-suppliers market. This means that your data is secure and you can rely on its integrity for restores. If an incident requires the rebuild of a critical server, then you can restore data across your back-up line without having the delays and anguish of restoring tape.
Virus protection for e-mails can be provided by outsourced services such as Messagelabs, so that you know that infected e-mails never hit your system. Virus protection from infected files should be provided by different scanning engines so that you maximise your protection. Use one product for desktop protection, another one for HTML protection and a different one still for additional in-house e-mail virus protection. Install an intrusion-detection system so that you know what is happening on your network before it can affect any services. Make the investment pay off by providing the highest possible level of cover during your day-to-day operation, adopting the adage that prevention is better than cure. After all, a serious virus attack can bring your entire network to a standstill.
E-mail is undoubtedly considered the quintessential IT service of today. Ensuring that you can provide hardware, as well as software resilience, is paramount in business-continuity planning. We all know that when the e-mail system fails, the phones in the IT department jump off the hook. A cluster solution is one way to provide hardware resilience, but software errors occur more frequently. It is, therefore, equally, if not more important, to address this by investing in replication technology. Ensure that the replication server sits off-site and build it into the disaster-recovery strategy. That way you can address small disruptions as well as long outages with equal efficiency.
The data link, if running a replication service to an existing site, must have sufficient bandwidth to accommodate the traffic flow. Disaster-recovery sites will need to have a dedicated link.
Model one only focuses on a small component of a much larger picture. What, for example, would happen if firm A suffered a bomb attack? The server at the disaster-recovery site would be idle, disconnected from its source. The picture, therefore, has to be expanded to take into account the central component of any effective disaster-recovery plan: the telecommunications infrastructure.
Continuing with the example of the e-mail service as above, model two utilises a second disaster-recovery site (separate grid and telephone exchange) that can access the e-mail server, but is also already vaulting back-up data on a daily basis from the main site. The expanded model shows that we can spread our risk while utilising the investment – in this case, the link to the back-up service provider. You could add replication services for critical systems at either site. You can also make arrangements for hardware-standby servers at the back-up site for fast data restores.
The basis of safety in a disaster lies in data integrity and distance. Data integrity can be checked through regular testing. Distance is provided by the most expensive building block in a disaster-recovery plan, namely the telecommunications infrastructure. The focus of utilising resources efficiently, therefore, requires maximum usage of your telecommunications.
If your wide-area-network topology is based on routing all traffic via a central IT hub, then the connection from a disaster-recovery site to other offices has to be arranged, otherwise the failure of the IT hub will bring down all other offices. Question whether your telecommunications infrastructure is flexible. Does it rely on expensive leased-line connectivity? If so, consider the option of using VPN services, such as BT Equip, which will allow any location to ‘join the cloud’ and talk to any other office. It does not rely on dedicated inflexible point-to-point leased lines.
Many firms consider using existing offices as disaster-recovery sites rather than dedicated hot sites. Would it be an effective measure if a London firm had another office site safely tucked away in the country where the threat from a terrorist attack would be highly unlikely? Our first reaction would, of course, deem it a sensible option. The office probably exists on a different electricity grid and telephone exchange. The difficulty arises when you start to think about what it is you want this extra site to provide as a back-up. If you have set up another office site to act as a replacement ‘hub’ for your IT department, then that in itself requires enormous investment. All the logistical essentials of an IT computer room would have to be replicated: back-up power supply; fire and smoke alarm; and air-conditioning. Add to that the need to accommodate key staff from another office. Does your other office site have unoccupied space to accommodate these staff and if so, have you calculated the cost of this unused office space or do you need to oust other staff to make room for the key personnel? Have you calculated the cost of the accommodation expenses of the key personnel, as the new office may well be too far from home to commute every day? Are there sufficient telephone lines to deal with the diverted calls? These are all practical and critical considerations that need to be taken into account when you think about the efficiency of the plan.
A professional disaster-recovery hot site can provide the infrastructure required for restoring critical and non-critical services efficiently. If you run replicated services to a hot site, then the outage is dramatically shortened. If you use your back-up site for server space and restores, your restore time is dramatically reduced. Spread the risk and, as long as your supplier arrangements have been negotiated properly, and you have the communications infrastructure in place, you will not incur any unnecessary delay in restoring critical or non-critical services.
The effective element of a disaster-recovery plan ensures that the interests of the main stakeholders are addressed. An efficient plan then ensures that there are acceptable levels of continuity for all types of outages, ranging from short disruptions to major catastrophes. Making the resources work for us during our daily operations provides us with peace of mind as we can see the processes work. A resilient and flexible telecommunications infrastructure is an essential building block in a successful disaster-recovery plan. Testing the plan is the final and crucial element.
References:
-
Colorado University Response Report #140
-
Touche Ross
-
Software Solutions
Carolyn Lees is IT director at Kennedys. She can be contacted at: c.lees@kennedys-law.com.
denotes premium content | Sep 6 2008 



















