Events for all Levels and InterestsStay
Jump Start Your Career GrowthStay
Get on the Higher Ed IT MapStay
Uncommon Thinking for the Common Good™Stay
EDUCAUSE Enterprise 2006. Summary: Business Continuity Planning
EDUCAUSE Enterprise 2006. Summary: Business Continuity Planning
Business Continuity Planning
Brian D. Voss, CIO, Louisiana State University
May 25, 2006
Business continuity planning has become a major resource demand in IT over the past few years, made more urgent by the hurricane disasters in the Gulf region this past year. As CIOs and institutions take stock of the potentially broad impacts of threats from natural disasters to the possibilities of environmental accidents, terrorist attacks, and pandemics, the IT community is coming to realize that business continuity planning is more than an IT disaster recovery plan. This discussion will provide a framework for developing or refining your own college's business continuity plans and the IT disaster recovery plans within them.
Brian Voss stated that traditional disaster recovery asks “What if my data center is lost?
In the aftermath of Katrina, Rita, and other disasters, he suggests that broader disaster recovery is needed. The new questions are “What if my campus is lost? and “What if the city where my campus is located is lost?” Additionally, we should think in terms of survivor disaster recovery: “What if I am fine, but everyone around me is not?”
Voss asked “Do you have a workable DR plan? Have you had to use it? How’d that go? Do you drill on this plan?” It seems today that no institution is happy with its traditional DR plan.
The second line of queries was “What has been your experience with your administration’s responsiveness to DR/BCP? Has it changed since 9-11 or Katrina/Rita?” What’s the risk vs cost to do BCP? How do you talk to your administration about the issues? For some this depends upon where you are located. Vermont, for example, doesn’t perceive the same risk as coastal or heavily populated areas.
Ohio State recently did a DR audit and decided something had to be changed. They gave the responsibility for campus-wide DR to the CIO. They do an annual drill exercise and each year it is a different scenario.
Voss, with his significant personal experience in Louisiana, made the point that it is the data that is important. Hardware can be replaced; data cannot be. If we lose our data centers, the need for rapid response will be acute and immediate, since our institutions can not operate without IT.
Regarding broader disaster recovery, Voss noted the following questions that should be resolved:
- Are our off-sites conveniently (and perhaps tragically) close? If so, perhaps they should not be.
- Do we have arrangements to get key services restored at a distance? What’s critical to the operation of our institutions? Web, email, financial/HR, etc.
It was the general consensus for most that, when disaster strikes, people in administrative positions don’t remember that they’ve been unwilling to talk about or fund Business Continuity Planning (BCP) in the past. While no one is thankful for 9-11, Katrina, or other recent disasters, most are happy to see that these have triggered the serious involvement of internal affairs units and the requirement that BCP audits be conducted regularly.
Two Key Lessons Learned
- People are our most key resource – but expect them to be burdened with other priorities
- Knowing what we’ll need to do and having it organized is more important than knowing all about “how” we’ll do it when we get there.
Voss suggested considering a data center lifeboat plan. He was able to fund one for $82K. The lifeboat plan is essentially a response to the question: What if we had a very short notice (4-8 hrs) of an impending disaster? What do we ‘scoop up’ and take with us in a 8’X12’ van?
Key things to recover and key things to address:
- The value of a flexible and capable staff is paramount. But we have to remember that they are dealing with unimaginable demands. How do we prepare them? How can we support them in times of crisis?
- Is there a stock of equipment to set up a large support operation, perhaps in another location, in short order?
- How will we do our DR/BCP on top of our normal jobs, as campus life resumes and student enrollment increases?
- How ready is our campus administration to take on the role of disaster response center? They may not have a choice. Will we be working side-by-side with them? Everything is affected by IT and its support. Will we be sitting at the “big” table?
- Having a good stock of networking equipment and mobile and desktop computing available can be helpful in supporting not only our own institution but other DR teams that set up support programs at the institution.
- Having strong relationships with key vendors can be helpful in times of crisis.
- Architectures count in DR/BCP. How divisible are the components of your systems? What’s removable as a component and what’s too tightly integrated?
- Be prepared to be flexible: adapt, improvise, and overcome. And remember that we can’t have thin-skin in times of crisis. Keep your friends close. Don’t have enemies.
- Everything we’ve been saying about the strategic value of IT is valid: IT enables everything in the 21st century. Does our administration understand this and understand that DR/BCP is not a luxury?
Getting Started – based on Voss’ experience
- Work on a basic nimble document
- Address the 3 disaster levels
- Request funding (and seed it with funding from your own budget)
- Rather than relying upon hastily assembled one, constructing a “permanent” EOC [LSU’s cost $150K]
- LSU Chancellor put out a call for DR BCP plans across key campus units. They met with EOC commander to review plans and coordinate preparation
- Changed the location of their server backups from Port Allen to higher ground farther away.
- The organization focus on DR/BCP created an IT officer with IT policy responsibility
- Started to update disaster plans (from 1984)
- Provided for payroll contingency – able to continue salary payments
- Moved long distance offsite storage to another location
- Working on the security of tapes with breach notification law
- Established LSU rapid recovery site 100+ miles away (not south)
- Increased stand-by presence
- Formal hot –site contract for mainframe support
- Email emergency service contract implemented.
Taking tapes off site – what is “safe? Are we encrypting to protect it? This is another cost that needs to be figured in to the budget.
- Conduct a risk evaluation and business impact analysis
- Define and prioritize your mission critical systems.
- Decide what must be up within 24 hrs
- Identify your backup/recovery site
- Vendors might be able to provide offsite storage of mission critical backup tapes, remote data centers, and temporary office locations
- Consider co-sourcing or reciprocal agreements with other regional higher education institutions for facility and equipment use
- Develop a plan with your key hardware vendors to readily replace any damaged hardware/communication systems.
- Develop and document a communication and contact plan
- Be wary of wireless – cellular circuits can quickly become overloaded and unavailable during a regional or national incident.
- At your centralized command/communication center use a variety of communications links such as web, cell, fax, landline, radio, sitckynote bulletin board, etc, with the hope that some of them will still work during and after the incident.
- Don’t forget about the people-side of your institution.
- Finally document and distribute your plan
Voss noted that in a disaster the rules and the plan “go out the window.” We must focus on the process of planning and not on the plan itself, examining how we will position ourselves and our assets to be flexible in responding to a disaster. We should focus on knowing what will need to be done in the first stages, what we’ll need to do those things, and who will do them. Plan to be flexible. Plan to improvise, adapt, and overcome. Drill on these things.
Each disaster is different – They say that New Orleans will be evacuated 8 times this year because people are living in temporary trailers that can’t withstand smaller storms. Many things are now patched and not as in the good shape they were in before the first major catastrophe – so they will go down quicker the next time.
Three musts (LSU cost - $180K initial - $80K year)
- Web site – back up in 4 hrs
- Email – back in12 hrs
- Produce payroll – within 24 hrs
How deep does the information in the plan go and to whom do you distribute it?
Voss responded: Thumb drives will have phone numbers and who can do what at all management levels. Exercising the plan is good, but if something actually comes up, LSU will do the plan not go back and read it.
Again: the three most important points were
- Hardware and facilities can be replaced in the periods following a disaster
- Data is the primary focus of what you need to be prepared to restore and the basis of continuity
- People are your most key asset
IT Personnel are first responders and we should think of them as a family unit not as an individual in these kinds of circumstances. We may have people desert the scene to go home / away from the disaster.
Everyone on campus uses and relies upon IT today. If anyone is on campus our people need to be there. Are our folks ready to be first responders? It can be emotionally stressful. Is the CIO/CTO prepared to deal with this aspect of the role of IT and IT Staff?
If the campus is evacuated how to do we continue to support learning, or if quarantined, how long can our people stay?
From the lessons learned by NASA disasters: we have had a failure of imagination
We need to imagine the questions first so that we can find the answers
As a community we need to as seek answers. How can we leverage national cyber-infrastructure for broader approaches to DR/BCP?
Voss closed by saying that CIOs can no longer say they can’t imagine what could happen because it just did
- earthquake, tsunami, terrorist, accident, pandemic
Brian Voss has posted his presentation slides to http://www.educause.edu/upload/presentations/ENT06/SESS13/ENT06%20BCPPKE.bdv.ppt