Conferences & Events
Events for all Levels and InterestsStay
Jump Start Your Career GrowthStay
Get on the Higher Ed IT MapStay
Uncommon Thinking for the Common Good™Stay
Was wondering if anyone else out there had significant issues with the freak October snowstorm this past weekend.
Drew was hit particularly hard. We were in an area of New Jersey that got a lot of wet, heavy snow, and we have thousands of large oak trees that lost huge limbs. In addition, we have an interesting situation with electrical power--the town of Madison has its own electric utility and we get our power from them, and they in turn bulk purchase electricity from JCP&L, who had hundreds of thousands of customers without power.
Drew's power (and the entire town of Madison) went out on Saturday afternoon around 4:30. All our server rooms have backup generators and switched on cleanly and remained running all day Sunday. Power hadn't been restored by Sunday night and the University made the decision to cancel classes, recommended students go home if possible, take a friend who could not easily go home with them, and the remaining students were shipped to a nearby school and slept on cots in the gym.
Monday a few of us in IT went to campus to determine the scope of the situation. The generators were still running, and we were running temporary power to some switches so we could do work as one ran out of fuel early in the morning. We were able to migrate most essential services out of that data center before the UPS power gave out, and most things were shut down cleanly. However, that room also houses our Internet connection and thus we were offline for about half an hour until our (excellent) Facilities staff refueled the generator from 5 gallon yellow diesel cans (we have a diesel storage tank on campus, but no fuel trailer to transport the fuel). The 30 gallons they put in would last us at least 8 hours, we were told. The other generators were checked and had enough fuel to last overnight if necessary. At any rate, a fuel delivery was made that afternoon and all the generator tanks were topped off.
Power was restored to campus Monday afternoon, but it turned out only 2 of 3 feeder cables were energized. As we were in an undervoltage situation, the generators didn't shut off. We made the decision not to manually switch the server rooms back to utility power.
Tuesday morning, however, the power situation changed. One part of our campus loop started behaving oddly--it turned out some high voltage equipment was damaged but we didn't know that at the time. At any rate power started fluctuating wildly, and for some reason the generator supporting that data center turned off and wouldn't reactivate. We had about 45 minutes of our server room UPS switching on and off utility power, and we were afraid it was going to get damaged. We again migrated services out of the room, turned off our redundant disk array in the building, and shut down the core network switch to avoid any damage. The equipment was down for several hours while Facilities brought in the municipal electric authority and high voltage contractors to assess the loop and make temporary repairs to restore normal voltages and service. Permanent repairs are awaiting the delivery of parts--high voltage equipment is not something electric supply places just have in stock.
A brief assessment shows that we had little or no permanent damage due to the power issues. We may be having issues with a UPS powering an aggregation switch but it was also 6 years old and in need of replacement. Most importantly, except for the brief generator outage and the corresponding loss of our Internet connection, we had little or no disruption in services for the duration. The University remains closed tomorrow, with students asked to return after 4pm Thursday for regular Friday classes. It is likely we will extend the semester, which was previously scheduled to end December 14th, but no official determination has been made.
I'm extremely proud of my staff and colleagues who came in over the last few days to respond to issues as they were occurring and our response likely avoided damage as well as downtime. I'm also thankful for our Facilities staff who were all working 24-7 keeping things together. This was the longest unscheduled shutdown we've had, far worse than Hurricane Irene, and longer than when our main administrative building was destroyed by fire in 1989. The power issues we experienced were unprecedented. Interestingly, I did not see many people referring to our university emergency response manual, and most of our response was coordinated by a few mid-level administrators who knew their areas and made the hard decisions. We were prepared to shut things and move to an emergency website but we never had to activate that option. We used Facebook, Twitter and campuswide email to communicate technology issues, and the University used our emergency notification system to great effect.
It's still too early to talk about lessons learned but I hope we have the time to sit down and do so, and although it's unlikely we'll have this scenario occur again, we probably have gained knowledge that will help us for the next unimagined crisis.