![]() |
|
![]() |
![]() |
![]() |
Facility issuesCreated by Theresa Rowe (Oakland University) on January 3, 2007
We shut down our offices on 12/22 with the plan that we would have minimal physical need to visit our data center. We have an aging data center, with air conditioning units that are over 20 years old (actual age unknown). We just received replacement funding last October, but seasonal weather postponed the actual replacement project until spring. As we started planning for replacement, we learned about zinc whiskers and the need to plan to minimize dust contamination, so the extra time for planning has been valuable. During the holiday break, our lead UNIX sys admin, Andrew, noticed that email was not available about 5 PM on Christmas Eve. He went to the data center to check, and found the temperature over 102 degrees. Both AC units were in a failed state. Andrew called, and our facilities manager Gail and I went in. Some of the servers had gone into automated shut-down due to the heat. Andrew proceeded to shut down everything except the most essential systems. We keep fans around because we've experienced failures before, so we opened the doors, turned on the fans, and called in the contracted AC technicians. It was Christmas Eve, so it took a while for a response. Gerard the AC tech came in to find that the main electrical feed to the roof chillers was not working. We then had to call in the university electricians (another drive wait), who found the circuit breaker tripped. The breaker was reset, cooling was restored and we went home around 2 AM. Because we had no explanation for the cause of the breaker trip, we came in on Christmas Day, only to find the situation repeating. More work continued on Christmas Day and the day after, as our electricians found that the power feed from the 480 volt sub-station to the distribution panel developed a fault to ground and caused the breakers to trip. A temporary fix was created by re-wiring the AC units to another panel. In the following days, we have had a few servers fail and have had to switch to our stand-by servers. This is certainly a test-discovery of what is really "critical" and do we have the right standby systems. Another mini-test of our disaster recovery plan! We now have to add a permanent electrical fix to the AC replacement project. We certainly will want this wired differently. A few nice things to have around -
What we needed but didn't immediate have -
One challenge was finding and waiting for the needed staff with expertise to work on and solve problems. We'll also be following up with our heating/cooling staff, who can now monitor the room temperature from their location. We didn't know that the service was available and had actually been planning to install our own system.
|
![]() |
|
| Unless otherwise noted, EDUCAUSE holds the copyright on all materials published by the association, whether in print or electronic form. In certain cases the work remains the intellectual property of the individual author(s) (see Special Circumstances). Content from conference speeches, presentations, blogs, wikis and feeds reflect the opinions of the author, and not necessarily those of EDUCAUSE or its members. | |||