Disaster Recovery News

MDOT forges virtual machine disaster recovery plan

The Mississippi Department of Transportation (MDOT) adopted a two-pronged data backup and disaster recovery plan after virtualizing most of its physical servers, adding Acronis Backup & Recovery to protect servers along with CA ArcServe for tape backup.

MDOT has 3,000 employees at locations around the state, but backups are centralized at one of its three data centers. The department has a Hewlett-Packard (HP) Co. EVA Fibre Channel storage area network (SAN) for virtual machines and SQL databases, and Dell PowerVault iSCSI SANs a few miles apart for disaster recovery. MDOT uses Acronis for bare-metal restore and server backups, reducing its backup times to minutes, according to enterprise systems architect Clint Johnson.

MDOT’s data protection changed after it began virtualizing its Windows servers with VMware in 2009. By late summer 2010, MDOT had 125 virtual servers and Johnson said he plans to virtualize two-thirds of its 300 Windows servers by the end of the year.

“The biggest difference [with disk backup] is the whole snapshot technique,” he said. “We’ve basically taken a copy of frozen disk. You have to be aware of the effect that has on your backup. Before we had an agent installed on each machine and backed it up using Windows. For a full backup over Gigabit Ethernet, it now probably takes from eight to 12 minutes.”

With tape backup, Johnson said, MDOT would have to rebuild a failed server from scratch to restore it. That prompted him to look for a disk backup solution to supplement and eventually replace tape.

Johnson said MDOT began using Acronis for bare-metal restore on desktops in late 2009, then started using it for servers. This year he upgraded to Backup & Recovery 10 Advanced Server Virtual edition with Universal Restore and Acronis Deduplication, allowing him to install one agent on the host that manages backups of all virtual machines.

“Central management from the console was the missing piece,” he said. “It lets you take a copy of a machine while it’s running. Before, you had to manually schedule backup windows. We kept a spreadsheet that said ‘This backup starts at 1 a.m. and takes an hour, so don’t start the next backup until then.’ We had to constantly juggle our backup windows.”

Johnson said he plans to gradually eliminate tape. For now he still uses ArcServe to back up files, sending tapes off-site for storage. “We’ve chosen not to interrupt that process,” he said. “Our goal is to get rid of it, but we want to make sure we can stand on our own feet with Acronis and recover everything within a satisfactory period of time before we get rid of the off-site tape and courier.”

MDOT also wrote a script that enables all virtual machines under Acronis protection to be tracked and backed up, regardless of where they’re moved. Johnson said this lets MDOT restore its most critical machines to another VMware farm by using VMotion.

“I can take a piece of hardware out of commission in minutes and replace it with something better, then upgrade it or fix it,” he said. “I have some very critical systems that we don’t take down.”

iSCSI helps enable off-site disaster recovery

Johnson said two of the main data centers back up to the Dell PowerVault at the third data center for disaster recovery.

“We use iSCSI for the purpose of off-site backup for disaster recovery,” Johnson said. “In the event we have a crisis, we have a procedure for bringing up only the most critical systems. From the administrative data center, I can write across the Ethernet network to the off-site storage arrays. I mount the storage to the backup server.”

Johnson said bare-metal restore also plays a big role in MDOT’s virtual machine disaster recovery plan. “Mostly we make sure we have a bare-metal of each machine with a recent copy of every SQL database within a few hours of that location,” he said. “We don’t have a DR site where you press a button and everything’s running there. Our outages aren’t critical where we’d have to shift all running systems. We have a DR document with our most critical machines, and we can restore them to another VMware farm.”

As an example of what type of recovery he frequently needs, Johnson pointed to a print server he recently lost.

“It was a print server for about 1,500 users,” he said. “The hardware failed, and we didn’t have like hardware to move it to. We had to restore the physical machine. We already had backup using an Acronis agent on the physical machine. I went to the previous night’s backup and clicked on a virtual machine on the VM farm. It sent out hardware drivers and re-detected its hardware. I did another boot up, and it recreated the server. I cloned the physical machine into a virtual machine, spun the machine up, gave it a new IP address and started using it.”