NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
BaJohn
Mar 11, 2015Virtuoso
Failures of various RAID modes.
I'm intrigued by the failures. dbott67 wrote: ........ and 2 multiple disk failures where I had to replace the drives and restore from backup. In each case, I was able to recover without dat...
dbott67
Mar 12, 2015Guide
Hi BaJohn (and StephenB),
With respect to the ReadyNASes, the one dual-disk failure was on a ReadyNAS Pro4 with 3 x 1 TB drives, IIRC configured in X-RAID2 (essentially a RAID5 expandable array). When the owner replaced the suspect drive (it had thrown a bunch of re-allocated sector errors) a second drive barfed during the resync and the unit went into life support mode.
I came onsite and a did a bit of troubleshooting, but determined that it would just be faster and easier to restore from backup. I had configured his offsite backup unit (kept at home) to be exactly the same as his work unit and we were doing nightly RSYNC backups, so we dropped it on his work network and they were back in business in about 5 minutes. Then, I reconfigured the other NAS with replacement drives and restored the data and configuration. After the backup job restored the data, he took his home unit back home and everything has been working smoothly since (a couple of years now).
The other failure was probably recoverable, but I was at the proverbial cross-roads, so to speak. During the RAIDiator development, there was a version that was released that required a factory default in order to take advantage of new features and I suffered a glitch that put my NAS in life support mode. Seeing as I had a complete backup, I decided that maybe the powers above wanted me to do factory reset and now was as good a time as any. Not a big issue for me, as it was on my home unit and I had a complete backup.
With respect to the failed servers, we had a few different problems.
1. Mail Server - Dell PowerEdge 2850 with PERC controller, RAID5 (3x146 GB) - Looks like the RAID controller died and seeing as I had just ordered the new integrated blade/SAN server (see below), I decided to restore onto a backup server and then migrate to VM upon arrival of new equipment. Re-installing the OS, mail software and restoring the data on the backup server took a few hours, but it was a minor inconvenience. When I migrated to the new VM hardware, I did fresh installs of the OS (Win2012-R2-DCE) and again restored from backup, and then cutover from old server to new VM with just a couple minutes of downtime to capture the last delta, restore and then swap IP addresses.
2. Phone PBX - last week, the solid state drive in our Mitel 3300 PBX died. I sent a copy of the backup to our vendor and they staged a new PBX for and then couriered it back to us. I dropped the replacement on the network and we were back in business.
3. SIP Server - last fall, one of our other Dell servers failed to start after a reboot. Again, I just fired up a backup server and restored the config and we were back online pretty quick.
At my place of work, we have some redundancy and resiliency, but we can't really afford some of the solutions that offer next-to-zero downtime with duplicate hardware, etc. We can tolerate a few hours of service interruption (i.e. our UPSes only offer around 30 minutes of protection) or the occasional failed hardware. In the event our main ILS system goes down, we have provisions to allow us to continue working and then upload the transactional data to the server when it comes back online.
What we can't afford is lost data, so I take great effort in making sure that the data is backed up and replicated off-site.
I recently purchased an integrated 4-bay blade server with 25-bay SAN storage. Currently, I've got it configured with 10 x 900 GB SAS drives in RAID50, with 2 hot spares for a total of 12 drives. This unit is hosting 8 VMs which are backed up to a Dell AppAssure DL-1000 appliance. I also backup just the data from various servers using RSYNC to the ReadyNAS 2100's and then replicate offsite. The DL-1000 provides a complete image of the VM plus snapshots every 3 hours.
The RSYNC backups allow me to quickly recover files at a granular level (plus 6 months worth of snapshots), as well as from a major catastrophe in our data centre (such as a fire) although it might take some time to recover, as we would have to order new equipment, stage the servers, restore the data and find a datacentre to host everything.
Below is the Dell VRTX with 2 x M620 blades (total 24 cores, 128 GB RAM) and 8 x 900 GB SAS drives (I've since added 4 more). The top unit is the Dell AppAssure DL-1000, followed by the VRTX, the ReadyNAS 2100 and then a few of our Dell PowerEdge 2850/2650/2550 servers (there are 10). Most of the PowerEdge servers have been or will be migrated over to the VRTX.

With respect to the ReadyNASes, the one dual-disk failure was on a ReadyNAS Pro4 with 3 x 1 TB drives, IIRC configured in X-RAID2 (essentially a RAID5 expandable array). When the owner replaced the suspect drive (it had thrown a bunch of re-allocated sector errors) a second drive barfed during the resync and the unit went into life support mode.
I came onsite and a did a bit of troubleshooting, but determined that it would just be faster and easier to restore from backup. I had configured his offsite backup unit (kept at home) to be exactly the same as his work unit and we were doing nightly RSYNC backups, so we dropped it on his work network and they were back in business in about 5 minutes. Then, I reconfigured the other NAS with replacement drives and restored the data and configuration. After the backup job restored the data, he took his home unit back home and everything has been working smoothly since (a couple of years now).
The other failure was probably recoverable, but I was at the proverbial cross-roads, so to speak. During the RAIDiator development, there was a version that was released that required a factory default in order to take advantage of new features and I suffered a glitch that put my NAS in life support mode. Seeing as I had a complete backup, I decided that maybe the powers above wanted me to do factory reset and now was as good a time as any. Not a big issue for me, as it was on my home unit and I had a complete backup.
With respect to the failed servers, we had a few different problems.
1. Mail Server - Dell PowerEdge 2850 with PERC controller, RAID5 (3x146 GB) - Looks like the RAID controller died and seeing as I had just ordered the new integrated blade/SAN server (see below), I decided to restore onto a backup server and then migrate to VM upon arrival of new equipment. Re-installing the OS, mail software and restoring the data on the backup server took a few hours, but it was a minor inconvenience. When I migrated to the new VM hardware, I did fresh installs of the OS (Win2012-R2-DCE) and again restored from backup, and then cutover from old server to new VM with just a couple minutes of downtime to capture the last delta, restore and then swap IP addresses.
2. Phone PBX - last week, the solid state drive in our Mitel 3300 PBX died. I sent a copy of the backup to our vendor and they staged a new PBX for and then couriered it back to us. I dropped the replacement on the network and we were back in business.
3. SIP Server - last fall, one of our other Dell servers failed to start after a reboot. Again, I just fired up a backup server and restored the config and we were back online pretty quick.
At my place of work, we have some redundancy and resiliency, but we can't really afford some of the solutions that offer next-to-zero downtime with duplicate hardware, etc. We can tolerate a few hours of service interruption (i.e. our UPSes only offer around 30 minutes of protection) or the occasional failed hardware. In the event our main ILS system goes down, we have provisions to allow us to continue working and then upload the transactional data to the server when it comes back online.
What we can't afford is lost data, so I take great effort in making sure that the data is backed up and replicated off-site.
I recently purchased an integrated 4-bay blade server with 25-bay SAN storage. Currently, I've got it configured with 10 x 900 GB SAS drives in RAID50, with 2 hot spares for a total of 12 drives. This unit is hosting 8 VMs which are backed up to a Dell AppAssure DL-1000 appliance. I also backup just the data from various servers using RSYNC to the ReadyNAS 2100's and then replicate offsite. The DL-1000 provides a complete image of the VM plus snapshots every 3 hours.
The RSYNC backups allow me to quickly recover files at a granular level (plus 6 months worth of snapshots), as well as from a major catastrophe in our data centre (such as a fire) although it might take some time to recover, as we would have to order new equipment, stage the servers, restore the data and find a datacentre to host everything.
Below is the Dell VRTX with 2 x M620 blades (total 24 cores, 128 GB RAM) and 8 x 900 GB SAS drives (I've since added 4 more). The top unit is the Dell AppAssure DL-1000, followed by the VRTX, the ReadyNAS 2100 and then a few of our Dell PowerEdge 2850/2650/2550 servers (there are 10). Most of the PowerEdge servers have been or will be migrated over to the VRTX.

Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!