NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
rcarr6502
May 19, 2018Tutor
Advice on mixing and interleaving disks, RAID 10?
I'm considering buying a ReadyNAS 628X and configuring it for RAID 10. To increase the reliability of the array, I've been thinking of using two drive vendors, populating it with HGST NAS disks and ...
- May 19, 2018
rcarr6502 wrote:
* RAID 5/6 is most dangerous when the array is degraded and rebuilding. That's exactly when the 2nd disk has failed in my two experiences. RAID 5/6 accelerates failure of remaining disks in a way that RAID-10 does not.
* RAID-10 will rebuild its array much more quickly -- since it just has to copy missing data from a disk in one mirror to the other.
* RAID-10 can survive a minimum of 1 disk failure up to N/2 disk failures. It's true that if one disk in both (RAID-10) mirrors fail simultaneously, the array is dead. But the chance that a second disk failure will take out the array should be 1/(n - 1)...
made me strongly consider RAID-10, especially the observation that "In raid 6, during a rebuild it has to read every drive and recalculate the missing data. That means you have to read [potentially terabytes and terabytes] of data to rebuild that and hope a URE doesn't occur."
I'll start by saying the analysis in the "death of raid" articles seems too simplistic to me. They make it sound like every disk read is like playing Russian Roulette - one chance in 10**14 of the URE bullet exploding your data. I don't think UREs are like that. Disk failures have a cause, they aren't just random events. When a disk fails, the chance of URE rises to 100%. When it's starting to fail, it rises very quickly from 0 to a much larger value. It's not a static 1 in 10**14 crap shoot.
I also don't think RAID 5/6 resync accelerates failures, though it is true that rebuilding the array requires either reading or writing every sector in it. More on that below.
Rebuilding RAID-10 is easier because it only requires mirroring one existing disk. The array fails if that existing disk fails during resync - but the 1/(n-1) probability is misleading (and in my opinion incorrect). Your "accelerate failures" concept is grounded in the idea that heavy disk I/O will create a failure in one of the remaining disks. With RAID-10 resync, the source disk (and of course the new mirror) are the only two disks that experience heavy I/O. So if that idea is correct, then the disk most likely to fail is in fact the source disk of the mirror.
I'm not really sold on that concept though. I think that when disks begin to fail, sectors silently become unreadable or unwriteable. But (at least with my data) most of the sectors are only rarely read or written - so those failures aren't detected right away. Then when a disk is detected as having failed, you replace it - and then discover there's other failures you didn't know about when the raid resync reads (or writes) everything.
So I don't think the RAID resync creates the failures - I think most of the time it uncovers failures that have already occured. I run the scheduled maitenance functions to try and detect those failures early.
Another observation - RAID-6 rebuild can survive UREs when you replace a single disk (because it has dual redundancy). Where it breaks down is if you have two or more UREs in the same stripe.
In my own experience, the odds of losing a RAID-5 array during resync are fairly small - certainly it happens sometimes, but not that often. In fact, I've never lost one that way. But there's always some chance your RAID array will fail, no matter what RAID mode you use - the defense against that is to have backups.
But if I were trying to solve the issues you are worried about, I think I'd go with multiple RAID-1 volumes instead of RAID-10. The resync process is the same with multiple RAID-1 as it is with RAID-10. But recovering data from RAID-1 is much easier than recovering data any of the other RAID modes. Plus it'd be much easier to increase storage (since you'd only need to offload and restore 1/4 of the data).
Retired_Member wrote:
Used WD in the past, but switched to HGST, which show 10 to 15% better performance. They are a bit warmer during standard operation, but give the higher throughput and are more reliable to my experience. Well, do not mix them with WD, though.
Again, it's fine to mix them - you just won't get the performance gain. You could of course mix the HGST with other enterprise class drives, and then you would get the performance improvement.
HGST drives have a good reputation, and as far as I can tell, the folks here who've used them are quite happy with them. Personally I've found the WD Reds to be quite reliable - one or two failures since I started using them back in 2012. At the moment I have 14 in service.
rcarr6502
May 19, 2018Tutor
Thanks, RolandWausE & StephenB.
Yes, I'm using Gigabit Ethernet.
Why RAID-10? I've been running XRAID-2 (RAID-6) for a while; it's saved me at least twice when I experienced two-disk failures. But everything I've read suggests RAID-6 is less safe as the number of disks or volume size increases.
http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/
I'm leaning toward RAID-10 for reliability, ease of data recovery and reasonable performance. RAID-6 has the advantage of maximizing disk space, but I'm willing to sacrifice disk space since resiliency is more important to me. Disk space is cheap now and disks so large, that growing the array horizontally over time (via X-RAID) isn't that important to me.
Based on what I've read:
* RAID 5/6 is most dangerous when the array is degraded and rebuilding. That's exactly when the 2nd disk has failed in my two experiences. RAID 5/6 accelerates failure of remaining disks in a way that RAID-10 does not.
* RAID-10 will rebuild its array much more quickly -- since it just has to copy missing data from a disk in one mirror to the other.
* RAID-10 can survive a minimum of 1 disk failure up to N/2 disk failures. It's true that if one disk in both (RAID-10) mirrors fail simultaneously, the array is dead. But the chance that a second disk failure will take out the array should be 1/(n - 1): 1/7 in an 8-disk array (according to:
https://aput.net/~jheiss/raid10/)
The conversation here:
made me strongly consider RAID-10, especially the observation that "In raid 6, during a rebuild it has to read every drive and recalculate the missing data. That means you have to read [potentially terabytes and terabytes] of data to rebuild that and hope a URE doesn't occur."
But I'm interested to hear about your experience and suggestions if you have a contrary opinion.
StephenB
May 19, 2018Guru - Experienced User
rcarr6502 wrote:
* RAID 5/6 is most dangerous when the array is degraded and rebuilding. That's exactly when the 2nd disk has failed in my two experiences. RAID 5/6 accelerates failure of remaining disks in a way that RAID-10 does not.
* RAID-10 will rebuild its array much more quickly -- since it just has to copy missing data from a disk in one mirror to the other.
* RAID-10 can survive a minimum of 1 disk failure up to N/2 disk failures. It's true that if one disk in both (RAID-10) mirrors fail simultaneously, the array is dead. But the chance that a second disk failure will take out the array should be 1/(n - 1)...
made me strongly consider RAID-10, especially the observation that "In raid 6, during a rebuild it has to read every drive and recalculate the missing data. That means you have to read [potentially terabytes and terabytes] of data to rebuild that and hope a URE doesn't occur."
I'll start by saying the analysis in the "death of raid" articles seems too simplistic to me. They make it sound like every disk read is like playing Russian Roulette - one chance in 10**14 of the URE bullet exploding your data. I don't think UREs are like that. Disk failures have a cause, they aren't just random events. When a disk fails, the chance of URE rises to 100%. When it's starting to fail, it rises very quickly from 0 to a much larger value. It's not a static 1 in 10**14 crap shoot.
I also don't think RAID 5/6 resync accelerates failures, though it is true that rebuilding the array requires either reading or writing every sector in it. More on that below.
Rebuilding RAID-10 is easier because it only requires mirroring one existing disk. The array fails if that existing disk fails during resync - but the 1/(n-1) probability is misleading (and in my opinion incorrect). Your "accelerate failures" concept is grounded in the idea that heavy disk I/O will create a failure in one of the remaining disks. With RAID-10 resync, the source disk (and of course the new mirror) are the only two disks that experience heavy I/O. So if that idea is correct, then the disk most likely to fail is in fact the source disk of the mirror.
I'm not really sold on that concept though. I think that when disks begin to fail, sectors silently become unreadable or unwriteable. But (at least with my data) most of the sectors are only rarely read or written - so those failures aren't detected right away. Then when a disk is detected as having failed, you replace it - and then discover there's other failures you didn't know about when the raid resync reads (or writes) everything.
So I don't think the RAID resync creates the failures - I think most of the time it uncovers failures that have already occured. I run the scheduled maitenance functions to try and detect those failures early.
Another observation - RAID-6 rebuild can survive UREs when you replace a single disk (because it has dual redundancy). Where it breaks down is if you have two or more UREs in the same stripe.
In my own experience, the odds of losing a RAID-5 array during resync are fairly small - certainly it happens sometimes, but not that often. In fact, I've never lost one that way. But there's always some chance your RAID array will fail, no matter what RAID mode you use - the defense against that is to have backups.
But if I were trying to solve the issues you are worried about, I think I'd go with multiple RAID-1 volumes instead of RAID-10. The resync process is the same with multiple RAID-1 as it is with RAID-10. But recovering data from RAID-1 is much easier than recovering data any of the other RAID modes. Plus it'd be much easier to increase storage (since you'd only need to offload and restore 1/4 of the data).
Retired_Member wrote:
Used WD in the past, but switched to HGST, which show 10 to 15% better performance. They are a bit warmer during standard operation, but give the higher throughput and are more reliable to my experience. Well, do not mix them with WD, though.
Again, it's fine to mix them - you just won't get the performance gain. You could of course mix the HGST with other enterprise class drives, and then you would get the performance improvement.
HGST drives have a good reputation, and as far as I can tell, the folks here who've used them are quite happy with them. Personally I've found the WD Reds to be quite reliable - one or two failures since I started using them back in 2012. At the moment I have 14 in service.
- rcarr6502May 20, 2018Tutor
Thank you, StephenB. This gives me a lot to think about.
- TeknoJnkyMay 23, 2018Hero
No raid level can prevent data loss.
Your only failsafe is backups and backups and backups.
Ideally on multiple devices, in multiple locations.
raid 6 + back ups = the best data safety.
I have 4 nas, 1 primary and 3 that have different sections of the primary backed up.
You could also consider cloud backup too depending on amount of data and your available bandwidth, and of course your budget limits.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!