6.2.0 Good...

anna_arun · ‎2014-09-23

Very Good Update .

Grievous · ‎2014-09-23

Thanks

canfeng_li · ‎2014-09-24

Great upgrade.

Hephaestus1 · ‎2014-09-25

Yes, good, but... on boxes using non-ECC RAM, what is the point of:

Another layer of data protection has been added with automatic bitrot protection. This feature will now both detect and correct data corruption due to media degradation on redundant RAID volumes

TeknoJnky · ‎2014-09-25

+1

garyd9 · ‎2014-09-25

Hephaestus wrote:
Yes, good, but... on boxes using non-ECC RAM, what is the point of:
Another layer of data protection has been added with automatic bitrot protection. This feature will now both detect and correct data corruption due to media degradation on redundant RAID volumes

Not related to ECC RAM. The following article might be helpful: http://arstechnica.com/information-tech ... lesystems/

Hephaestus1 · ‎2014-09-25

garyd9 wrote:
Hephaestus wrote:
Yes, good, but... on boxes using non-ECC RAM, what is the point of:
Another layer of data protection has been added with automatic bitrot protection. This feature will now both detect and correct data corruption due to media degradation on redundant RAID volumes
Not related to ECC RAM. The following article might be helpful: http://arstechnica.com/information-technology/2014/01/bitrot-and-atomic-cows-inside-next-gen-filesystems/

I am familiar with the Ars Technica article. It makes simplifying assumption that RAM is 100% error free, and talks only about disk data corruption (note also that the Netgear ReadyNAS is not using BTRFS native RAID configuration).

It appears to be questionable to provide "(...) automatic bitrot protection [which] will now both detect and correct data corruption due to media degradation" when another source of random bit errors remains: non-ECC RAM. See for example Google's study "DRAM Errors in the Wild: A Large-Scale Field Study": http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

StephenB · ‎2014-09-25

Hephaestus wrote:
It appears to be questionable to provide "(...) automatic bitrot protection [which] will now both detect and correct data corruption due to media degradation" when another source of random bit errors remains: non-ECC RAM. See for example Google's study "DRAM Errors in the Wild: A Large-Scale Field Study": http://www.cs.toronto.edu/~bianca/paper ... rics09.pdf

Of course non-ECC RAM is also used in the clients that are reading/writing the data, and the routers and switches that are forwarding it.

The study is interesting, thx for sharing it.

I don't quite buy your assessment though - just because you aren't protected against all failure modes doesn't mean there is no value in being protected against some.

garyd9 · ‎2014-09-25

The fact is that, at least since the days of sealed HDD's, I've never actually seen so-called bit-rot. (I've seen errors in open drives, but those could have easily been a bit (no pun intended) of dust on the platter.) I'm not saying that it doesn't happen, but I've never seen it.

However, I have seen people on this forum asking for "bitrot" protection and correction and referencing that article. I've even seen people complaining that btrfs has built in mechanisms for dealing with that and netgear doesn't use it. So, the SE's put it (or something like it) in. Maybe it's just a marketing point. Maybe it's worth something. (Actually, I'm kind of curious about their implementation when the CRC's are coming from btrfs and the redundancy is coming at a lower level.)

Sure, there are other points of failure. In the real world, the biggest point of failure can't be addressed. That would be the users. A human being is hundreds if not millions of times more likely to cause data corruption than any decent non-ECC memory chip (that has passed quality controls) or hard drive.

It's not perfect, and even with fully ECC memory I could find failure points without a human interfering, but it's perhaps better than nothing.

Gary

Hephaestus1 · ‎2014-09-25

Well, it is never black and white, is it. While non-ECC RAM is commonly used, the fact is that every link in the chain would benefit from using ECC RAM - at increased equipment cost. Netgear agrees with that: more expensive ReadyNAS models (516, 716) already use ECC RAM. Historically one could predict that when the ECC RAM prices drop down even more, it will eventually be used in less expensive boxes. I have to agree however that my assessment is a bit over pessimistic: chances of both RAM and disk errors happening at the same time while scary, are statistically very, very low. One could probably argue that not having a dual (triple?), redundant power supply is possibly more important... and so on.

BTW, not experiencing a silent data corruption does not mean that it does not exist, specially that it may be undetected for long time, unless a special test is run, or until a day corrupted data is called for. If someone is more curious Google finds many references to proper research papers describing why, how often, etc. Of course in modern hard drives and in "tested" RAM this is not a common issue, but it does exists (and it always will). And of course there is always a human error factor as well.

StephenB · ‎2014-09-26

garyd9 wrote:
The fact is that, at least since the days of sealed HDD's, I've never actually seen so-called bit-rot. (I've seen errors in open drives, but those could have easily been a bit (no pun intended) of dust on the platter.) I'm not saying that it doesn't happen, but I've never seen it.

I haven't seen any definitive cases either. Though it is something that might be hard to detect. Yesterday I ran across a couple of old Word files on a server that I could download, but which Word couldn't read. Bit-rot? I have no way to tell - possibly the original upload failed, and they were never readable. Or maybe the original disk failed and was cloned...

I am a bit skeptical on the idea though, since disks have their own crc32 checksums.

Hephaestus wrote:
BTW, not experiencing a silent data corruption does not mean that it does not exist, specially that it may be undetected for long time, unless a special test is run, or until a day corrupted data is called for. If someone is more curious Google finds many references to proper research papers describing why, how often, etc. Of course in modern hard drives and in "tested" RAM this is not a common issue, but it does exists (and it always will). And of course there is always a human error factor as well.

It would be interesting to see the papers, including the mechanisms for silent bit-rot. This would be cases where the disk media degrades on its own to something readable but incorrect. RAM or software issues wouldn't be bit-rot (at least if they are, then btrfs checksums aren't guaranteed to detect or heal them).

Hephaestus1 · ‎2014-09-26

Here is a short collection of articles I usually recommend to my students who are interested in this topic:

(1) very readable, comprehensive (and vastly better than often quoted on forums Ars Technica article): "Bit Rot: Myth or Way of Life?", presented in 2010 on Digital Preservation Seminar, University of Alberta:
http://www.exlibrisgroup.com/files/Customer_Center/NorthAmerica/Rosetta-Alberta-clarke-1.ppt
(Power Point presentation, for people without MS Office: use free OpenOffice or LibreOffice)

(2) very readable article by friend James Hamilton: "Observations on Errors, Corrections, & Trust of Dependent Systems":
http://perspectives.mvdirona.com/2012/02/26/ObservationsOnErrorsCorrectionsTrustOfDependentSystems.aspx

(3) good, short article from NEC: "Silent data corruption in disk arrays: A solution"
http://www.necam.com/docs/?id=54157ff5-5de8-4966-a99d-341cf2cb27d3

(4) good article from ACM (Association for Computing Machinery): "Keeping Bits Safe: How Hard Can It Be?":
http://queue.acm.org/detail.cfm?id=1866298

(5) and finally, a bit of heavy reading: "End-to-end Data Integrity for File Systems: A ZFS Case Study":
http://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf

For readers who are curious, but do not want to turn it into a full time study: just read (1) and (2), it wont take long. On the other hand, if someone wants to spend days reading... each of the above has many references.

And remember: the main thing is to have fun!

6.2.0 Good...

6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...

Re: 6.2.0 Good...