Hello,I'm using RN104, 2 x 4GB WD RED in JBOD/Flex mode (I need no raid). No apps installed, samba, dlna services stopped to debug this issue. Http/https/ssh - enabled. I've enabled disk spin down after 5 minutes from GUI menu. System logs shows that disks are stopped, but then immediately started after 4-8 seconds.Disk [0] going to standby...Disk [0] spinning up...Disk [0] going to standby...Disk [0] spinning up...After enabling write log: echo 1 > /proc/sys/vm/block_dump I see this in the dmesg:md0_raid1(694): WRITE block 8 on sda1 (1 sectors)md0_raid1(694): WRITE block 8 on sdb1 (1 sectors)jbd2/md0-8(719): WRITE block 3684640 on md0 (8 sectors)jbd2/md0-8(719): WRITE block 3684648 on md0 (8 sectors)jbd2/md0-8(719): WRITE block 3684656 on md0 (8 sectors)jbd2/md0-8(719): WRITE block 3684664 on md0 (8 sectors)jbd2/md0-8(719): WRITE block 3684672 on md0 (8 sectors)jbd2/md0-8(719): WRITE block 3684680 on md0 (8 sectors)jbd2/md0-8(719): WRITE block 3684688 on md0 (8 sectors)jbd2/md0-8(719): WRITE block 3684696 on md0 (8 sectors)leafp2p(1282): WRITE block 775816 on md0 (8 sectors)md0_raid1(694): WRITE block 8 on sda1 (1 sectors)md0_raid1(694): WRITE block 8 on sdb1 (1 sectors)I do believe, these writes wakes up the disks, becaue they happen in every 4-8 seconds. I've tried to kill leafp2p process, it doesn't seem to be the problem source. I can suspect that it's these writes are made by kernel, but don't have any knowledge about raid stuff. Same behaviour is when I type "hdparm -y /dev/sda". It spinds down, and wakes up in a second. Please advise.

Do you have any apps installed? If so, which ones?What services are you running?

There are no installed apps at the moment. Only http, https and ssh services enabled. By the way, lsof /dev/sda, lsof /dev/sdb, doesn't show any access to files too. I think I could try reseting to factory defaults, to make sure that it's not something i've installed back ago. But I guess there is no way save data on the disks? Even if I disconnect them during reset, they will be wiped out once I connect them to fresh system?

Doing a factory reset with no disks installed achieves nothing as the whole purpose of a factory reset is to wipe the disks installed and do a clean install of the OS.Backup your data, do a factory default (wipes all data, settings, everything) and restore your data form backup if you wish.Any clues in the logs zip file around the time the disks are spun up which might explain what is going on?

I have a RN104 too.I had the same problem. I delete and uninstall any apps and still did not work.Running the top command from ssh I saw instances of mysql (but if I have not installed). I install mysql and then delete now Spindown works well.When I reboot the NAS, mysql instances be running again !! Spindown now does not work well.In the end I renamed the "mysql" script in the init.d folder to NAS not run this. Now Spindown works well with everything stopped.When operating Transmission does not stop any of the 4 HD that I have, is this normal?Not if it will be the same problem, I hope you can help with anything explained.regards,

We use systemctl to disable services in OS6.Transmission makes lots of reads. The disks won't spin down with that running.

Disk spin-down problem, RN104, fw 6.2.0

23 Replies

Replies have been turned off for this discussion

dsm1212
Apprentice
Dec 04, 2014
Ah, ok, I think what it does then is read from both the mirrors and if it gets an IO error from one side of the mirror it will rewrite the data successfully read from the other side. md supports this automatically, you just need to trigger a read from both sides and it will happen. So bitrot protection will wake up the disks unless the program driving it checks to see if the disks are in standby.

steve
mdgm-ntgr
NETGEAR Employee Retired
Dec 04, 2014
Bitrot protection does more than what md raid can do by itself. It can fix some corruption that md raid by itself couldn't do anything about.

StephenB

Guru - Experienced User

Dec 04, 2014

dsm1212 wrote:
Ah, ok, I think what it does then is read from both the mirrors and if it gets an IO error from one side of the mirror it will rewrite the data successfully read from the other side. md supports this automatically, you just need to trigger a read from both sides and it will happen. So bitrot protection will wake up the disks unless the program driving it checks to see if the disks are in standby.

dsm1212 wrote:
Ah, ok, I think what it does then is read from both the mirrors and if it gets an IO error from one side of the mirror it will rewrite the data successfully read from the other side. md supports this automatically, you just need to trigger a read from both sides and it will happen. So bitrot protection will wake up the disks unless the program driving it checks to see if the disks are in standby.

Well, normal RAID handles read IO errors so there's no need to add more protection for that. Bitrot protection deals with the case where the data was "silently" corrupted w/o an I/O error to trigger the recovery. And the bitrot might only affect the checksums themselves, so that possibility needs to be covered.

So bitrot recovery has to kick in after the btrfs checksum fails but when no read errors have occured. Once you know the checksum is bad, there are several strategies you could attempt to use. The simplest I can think of is to simulate read errors on each of the data blocks covered by the checksum, and see if substituting these recovered blocks results in a good checksum. We are just speculating of course, since Netgear isn't saying.

dsm1212
Apprentice
Dec 05, 2014
Right a read will fix it, but if the leg with the problem is not read it sits there like a time bomb, then the drive with the good leg fails for some other reason and now your last copy has this latent read error. You avoid this by forcing a read on both legs. Really I would bet this is all it does as this is a pretty well understood problem. There is another way they may be doing this that is much more efficient, but it's disk vendor dependent and I doubt netgear would go there.

steve
StephenB
Guru - Experienced User
Dec 05, 2014
Anything your approach could fix is also fixed by a normal scrub. Per mdgm's comment, bitrot protection goes beyond what md raid can fix.

The weakness in all RAID is that it recovers from missing (unreadable) blocks, but has no way to recover from wrong blocks that are readable. It can only provide erasure correction, not error correction.

Blending in the checksum allows you to identify blocks that might be bad, and then pretend they were erased, so they can be rebuilt. I've seen that approach used in other contexts, it is also a well understood technique if you are into this kind of stuff.

The main difference between our approaches is that I was simulating a read error/erasure, and you were assuming there must be one already happening. It's a small change, but it enables repairs that otherwise can't be made. For instance, if you attempt to fix an array failure by cloning bad disks to a good ones - in that case there are scattered blocks that are wrong, but readable. My approach can fix most of those errors, yours cannot.

But again, Netgear isn't saying what they actually did - probably because they want to retain it as a competitive advantage.
dsm1212
Apprentice
Dec 05, 2014
Ok, I think I see what you are suggesting, but I don't understand what the case you are trying to prevent has to do with bit-rot. If a bit changes on the disk then the sector won't pass internal checksums done by the disk firmware and it will be an IO error. The chance of some bits flipping and the internal checksum still being good are pretty obscure (like not in our lifetimes). Statistically a good read of the wrong data either means someone wrote it or there was a silent failed write that left the old data in tact. Neither of those cases should be labeled "bit-rot". They happen either because someone wrote directly to a leg or the RAID sw/hw itself was buggy or there was a deferred write error.

steve

StephenB

Guru - Experienced User

Dec 05, 2014

I think a lot depends on what you think bit rot is. Here's one article that is using the silent corruption definition: http://arstechnica.com/information-tech ... lesystems/

...As a test, I set up a virtual machine with six drives. One has the operating system on it, two are configured as a simple btrfs-raid1 mirror, and the remaining three are set up as a conventional raid5. I saved Finn's picture on both the btrfs-raid1 mirror and the conventional raid5 array, and then I took the whole system offline and flipped a single bit—yes, just a single bit from 0 to 1—in the JPG file saved on each array.

He then goes on to say that RAID-5 didn't find the error, but that the btrfs experimental raid feature did find and fix it. No read errors are happening, the data just went wrong somehow. (BTW, A raid-5 scrub would have detected it, but it wouldn't know if the parity block or the data block was wrong. In principle a RAID-6 scrub could be written which would find/repair the error, but I don't know if a normal RAID-6 scrub would do so).

Most of the bit-rot discussions on this forum were triggered from the article, so I think generally posters here are aligned with this understanding of the definition.

How often you think this happens in practice, and what the mechanisms for this "rot" are is an interesting question, and I suspect there are different views. Personally I don't think disks are likely to return wrong data when they fail. Like you, I believe it is far more likely that the disk's own error checks will uncover the problem, and it will return a read error.

Cloning a bad disk is one obvious way it can happen in real life. Software errors or crashes with uncompleted writes in the queue could of course do it as well. Memory failures that corrupt the data in the cache (before it is written) could do it. But personally I don't believe it happens spontaneously on the disk itself.

algiam

Aspirant

Dec 23, 2014

I have a RN104 with firmware 6.2.1, prior to 6.2.0 worked fine the spindown.

I have a problem with Spindown, only 2 of the 4 hd off.

Disc 1 JBOD 500GB
Disc 2 JBOD 2T
Disc 3 1TB Raid 1
Disc 4 1TB Raid 1

Only discs 1 and 3 are stopped, why?
The disc 2 may be some reason, but the disc 4 having Raid with the 3?

I have all the stops aplicacions not have anything, even stopped the SMB, DLNA disk 2 and is the same.

Dec 23 05:56:14 Optimus noflushd[12881]: Spinning up disk 1 (/dev/sdd) after 2:58:17.
Dec 23 05:56:15 Optimus noflushd[12881]: Spinning up disk 3 (/dev/sdb) after 3:08:16.
Dec 23 06:07:02 Optimus noflushd[12881]: Spinning down disk 3 (/dev/sdb).
Dec 23 06:07:05 Optimus noflushd[12881]: Spinning down disk 1 (/dev/sdd).
Dec 23 06:25:15 Optimus noflushd[12881]: Spinning up disk 3 (/dev/sdb) after 0:18:10.
Dec 23 06:25:20 Optimus noflushd[12881]: Spinning up disk 1 (/dev/sdd) after 0:18:13.
Dec 23 06:35:22 Optimus noflushd[12881]: Spinning down disk 3 (/dev/sdb).
Dec 23 06:35:24 Optimus noflushd[12881]: Spinning down disk 1 (/dev/sdd).
Dec 23 11:07:53 Optimus noflushd[12881]: Spinning up disk 1 (/dev/sdd) after 4:32:26.
Dec 23 11:08:13 Optimus noflushd[12881]: Spinning up disk 3 (/dev/sdb) after 4:32:49.
Dec 23 11:15:13 Optimus noflushd[12881]: Quitting on signal...
Dec 23 11:17:20 Optimus noflushd[1419]: Enabling spindown for disk 3 [sdb,0:2:ST31000340NS:9QJ3D9PF:300F:7200]
Dec 23 11:17:20 Optimus noflushd[1419]: Enabling spindown for disk 4 [sda,0:3:ST31000340NS:9QJ2ZMYG:300F:7200]
Dec 23 11:17:20 Optimus noflushd[1419]: Enabling spindown for disk 2 [sdc,0:1:WDC_WD20EZRX-00D8PB0:WD-WMC4M0310978:80.00A08:]
Dec 23 11:17:20 Optimus noflushd[1419]: Enabling spindown for disk 1 [sdd,0:0:ST3500418AS:6VMDW5LD:HP34:7200]
Dec 23 11:28:11 Optimus noflushd[1419]: Spinning down disk 3 (/dev/sdb).
Dec 23 11:39:16 Optimus noflushd[1419]: Spinning up disk 3 (/dev/sdb) after 0:11:02.
Dec 23 11:49:17 Optimus noflushd[1419]: Spinning down disk 3 (/dev/sdb).
Dec 23 11:49:35 Optimus noflushd[1419]: Spinning down disk 1 (/dev/sdd).
Dec 23 11:56:44 Optimus noflushd[1419]: Spinning up disk 3 (/dev/sdb) after 0:07:24.
Dec 23 11:56:44 Optimus noflushd[1419]: Spinning up disk 1 (/dev/sdd) after 0:07:07.
Dec 23 12:01:40 Optimus noflushd[1419]: Quitting on signal...
Dec 23 12:01:40 Optimus noflushd[4067]: Enabling spindown for disk 3 [sdb,0:2:ST31000340NS:9QJ3D9PF:300F:7200]
Dec 23 12:01:40 Optimus noflushd[4067]: Enabling spindown for disk 4 [sda,0:3:ST31000340NS:9QJ2ZMYG:300F:7200]
Dec 23 12:01:40 Optimus noflushd[4067]: Enabling spindown for disk 2 [sdc,0:1:WDC_WD20EZRX-00D8PB0:WD-WMC4M0310978:80.00A08:]
Dec 23 12:01:40 Optimus noflushd[4067]: Enabling spindown for disk 1 [sdd,0:0:ST3500418AS:6VMDW5LD:HP34:7200]
Dec 23 12:28:19 Optimus noflushd[4067]: Spinning down disk 3 (/dev/sdb).
Dec 23 12:28:21 Optimus noflushd[4067]: Spindown of disk 3 (/dev/sdb) cancelled.
Dec 23 12:33:22 Optimus noflushd[4067]: Spinning down disk 3 (/dev/sdb).
Dec 23 12:33:25 Optimus noflushd[4067]: Spinning down disk 1 (/dev/sdd).
Dec 23 12:39:08 Optimus noflushd[4067]: Spinning up disk 1 (/dev/sdd) after 0:05:41.
Dec 23 12:39:23 Optimus noflushd[4067]: Spinning up disk 3 (/dev/sdb) after 0:05:58.
Dec 23 12:44:25 Optimus noflushd[4067]: Spinning down disk 3 (/dev/sdb).
Dec 23 12:44:27 Optimus noflushd[4067]: Spinning down disk 1 (/dev/sdd).
Dec 23 13:08:08 Optimus noflushd[4067]: Spinning up disk 1 (/dev/sdd) after 0:23:38.
Dec 23 13:08:34 Optimus noflushd[4067]: Spinning up disk 3 (/dev/sdb) after 0:24:07.
Dec 23 13:13:25 Optimus noflushd[4067]: Spinning down disk 1 (/dev/sdd).
Dec 23 13:13:58 Optimus noflushd[4067]: Spinning down disk 3 (/dev/sdb).
Dec 23 13:31:27 Optimus noflushd[4067]: Spinning up disk 3 (/dev/sdb) after 0:17:27.
Dec 23 13:31:27 Optimus noflushd[4067]: Spinning up disk 1 (/dev/sdd) after 0:17:59.
Dec 23 13:36:40 Optimus noflushd[4067]: Spinning down disk 3 (/dev/sdb).
Dec 23 13:37:32 Optimus noflushd[4067]: Spinning down disk 1 (/dev/sdd).
Dec 23 14:58:32 Optimus noflushd[4067]: Spinning up disk 1 (/dev/sdd) after 1:20:57.
Dec 23 14:58:43 Optimus noflushd[4067]: Spinning up disk 3 (/dev/sdb) after 1:22:01.

Someone who can help me?

Thank You,

StephenB
Guru - Experienced User
Dec 23, 2014
If you are good with ssh, you can try this: viewtopic.php?f=154&t=77842&hilit=skywalker#p435474
mdgm-ntgr
NETGEAR Employee Retired
Dec 23, 2014
Disk 2 is a WD Green disk. Have you tried disabling the WDIDLE3 timer on that?

Forum Discussion

Disk spin-down problem, RN104, fw 6.2.0

23 Replies

Related Content

Stop spin-down

Disk spin down or power down wakeup

Spinning up shortly after each Spinning down

disk spin - down but not set

RN104 Shutdown-Problem

NETGEAR Academy

ProSupport for Business