Forum Discussion

Aspirant

May 27, 2021

Solved

Сhecksum errors in files on RAID5 Netgear ReadyNAS Pro 6 RNDP6000

Netgear ReadyNAS Pro 6 RNDP6000 RAID-5 (of 6 disks) 2TB FW 6.10.2 I got a checksum error message in the backup software. I was check disks and found on one many bad sectors. (using HDAT2 softw...

StephenB
Jun 04, 2021
Roman304 wrote:

I have partitioned jbod raid drives into each drive. RAID number 4 checksum error.

That's a clear indication that your issue is either linked to that disk or to that slot. The next step is to figure out which.

I suggest destroying RAID 1,2,5,6, and removing those disks. Then power down the NAS, and swap RAID 3 and RAID 4. Power up and re-run the test on both volumes. That will let you know if the problem is linked to the disk or the slot.

If the problem disappears on both disks, then it could be power-related. You can confirm that by adding the removed disks back one at a time, and see when the problem starts happening again.

Roman304

Aspirant

May 30, 2021

i tried to check mdadm by signal.

echo repair > /sys/block/md127/md/sync_action

9012.910037] md: requested-resync of RAID array md127
[ 9012.910043] md: minimum _guaranteed_  speed: 30000 KB/sec/disk.
[ 9012.910045] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
[ 9012.910053] md: using 128k window, over a total of 1948663808k.
[ 9084.033911] BTRFS warning (device md0): csum failed ino 26800 off 3297280 csum 2509189606 expected csum 3194441580
[ 9084.454904] BTRFS warning (device md0): csum failed ino 26800 off 4055040 csum 1637586726 expected csum 1282422897
[ 9084.581585] BTRFS warning (device md0): csum failed ino 26800 off 6422528 csum 1593964658 expected csum 1961461496
[ 9084.612492] sh (25655): drop_caches: 3
[ 9084.614361] BTRFS warning (device md0): csum failed ino 26800 off 5677056 csum 1932001198 expected csum 1489994020
[ 9084.615384] BTRFS warning (device md0): csum failed ino 26800 off 5677056 csum 1932001198 expected csum 1489994020
[ 9084.622754] BTRFS warning (device md0): csum failed ino 26800 off 7094272 csum 3606228094 expected csum 4246393588
[ 9084.695542] sh (25658): drop_caches: 3
[ 9085.341325] sh (25699): drop_caches: 3
[ 9085.424200] sh (25700): drop_caches: 3
[ 9085.607585] sh (25701): drop_caches: 3
[ 9085.704849] sh (25702): drop_caches: 3
[ 9085.731888] sh (25704): drop_caches: 3
[ 9085.732521] mdcsrepair[25705]: segfault at 1902230 ip 00000000004048df sp 00007ffea4e019b0 error 4 in mdcsrepair[400000+10000]
[ 9087.009513] sh (25731): drop_caches: 3
[ 9087.110047] sh (25732): drop_caches: 3
[ 9087.557204] sh (25762): drop_caches: 3
[ 9087.638523] sh (25766): drop_caches: 3
[ 9101.700580] BTRFS warning (device md0): csum failed ino 26800 off 3297280 csum 2509189606 expected csum 3194441580
[ 9102.140147] BTRFS warning (device md0): csum failed ino 26800 off 6729728 csum 843258588 expected csum 430662742
[ 9102.141347] BTRFS warning (device md0): csum failed ino 26800 off 4530176 csum 4014326161 expected csum 3266211526
[ 9102.142287] BTRFS warning (device md0): csum failed ino 26800 off 4055040 csum 1637586726 expected csum 1282422897
[ 9102.142732] BTRFS warning (device md0): csum failed ino 26800 off 4804608 csum 2561880108 expected csum 3042484091
[ 9103.060502] BTRFS warning (device md0): csum failed ino 26800 off 2076672 csum 188520152 expected csum 651639183
[ 9103.276951] sh (26055): drop_caches: 3
[ 9103.277401] sh (26056): drop_caches: 3
[ 9103.277815] sh (26058): drop_caches: 3
[ 9103.281850] sh (26054): drop_caches: 3
[ 9103.282294] sh (26057): drop_caches: 3
[ 9103.434179] sh (26062): drop_caches: 3
[ 9103.437582] sh (26060): drop_caches: 3
[ 9103.438571] sh (26059): drop_caches: 3
[ 9103.465113] sh (26061): drop_caches: 3
[ 9103.467454] sh (26063): drop_caches: 3
[ 9103.467969] sh (26064): drop_caches: 3
[ 9103.566172] sh (26066): drop_caches: 3
[ 9103.567000] mdcsrepair[26070]: segfault at 1ca6238 ip 00000000004048df sp 00007fffea17d9d0 error 4 in mdcsrepair[400000+10000]
[ 9103.568900] sh (26067): drop_caches: 3
[ 9103.568901] sh (26065): drop_caches: 3
[ 9103.569312] mdcsrepair[26071]: segfault at 1d29220 ip 00000000004048df sp 00007ffcf1ef8270 error 4 in mdcsrepair[400000+10000]
[ 9103.599827] sh (26069): drop_caches: 3
[ 9103.600311] mdcsrepair[26092]: segfault at 25a7228 ip 00000000004048df sp 00007ffd96a6c460 error 4 in mdcsrepair[400000+10000]
[ 9103.639920] BTRFS warning (device md0): csum failed ino 26800 off 7749632 csum 976488093 expected csum 400533962
[ 9103.640132] BTRFS warning (device md0): csum failed ino 26800 off 7749632 csum 976488093 expected csum 400533962
[ 9105.156635] sh (26145): drop_caches: 3
[ 9105.262331] sh (26146): drop_caches: 3
[ 9105.338203] sh (26148): drop_caches: 3
[ 9105.338666] mdcsrepair[26149]: segfault at 14e0228 ip 00000000004048df sp 00007ffd2601bda0 error 4 in mdcsrepair[400000+10000]
[ 9392.955849] BTRFS warning (device md0): csum failed ino 26800 off 3297280 csum 2509189606 expected csum 3194441580
[ 9393.396182] sh (27133): drop_caches: 3
[ 9393.407254] BTRFS warning (device md0): csum failed ino 26800 off 5730304 csum 1721535998 expected csum 1266098857
[ 9393.555932] sh (27134): drop_caches: 3
[ 9393.680572] BTRFS warning (device md0): csum failed ino 26800 off 4276224 csum 2257014538 expected csum 2876039261
[ 9393.705564] sh (27136): drop_caches: 3
[ 9393.705994] mdcsrepair[27138]: segfault at fb1238 ip 00000000004048df sp 00007fffde04bee0 error 4 in mdcsrepair[400000+10000]
[ 9394.203447] BTRFS warning (device md0): csum failed ino 26800 off 5115904 csum 406066205 expected csum 870083223
[ 9394.203615] BTRFS warning (device md0): csum failed ino 26800 off 5115904 csum 406066205 expected csum 870083223
[ 9395.340527] sh (27180): drop_caches: 3
[ 9395.465281] sh (27182): drop_caches: 3
[ 9395.529394] sh (27184): drop_caches: 3
[ 9395.529871] mdcsrepair[27187]: segfault at 134d230 ip 00000000004048df sp 00007ffeebc6fb40 error 4 in mdcsrepair[400000+10000]
[ 9395.831133] sh (27222): drop_caches: 3
[ 9395.950588] sh (27223): drop_caches: 3
[ 9395.951581] sh (27224): drop_caches: 3
[ 9395.980226] sh (27226): drop_caches: 3
[ 9396.077257] sh (27227): drop_caches: 3
[ 9396.077683] mdcsrepair[27229]: segfault at 1b93238 ip 00000000004048df sp 00007fff9c18ac80 error 4 in mdcsrepair[400000+10000]
[11045.200001] BTRFS warning (device md0): csum failed ino 26800 off 2174976 csum 549178347 expected csum 190035297
[11045.294967] BTRFS warning (device md0): csum failed ino 26800 off 4857856 csum 2091937243 expected csum 1364968076
[11045.297180] BTRFS warning (device md0): csum failed ino 26800 off 5017600 csum 455988020 expected csum 918977635
[11045.323613] BTRFS warning (device md0): csum failed ino 26800 off 6045696 csum 3782956367 expected csum 3432053272
[11045.324193] sh (31561): drop_caches: 3
[11045.380741] BTRFS warning (device md0): csum failed ino 26800 off 7426048 csum 3231217454 expected csum 3983792249
[11045.540305] sh (31563): drop_caches: 3
[11046.101205] sh (31610): drop_caches: 3
[11046.200518] sh (31611): drop_caches: 3
[11046.201664] sh (31612): drop_caches: 3
[11046.212111] sh (31613): drop_caches: 3
[11046.212113] sh (31614): drop_caches: 3
[11046.226738] sh (31616): drop_caches: 3
[11046.226747] sh (31615): drop_caches: 3
[11046.238619] sh (31618): drop_caches: 3
[11046.239139] sh (31617): drop_caches: 3
[11046.239189] mdcsrepair[31619]: segfault at 2179228 ip 00000000004048df sp 00007ffdd7ed7ba0 error 4 in mdcsrepair[400000+10000]
[33839.238730] md: md127: requested-resync done.

next cat /sys/block/md127/md/mismatch_cnt and results: 135008

but, no error status in GUI

/dev/md127:
           Version : 1.2
     Creation Time : Mon Mar 16 21:27:21 2020
        Raid Level : raid5
        Array Size : 9743319040 (9291.95 GiB 9977.16 GB)
     Used Dev Size : 1948663808 (1858.39 GiB 1995.43 GB)
      Raid Devices : 6
     Total Devices : 6
       Persistence : Superblock is persistent

       Update Time : Sun May 30 11:49:08 2021
             State : clean
    Active Devices : 6
   Working Devices : 6
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 64K

Consistency Policy : unknown

              Name : 33ea55f9:RAID-5-0  (local to host 33ea55f9)
              UUID : 04d214c4:ee331e6a:74ca0a04:5e846481
            Events : 979

    Number   Major   Minor   RaidDevice State
       6       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
       4       8       67        4      active sync   /dev/sde3
       5       8       83        5      active sync   /dev/sdf3

problem still exists

root@NAS-2:/RAID-5/TEST-FILE# dd if=/dev/urandom of=Test.flie bs=64M count=32dd: warning: partial read (33554431 bytes); suggest iflag=fullblock
0+32 records in
0+32 records out
1073741792 bytes (1.1 GB, 1.0 GiB) copied, 103.542 s, 10.4 MB/s
root@NAS-2:/RAID-5/TEST-FILE# md5sum Test.flie
71b8e1ea63c2d543dd1b521698f1f40b  Test.flie

after 5-10 minutes

root@NAS-2:/RAID-5/TEST-FILE# md5sum Test.flie
5952e1d1c6447efbbc4e76b13f090dbd  Test.flie

DEADDEADBEEF

Apprentice

May 31, 2021

It seems incredibly strange that BTRFS would have a file get corrupted like this and not throw BTRFS errors all over the place.. What's in the journal would not match what's on the disk!

Are you absolutely sure there's not some application out there touching/modifying the files? Can you also track the modify date of the file ($ stat <file>)? Perhaps even turn on Auditing? It's a pretty major thing for a file to get changed like that silently, BTRFS should detect and report corruption as soon as you try to access the file.. Unless.... maybe you have turned off checksumming on your data volume?

Roman304

Aspirant

May 31, 2021

DEADDEADBEEF wrote:

Are you absolutely sure there's not some application out there touching/modifying the files? Can you also track the modify date of the file ($ stat <file>)? Perhaps even turn on Auditing?

i am not sure... But i use default all services and nothing install outhere.

about $ stat <file> , file not modified, but checksum changed.

root@HQ-NAS-2:/RAID-5/TEST-FILE# dd if=/dev/urandom of=Test.flie bs=64M count=32
dd: warning: partial read (33554431 bytes); suggest iflag=fullblock
0+32 records in
0+32 records out
1073741792 bytes (1.1 GB, 1.0 GiB) copied, 103.885 s, 10.3 MB/s
root@HQ-NAS-2:/RAID-5/TEST-FILE# md5sum Test.flie
0542952ac3e7e9d494a26a37c41a6c9e  Test.flie
root@HQ-NAS-2:/RAID-5/TEST-FILE# stat Test.flie
  File: 'Test.flie'
  Size: 1073741792      Blocks: 2097160    IO Block: 4096   regular file
Device: 35h/53d Inode: 1049        Links: 1
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-05-31 10:54:16.835309473 +0300
Modify: 2021-05-31 10:56:00.681735098 +0300
Change: 2021-05-31 10:56:00.681735098 +0300
 Birth: -
root@HQ-NAS-2:/RAID-5/TEST-FILE# md5sum Test.flie
0542952ac3e7e9d494a26a37c41a6c9e  Test.flie
root@HQ-NAS-2:/RAID-5/TEST-FILE# md5sum Test.flie
0efe119d6aba0648ba32fc722fd72095  Test.flie
root@HQ-NAS-2:/RAID-5/TEST-FILE# stat Test.flie
  File: 'Test.flie'
  Size: 1073741792      Blocks: 2097152    IO Block: 4096   regular file
Device: 35h/53d Inode: 1049        Links: 1
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-05-31 10:54:16.835309473 +0300
Modify: 2021-05-31 10:56:00.681735098 +0300
Change: 2021-05-31 10:56:00.681735098 +0300
 Birth: -
root@HQ-NAS-2:/RAID-5/TEST-FILE# md5sum Test.flie
d289a229916a49bede053b9cdc778ec6  Test.flie
root@HQ-NAS-2:/RAID-5/TEST-FILE# stat Test.flie
  File: 'Test.flie'
  Size: 1073741792      Blocks: 2097152    IO Block: 4096   regular file
Device: 35h/53d Inode: 1049        Links: 1
Access: (0660/-rw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-05-31 10:54:16.835309473 +0300
Modify: 2021-05-31 10:56:00.681735098 +0300
Change: 2021-05-31 10:56:00.681735098 +0300
 Birth: -