Lots of bitrot errors, advice needed

StephenB
Guru - Experienced User
Dec 14, 2024
Dewdman42 wrote:

What are you referring to about the 800 pending sectors? Please forgive me if this is obvious?

See status.log above

[23/07/09 03:52:40 MDT] crit:disk:LOGMSG_SMART_PENDING_SECT_30DAYS_WARN Detected increasing pending sector: count [878] on disk 2 (Internal) [WDC WD60EFRX-68L0BN1, WD-WX11DA7R3J4V] 33 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.

This problem goes back to July 2023. "Pending Sectors" are sectors that could not be read (generating an error, but not reallocated). The drive was marked as failed by the ReadyNAS (also on 9 July), but the reboot cleared that status. Although the system did eventually resync that disk after a couple of reboots, I suspect the issues on the disk caused the problem.

The volume also was very full around that time

[23/07/13 16:46:05 MDT] warning:volume:LOGMSG_VOLUME_USAGE_CRITICAL Less than 10% of volume data's capacity is free. data's performance is degraded and you risk running out of usable space. To improve performance and stability, you must add capacity or make free space.

BTRFS doesn't behave will when it runs out of usable free space, so I always recommend keeping at least 15%.
- Dewdman42
  Virtuoso
  Dec 14, 2024
  To be clear, as I stated earlier, the disk did not resync after a couple of reboots. I removed the disk from the array, rebooted, and then put it back to force a re-sync. After that, zero errors other then these bit rot errors 18 months later when I finally got around to re backing up some old files, which are now gone since I removed the reported files
  
  If the disk itself was destined to fail at a hardware level why has it not been producing errors since July 2023? To me it seems like the raid array was software compromised in some way for some unknown reason, And still may be compromised for all I know.. I did allow the volume to become quite full in the past which could have allowed some software induced problem in the raid array itself, as you indicated the file system does not behave well when the disk is full. Just thinking out loud here.
  
  I still don’t know what I should do. Buying a new drive may do nothing if the drive synchronization was somehow compromised and if it is still compromised due to that. On the other hand if the actual hardware actually is failing, despite no errors since 2023,and no smart errors ever, then it could be a ticking time bomb.
  
  Approx 8tb of data I am half tempted to just buy a super big ssd and forget about using a NAS and raid.
  - StephenB
    Guru - Experienced User
    Dec 14, 2024
    Dewdman42 wrote:
    To be clear, as I stated earlier, the disk did not resync after a couple of reboots. I removed the disk from the array, rebooted, and then put it back to force a re-sync. After that, zero errors other then these bit rot errors 18 months later when I finally got around to re backing up some old files, which are now gone since I removed the reported files
    
    To be clear, 933 pending sectors were reported by the drive right before the resync. The NAS marked the drive as failed - which in my opinion was exactly right. It should have been replaced at that time - forcing the resync was a bad idea. Sorting out exactly why the errors stopped would be difficult (or impossible) at this point. But it is quite possible (IMO probable) that the damage was done somewhat before that scrub, and detected during the scrub. My guess is that the scrub updated the parity blocks with the errored data, and the resync therefore would also have rebuilt the drive with the errored data.
    
    I don't have the full log zip, so I can't see if there are any other possible causes.
    
    Dewdman42 wrote:
    
    Approx 8tb of data I am half tempted to just buy a super big ssd and forget about using a NAS and raid.
    
    You could of course do that. Or go with a single mechanical drive (which is quite a bit less expensive at that size). You still would need a good backup plan.
    
    Switching to RAID-1 is a similar option, as mirroring is simpler than RAID-5, and it is easier to recover files when something goes wrong.
    
    One of my backup NAS has two JBOD volumes that comfortably holds everything I have. So at some point in the future, I could just install large enough disks in my application server for primary storage, and just use my NAS systems for backup.

Forum Discussion

Lots of bitrot errors, advice needed

Related Content

advice

Pending Drive Failure - Need Advice

Home Networking advice

Enable Bitrot on Files: MV or CP?

Bitrot detected in system files?

NETGEAR Academy

ProSupport for Business