NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
Dewdman42
Dec 10, 2024Virtuoso
Lots of bitrot errors, advice needed
My ReadyNAS is functioning for the most part, but recently I found out that idrive had not been backing it up for quite a while, months...so I fixed that and finally ended up re-backing up the whole ...
StephenB
Dec 11, 2024Guru - Experienced User
Dewdman42 wrote:
well so looking back further in the logs I do find something that must be the reason for this 18 months ago. see below.
I agree it is at least related.
800+ pending sectors does mean the disk needs to be replaced.
Dewdman42
Dec 14, 2024Virtuoso
What are you referring to about the 800 pending sectors? Please forgive me if this is obvious?
I haven't found any other errors other than 18 months ago when the drive was resync'd for some reason, errors gone after that.. except for these bit rot errors now while doing big re-backup....
- StephenBDec 14, 2024Guru - Experienced User
Dewdman42 wrote:
What are you referring to about the 800 pending sectors? Please forgive me if this is obvious?
See status.log above
[23/07/09 03:52:40 MDT] crit:disk:LOGMSG_SMART_PENDING_SECT_30DAYS_WARN Detected increasing pending sector: count [878] on disk 2 (Internal) [WDC WD60EFRX-68L0BN1, WD-WX11DA7R3J4V] 33 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.This problem goes back to July 2023. "Pending Sectors" are sectors that could not be read (generating an error, but not reallocated). The drive was marked as failed by the ReadyNAS (also on 9 July), but the reboot cleared that status. Although the system did eventually resync that disk after a couple of reboots, I suspect the issues on the disk caused the problem.
The volume also was very full around that time
[23/07/13 16:46:05 MDT] warning:volume:LOGMSG_VOLUME_USAGE_CRITICAL Less than 10% of volume data's capacity is free. data's performance is degraded and you risk running out of usable space. To improve performance and stability, you must add capacity or make free space.BTRFS doesn't behave will when it runs out of usable free space, so I always recommend keeping at least 15%.
- Dewdman42Dec 14, 2024VirtuosoTo be clear, as I stated earlier, the disk did not resync after a couple of reboots. I removed the disk from the array, rebooted, and then put it back to force a re-sync. After that, zero errors other then these bit rot errors 18 months later when I finally got around to re backing up some old files, which are now gone since I removed the reported files
If the disk itself was destined to fail at a hardware level why has it not been producing errors since July 2023? To me it seems like the raid array was software compromised in some way for some unknown reason, And still may be compromised for all I know.. I did allow the volume to become quite full in the past which could have allowed some software induced problem in the raid array itself, as you indicated the file system does not behave well when the disk is full. Just thinking out loud here.
I still don’t know what I should do. Buying a new drive may do nothing if the drive synchronization was somehow compromised and if it is still compromised due to that. On the other hand if the actual hardware actually is failing, despite no errors since 2023,and no smart errors ever, then it could be a ticking time bomb.
Approx 8tb of data I am half tempted to just buy a super big ssd and forget about using a NAS and raid.- StephenBDec 14, 2024Guru - Experienced User
Dewdman42 wrote:
To be clear, as I stated earlier, the disk did not resync after a couple of reboots. I removed the disk from the array, rebooted, and then put it back to force a re-sync. After that, zero errors other then these bit rot errors 18 months later when I finally got around to re backing up some old files, which are now gone since I removed the reported filesTo be clear, 933 pending sectors were reported by the drive right before the resync. The NAS marked the drive as failed - which in my opinion was exactly right. It should have been replaced at that time - forcing the resync was a bad idea. Sorting out exactly why the errors stopped would be difficult (or impossible) at this point. But it is quite possible (IMO probable) that the damage was done somewhat before that scrub, and detected during the scrub. My guess is that the scrub updated the parity blocks with the errored data, and the resync therefore would also have rebuilt the drive with the errored data.
I don't have the full log zip, so I can't see if there are any other possible causes.
Dewdman42 wrote:
Approx 8tb of data I am half tempted to just buy a super big ssd and forget about using a NAS and raid.You could of course do that. Or go with a single mechanical drive (which is quite a bit less expensive at that size). You still would need a good backup plan.
Switching to RAID-1 is a similar option, as mirroring is simpler than RAID-5, and it is easier to recover files when something goes wrong.
One of my backup NAS has two JBOD volumes that comfortably holds everything I have. So at some point in the future, I could just install large enough disks in my application server for primary storage, and just use my NAS systems for backup.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!