× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

I'm starting to suspect that the issue described in the community post titled "Disk scrubbing followed by fsck TRASHES filesystem?" just happened to me.

 

https://community.netgear.com/t5/ReadyNAS-in-Business/Disk-scrubbing-followed-by-fsck-TRASHES-filesy...

 

Wondering if anyone has any ideas as to what else I can try to recover the content.

 

So far:

 

I am running RAIDiator 4.2.28 on a ReadyNAS Ultra 6 Plus [X-RAID2] with 6 2TB drives in X-RAID2 with dual redundancy.

 

The NAS disk check reports tons of errors, and the NAS can't seem to see anything on the data volumes, though the operating file system seems fine.

 

I've plugged the 6 disks into another computer via USB3 and run a slew of recovery software on them. Only two seem to recognize the raid pattern: ReclaimMe Free Raid Recovery and UFS Explorer Pro (though they each report a slightly different configuration, they both detect RAID 6 and seem to be happy about it. ReclaimMe took about 2 hours to detect it by trial and error and only detected 1 raid pattern for the entire disk, UFS Explorer Pro found the raid parameters for 3 partitions (4GB, 2GB, and 7.27TB) instantly.

 

The 4GB partition opens fine with UFS Explorer Pro, can be exported to an image, and I can recover any file I want from this file system image using rlinux from rtt.

Just to see how bad things were, I started with the 4GB partition for the operating system. I've saved an image of the un-RAIDed 4GB OS partition, which is mirrored of each of the 6 drives. After trial and error with ReclaimMe, GetDataBack, RStudio, I found that UFS Explorer Pro (trial edition) was the only package that was helpful in detecting this 6x mirrored partition configuration and exporting the diskimage. However, it was rlinux (free) from rtt that allowed me to scan the partition and export the contents. So this partition seems to be intact, which make sense because RAIDiator is still running when I boot the NAS. I compared the unRAIDed image to a few of the mirror individually, and they seem fine, with only a few issues with how symbolic links and duplilcate (probably deleted versions) filenames are handled.

 

Not so lucky with the data volume

So then using ReclaimMe Free Raid Recovery, and the parameters it autodetected, I have saved a destriped image of the volume partition of the array, 7,27TB. It took me about 50 hours to write via USB3.

 

I've scanned the destriped image with several recovery programs, so far, only rlinux or rstudio seems to find the EXT filesystem (after 20 hours) or so. ReclaiMe, GetDataBack and UFS File Explorer can find a bunch of file fragments, and ghosts of dozens of FAT and NTFS file systems that seem to belong to disk images and virtual hard drives and backups that I was storing, but no EXT partitions.

 

in rlinux, the file system directory appears remarkably intact, with about 823000 files totaling about 4.44 TB. I can easily extract any one I want.

 

However,it is aggrevating when I am able to recover some of these file and inspect them, that small files (a few kilobytes, probably within one RAID block) are almost always ok, image files like JPG often have ok previews but when opened are corrupt after about 10%-20% of the image, and longer music/video files play fine end to end, but with 'static' / 'corruption' in the stream. 7z/gz/zip/iso files will not open, and I haven't tried any binary files or office documents.

 

So it seems to me like perhaps one of the blocks in the striping pattern got destroyed, and based on my logs, it fits with the known issue (albeit from before version 4.2.20) of 'file system gets trashed after raid scrubbing and fsck' described above.

 

Questions - Please help!

Any suggestions on what I can try next, or is the file system 'trashed' as described in the post referenced above?

 

One more thing I thought to try is to export the disk image from UFS Explorer Pro to see if perhaps the reclaimme destriping is to blame for some of the corruption, and then try to recover the file system again using rlinux. That will take around 70 to 80 hours. Any chance that would succeed any better than the reclaime export? is there a better tool to 'destripe' the RAIDX2 to a disk image that I should use instead?

 

Once I have the image, is rlinux the best way to detect the EXT file system, or are there others? Are there any that can repair it?

 

Are there any on-device things I should try on the NAS to recover the file system?

 

Recent system logs

 

These seem to show how this started happening right after a raid scrubbing followed by a fsck on Jan 22. There was also a disk failure during a raid scrubbing on Jan 5, but there was at least one successful fsck on jan 18 that seemed to be ok after the rebuild on Jan 11.

 

 

  • Fri Jan 22 16:42:28 EST 2016 The paths for the shares listed below could not be found. Typically, this occurs when the ReadyNAS is unable to access the data volume. vault media documents public content webroot common backup addons-config
  • Fri Jan 22 16:40:13 EST 2016 Volume scan found errors that it could not easily correct. Please ensure that you have current backups of all valuable data before performing a full volume scan by rebooting the NAS through Frontview with the volume scan option enabled.
  • Wed Jan 20 11:33:27 EST 2016 RAID scrubbing finished on volume C.
  • Tue Jan 19 01:00:26 EST 2016 RAID scrubbing started on volume C.
  • Mon Jan 18 01:08:08 EST 2016 The on-line filesystem consistency check completed without errors for Volume C.
  • Mon Jan 18 01:00:02 EST 2016 The on-line filesystem consistency check has started for Volume C.
  • Tue Jan 12 08:29:16 EST 2016 RAID sync finished on volume C.
  • Mon Jan 11 19:42:18 EST 2016 RAID sync started on volume C.
  • Mon Jan 11 19:41:51 EST 2016 Data volume will be rebuilt with disk 6.
  • Mon Jan 11 19:39:41 EST 2016 New disk detected. If multiple disks have been added, they will be processed one at a time. Please do not remove any added disk(s) during this time. [Disk 6]
  • Mon Jan 11 01:13:34 EST 2016 The on-line filesystem consistency check completed without errors for Volume C.
  • Mon Jan 11 01:00:01 EST 2016 The on-line filesystem consistency check has started for Volume C.
  • Tue Jan 5 14:42:01 EST 2016 A disk was removed from the ReadyNAS. For full protection of your data volume, please add a replacement disk as soon as possible.
  • Tue Jan 5 14:42:01 EST 2016 Disk removal detected. [Disk 6]
  • Tue Jan 5 08:42:20 EST 2016 RAID scrubbing finished on volume C.
  • Tue Jan 5 08:42:14 EST 2016 If the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead. If this disk is a part of a RAID 6 volume, your volume is still protected if this is your first failure. A 2nd disk failure will make your volume unprotected. If this disk is a part of a RAID 10 volume, a failure of this disk's mirror partner will render the volume dead. It is recommended that you replace the failed disk as soon as possible to maintain optimal protection of your volume.
  • Tue Jan 5 08:42:13 EST 2016 Disk failure detected.
  • Tue Jan 5 08:35:37 EST 2016 Detected increasing spin retry count[540701285] on disk 6 [WDC WD20EZRX-00D8PB0, WD-WCC4M1521546]. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.
  • Tue Jan 5 01:00:33 EST 2016 RAID scrubbing started on volume C.
Message 1 of 7

Accepted Solutions

Re: Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

I have destriped the raid to an image using two separate softwares and run five recovery software on the images. Two attempts have resulted in an almost complete file system, but every file llarger than a a few KB has random bits of other files inserted, periodically. In video, this looked like static or missing audio segments. In pictures it looks like partial pictures.

There was no rescuing it: basically the netgear box thrashed my file system during a routine raid scrub+fsck.

The rebuild and restore from backup went smoothly, albeit my array has been down for over two weeks.

Based on the thread linked above, this seems like it may have been a known defect a few versions back and either was not fixed or was reintroduced.

I guess my only option for some sort of closure on this, to do something proactive to minimize the chance of this happening again, is to go with another vendor and hope they make fewer defects in their products.

Thanks for your efforts trying to make this right.

View solution in original post

Message 6 of 7

All Replies
mdgm-ntgr
NETGEAR Employee Retired

Re: Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

Please send me your logs (see the Sending Logs link in my sig).

Message 2 of 7
mdgm-ntgr
NETGEAR Employee Retired

Re: Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

Well you could contact support and enquire about purchasing a data recovery contract. Note that data recovery attempts may be unsuccessful.

It appears that one of your disks may be failing.

Message 3 of 7

Re: Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

Thank you for looking at the logs.

 

"It appears that one of your disks may be failing."

 

Could you be more specific? which disk, according to which log entry?

 

All the disks seem to test ok with wd diagnostic tools on an external pc, so it would be good to know how you learned that one could be failing.

I did just replace one disk and the array was rebuilt. Are you talking about disk 6? or another disk?

 

Also, would a failing disk potentially cause this data corruption? Would pulling the failing disk be expected to fix it? From my limited knowledge of RAID, that would not make sense. Would it potentially fix the problem?

 

Message 4 of 7
mdgm-ntgr
NETGEAR Employee Retired

Re: Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

If you look in disk_smart.log one of your disks has some current pending sectors.

Message 5 of 7

Re: Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

I have destriped the raid to an image using two separate softwares and run five recovery software on the images. Two attempts have resulted in an almost complete file system, but every file llarger than a a few KB has random bits of other files inserted, periodically. In video, this looked like static or missing audio segments. In pictures it looks like partial pictures.

There was no rescuing it: basically the netgear box thrashed my file system during a routine raid scrub+fsck.

The rebuild and restore from backup went smoothly, albeit my array has been down for over two weeks.

Based on the thread linked above, this seems like it may have been a known defect a few versions back and either was not fixed or was reintroduced.

I guess my only option for some sort of closure on this, to do something proactive to minimize the chance of this happening again, is to go with another vendor and hope they make fewer defects in their products.

Thanks for your efforts trying to make this right.
Message 6 of 7
mdgm-ntgr
NETGEAR Employee Retired

Re: Suspect Disk scrubbing followed by fsck TRASHES filesystem. Now what?

The thread you linked to was a very old thread with several versions older firmware than what you are running, so would be about a different issue.

 

Before doing the offline volume scan you were advised to backup your data. Running a filesystem repair on any filesystem can be dangerous so it is advisable to update your regular backup first.

 

The online filesystem check works off a snapshot so would have no impact on your data.

 

Disk scrubbing makes sure that your disks are in sync with each other and running scrubbing once in a while (e.g. every 2-3 months) is advisable, but it does not replace the need for backups.

Whichever NAS you use from whichever brand you should not store important data on just the one device.

Message 7 of 7
Top Contributors
Discussion stats
  • 6 replies
  • 3395 views
  • 0 kudos
  • 2 in conversation
Announcements