NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
ale2
Jul 28, 2021Aspirant
Readynas 424 BTRFS error
Hi all,
a couple of days ago, when trying to readd a Private Time Machine share for a user after deleting it, I got an error that it couln't be written. After trying a few times, I restarted the NAS, and noticed in the Logs section the following message:
Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data.
I downloaded the logs and noticed the following in the kernel.log file:
Jul 25 01:16:27 nas2012 kernel: FAT-fs (sdd1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Jul 25 01:17:11 nas2012 kernel: BTRFS error (device dm-0): parent transid verify failed on 8379516911616 wanted 2728973 found 2735258
Jul 25 01:17:11 nas2012 kernel: BTRFS error (device dm-0): parent transid verify failed on 8379516911616 wanted 2728973 found 2735258
Jul 25 01:17:11 nas2012 kernel: BTRFS warning (device dm-0): Skipping commit of aborted transaction.
Jul 25 01:17:11 nas2012 kernel: BTRFS: error (device dm-0) in cleanup_transaction:1864: errno=-5 IO failure
Jul 25 01:17:11 nas2012 kernel: BTRFS info (device dm-0): forced readonly
Jul 25 01:17:11 nas2012 kernel: BTRFS info (device dm-0): delayed_refs has NO entry
Jul 25 02:46:52 nas2012 kernel: BTRFS error (device dm-0): Remounting read-write after error is not allowed
It seems the first line (FAT-fs) points to /dev/sdd1, which could be the thumbdrive with the encryption key. I didn't find anything on Netgear's documentation regarding the BTRFS error, and reading on the forums, the only "solution" for the "parent transid verify failed" error seems to be to format the NAS and start over. Is this so? Can the BTRFS filesystem get corrupted like this? I chose Readynas because of BTRFS's bitrot protection, but it seems more fragile than other filesystems if it can get corrupted like this and all is lost. Is there some other solution?
Thanks!
4 Replies
Replies have been turned off for this discussion
- rn_enthusiastVirtuoso
Hi ale2
I would like to take a peak at your logs, if you don't mind?
When you access the NAS Web Interface, go to: System > Logs and here you will see a button called "Download Logs" on the right-hand side. Click this and it will download a zip file with all the NAS logs inside.
Take this zip file and upload to your Dropbox, Google Drive or similar and then make a link where I can download the log zip file. PM me this link please. Then I will have a look to see what is going on.
Cheers
- rn_enthusiastVirtuoso
Hi ale2
Thanks for the logs
The issue started on the 25th at 01:11AM.
Jul 25 01:11:15 nas2012 kernel: BTRFS error (device dm-0): parent transid verify failed on 8379516911616 wanted 2728973 found 2735258 Jul 25 01:11:15 nas2012 kernel: BTRFS error (device dm-0): parent transid verify failed on 8379516911616 wanted 2728973 found 2735258 Jul 25 01:11:15 nas2012 kernel: BTRFS warning (device dm-0): Skipping commit of aborted transaction. Jul 25 01:11:15 nas2012 kernel: BTRFS: error (device dm-0) in cleanup_transaction:1864: errno=-5 IO failure Jul 25 01:11:15 nas2012 kernel: BTRFS info (device dm-0): forced readonly
Some I/O error happened somewhere and typically can be disk related. However, I cannot see your NAS having any complaints about your disks, at any time. I trawled the kernel logs and I see no clear explanation for why this would have happened.
But it is still obvious that the filesystem ran into some I/O issue which also caused checksum errors (hence the parent transid verify failed messages).You are using 8TB WDC WD80EFAX disks but since these are 8TB and I think they are still CMR drives. StephenB and Sandshark can confirm this as they know more about drive hardware than I do. If these are actually SMR drives, they should be replaced as that is not suitable for a NAS.
I also notice you have spindown enabled and we see your disks spinning up and down a lot. Often after just a few seconds. Below is a random day:
Jul 22 00:42:42 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:08. Jul 22 00:42:42 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 01:45:32 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 04:02:36 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:11. Jul 22 04:02:36 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08. Jul 22 04:02:36 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 04:47:57 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08. Jul 22 04:47:57 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 04:49:05 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:05. Jul 22 05:33:34 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08. Jul 22 05:33:34 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 06:18:51 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 06:48:17 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:05. Jul 22 07:04:25 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 07:49:43 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:05. Jul 22 08:49:15 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:11. Jul 22 08:49:15 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08. Jul 22 08:49:15 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 17:25:06 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:05. Jul 22 17:26:12 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:05. Jul 22 18:55:46 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:11. Jul 22 18:55:46 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08. Jul 22 18:55:46 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 21:17:00 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:08. Jul 22 21:17:00 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:05. Jul 22 22:03:13 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08. Jul 22 22:03:13 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05. Jul 22 23:00:14 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:08. Jul 22 23:00:14 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Looking at the drive statistics, the start/stop count is very high (due to constant spin up/down).
Device: sdb Controller: 0 Channel: 1 Model: WDC WD80EFAX-68KNBN0 Serial: VAH1WPHL Firmware: 81.00A81W Class: SATA RPM: 5400 Sectors: 15628053168 Pool: data PoolType: RAID 5 PoolState: 1 PoolHostId: a449f54 Health data ATA Error Count: 0 Reallocated Sectors: 0 Reallocation Events: 0 Spin Retry Count: 0 Current Pending Sector Count: 0 Uncorrectable Sector Count: 0 Temperature: 41 Start/Stop Count: 31533 Power-On Hours: 17133 Power Cycle Count: 217 Load Cycle Count: 31609
That does not look like a good way to keep drives healthy. This much start/stops on the drives cannot be good, IMO. I don't see disk issues or kernel complaining about disks or disk spindown happening when issue first started, but I would still recommend to disable Disk Spindown. I don't have a good explanation for the BTRFS failure but I suspect it is related to failed disk I/O somewhere.Since you can access your data, do a backup now. Make sure the backup is good and you can then either delete and re-create the volume --> then restore from backup or if you want to be adventurers you can try and let btrfsck do a repair on the volume (though this can make the problem even worse) - https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check
This should be run on an unmounted volume, so you first need to unmount the volume via the CLI (SSH to the NAS). However, backup the data first before doing anything else - to be safe.I understand that you choose to use a BTRFS filesystem due to bitrot protection and other advantages that it has. I have ReadyNAS myself, standing a corner for some backups but my main Ubuntu server uses a BTRFS raid and all my backup drives are BTRFS formatted too. I never had an issue with it. I would not call the filesystem fragile but I do want to point out that Netgear is using a 3.5 year old BTRFS version on the NAS (v4.16) which means it would ultimately not be as stable as a more modern version of the filesystem.
Cheers- StephenBGuru - Experienced User
rn_enthusiast wrote:
You are using 8TB WDC WD80EFAX disks but since these are 8TB and I think they are still CMR drives. StephenB and Sandshark can confirm this as they know more about drive hardware than I do. If these are actually SMR drives, they should be replaced as that is not suitable for a NAS.
The SMR NAS-purposed drives are:
- WD20EFAX
- WD30EFAX
- WD40EFAX
- WD60EFAX
The WD80EFAX is CMR (and I have one in my RN524, mixed with some WD80EFZX.
I haven't seen an up-to-date decoder for the WD model format, but the A in that position denotes the cache size (256 MB in the case of the WD80EFAX, 128 MB for the WD80EFZX). There isn't anything in the model format that tells you if the disk is SMR or not.
rn_enthusiast wrote:
That does not look like a good way to keep drives healthy. This much start/stops on the drives cannot be good, IMO.
I'd always assumed that frequent starts/stops would stress the drives, but there was a post here some years back that said that was not the case. The poster worked for a disk manufacturer, and his assertion was that the air bearings, etc used in modern disks wouldn't wear out prematurely due to the starts/stops. I then looked at studies on disk reliability and studies on managing power use in data centers - and none of them included any trade-off on disk life vs the power savings. I didn't find any data at all that suggested there was a trade-off to make.
Since I found no field data at all that confirms this commonly-held idea, I decided to not to worry about it.
There is a middle ground - you can increase the spindown interval, so it doesn't spin down quite as often, and/or you can schedule spindown so it only engages when the NAS isn't normally in use.
On my main NAS, spindown is enabled at night, and uses the default 5 minute spindown setting. I disable it during the day because I don't want to wait for spin up before I can access my files.
rn_enthusiast wrote:
Jul 25 01:11:15 nas2012 kernel: BTRFS: error (device dm-0) in cleanup_transaction:1864: errno=-5 IO failure
Some I/O error happened somewhere and typically can be disk related.
I just want to add that an IO write failure generally will result in file system corruption, no matter what file system you use. The only way that is avoided is when RAID parity can recover the lost data. And RAID parity only works if the system somehow knows which data block is errored.
At the end of the day, the best way to keep data safe is to back it up, so you have copies on multiple devices - ideally one off-site to give you disaster protection.
As far as my own experience here goes, I haven't found BTRFS to be more fragile than EXT, as long as you maintain enough free space. I find the other features of BTRFS (snapshots in particular) to be more compelling.
- ale2Aspirant
Thank you rn_enthusiast and StephenB for taking the time to check the logs and for your suggestions. I will try to backup what I can (I THINK I had the most important items already backup up), and will factory reset to start fresh.
Thanks!
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!