Readynas 424 BTRFS error

Question

Hi all,&nbsp;a couple of days ago, when trying to readd a Private Time Machine share for a user after deleting it, I got an error that it couln't be written. After trying a few times, I restarted the NAS, and noticed in the Logs section the following message:&nbsp;Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data.&nbsp;I downloaded the logs and noticed the following in the kernel.log file:&nbsp;Jul 25 01:16:27 nas2012 kernel: FAT-fs (sdd1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.Jul 25 01:17:11 nas2012 kernel: BTRFS error (device dm-0): parent transid verify failed on 8379516911616 wanted 2728973 found 2735258Jul 25 01:17:11 nas2012 kernel: BTRFS error (device dm-0): parent transid verify failed on 8379516911616 wanted 2728973 found 2735258Jul 25 01:17:11 nas2012 kernel: BTRFS warning (device dm-0): Skipping commit of aborted transaction.Jul 25 01:17:11 nas2012 kernel: BTRFS: error (device dm-0) in cleanup_transaction:1864: errno=-5 IO failureJul 25 01:17:11 nas2012 kernel: BTRFS info (device dm-0): forced readonlyJul 25 01:17:11 nas2012 kernel: BTRFS info (device dm-0): delayed_refs has NO entryJul 25 02:46:52 nas2012 kernel: BTRFS error (device dm-0): Remounting read-write after error is not allowed&nbsp;It seems the first line (FAT-fs) points to /dev/sdd1, which could be the thumbdrive with the encryption key. I didn't find anything on Netgear's documentation regarding the BTRFS error, and reading on the forums, the only "solution" for the "parent transid verify failed" error seems to be to format the NAS and start over. Is this so? Can the BTRFS filesystem get corrupted like this? I chose Readynas because of BTRFS's bitrot protection, but it seems more fragile than other filesystems if it can get corrupted like this and all is lost. Is there some other solution?&nbsp;Thanks!

StephenB · Answer

rn_enthusiast&nbsp;wrote:
&nbsp;
You are using 8TB WDC WD80EFAX disks but since these are 8TB and I think they are still CMR drives.&nbsp;StephenB&nbsp; and&nbsp;Sandshark&nbsp; can confirm this as they know more about drive hardware than I do. If these are&nbsp;actually SMR drives, they should be replaced as that is not suitable for a NAS.

The SMR NAS-purposed drives are:

WD20EFAX
WD30EFAX
WD40EFAX
WD60EFAX&nbsp;

The WD80EFAX is CMR (and I have one in my RN524, mixed with some WD80EFZX.
&nbsp;
I haven't seen an up-to-date decoder for the WD model format, but the A&nbsp;in that position denotes the cache size (256 MB in the case of the WD80EFAX, 128 MB for the WD80EFZX). There isn't anything in the model format that tells you if the disk is SMR or not.
&nbsp;
rn_enthusiast&nbsp;wrote:
That does not look like a good way to keep drives healthy. This much start/stops on the drives cannot be good, IMO.

I'd always assumed that frequent starts/stops would stress the drives, but there was a post here some years back that said that was not the case.&nbsp; The poster worked for a disk manufacturer, and his assertion was that the air bearings, etc used in modern disks wouldn't wear out prematurely due to the starts/stops.&nbsp; I then looked at studies on disk reliability and studies on managing power use in data centers - and none of them included any trade-off on disk life vs the power savings. I didn't find any data at all that suggested there was a trade-off to make.
&nbsp;
Since I found no field data at all that confirms this commonly-held idea, I decided to not to worry about it.
&nbsp;
There is a middle ground - you can increase the spindown interval, so it doesn't spin down quite as often, and/or you can schedule spindown so it only engages when the NAS isn't normally in use.&nbsp;
&nbsp;
On my main NAS, spindown is enabled at night, and uses the default 5 minute spindown setting.&nbsp; I disable it during the day because I don't want to wait for spin up before I can access my files.&nbsp;
&nbsp;
rn_enthusiast&nbsp;wrote:
&nbsp;
Jul 25 01:11:15 nas2012 kernel: BTRFS: error (device dm-0) in cleanup_transaction:1864: errno=-5 IO failure
&nbsp;
Some I/O error happened somewhere and typically can be disk related.&nbsp;

&nbsp;
I just want to add that an IO write failure generally will result in file system corruption, no matter what file system you use.&nbsp; The only way that is avoided is when RAID parity can recover the lost data.&nbsp; And RAID parity only works if the system somehow knows which data block is errored.
&nbsp;
At the end of the day, the best way to keep data safe is to back it up, so you have copies on multiple devices - ideally one off-site to give you disaster protection.&nbsp;
&nbsp;
As far as my own experience here goes, I haven't found BTRFS to be more fragile than EXT, as long as you maintain enough free space.&nbsp; I find the other features of BTRFS (snapshots in particular) to be more compelling.

rn_enthusiast · Answer

Hi ale2&nbsp;&nbsp;Thanks for the logs&nbsp;The issue started on the 25th at 01:11AM.&nbsp;Jul 25 01:11:15 nas2012 kernel: BTRFS error (device dm-0): parent transid verify failed on 8379516911616 wanted 2728973 found 2735258
Jul 25 01:11:15 nas2012 kernel: BTRFS error (device dm-0): parent transid verify failed on 8379516911616 wanted 2728973 found 2735258
Jul 25 01:11:15 nas2012 kernel: BTRFS warning (device dm-0): Skipping commit of aborted transaction.
Jul 25 01:11:15 nas2012 kernel: BTRFS: error (device dm-0) in cleanup_transaction:1864: errno=-5 IO failure
Jul 25 01:11:15 nas2012 kernel: BTRFS info (device dm-0): forced readonly&nbsp;&nbsp;Some I/O error happened somewhere and typically can be disk related. However, I cannot see your NAS having any complaints about your disks, at any time. I trawled the kernel logs and I see no clear explanation for why this would have happened.But it is still obvious that the filesystem ran into some I/O issue which also caused checksum errors (hence the parent transid verify failed messages).&nbsp;You are using 8TB WDC WD80EFAX disks but since these are 8TB and I think they are still CMR drives.&nbsp;StephenB&nbsp; and&nbsp;Sandshark&nbsp; can confirm this as they know more about drive hardware than I do. If these are&nbsp;actually SMR drives, they should be replaced as that is not suitable for a NAS.&nbsp;I also notice you have spindown enabled and we see your disks spinning up and down a lot. Often after just a few seconds. Below is a random day:&nbsp;Jul 22 00:42:42 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:08.
Jul 22 00:42:42 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 01:45:32 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 04:02:36 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:11.
Jul 22 04:02:36 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08.
Jul 22 04:02:36 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 04:47:57 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08.
Jul 22 04:47:57 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 04:49:05 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:05.
Jul 22 05:33:34 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08.
Jul 22 05:33:34 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 06:18:51 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 06:48:17 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:05.
Jul 22 07:04:25 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 07:49:43 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:05.
Jul 22 08:49:15 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:11.
Jul 22 08:49:15 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08.
Jul 22 08:49:15 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 17:25:06 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:05.
Jul 22 17:26:12 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:05.
Jul 22 18:55:46 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:11.
Jul 22 18:55:46 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08.
Jul 22 18:55:46 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 21:17:00 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:08.
Jul 22 21:17:00 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:05.
Jul 22 22:03:13 nas2012 noflushd[3402]: Spinning up disk 2 (/dev/sdb) after 0:00:08.
Jul 22 22:03:13 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.
Jul 22 23:00:14 nas2012 noflushd[3402]: Spinning up disk 3 (/dev/sda) after 0:00:08.
Jul 22 23:00:14 nas2012 noflushd[3402]: Spinning up disk 1 (/dev/sdc) after 0:00:05.&nbsp;&nbsp;Looking at the drive statistics, the start/stop count is very high (due to constant spin up/down).&nbsp;Device:             sdb
Controller:         0
Channel:            1
Model:              WDC WD80EFAX-68KNBN0
Serial:             VAH1WPHL
Firmware:           81.00A81W
Class:              SATA
RPM:                5400
Sectors:            15628053168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         a449f54
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    41
  Start/Stop Count:               31533
  Power-On Hours:                 17133
  Power Cycle Count:              217
  Load Cycle Count:               31609That does not look like a good way to keep drives healthy. This much start/stops on the drives cannot be good, IMO. I don't see disk issues or kernel complaining about disks or disk spindown happening when issue first started, but I would still recommend to disable Disk Spindown. I don't have a good explanation for the BTRFS failure but I suspect it is related to failed disk I/O somewhere.&nbsp;&nbsp;Since you can access your data, do a backup now. Make sure the backup is good and you can then either delete and re-create the volume --&gt; then restore from backup or if you want to be adventurers you can try and let btrfsck do a repair on the volume (though this can make the problem even worse) - https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-check&nbsp;This should be run on an unmounted volume, so you first need to unmount the volume via the CLI (SSH to the NAS). However, backup the data first before doing anything else - to be safe.&nbsp;I understand that you choose to use a BTRFS filesystem due to bitrot protection and other advantages that it has. I have ReadyNAS myself, standing a corner for some backups but my main Ubuntu server uses a BTRFS raid and all my backup drives are BTRFS formatted too. I never had an issue with it. I would not call the filesystem fragile but I do want to point out that Netgear is using a 3.5 year old BTRFS version on the NAS (v4.16) which means it would ultimately not be as stable as a more modern version of the filesystem.Cheers&nbsp;

rn_enthusiast · Answer

Hi ale2&nbsp;&nbsp;I would like to take a peak at your logs, if you don't mind?&nbsp;When you access the NAS Web Interface, go to: System &gt; Logs and here you will see a button called "Download Logs" on the right-hand side. Click this and it will download a zip file with all the NAS logs inside.&nbsp;Take this zip file and upload to your Dropbox, Google Drive or similar and then make a link where I can download the log zip file. PM me this link please. Then I will have a look to see what is going on.&nbsp;Cheers

ale2 · Answer

Thank you rn_enthusiast and StephenB&nbsp;for taking the time to check the logs and for your suggestions. I will try to backup what I can (I THINK I had the most important items already backup up), and will factory reset to start fresh.Thanks!

Forum Discussion

Readynas 424 BTRFS error

4 Replies

Related Content

Readynas rn10400 btrfs error

BTRFS error ReadyNAS 314, OS6.10.8

ReadyNAS 314 BTRFS errors

ReadyNas 214 - BTRFS error during scrub - Stuck in readonly

ReadyNas btrfs subvol

NETGEAR Academy

ProSupport for Business