RN314 - Every morning: volume error read only.

ThiloS · ‎2019-11-18

Hi there,

every morning while my backups are running the system says:

on volume data an error appears - and puts it into read-only mode.

I checked all HDDs and they show no ATA error.

I did a defragmentation, cleaning, and balancing. but still the message appears in the morning.

(If i restart the ReadyNas everything is fine and it works like there isn't a problem.)

I did a backup on an external drive (but i really would like to keep the 5- years of snapshots which are stored on the volume...)

What can i do ?

Any suggestions?

OS: 6.10.2

StephenB · ‎2019-11-19

What firmware are you running?
How full is the data volume?
What is the error?

Try downloading the log zip file, and look in system.log and kernel.log for btrfs and disk errors.

FWIW, usually the read-only status is a sign that the file system is corrupted.

ThiloS · ‎2019-11-19

Hi Stephen,

firmware is 6.10.2

the data volume is about 9 TB full and 6.8 TB free.

(4 disks - 3 times 6 TB one time 10 TB)

System.log doesn't show any disk or brtfs errors.

Kernel.log does!

Here an example (but the error are going on and on after that:)

Nov 19 07:00:47 NAS kernel: md127: detected capacity change from 0 to 17988620648448
Nov 19 07:00:47 NAS kernel: Adding 1046524k swap on /dev/md1. Priority:-1 extents:1 across:1046524k
Nov 19 07:00:47 NAS kernel: BTRFS: device label 7c6e3e74:data devid 1 transid 221653 /dev/md127
Nov 19 07:01:59 NAS kernel: e1000e: eth0 NIC Link is Down
Nov 19 07:01:59 NAS kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Nov 19 07:01:59 NAS kernel: 8021q: adding VLAN 0 to HW filter on device eth0
Nov 19 07:01:59 NAS kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Nov 19 07:01:59 NAS kernel: NFSD: starting 90-second grace period (net ffffffff88d74240)
Nov 19 07:02:01 NAS kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Nov 19 07:02:01 NAS kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Nov 19 07:03:03 NAS kernel: nfsd: last server has exited, flushing export cache
Nov 19 07:03:04 NAS kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
Nov 19 07:03:04 NAS kernel: NFSD: starting 90-second grace period (net ffffffff88d74240)
Nov 19 07:29:49 NAS kernel: ------------[ cut here ]------------
Nov 19 07:29:49 NAS kernel: WARNING: CPU: 3 PID: 2743 at fs/btrfs/extent-tree.c:7004 __btrfs_free_extent+0xaf4/0xb31()
Nov 19 07:29:49 NAS kernel: BTRFS: Transaction aborted (error -28)
Nov 19 07:29:49 NAS kernel: Modules linked in: vpd(PO)
Nov 19 07:29:49 NAS kernel: CPU: 3 PID: 2743 Comm: btrfs-transacti Tainted: P O 4.4.184.x86_64.1 #1
Nov 19 07:29:49 NAS kernel: Hardware name: NETGEAR ReadyNAS 314 /ReadyNAS 314 , BIOS 4.6.5 04/03/2013
Nov 19 07:29:49 NAS kernel: 0000000000000000 ffff88005b56fa90 ffffffff88363e61 ffff88005b56fad8
Nov 19 07:29:49 NAS kernel: 0000000000000009 ffff88005b56fac8 ffffffff880dd14a ffffffff88282c3d
Nov 19 07:29:49 NAS kernel: 00000000ffffffe4 ffff880023c66930 ffff88007c0e41e0 0000000000000000
Nov 19 07:29:49 NAS kernel: Call Trace:
Nov 19 07:29:49 NAS kernel: [<ffffffff88363e61>] dump_stack+0x4d/0x63
Nov 19 07:29:49 NAS kernel: [<ffffffff880dd14a>] warn_slowpath_common+0x8f/0xa8
Nov 19 07:29:49 NAS kernel: [<ffffffff88282c3d>] ? __btrfs_free_extent+0xaf4/0xb31
Nov 19 07:29:49 NAS kernel: [<ffffffff88063f94>] warn_slowpath_fmt+0x47/0x49
Nov 19 07:29:49 NAS kernel: [<ffffffff8828222f>] ? __btrfs_free_extent+0xe6/0xb31
Nov 19 07:29:49 NAS kernel: [<ffffffff88282c3d>] __btrfs_free_extent+0xaf4/0xb31
Nov 19 07:29:49 NAS kernel: [<ffffffff882dff48>] ? btrfs_merge_delayed_refs+0x60/0x43c
Nov 19 07:29:49 NAS kernel: [<ffffffff88286769>] __btrfs_run_delayed_refs+0xa78/0xcbf
Nov 19 07:29:49 NAS kernel: [<ffffffff88288537>] btrfs_run_delayed_refs+0x66/0x24f
Nov 19 07:29:49 NAS kernel: [<ffffffff88289144>] btrfs_write_dirty_block_groups+0x124/0x316
Nov 19 07:29:49 NAS kernel: [<ffffffff88288705>] ? btrfs_run_delayed_refs+0x234/0x24f
Nov 19 07:29:49 NAS kernel: [<ffffffff88298eb8>] commit_cowonly_roots+0x208/0x2ac
Nov 19 07:29:49 NAS kernel: [<ffffffff8829b36f>] btrfs_commit_transaction+0x4b5/0x9a2
Nov 19 07:29:49 NAS kernel: [<ffffffff8829bb1b>] ? start_transaction+0x2bf/0x3bc
Nov 19 07:29:49 NAS kernel: [<ffffffff882966b3>] ? btrfs_cleanup_transaction+0x458/0x458
Nov 19 07:29:49 NAS kernel: [<ffffffff88296784>] transaction_kthread+0xd1/0x1b3
Nov 19 07:29:49 NAS kernel: [<ffffffff88079cb2>] kthread+0xdc/0xe4
Nov 19 07:29:49 NAS kernel: [<ffffffff88079bd6>] ? kthread_worker_fn+0x129/0x129
Nov 19 07:29:49 NAS kernel: [<ffffffff888e6c6f>] ret_from_fork+0x3f/0x80
Nov 19 07:29:49 NAS kernel: [<ffffffff88079bd6>] ? kthread_worker_fn+0x129/0x129
Nov 19 07:29:49 NAS kernel: ---[ end trace 79a8d4758041e9eb ]---
Nov 19 07:29:49 NAS kernel: BTRFS: error (device md127) in __btrfs_free_extent:7004: errno=-28 No space left
Nov 19 07:29:49 NAS kernel: BTRFS info (device md127): forced readonly
Nov 19 07:29:49 NAS kernel: BTRFS: error (device md127) in btrfs_run_delayed_refs:2995: errno=-28 No space left
Nov 19 07:29:49 NAS kernel: BTRFS warning (device md127): Skipping commit of aborted transaction.
Nov 19 07:29:50 NAS kernel: BTRFS: error (device md127) in cleanup_transaction:1864: errno=-28 No space left
Nov 19 07:30:35 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458747142144 wanted 221668 found 221661
Nov 19 07:30:35 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458747142144 wanted 221668 found 221661
Nov 19 07:30:35 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458747142144 wanted 221668 found 221661
Nov 19 07:30:35 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458747142144 wanted 221668 found 221661
Nov 19 07:30:36 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458579861504 wanted 221668 found 221661
Nov 19 07:30:36 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458579861504 wanted 221668 found 221661
Nov 19 07:30:36 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458579861504 wanted 221668 found 221661
Nov 19 07:30:36 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458579861504 wanted 221668 found 221661
Nov 19 07:30:38 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458579861504 wanted 221668 found 221661
Nov 19 07:30:38 NAS kernel: BTRFS error (device md127): parent transid verify failed on 10458579861504 wanted 221668 found 221661
Nov 19 07:30:41 NAS kernel: verify_parent_transid: 10 callbacks suppressed

StephenB · ‎2019-11-19

Ok. Can you figure out if one backup job is triggering this problem? If the damage is limited to one share, then you could potentially re-create that share.

ThiloS · ‎2019-11-19

If i reboot the nas i can easily finish every backup-job without a problem... : - (

So i can't see a problem with a specific backup-job

StephenB · ‎2019-11-19

@ThiloS wrote:

If i reboot the nas i can easily finish every backup-job without a problem... : - (

So i can't see a problem with a specific backup-job

If you can't figure out which subvolume is damaged, then you likely will need to do a full reset and restore everything from backup.

If you have ssh enabled, you could try doing a btrfs send for every subvolume to /dev/null, and look for errors after each send.

Are these backups writing to the NAS, or are they reading from the NAS?

ThiloS · ‎2019-11-20

Another thought:

My NAS starts at 7:00 in the morning.

The first Backup-Jobs are set between 7:00 and 8:00.

The second "wave" from 8:00 til 9:00.

[All backup-jobs are Rsync via SSH from my webservers to the NAS]

Since all backup jobs running well in the middle of the day if i start them manually - i tried now to set them from 8:00 to 9:00

and from 9:00 til 10:00.

(The idea is - that the anti-virus is scanning on the system start-up and at the same time the first backup-jobs are running. Maybe those two services don't like each other ?)

At least the result is for today: No errors. Backup-jobs are running well. The volume is in normal state.

What do you think? Is it possible that this is the error?

Should i deactivate AntiVirus at all ?

StephenB · ‎2019-11-20

@ThiloS wrote:

Another thought:

My NAS starts at 7:00 in the morning.

The first Backup-Jobs are set between 7:00 and 8:00. ...

Since all backup jobs running well in the middle of the day if i start them manually - i tried now to set them from 8:00 to 9:00

and from 9:00 til 10:00.

At least the result is for today: No errors. Backup-jobs are running well. The volume is in normal state.

That is a bit odd. I'd leave it set up this way for now, and see if the problem comes back. FWIW, the problem could have been initiated by an error when the NAS started up, and not necessarily the backup jobs.

If the problem doesn't recur over the next couple of days, then I'd suggest doing a scrub from the volume settings wheel - followed by a balance. You'd need to leave the NAS on 24x7 until those complete.

@ThiloS wrote:

(The idea is - that the anti-virus is scanning on the system start-up and at the same time the first backup-jobs are running. Maybe those two services don't like each other ?)
...

Should i deactivate AntiVirus at all ?

I don't think you have enough information to isolate this to AntiVirus, and there are other things that might be happening at startup (snapshot pruning for example). Plus AntiVirus should also be scanning the files that are being updated by the backup jobs - no matter when they run.

Normally the NAS will go into read-only mode when it detects a problem with the file system itself - and you are seeing those errors in your log zip file. No matter what the cause, it's important to sort out if there is corruption somewhere in the file system on-disk structures, and also to check disk health. The scrub and the balance I suggested above will help with that (and if you've never run them, it would be good to run them anyway). If the volume is healthy, then we can take a closer look at the cause.

BTW, if the backups are initiated by the web server, then you might want to create snapshots when the NAS powers up - that would ensure that your snapshots are coherent. Since the snapshots can only be created on hourly boundaries, that would require you to start up the NAS an hour before the first backup (and also schedule the the first backup wave a bit after the hour). I wouldn't rely on the NAS making a 7 am snapshot if it is booting up at 7 am. FWIW, The NAS backup jobs start at 5 minutes past the hour.

As an aside, I suggest custom snapshots, and not smart ones. Custom snapshots allow you to set retention, which makes the snapshot space easier to manage.

ThiloS · ‎2019-11-20

ok - thank you !!!

I'll have a look at it. check for errors in the next days and try scrubbing and balancing.

i make custom snapshots once a week. (but this since 5 years now...)

If things turn out to be bad i'll be back again : -)

ThiloS · ‎2019-11-25

hmmm - today i got the read-only warning again... ok - so there is a problem:

Kernel log says:

Nov 25 09:33:52 NAS kernel: Call Trace:
Nov 25 09:33:52 NAS kernel: [<ffffffff88363e61>] dump_stack+0x4d/0x63
Nov 25 09:33:52 NAS kernel: [<ffffffff880dd14a>] warn_slowpath_common+0x8f/0xa8
Nov 25 09:33:52 NAS kernel: [<ffffffff88282c3d>] ? __btrfs_free_extent+0xaf4/0xb31
Nov 25 09:33:52 NAS kernel: [<ffffffff88063f94>] warn_slowpath_fmt+0x47/0x49
Nov 25 09:33:52 NAS kernel: [<ffffffff8828222f>] ? __btrfs_free_extent+0xe6/0xb31
Nov 25 09:33:52 NAS kernel: [<ffffffff88282c3d>] __btrfs_free_extent+0xaf4/0xb31
Nov 25 09:33:52 NAS kernel: [<ffffffff882dff48>] ? btrfs_merge_delayed_refs+0x60/0x43c
Nov 25 09:33:52 NAS kernel: [<ffffffff88286769>] __btrfs_run_delayed_refs+0xa78/0xcbf
Nov 25 09:33:52 NAS kernel: [<ffffffff880e5eda>] ? set_page_dirty+0x3c/0x5b
Nov 25 09:33:52 NAS kernel: [<ffffffff88288537>] btrfs_run_delayed_refs+0x66/0x24f
Nov 25 09:33:52 NAS kernel: [<ffffffff8828bea7>] btrfs_start_dirty_block_groups+0x32c/0x3eb
Nov 25 09:33:52 NAS kernel: [<ffffffff8829af9f>] btrfs_commit_transaction+0xe5/0x9a2
Nov 25 09:33:52 NAS kernel: [<ffffffff8829bb1b>] ? start_transaction+0x2bf/0x3bc
Nov 25 09:33:52 NAS kernel: [<ffffffff882966b3>] ? btrfs_cleanup_transaction+0x458/0x458
Nov 25 09:33:52 NAS kernel: [<ffffffff88296784>] transaction_kthread+0xd1/0x1b3
Nov 25 09:33:52 NAS kernel: [<ffffffff88079cb2>] kthread+0xdc/0xe4
Nov 25 09:33:52 NAS kernel: [<ffffffff88079bd6>] ? kthread_worker_fn+0x129/0x129
Nov 25 09:33:52 NAS kernel: [<ffffffff888e6c6f>] ret_from_fork+0x3f/0x80
Nov 25 09:33:52 NAS kernel: [<ffffffff88079bd6>] ? kthread_worker_fn+0x129/0x129
Nov 25 09:33:52 NAS kernel: ---[ end trace 822d266df2293e05 ]---
Nov 25 09:33:52 NAS kernel: BTRFS: error (device md127) in __btrfs_free_extent:7004: errno=-28 No space left
Nov 25 09:33:52 NAS kernel: BTRFS info (device md127): forced readonly
Nov 25 09:33:54 NAS kernel: BTRFS: error (device md127) in btrfs_run_delayed_refs:2995: errno=-28 No space left

System log says:

Nov 25 09:01:14 NAS clamd[4699]: SelfCheck: Database status OK.
Nov 25 09:01:37 NAS connmand[2782]: ntp: adjust (slew): +0.011105 sec
Nov 25 09:03:07 NAS dbus[2765]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
Nov 25 09:03:07 NAS dbus[2765]: [system] Successfully activated service 'org.opensuse.Snapper'
Nov 25 09:03:07 NAS snapperd[6114]: THROW: invalid config
Nov 25 09:03:07 NAS snapperd[6114]: config '3' is invalid
Nov 25 09:03:07 NAS snapperd[6114]: THROW: invalid config
Nov 25 09:03:07 NAS snapperd[6114]: config '4' is invalid
Nov 25 09:05:01 NAS CRON[6371]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 25 09:05:01 NAS CRON[6370]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 25 09:05:01 NAS CRON[6372]: (root) CMD (/frontview/bin/fvbackup 108 &> /dev/null)
Nov 25 09:05:01 NAS CRON[6373]: (root) CMD (/frontview/bin/fvbackup 107 &> /dev/null)
Nov 25 09:05:01 NAS CRON[6371]: pam_unix(cron:session): session closed for user root
Nov 25 09:05:01 NAS CRON[6370]: pam_unix(cron:session): session closed for user root
Nov 25 09:09:01 NAS CRON[6411]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 25 09:09:01 NAS CRON[6412]: (root) CMD (  [ -x /usr/lib/php5/sessionclean ] && /usr/lib/php5/sessionclean)
Nov 25 09:09:06 NAS CRON[6411]: pam_unix(cron:session): session closed for user root
Nov 25 09:11:14 NAS clamd[4699]: SelfCheck: Database status OK.
Nov 25 09:17:01 NAS CRON[6526]: pam_unix(cron:session): session opened for user root by (uid=0)
Nov 25 09:17:01 NAS CRON[6527]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Nov 25 09:17:01 NAS CRON[6526]: pam_unix(cron:session): session closed for user root
Nov 25 09:18:41 NAS connmand[2782]: ntp: adjust (slew): -0.014350 sec
Nov 25 09:20:29 NAS smbd[6564]: [2019/11/25 09:20:29.251041,  0] ../lib/param/loadparm.c:1016(lpcfg_service_ok)
Nov 25 09:20:29 NAS smbd[6564]:   WARNING: No path in service RNXconfig - making it unavailable!
Nov 25 09:20:30 NAS smbd[6564]: [2019/11/25 09:20:30.937883,  0] ../lib/param/loadparm.c:1016(lpcfg_service_ok)
Nov 25 09:20:30 NAS smbd[6564]:   WARNING: No path in service RNXconfig - making it unavailable!
Nov 25 09:21:15 NAS clamd[4699]: SelfCheck: Database status OK.
Nov 25 09:31:15 NAS clamd[4699]: SelfCheck: Database status OK.
Nov 25 09:34:27 NAS msmtpq[6739]: mail for [ -C /etc/msmtprc info@inside-layout.de --timeout=60 ] : send was successful
Nov 25 09:35:18 NAS apache_access[3652]: 000.000.000.00 "GET /admin/ HTTP/1.1" 401
Nov 25 09:35:21 NAS apache_access[3652]: Admin session started from 192.168.178.44
Nov 25 09:35:24 NAS apache_access[3652]: 000.000.000.00 "POST /dbbroker HTTP/1.1" 401
Nov 25 09:35:30 NAS msmtpq[6849]: mail for [ -C /etc/msmtprc here-stays@mymail.de --timeout=60 ] : send was successful
Nov 25 09:35:45 NAS connmand[2782]: ntp: adjust (slew): +0.002666 sec
Nov 25 09:36:15 NAS apache_access[3653]: ::1 "OPTIONS * HTTP/1.0" 200
Nov 25 09:36:45 NAS dbus[2765]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
Nov 25 09:36:45 NAS dbus[2765]: [system] Successfully activated service 'org.opensuse.Snapper'
Nov 25 09:36:45 NAS snapperd[7122]: THROW: invalid config
Nov 25 09:36:45 NAS snapperd[7122]: config '3' is invalid
Nov 25 09:36:45 NAS snapperd[7122]: THROW: invalid config
Nov 25 09:36:45 NAS snapperd[7122]: config '4' is invalid

(Ip Adress, mail and Nas-Name are changed for web-reasons 🙂 - greetings from a paranoid user :- )

So - that means i have to set up the whole thing from the scratch?

Or is there a nother possibility.

StephenB · ‎2019-11-25

@ThiloS wrote:

So - that means i have to set up the whole thing from the scratch?

That's what I would do.

Though you could try running a balance first.

Sandshark · ‎2019-11-25

I had a volume go read-only due to a eSATA connection to an EDA500 that became loose. Even though I absolutely knew the root cause was fixed, I could never make the file system reliable again. In my case, since it was an EDA500, I fixed it with a volume destroy and re-creation. If you only have the one volume, a factory default seems to be the only sure way to make it go away forever.

RN314 - Every morning: volume error read only.

RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.

Re: RN314 - Every morning: volume error read only.