- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Re: RN214 file system read-only (again)
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
RN214 file system read-only (again)
It seems I'm extremely unlucky with this device because this is not the first time a file system error appears (cf. https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/RN204-repeated-file-system-error/m-... )
RN214 (firmware v6.9.5 Hotfix1) with 4x8TB WD Purple HDDs, used for backing up some servers in an AD environment. About half of the volume is filled. Balance, defrag, scrub scheduled weekly.
Then the file system went read-only. The web interface shows this:
Jun 17, 2019 04:00:01 AM Volume: Scrub started for volume data. Jun 17, 2019 03:00:01 AM Volume: Defragmentation complete for volume data. Jun 17, 2019 03:00:01 AM Volume: Defragmentation started for volume data. Jun 17, 2019 02:00:01 AM Volume: Balance complete for volume data. Jun 17, 2019 02:00:01 AM Volume: Balance started for volume data. Jun 16, 2019 10:13:58 PM Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data. Jun 11, 2019 10:22:24 AM Volume: Scrub completed for volume data'. Jun 10, 2019 04:00:01 AM Volume: Scrub started for volume data. Jun 10, 2019 03:53:36 AM Volume: Defragmentation complete for volume data. Jun 10, 2019 03:00:01 AM Volume: Defragmentation started for volume data.
In systemd-journal.log I see a lot of entries like this:
Jun 16 22:12:20 hubu004 kernel: ------------[ cut here ]------------ Jun 16 22:12:20 hubu004 kernel: WARNING: CPU: 3 PID: 6743 at fs/btrfs/disk-io.c:541 btree_csum_one_bio+0x94/0xd8() Jun 16 22:12:20 hubu004 kernel: Modules linked in: vpd(PO) Jun 16 22:12:20 hubu004 kernel: CPU: 3 PID: 6743 Comm: kworker/u8:2 Tainted: P W O 4.4.157.alpine.1 #1 Jun 16 22:12:20 hubu004 kernel: Hardware name: Annapurna Labs Alpine Jun 16 22:12:20 hubu004 kernel: Workqueue: btrfs-worker btrfs_worker_helper Jun 16 22:12:20 hubu004 kernel: [<c0014690>] (unwind_backtrace) from [<c0011590>] (show_stack+0x10/0x14) Jun 16 22:12:20 hubu004 kernel: [<c0011590>] (show_stack) from [<c035d294>] (dump_stack+0x7c/0x9c) Jun 16 22:12:20 hubu004 kernel: [<c035d294>] (dump_stack) from [<c0090d98>] (warn_slowpath_common+0x80/0xac) Jun 16 22:12:20 hubu004 kernel: [<c0090d98>] (warn_slowpath_common) from [<c001e700>] (warn_slowpath_null+0x18/0x20) Jun 16 22:12:20 hubu004 kernel: [<c001e700>] (warn_slowpath_null) from [<c02787e0>] (btree_csum_one_bio+0x94/0xd8) Jun 16 22:12:20 hubu004 kernel: [<c02787e0>] (btree_csum_one_bio) from [<c0277868>] (run_one_async_start+0x34/0x44) Jun 16 22:12:20 hubu004 kernel: [<c0277868>] (run_one_async_start) from [<c02b55ec>] (btrfs_worker_helper+0xec/0x1ac) Jun 16 22:12:20 hubu004 kernel: [<c02b55ec>] (btrfs_worker_helper) from [<c0031c44>] (process_one_work+0x1d4/0x30c) Jun 16 22:12:20 hubu004 kernel: [<c0031c44>] (process_one_work) from [<c0032b30>] (worker_thread+0x2cc/0x440) Jun 16 22:12:20 hubu004 kernel: [<c0032b30>] (worker_thread) from [<c0036c44>] (kthread+0xf4/0x104) Jun 16 22:12:20 hubu004 kernel: [<c0036c44>] (kthread) from [<c000e920>] (ret_from_fork+0x14/0x34) Jun 16 22:12:20 hubu004 kernel: ---[ end trace f005209bdac8c6a3 ]---
Then this:
Jun 16 22:12:37 hubu004 kernel: BTRFS: error (device md127) in btrfs_commit_transaction:2241: errno=-5 IO failure (Error while writing out transaction) Jun 16 22:12:37 hubu004 kernel: BTRFS info (device md127): forced readonly Jun 16 22:12:37 hubu004 kernel: BTRFS warning (device md127): Skipping commit of aborted transaction. Jun 16 22:12:37 hubu004 kernel: BTRFS: error (device md127) in cleanup_transaction:1864: errno=-5 IO failure Jun 16 22:12:37 hubu004 kernel: BTRFS info (device md127): delayed_refs has NO entry
Finally hundreds of lines of this:
Jun 17 08:52:36 hubu004 kernel: BTRFS critical (device md127): unable to find logical 764401909760 len 4096
This means a file system crash, right? Any options other than to reset the device and rebuild the volume losing all data? I do have secondary backup, but to be honest, I am fed up with a file system crash every few months!
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: RN214 file system read-only (again)
Did you replace the disk that generated the errors the last time?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: RN214 file system read-only (again)
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: RN214 file system read-only (again)
@edkedk wrote:
Some weeks ago I replaced a disk that started to develop bad sectors.
Ok. And that resynced ok?
Perhaps enable ssh, and enter
# smartctl -x /dev/sda # smartctl -x /dev/sdb # smartctl -x /dev/sdc # smartctl -x /dev/sdd
and look for saved errors for the drives.
For example, something like this:
Error 12 [11] occurred at disk power-on lifetime: 36166 hours (1506 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 0c 27 df 40 40 00 Error: UNC at LBA = 0x0c27df40 = 203939648 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 80 00 c8 00 00 0c 27 df 40 40 08 1d+08:15:46.204 READ FPDMA QUEUED 60 00 08 00 c0 00 01 06 34 36 98 40 08 1d+08:15:46.163 READ FPDMA QUEUED 60 00 80 00 b8 00 00 0c 27 e4 40 40 08 1d+08:15:46.146 READ FPDMA QUEUED 60 00 80 00 b0 00 00 0c 27 e9 40 40 08 1d+08:15:46.123 READ FPDMA QUEUED 60 00 80 00 a8 00 00 0c 27 ee 40 40 08 1d+08:15:46.094 READ FPDMA QUEUED
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: RN214 file system read-only (again)
Yes, resync completed without any error.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: RN214 file system read-only (again)
Actually, the error I mentioned in the older thread I linked in the first post happened on the other RN214 device we have in the network...
The disk was replaced in the one this thread is about.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: RN214 file system read-only (again)
I have run the smartctl commands. On two disk there are 1 and 3 reallocated sectors. On sdd there is this error log:
Error 1 [0] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 84 -- 41 00 00 00 00 00 00 00 00 00 00 Error: ICRC, ABRT at LBA = 0x00000000 = 0 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 08 00 78 00 00 00 00 01 b8 40 08 00:00:32.826 READ FPDMA QUEUED 60 00 08 00 70 00 00 00 00 01 b0 40 08 00:00:32.826 READ FPDMA QUEUED 60 00 08 00 68 00 00 00 00 01 a8 40 08 00:00:32.825 READ FPDMA QUEUED 60 00 08 00 60 00 00 00 00 01 a0 40 08 00:00:32.825 READ FPDMA QUEUED 60 00 08 00 58 00 00 00 00 01 98 40 08 00:00:32.825 READ FPDMA QUEUED
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: RN214 file system read-only (again)
@edkedk wrote:
I have run the smartctl commands. On two disk there are 1 and 3 reallocated sectors. On sdd there is this error log:
Error 1 [0] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours) Error: ICRC, ABRT at LBA = 0x00000000 = 0
It looks like that happened when the disk was initially powered up. Did you put it into the NAS right away, or did you connect it to a PC and test it first?
It's related to the communication between the NAS (or computer) and the drive, so it could simply be a poor connection when first plugged in. I don't think this error is concerning (and don't see how it would make your system read-only, since it happened before the drive was added to the array).
The other drives with the reallocated sectors might be part of the puzzle though, and probably should be tested.