- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
BTRFS DMESG error on Readynas Pro
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
BTRFS DMESG error on Readynas Pro
I've gotten a lot of my mileage out of my hacked up Readynas Pro moving up to os6 and been very happy overall. Moving now from 6TB WD Reds to 8TB WD Reds recently and noticed an error in DMESG,
One of the volumes has a write error of some sort and wr number keeps incrementing:
[1195510.074643] BTRFS error (device md125): bdev /dev/md125 errs: wr 10614, rd 0, flush 0, corrupt 0, gen 0
Here is the layout of my device.
root@qubert:~/incoming# btrfs fi usage /data/ Overall: Device size: 29.08TiB Device allocated: 24.98TiB Device unallocated: 4.11TiB Device missing: 0.00B Used: 24.89TiB Free (estimated): 4.20TiB (min: 2.14TiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:24.92TiB, Used:24.83TiB /dev/md124 34.00GiB /dev/md125 7.88TiB /dev/md126 3.96TiB /dev/md127 13.04TiB Metadata,RAID1: Size:30.00GiB, Used:28.78GiB /dev/md125 9.00GiB /dev/md126 26.00GiB /dev/md127 25.00GiB System,RAID1: Size:32.00MiB, Used:2.72MiB /dev/md125 32.00MiB /dev/md126 32.00MiB Unallocated: /dev/md124 1.79TiB /dev/md125 1.20TiB /dev/md126 573.87GiB /dev/md127 573.46GiB
I suspect some of this may have to do with the fact the existing set up was more than 90% used and it may resolve itself. Is this something I should be concerned about and would defrag/balancing help here?
And is there a guide on the best way to upgrade the device? I'm moving 6xRAID5 replacing 6TB with 8TB one by one. I've replaced two so far.
TIA!!!
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
I'd definitely be concerned on the error message. Can you look at the underlying SMART stats for the disks?
I suggest using
# smartctl -x /dev/sda
etc for the rest.
The -x will give you access to an error history - which turned up some UNCs on a couple of my own WD60EFRX drives that I wasn't aware of.
Maybe also look for disk errors in the log. Once you locate the disk, you can replace it next - if it happens to be one of the new ones, then exchange it with the seller
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
Hey StephenB,
Thanks for replying. I can't seem to figure out how to issolate the problem on the actual disk level, only on the array level. The smartctl command is undecipherable to me as I don't see any fields which indicate problems.
Poking around it looks like under the covers it looks like these arrays are made when new disks are added, but can live on for a while. Is there a way to force the removal of old arrays? The other question is would any of the maintenance tasks aggrevate (or potentially fix) these problems?
I have a back up of the data so I'm thinking about just continuing the upgrade, in the event the problem is being caused by one of the older drives which is on the way out anyway.
Thanks for your help!
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
@sandroid wrote:
Thanks for replying. I can't seem to figure out how to issolate the problem on the actual disk level, only on the array level. The smartctl command is undecipherable to me as I don't see any fields which indicate problems.
Then look in system.log and kernel.log for btrfs errors. Likely you will see a disk i/o error nearby.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
I did a search on the logs and no sdx errors there. I have docker installed which causes a lot of error messages due to the networking which limits how far back it goes. I'll try the defrag/balance/scrub and see if anything else shakes loos, before adding the next disk.
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
@sandroid wrote:
The smartctl command is undecipherable to me as I don't see any fields which indicate problems.
Of course there are the usual stats - command timeouts, pending sectors, reallocated sectors.
-x gives you an error log that starts with
SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 14 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.
You might find stuff like this in that section:
Error 14 [13] occurred at disk power-on lifetime: 44522 hours (1855 days + 2 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 01 4b 81 cb 40 40 00 Error: UNC at LBA = 0x14b81cb40 = 5561764672 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 01 80 00 e8 00 01 4b 81 cb 40 40 08 8d+00:41:33.721 READ FPDMA QUEUED 60 01 80 00 e0 00 01 4b 81 c9 40 40 08 8d+00:41:33.721 READ FPDMA QUEUED 60 01 80 00 d8 00 01 4b 81 c7 40 40 08 8d+00:41:33.720 READ FPDMA QUEUED 60 01 80 00 d0 00 01 4b 81 c5 40 40 08 8d+00:41:33.720 READ FPDMA QUEUED 60 01 80 00 c8 00 01 4b 81 c3 40 40 08 8d+00:41:33.719 READ FPDMA QUEUED
This particular error is a UNC (short for uncorrectable).
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
Here's a pastbin of the output from all the drives: https://pastebin.com/KqnUukd9
Maybe you can spot the problem, but nothing like you showed me jumped out.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
@sandroid wrote:
Here's a pastbin of the output from all the drives: https://pastebin.com/KqnUukd9
Nothing is showing up for me either - everything looks healthy there.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
@sandroid wrote:
Poking around it looks like under the covers it looks like these arrays are made when new disks are added, but can live on for a while. Is there a way to force the removal of old arrays? The other question is would any of the maintenance tasks aggrevate (or potentially fix) these problems?
What the NAS does as you incrementally upgrade the drives is "stack" arrays on top of each other to make the volume larger. Thiose "old arrays" are still in use. The only way to be rid of them is to destroy the volume or factory default and start over with recovering data from backup. With the kind of error you are seeing, that could be a good idea, anyway. The process of syncing the array may trigger some SMART or other errors that are easier to decypher.
As far as living with it for a while, you are on shaky ground. Your volume may completely fail, so make sureyou keep your backup up to date. But if the files are becoming corrupt but the volume is intact, you may be backing up corrupt files.
If you need a better illustration of how the NAS "stacks" arrays, let's say you start with 4 x 1TB drives in a 4-bay NAS (4 drives horizontally, 1TB vertically). If you replace two drives with 4TB, it adds a 2 x 3TB array on top of the 4 x 1TB one. As you replace more, it horizontally expands that second layer. If you then start replacing with 6TB, it adds another 2TB high layer. It never vertically expands or deletes an MDADM array. Since BTRFS volumes can span more than one MDADM array, it's unnecessary. So, while the volume expands vertically, the arrays don't; the arrays get added to.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
Thanks for the explanation Sandshark! I have 2 active (a poor old NV+ sits maxed out for emergencies) ReadyNAS, a PioneerPro and Ultra4. I back up to the ultra4 so I can re-create the data if needed. I haven't kept up with the Replicate/DR I only manually rsync individual volumes, but I suppose I could do something fancier if I need to rebuild the main box to get rid of this bad volume.
Update on
I ran the defrag/balance which both ran for a minute and didn't do anything, then I patched (from 6.9.5 hotfix 1 to 6.9.6) and rebooted. The volume which was having the problem changed id. I'm not sure if the id changed on the same volume, or the errors moved to a different volume, I think the first,
Now I'm running scrub and the error which was incrementing by one every 5 minutes has stopped incrementing so we'll see what btrfs says at the end of the scrub, and decide if erasing the main box and restoring from back up is the way to go.
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
Well that didn't work. The scrub crashed stopped the incrementing errors temporarily, but crashed the system when it got to a certain point and the errors remain. I'm going to reset the box and copy over the backups since that seems like its the best way to clear everything in the long run and it will take a week or two to expand the volume at this rate anyway.
I'd like to do one last backup, but I'm afraid I will overwrite good files with junk (although there's no evidence of problems). I wish there was a btrfs tool which would tell me which files were involved in the failed io writes.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: BTRFS DMESG error on Readynas Pro
If you have room for a separate backup from your main one, make that new one and use a utility to do a file compare. You'll have to use one that uses more than just name and date for the comparison. Then, you can decide (most likely based on date) whihc are genuinely new/modified and which of the new ones are likely corrupt.