- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
Hi,
I'm trying to understand a corruption issue I've run into before I take any measures to fix.
I'm having the standard "BTRFS error (device md125): parent transid verify failed" that usually results in the answer in this forum and elsewhere of "Just restore from backup"...
Well, before I restore, I want to understand if there is any chance of recovering one of the corrupted files I have, since I created a file recently before I had a chance to back it up, this error seems to have corrupted the data.
Some background, this error occured after 128 days having the readynas working, without reboot. No power loss, it's on a UPS. But curiously only about 1 minute after completing a defrag, the volume went read only with the error message:
warning:volume:LOGMSG_VOLUME_READONLY The volume data encountered an error and was made read-only. It is recommended to backup your data.
Here is what I'm trying to understand.
the file corrupted is a file called "MoveNewMusic.pl" a stat of that file shows:
# stat MoveNewMusic.pl File: 'MoveNewMusic.pl' Size: 7610 Blocks: 16 IO Block: 4096 regular file Device: ebh/235d Inode: 1751 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-11-11 10:06:26.312009653 -0700 Modify: 2020-11-11 22:45:56.855610574 -0700 Change: 2020-11-11 22:45:56.955601949 -0700 Birth: - # stat -f MoveNewMusic.pl File: "MoveNewMusic.pl" ID: 9b15acc4df0dcccb Namelen: 255 Type: btrfs Block size: 4096 Fundamental block size: 4096 Blocks: Total: 12689899104 Free: 4142678422 Available: 4140347222 Inodes: Total: 0 Free: 0
I'm not sure how either the Device in this case "eb" relates to the a device number of e or 14, and a minor device number of b of 11 relates to any of the /dev/md* devices I have.
A df of that file shows a Filesystem of -, which I'm not sure is normal. Since if I do a df of a good file I get a filesystem of /dev/md127 see below:
# df MoveNewMusic.pl Filesystem 1K-blocks Used Available Use% Mounted on - 50759596416 34188882728 16561388888 68% /run/nfs4/home/cthierman # df version.txt Filesystem 1K-blocks Used Available Use% Mounted on /dev/md127 50759596416 34188882728 16561388888 68% /run/nfs4/home
dmesg is showing that I have errors on /dev/md125, but how can I confirm, with the above info that /dev/md125 is where this file is/was sitting. Is it possible that I have corruption on one of the other two volumes /dev/md127 and /dev/md126 of which no errors or warning appear in dmesg.
Here is a sample of the dmesg -T
[Mon Dec 7 01:51:28 2020] BTRFS error (device md125): parent transid verify failed on 1087635456 wanted 3045699 found 2651197 [Mon Dec 7 01:51:28 2020] BTRFS error (device md125): parent transid verify failed on 1087635456 wanted 3045699 found 2651197 [Mon Dec 7 01:51:29 2020] BTRFS error (device md125): parent transid verify failed on 1087635456 wanted 3045699 found 2651197 [Mon Dec 7 01:51:29 2020] __btrfs_lookup_bio_sums: 3955 callbacks suppressed [Mon Dec 7 01:51:29 2020] BTRFS info (device md125): no csum found for inode 8866 start 14287339520 [Mon Dec 7 01:51:29 2020] BTRFS info (device md125): no csum found for inode 8866 start 14287343616
And here is a cat of /proc/mdstat, and yes, I'm presently running a scrub to see if that will fix anything.
# cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md125 : active raid1 sdd5[0] sdf5[1] 1952316416 blocks super 1.2 [2/2] [UU] [==================>..] resync = 92.2% (1801338496/1952316416) finish=82.2min speed=30574K/sec md126 : active raid5 sda4[0] sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] 34180206080 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU] md127 : active raid5 sdc3[8] sde3[10] sdf3[11] sdd3[9] sdb3[7] sda3[6] 14627073920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU] md1 : active raid10 sda2[0] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1] 1566720 blocks super 1.2 512K chunks 2 near-copies [6/6] [UUUUUU] md0 : active raid1 sda1[6] sde1[10] sdf1[11] sdd1[9] sdc1[8] sdb1[7] 4190208 blocks super 1.2 [6/6] [UUUUUU]
My hope is that, if I can understand how the device relates to a btrfs filesystem/RAID volume, maybe I can understand if I have a chance to recover this file, by using some of the methods suggested in other posts. Ie, zero the the log on, heaven forbid, /dev/md125. Or run a btrfs check... Or maybe just a simple reboot and cross my fingers..
Hoping someone well versed in the in's and out's of the ReadyNas can fill me in on what I'm missing.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
Maybe start with something simpler:
# btrfs device stats /data
(substituting your actual volume name if it isn't data)
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
Sorry, I should have mentioned I tried that.. and this was the output... Still hoping there is a way to find where that file ended up and if I have a chance of recovering.
# btrfs device stats /data [/dev/md127].write_io_errs 0 [/dev/md127].read_io_errs 0 [/dev/md127].flush_io_errs 0 [/dev/md127].corruption_errs 0 [/dev/md127].generation_errs 0 [/dev/md126].write_io_errs 0 [/dev/md126].read_io_errs 0 [/dev/md126].flush_io_errs 0 [/dev/md126].corruption_errs 0 [/dev/md126].generation_errs 0 [/dev/md125].write_io_errs 0 [/dev/md125].read_io_errs 0 [/dev/md125].flush_io_errs 0 [/dev/md125].corruption_errs 0 [/dev/md125].generation_errs 0
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
I should probably also add, that this perl script that I wrote, which normally is all text, is now just a file full of hex 01's. As seen here:
# od -x MoveNewMusic.pl 0000000 0101 0101 0101 0101 0101 0101 0101 0101 * 0016660 0101 0101 0101 0101 0101 0016672
Which is not something you wish for, from your NAS. So by understanding the problem, I'm hoping to understand to what extent has this affected the rest of my Data...
P.S. Scrub did nothing to fix the problem... I'm reluctant to reboot till I have all the data moved to my new Synology NAS, where BTRFS is replaced by ext4.
Sadly, without understanding what caused this, I have lost all confidence in the ReadyNAS platform.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
Were checksums turned on for the volume?
Maybe after the data is copied, you can try btrfs check.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
@StephenB wrote:Were checksums turned on for the volume?
Maybe after the data is copied, you can try btrfs check.
Now that is an interesting question. For /data yes. But, the /home directory (volume) was created by readynas when it was imaged. I assume the answer would be yes, as /home appears to show up on /dev/md127, the same device as /data, however, the subdirectories that my script sits in, shows up oddly, with a - for a Filesystem, when you do a df on it.
# df /home Filesystem 1K-blocks Used Available Use% Mounted on /dev/md127 50759596416 34188882728 16561388888 68% /home # df /home/cthierman Filesystem 1K-blocks Used Available Use% Mounted on - 50759596416 34188882728 16561388888 68% /home/cthierman
So, I'm not really sure what the answer is to your question. Other than, I hope so....
As to the "btrfs check" yes, that is the plan, after I get all the data moved... Sadly, many Terabytes worth.... Probably take some time.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
@thierman wrote:
Sadly, many Terabytes worth.... Probably take some time.
I saw about 320 GiB an hour when I copied about 4.5 TiB from my main NAS to a Pro-6. That was using rsync (a backup job running on the destination NAS).
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
Yeah, interestingly, I have two copies running. One I started on Dec6th @ 11:53am, mytime, which is presently Dec 8th 10:16am.
So two days ago. To a direct attached, over USB, harddrive off the ReadyNas 316.
That copy has copied 992G of data in that time.
Meanwhile, I started an rsync less than 24 hours ago across a 1Gb/s network, using rsync to the synology and it has already transferred 1.4TB of data.
I'm beginning to think that USB port on the ReadyNas 316 is a version 1.0 USB. Or the drive I have is stupidly slow.
I also have a ReadyNas Pro Pioneer, and managed to rsync data to it faster than the USB... Which is saying something cause the Pro isn't the fastest NAS....
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
@thierman wrote:
I'm beginning to think that USB port on the ReadyNas 316 is a version 1.0 USB.
The front port is USB 2.0. The rear ports are USB 3.0.
I am wondering if you are writing to an SMR drive. That can be very slow for sustained writes.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Trying to understand how to identify which raid volume a file is on within ReadyNAS btrfs.
It's a 3TB Hitachi drive... I suspect your right.. Though I'm not sure, but looks like Hitachi was using SMR at the time I bought it. Though from what I can see mainly on 4TB and above drives.. But I'm going to assume that it was also my 3TB drive.
An update on what I've found out...
I put together a table of the various settings I was using for each volume, as I found some volumes were uneffected, while others had lots of corrupted files. Having no idea what caused the corruption, I decided to create a script that looked for and identified the corrupt files. Since they all have the same attributes, all the bytes have been replaced with hex x01, so any file with all zero's except the right most bit as a one is flagged as corrupted, using the first 100 bytes as a sample. Ie, every byte for the 100 bytes must be of the same 0x01.
Here is the table:
What I first noticed is the only volumes that had corrupt files are the volumes where BITROT is turned on.. This is an interesting find, as it leads me to conclude that for all that BTRFS builds COW as a solution to BITROT. Seems, in my case at least, this was the single most contributing factor to a complete loss of data. I would advise anyone running BTRFS to turn off Copy On Write (AKA BITROT protection) ASAP. As I ran for many years without an issue, then one night at 4am. Wham! Volume is readonly, and huge swaths of corrupted files, which I suspect can only be recovered from a backup.
I would love to know how to see where the inodes for these files sit. Something to see if there is any correlation to a particulair drive.
Right now, all I know is that just before the ReadyNas reported problems, a defrag had just finished not much more than 1 minute prior...
Perhaps a warning for others... And a question for those who have also seen their ReadyNas corrupted. Was BITROT protection on on the corrupted volumes???