Lost another volume

ibell63 · ‎2017-06-26

On firmware 6.7.4 ran a scrub before leaving for the weekend on Friday, came back in today after not having received a notification that the scrub had ever finished, couldn't make any changes to the contents of any shares, was getting permission denied, even on accounts that have read/write. Reset permissions for these shares.

Finally rebooted and now the volume has disappeared. To my knowledge this system has never suffered a power failure since the last time I rebuilt. Please let me know which logs to post if any.

Hopchen · ‎2017-06-26

Hi again,

Yes, so there is definitely some corruption on the filesystem - unfortunately. You can probably see that yourself reading through some of those messages. It would cause issues mounting the volume and thus you see no wolume anymore. This is not a result of the scrub by the way.

It is one of the problems with running a RAID0. You are much more prone to these sort of problems as there is no fault tolerance at all. If any of the disks were stalling or if there are errors on any of the disks it can cause serious issues for a RAID0. Are all the disks OK? You can check it in the disk_info.log

Do you have have a backup of the data? If so, you are best to factory default and restore from backups. Also, you might want to consider whether RAID0 is the correct RAID for your setup? It is rather risky on 4 drives I think.

View solution in original post

Hopchen · ‎2017-06-26

Hi,

First thing is to check whether the data RAID is running. Can you please post the mdstat.log ?

Thanks

ibell63 · ‎2017-06-26

mdstat.log

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md127 : active raid0 sda3[0] sdd3[3] sdc3[2] sdb3[1]
      23422691328 blocks super 1.2 64k chunks
      
md1 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]
      1046528 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      
md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      4190208 blocks super 1.2 [4/4] [UUUU]
      
unused devices: <none>
/dev/md/0:
        Version : 1.2
  Creation Time : Mon Oct 10 14:44:16 2016
     Raid Level : raid1
     Array Size : 4190208 (4.00 GiB 4.29 GB)
  Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Mon Jun 26 12:53:42 2017
          State : clean 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

           Name : 117c606a:0  (local to host 117c606a)
           UUID : 1d244df2:605bbba2:95ce8d48:297bec93
         Events : 86

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
/dev/md/1:
        Version : 1.2
  Creation Time : Mon May 15 18:47:04 2017
     Raid Level : raid10
     Array Size : 1046528 (1022.00 MiB 1071.64 MB)
  Used Dev Size : 523264 (511.00 MiB 535.82 MB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Mon Jun 26 10:41:10 2017
          State : clean 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : near=2
     Chunk Size : 512K

           Name : 117c606a:1  (local to host 117c606a)
           UUID : 7eb71118:4e648fc4:7b53a41b:99cd94fa
         Events : 19

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync set-A   /dev/sda2
       1       8       18        1      active sync set-B   /dev/sdb2
       2       8       34        2      active sync set-A   /dev/sdc2
       3       8       50        3      active sync set-B   /dev/sdd2
/dev/md/data-0:
        Version : 1.2
  Creation Time : Mon May 15 18:47:04 2017
     Raid Level : raid0
     Array Size : 23422691328 (22337.62 GiB 23984.84 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Mon May 15 18:47:04 2017
          State : clean 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 64K

           Name : 117c606a:data-0  (local to host 117c606a)
           UUID : 7abc2a8b:b6a7cb06:e80da4b0:8c8d78e6
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3

Hopchen · ‎2017-06-26

Okay, so your data RAID is active which is good. But I can see that it is a RAID0. A scrub will have no effect on a RAID0 as there is no redundancy to recover from in case of the filesystem finding corrupt versions of your files during the scrub. So, if you run a RAID0 (or any other RAID with no redundancy), don't run a scrub.

That being said, the scrub shouldn't break the volume - it just won't do anything useful. Can you enable SSH access and login to the CLI of the NAS and run this command:

journalctl | grep -i btrfs

It is to see if the system logged any filesystem warnings. We need to see if the filesystem is OK.

Thanks

ibell63 · ‎2017-06-26

Couldn't post the output of that command because it's too long, so I uploaded it to pastebin:

https://pastebin.com/26WTbCLC

Hopchen · ‎2017-06-26

Hi again,

Yes, so there is definitely some corruption on the filesystem - unfortunately. You can probably see that yourself reading through some of those messages. It would cause issues mounting the volume and thus you see no wolume anymore. This is not a result of the scrub by the way.

It is one of the problems with running a RAID0. You are much more prone to these sort of problems as there is no fault tolerance at all. If any of the disks were stalling or if there are errors on any of the disks it can cause serious issues for a RAID0. Are all the disks OK? You can check it in the disk_info.log

Do you have have a backup of the data? If so, you are best to factory default and restore from backups. Also, you might want to consider whether RAID0 is the correct RAID for your setup? It is rather risky on 4 drives I think.

ibell63 · ‎2017-06-26

The data is not unique and I have backups I can restore from. I'll reformat as XRAID. disk_info.log does not show any issues. Here it is below for your reference, also:

Device:             sda
Controller:         0
Channel:            0
Model:              WL6000GSA12872E
Serial:             WOL240343186
Firmware:           01.01C01
Class:              SATA
Sectors:            11721532032
Pool:               data
PoolType:           RAID 0
PoolState:          5
PoolHostId:         117c606a
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    36
  Start/Stop Count:               184
  Power-On Hours:                 8504
  Power Cycle Count:              133
  Load Cycle Count:               153

Device:             sdb
Controller:         0
Channel:            1
Model:              WL6000GSA6472E
Serial:             WOL240336490
Firmware:           01.0RRE2
Class:              SATA
RPM:                5700
Sectors:            11721045168
Pool:               data
PoolType:           RAID 0
PoolState:          5
PoolHostId:         117c606a
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    39
  Start/Stop Count:               227
  Power-On Hours:                 8890
  Power Cycle Count:              140
  Load Cycle Count:               187

Device:             sdc
Controller:         0
Channel:            2
Model:              WL6000GSA6472E
Serial:             WOL240336488
Firmware:           01.0RRE2
Class:              SATA
RPM:                5700
Sectors:            11721045168
Pool:               data
PoolType:           RAID 0
PoolState:          5
PoolHostId:         117c606a
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    39
  Start/Stop Count:               213
  Power-On Hours:                 8737
  Power Cycle Count:              136
  Load Cycle Count:               175

Device:             sdd
Controller:         0
Channel:            3
Model:              WL6000GSA6472E
Serial:             WOL240336487
Firmware:           01.0RRE2
Class:              SATA
RPM:                5700
Sectors:            11721045168
Pool:               data
PoolType:           RAID 0
PoolState:          5
PoolHostId:         117c606a
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    36
  Start/Stop Count:               221
  Power-On Hours:                 8826
  Power Cycle Count:              139
  Load Cycle Count:               178

Hopchen · ‎2017-06-26

Yes, your disks seems fine here. But a RAID0 is so intolerant that any hicpus can be problematic - the more disks involved the bigger the risk. It is hard to say exactly what caused it. The point is that there is no recovery for the filesystem once it encouters errors - as there is no redundancy.

I am glad you had a backup and that you are considering a RAID with some redundancy.

Just as an FYI - things that can typically lead to filesystem corruption is:

1. Disk issues.
2. Filling the filesystem too much. You should leave about 10% free space.
3. Non-graceful shutdowns (such as power-cuts).

Best of luck!

ibell63 · ‎2017-06-26

ReclaiME finds my filesystem and it appears that it will be able to recover data from it.

Hopchen · ‎2017-06-26

That is good news if you were in a potential data loss situation. ReclaiME is a good software and might find some files. BTRFS has some built-in tools for that as well, namely: BTRFS restore. But none of those really fixes the filesystem and might find corrupted versions of your files.

So unless you really need some data that isn't backed up, I would still go the factory default route.

Lost another volume

Lost another volume

Re: Lost another volume

Re: Lost another volume

Re: Lost another volume

Re: Lost another volume

Re: Lost another volume

Re: Lost another volume

Re: Lost another volume

Re: Lost another volume

Re: Lost another volume

Re: Lost another volume