× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Pro4 not rebooting

SurplusGadgets
Aspirant

Pro4 not rebooting

Really beginning to question ReadyNAS devices.  Had 3 NV+ for 4-5 years and never an issue.  Upgraded to an Ultra4+, Pro4 and Pro2 and everyone of them has crashed badly within 2 years of use.  This is the last of the 3.  Ultra4+ supposedly crashed badly in May when a disk supposedly died while a firmware upgrade was taking place.  16TB down the tubes.  Still trying to recreate.  Anyway, Pro4 stopped responding suddenly.  Tried to reboot but stuck on Booting... (blinking power button).  Mem Test fine.  Disk Test, 2 hrs after reaching 100% finally reports Disk 3 bad.  So remove Disk 3.  Try to reboot. Stuck on Booting...  Run disk test again.  10 hrs after reaching 100%, still on "Testing Disks...".  OK, put in Tech Mode and telnet'ed in.  

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : inactive sda2[0](S) sdd2[3](S) sdc2[2](S) sdb2[1](S)
      2095040 blocks super 1.2

md0 : active raid1 sda1[0] sdd1[3] sdb1[1]
      4190208 blocks super 1.2 [4/3] [UU_U]

md127 : active raid5 sda3[0] sdd3[3] sdb3[1]
      5845988352 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UU_U]
      bitmap: 1/15 pages [4KB], 65536KB chunk

unused devices: <none>
# mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Thu Aug  7 10:59:49 2014
     Raid Level : raid5
     Array Size : 5845988352 (5575.17 GiB 5986.29 GB)
  Used Dev Size : 1948662784 (1858.39 GiB 1995.43 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Sun Oct  4 13:19:02 2015
          State : active, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : 5e26f85a:data-0  (local to host 5e26f85a)
           UUID : 3c6efe9b:5528f63b:df1a2fab:feab3302
         Events : 6859

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       4       0        0        4      removed
       3       8       51        3      active sync   /dev/sdd3

Is it just the Swap RAID device (md1) did not fail its 3rd device/partition (sdc2) and that is limiting the bootup?  Can I just fail and remove that partition (mdadm /dev/md1 --fail /dev/sdc2 --remove /dev/sdc2) and try to reboot to get the device into limp mode?  I want to get the Pro4 back up to do a latest backup of some critical data.  Do not understand why suddenly failing drives keep these systems from booting -- or is this an OS6 / BTRFS thing on these "older" devices that causes the issues?  Never had drives fail until I upgraded to the new hardware (and drives in some cases; but this is an older drive).  (Drives are all on HW compatibility list and identical models.  Pro4 is purchased from an Amazon reseller and Netgear would not allow it to be registered; saying already been registered. Ultra4+ that failed in May was still under warranty and registered.  Seatools reports the "failed" drive is fine.)

Message 1 of 3

Accepted Solutions
SurplusGadgets
Aspirant

Re: Pro4 not rebooting

So still working to recover the data. Under Tech Mode, I was able to re-add the dropped disk to the arrays.  And then even mount the arrays (well, md0 and md127 that have BTRFS on them).  The Data array is just fine.  The OS array md0, mounts but exhibits two unrecoverable BTRFS issues as reported by SCRUB (albeit poking around looks fine).  I did have to "fix" one bad partition table copy on md0. The key is how to get the data off the data array since the unit will not boot?  The Tech Mode OS seems locked down / hobbled.  No other exec's will run even if added to the RAM disk.  The existing exec's like Dropbear (ssh, scp) and similar (tar -c) have been disabled from copying off the array over the network (security measure?).  I cannot seem to mount a local USB drive either. Have not investigated booting off a USB stick yet (think I read somewhere this is possible). Simple ideas?  Prefer network as it is much faster than USB.  My backup solution is incremental and too long to recreate.  Just trying to avoid a loss of 8TB after this, a third machine since upgrading to OS6, somehow has a glitch that is preventing the current volumes from booting.  Just cannot believe I will loose data again, a third time, and looking for a better solution.  So much for RAID data protection 😞

View solution in original post

Message 3 of 3

All Replies
SurplusGadgets
Aspirant

Re: Pro4 not rebooting

Likely answering my own question.  More research.  Seems all four drives are ST2000DL003 FW ver CC3C.  Purchased at three different times from three different vendors.  On HW list but not with notice of FW version at time of purchases.  All test fine when I check them with SmartControl or similar currently; even SMART shows fine.  But SeaTools simply reports "SMART Failure" on two of them (although other tests on SeaTools pass fine).  SMART registers reporting on SmartControl and similar tools reports fine.  So am guessing ReadyNAS is seeing some register that SeaTools is and refusing to boot with these two drives now.  Am working to replace and clone all four drives to try and rebuild the array; a time bomb that was waiting to go off maybe.  Not sure why ReadyNAS "Test Disk" does not report this (only did for one of the two and only once) or why there is not a more informative reason when trying to boot that this may be the issue.  Too much time and energy; let alone sudden failure of the system with no warnings.

 

Message 2 of 3
SurplusGadgets
Aspirant

Re: Pro4 not rebooting

So still working to recover the data. Under Tech Mode, I was able to re-add the dropped disk to the arrays.  And then even mount the arrays (well, md0 and md127 that have BTRFS on them).  The Data array is just fine.  The OS array md0, mounts but exhibits two unrecoverable BTRFS issues as reported by SCRUB (albeit poking around looks fine).  I did have to "fix" one bad partition table copy on md0. The key is how to get the data off the data array since the unit will not boot?  The Tech Mode OS seems locked down / hobbled.  No other exec's will run even if added to the RAM disk.  The existing exec's like Dropbear (ssh, scp) and similar (tar -c) have been disabled from copying off the array over the network (security measure?).  I cannot seem to mount a local USB drive either. Have not investigated booting off a USB stick yet (think I read somewhere this is possible). Simple ideas?  Prefer network as it is much faster than USB.  My backup solution is incremental and too long to recreate.  Just trying to avoid a loss of 8TB after this, a third machine since upgrading to OS6, somehow has a glitch that is preventing the current volumes from booting.  Just cannot believe I will loose data again, a third time, and looking for a better solution.  So much for RAID data protection 😞

Message 3 of 3
Top Contributors
Discussion stats
  • 2 replies
  • 2241 views
  • 0 kudos
  • 1 in conversation
Announcements