× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Recovering from a failed drive

jcalderone
Aspirant

Recovering from a failed drive

I have a ReadyNAS NV. Up until earlier this month, it had 4 disks, 3 500GB and 1 1TB. It was in a RAIDX configuration for 1.5TB of available storage, almost all in use. At some point there was a hardware issue (it got pretty hot in my apartment one day, so perhaps heat related) and the NAS crashed. I checked out all the disks in another machine and identified one that was having read failures, RMA'd it, got a replacement. Put the replacement in and booted the NAS, but it didn't bring any shares online. Instead, looking through the logs, I see some suspicious things:

Jul 30 18:31:29 higgs kernel: startstop XRAID command = start, flash_cache=0
Jul 30 18:31:29 higgs kernel: Evaluating last known good configuration.
Jul 30 18:31:29 higgs kernel: X_RAID clean shutdown indicator: 0xb.
Jul 30 18:31:29 higgs kernel: 0 2 2 0 1 0 0 0
Jul 30 18:31:29 higgs kernel: 0 1 0 0
Jul 30 18:31:29 higgs kernel: 1 0 0 0
Jul 30 18:31:29 higgs kernel: 0 0 0 0
Jul 30 18:31:29 higgs kernel: 0 0 0 0
Jul 30 18:31:29 higgs kernel: Update time for sb 1 = 4c337678.
Jul 30 18:31:29 higgs kernel: Update time for sb 2 = 4c337678.
Jul 30 18:31:29 higgs kernel: Update time for sb 3 = 0.
Jul 30 18:31:29 higgs kernel: Update time for sb 4 = 49c23714.
Jul 30 18:31:29 higgs kernel: recent_ID = 1, select_ID=1, most_ID=2 right_mac=4
Jul 30 18:31:29 higgs kernel: Selected sb 1, ctime=4c337678, id=a201190c.
Jul 30 18:31:29 higgs kernel: Use this image: 1
Jul 30 18:31:29 higgs kernel:
Jul 30 18:31:29 higgs kernel: VERSION/ID : SB=(V:0.1.0) ID=<a201190c.00000000.00000000.00000000> CT:4c337678
Jul 30 18:31:29 higgs kernel: RAID_INFO : DISKS(TOTAL:4 RAID:4 PARITY:3 ONL:3 WRK:3 FAILED:1 SPARE:0 BASE:2)
Jul 30 18:31:29 higgs kernel: SZ:0976752688 UT:00000000 STATE:0 LUNS:2 EXTCMD:1 LSZ:0976752686
Jul 30 18:31:29 higgs kernel: LOGICAL_DRIVE : 0: B:0000000002 E:0004096000 R:1 O:1 I:0:000000000 DM:7
Jul 30 18:31:29 higgs kernel: LOGICAL_DRIVE : 1: B:0004096002 E:0972656686 R:4 O:1 I:0:000000000 DM:7
Jul 30 18:31:29 higgs kernel: PHYSICAL_DRIVE: 0: DISK<N:0/1,hdc(22,0),ID:0,PT:1,SZ:1953504688,ST: :online>
Jul 30 18:31:29 higgs kernel: PHYSICAL_DRIVE: 1: DISK<N:1/2,hde(33,0),ID:1,PT:1,SZ:0976752688,ST: :online>
Jul 30 18:31:29 higgs kernel: PHYSICAL_DRIVE: 2: DISK<N:2/3,hdg(34,0),ID:2,PT:1,SZ:0976752688,ST: B:online>
Jul 30 18:31:29 higgs kernel: PHYSICAL_DRIVE: 3: DISK<N:3/4,hdi(56,0),ID:3,PT:1,SZ:0976752688,ST:P :faulty>
Jul 30 18:31:29 higgs kernel: CURRENT_DRIVE : DISK<N:0/1,XXX(22,0),ID:0,PT:1,SZ:1953504688,ST: :online>
Jul 30 18:31:29 higgs kernel: Not enough disks, would not start XRAID.
Jul 30 18:31:29 higgs kernel: hdc: hdc1 hdc2 hdc3 < hdc5 >
Jul 30 18:31:29 higgs kernel: hde: hde1 hde2 hde3 < hde5 >
Jul 30 18:31:29 higgs kernel: hdg: unknown partition table
Jul 30 18:31:29 higgs kernel: hdi: unknown partition table

PHYSICAL_DRIVE 2 corresponds to the one which was just replaced. But there seems to be an issue with PHYSICAL_DRIVE 3 now as well. In particular, its update time seems to be out of sync with the first two (original, hopefully good) disks.

What can I do about this?
Message 1 of 4
Jedi_Knight
Tutor

Re: Recovering from a failed drive

I would suggest contacting the Tech Support and also send in your full system logs with this forum post in subject line. Meanwhile please refrain doing anything to the system and work with Tech Support on it. Thanks.

Message 2 of 4
Jedi_Knight
Tutor

Re: Recovering from a failed drive

Thanks for the system logs. Have you contact Tech Support? Please refer them to this forum post as they will need to escalate your support case to Level 4 Support.
Message 3 of 4
jcalderone
Aspirant

Re: Recovering from a failed drive

I did contact tech support. The suggestion made was to clone a couple of the drives:

- Forgive the previous agent. We have gone through the logs again and Drive 3 and 4 are reporting incorrectly.
- There is a TLER error on disk 1 stating that disk one is not compatible. But it has not failed.
- You need to power the unit off. Take disk 3 and 4 out of the unit and clone them
- Steps to clone are here:

(lots of stuff about dd_rescue)

Since disk 3 is blank (the original was RMA'd), I don't think these instructions make sense. I've asked for clarification though and am waiting for a response.
Message 4 of 4
Top Contributors
Discussion stats
  • 3 replies
  • 881 views
  • 0 kudos
  • 2 in conversation
Announcements