Forum Discussion

berillio

Aspirant

May 24, 2022

RN214 goes Offline and NICs may be dead

Hello RN Forum,

As you may remember from other posts (still on hiatus, sic), my current set up in Birmingham UK is made of:

RN214a (4x WD40EFAX), FW 6.10.3, RAID 5

RN214b (3x WD80EFBX + 1x WD40EFRX), FW 6.10.3, RAID 5

RN424 (3x WD80EFAX + 1x WD80EFBX), FW 6.10.4 Hotfix1, RAID 5

The RN214s were purchased in April / May 2020 and the RN424 in April 2021

I went back in Italy for Xmas but family commitments kept me there until now, mid May.

While abroad, very occasionally, I logged on the NASs, and in February, I noticed that RN214b had gone offline; I asked a friend to go and check and she rebooted it (I instructed her to pull the plug, as it was unresponsive). I think I checked it again in April and it appeared to be “Online”, but it was Offline when I arrived back home.

This situation seems to be similar to the one described in

RN21400 Suddenly Goes Offline but is actually running

https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/RN21400-Suddenly-Goes-Offline-but-is-actually-running/m-p/2113700#M192729

as all my NAS have a double NIC to allow bonding, but inreality I tried to implement the bonding but failed on the RN214s. I thought that my “basic” router would not support bonding and I purchased a GS32T, but although on the RN424 the bonding was successful, both the RN214s would go offline as soon as the second “bonded” NIC was connected. I planned to have a further look at this issue with a post on the forum, but I already had two posts open, so I waited a later time.

All NASs were left configured with double bonding in place, but eth1 was unconnected on the RN 214s (so both NICs were “in use and powered”) - (unfortunately): this was the situation for the last year/18 months. No apps installed on RN214b (Plex on 214a, use it once and pretty unnecessary for my use).

I followed the advices on that post above, but unfortunately I did not get to the “happy ending”: the RN21b, now connected using an unbonded single NIC (on eth1 now but there seems to be little difference between the two NICs), would boot up, stay online for a matter of minutes, then becomes unavailable.

The fan does not seem to respond to the software settings either: even if I set it to “Cool”, the rpm remain @<800. I tried to swap it for a similarly identical fan from a dead RN104, when I rebooted RN214b I saw 2785rpm, but five minutes later the data was unaccessible, and next reboot the fan was running <800rm again. The unit is currently off and “naked” (no side panels).

This is the situation thus far.

Any possibility that a FW update may fix any of the issues (which seem to be hardware issues, to my eyes) ?

Any further test I could do?

Further comments

The RN214a is currently using eth0. Is it advisable to switch to eth1, if that seems to be less temperature sensitive, or that is only the case when both NICs are in a bonded state and “in use”?

Is it advisable to keep the double bonding on the RN424, given that the speed advantage is minimal, or should I just use a single unbonded NIC, alternating eth0 and eth1 once a year or so?

IF the unit has suffered a terminal fault to both NICs – and has to be considered DEAD, then I really don’t know what to do, because NETGEAR ReadyNAS seems to have disappeared from the UK market: Six months ago, I could still find RN424s on Amazon.de, but that does not seems to be the case anymore; so the option of plugging the entire array in a new unit (214 or 424) doesn’t seem to be available to me anymore.

If I am correct, I can switch off the RN 214a, remove all disks (ordered & labelled), load all the disks from the faulty RN214b and power it up. The R214a should read that full array. That should allow me to transfer all the data on a WD10EFAX currently empty. The disks could be formatted and used somewhere else.

Thanks for help and suggestion in advance,

Berillio

13 Replies

Replies have been turned off for this discussion

StephenB
Guru - Experienced User
May 24, 2022
berillio wrote:

Is it advisable to keep the double bonding on the RN424, given that the speed advantage is minimal

Why (given that the speed advantage is minimal)?

berillio wrote:

I can switch off the RN 214a, remove all disks (ordered & labelled), load all the disks from the faulty RN214b and power it up. The R214a should read that full array. That should allow me to transfer all the data on a WD10EFAX currently empty.

Correct. You can also migrate the disks to the RN424 (or in the other direction) - though the system will need to switch the OS from arm->x86 (or vice versa) when you do that.

berillio wrote:

The RN214a is currently using eth0. Is it advisable to switch to eth1

I don't think it matters though it would do no harm.

berillio wrote:
The RN214s were purchased in April / May 2020 and the RN424 in April 2021

The hardware warranty is 3 years, so you could request an RMA for RN214b.
- berillio
  Aspirant
  May 26, 2022
  Thank you Stephen B;
  “ Correct. You can also migrate the disks to the RN424 (or in the other direction) - though the system will need to switch the OS from arm->x86 (or vice versa) when you do that.”
  
  I went for a simple array migration to the RN214a (which now calls itself RN214b, but using a different IP).
  Unfortunately it showed the same problem as the previous “Unit b”. The file system came up but very slowly. RAIDair showed the unit online, but not the Admin & Browse icons for at least five minutes.
  Then I instructed a full data (minus the snapshots) “Teracopy” over the WD10 target drive, but that did not start (because the target drive was too small by 76Gb), but I only realised that 2h later when I checked it, and by then the unit was frozen; I unplugged it and restarted, simply to see a CPU temperature of 71° and likewise extremely high temps for the drives. OUCH.
  I let it cool down for 2 or 3h, then I managed to transfer 78Gb of data before it hung. This morning I tried to copy one folder, data transfer speeds were ~104Mb/sec but then it froze 10 seconds before the end. This evening, the file system was up for a matter of seconds before hourglassing; the unit hung, although the temps were lower than 30° all around (incidentally, I moved the unit to a more “exposed” position, removed the side cheeks and top panel to allow more air in; the drives were also removed and left on the desk to cool down and inserted just before rebooting).
  Now I don’t know anymore what to think
  Should I return the RN214a array to the “Unit A” and check if that is still functional?
  Should I instead test the RN214a array in “Unit B” to see if that hardware is faulty as I assumed it was?
  Should I presume that the 6.10.3 FW on the RN214b array has got corrupted somewhat, upgrade it to (say) 6.10.4 and see if an uncorrupted firmware can read the exhisting file system?
  Should I try the RN214b array in the RN424, maybe some more robust hardware (also with a much bigger fan) could read the file system? But that would imply a firmware upgrade anyway (arm to x86_64) so basically also similar to the previous option + hardware advantage?
  
  Thank to everybody in advance
  
  p.s        This is the content of diskinfo.log from the logs download taken before switching the array to the RN214a unit:
  
  Device:             sda
  Controller:         0
  Channel:            0
  Model:              WDC WD80EFBX-68AZZN0
  Serial:             VRHBHJRK
  Firmware:           85.00A85W
  Class:              SATA
  RPM:                7200
  Sectors:            15628053168
  Pool:               data
  PoolType:           RAID 5
  PoolState:          1
  PoolHostId:         1132353a
  Health data
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    41
  Start/Stop Count:               19
  Power-On Hours:                 4764
  Power Cycle Count:              19
  Load Cycle Count:               215
  
  Device:             sdb
  Controller:         0
  Channel:            1
  Model:              WDC WD80EFBX-68AZZN0
  Serial:             VRHBMEDK
  Firmware:           85.00A85W
  Class:              SATA
  RPM:                7200
  Sectors:            15628053168
  Pool:               data
  PoolType:           RAID 5
  PoolState:          1
  PoolHostId:         1132353a
  Health data
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    44
  Start/Stop Count:               18
  Power-On Hours:                 4744
  Power Cycle Count:              18
  Load Cycle Count:               213
  
  Device:             sdc
  Controller:         0
  Channel:            2
  Model:              WDC WD80EFBX-68AZZN0
  Serial:             VRGR7MNK
  Firmware:           85.00A85W
  Class:              SATA
  RPM:                7200
  Sectors:            15628053168
  Pool:               data
  PoolType:           RAID 5
  PoolState:          1
  PoolHostId:         1132353a
  Health data
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
    Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    43
  Start/Stop Count:               17
  Power-On Hours:                 4615
  Power Cycle Count:              17
  Load Cycle Count:               207
  
  Device:             sdd
  Controller:         0
  Channel:            3
  Model:              WDC WD40EFRX-68N32N0
  Serial:             WD-WCC7K6YX6PYY
  Firmware:           82.00A82W
  Class:              SATA
  RPM:                5400
  Sectors:            7814037168
  Pool:               data
  PoolType:           RAID 5
  PoolState:          1
  PoolHostId:         1132353a
  Health data
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    33
  Start/Stop Count:               1158
  Power-On Hours:                 17642
  Power Cycle Count:              78
  Load Cycle Count:               1277
  - StephenB
    Guru - Experienced User
    May 27, 2022
    berillio wrote:
    
    I went for a simple array migration to the RN214a (which now calls itself RN214b, but using a different IP).
    
    Unfortunately it showed the same problem as the previous “Unit b”. The file system came up but very slowly. RAIDair showed the unit online, but not the Admin & Browse icons for at least five minutes.
    
    So the disks in RN214b cause the same problem when migrated to RN214a.
    
    I would next try the RN214a disks in RN214b, and confirm that the problem doesn't occur in RN214b with RN214a's disks.
    
    I'd also take a look at the OS partition fullness (not that likely to be the issue, but easy to check). Look in volume.log, and scroll down to the df -h section. /dev/md0 is the OS partition.
    
    === df -h === Filesystem Size Used Avail Use% Mounted on udev 10M 4.0K 10M 1% /dev /dev/md0 3.7G 633M 2.9G 18% /
    
    Did you have any apps running in RN214b?
    
    It might be worth asking a mod ( Marc_V or JeraldM ) to review the entire log zip of the problem system.