RN628 offline after spontaneous reboot

eph3 · ‎2020-02-18

My ReadyNAS RN628 completed its nightly rsync jobs around 4am. Around 9am another ReadyNAS that uses its UPS sent an email saying it could no longer see the RN628's UPS (over ethernet).

When I went to check on the RN628, the front panel displays

kthread data+7

43%

I pressed the UI button and the display changed to

Booting...

43%

It seems to be hung. Any clues or suggestions?

StephenB · ‎2020-02-19

@eph3 wrote:

It seems to be hung. Any clues or suggestions?

I'd try rebooting it read-only next.

eph3 · ‎2020-02-19

The RN628 would not shut down gracefully, so I did the reboot by holding the front power button down for 5 seconds. Unfortunately, I joggled holding down the reset button and missed the Boot Menu. (Having the reset button in the rear while reading and operating the front panel has never seemed like the best bit of UI design )

The unit booted normally and the Admin page says everything is fine. I looked through the logs and found a couple interesting things in kernel.log.

The unit rebooted back on February 17, which I was unaware of.
42 seconds after that reboot the log says
Feb 17 20:10:57 Lorraine kernel: RIP: 0010:[<ffffffff8807a023>] [<ffffffff8807a023>] kthread_data+0x7/0xc
Recall that kthread data+7 was the message on the front panel when I went to see what the problem was.

I've attached the kernel.log.

So RAIDar and Admin page show everything as green and happy. Any ideas? A few more background details:

I upgraded to OS 6.10.2 back in November
I have an EDA500 attached
The EDA500 has Seagate ST4000DM000 drives in it which seem to cause high command timeouts warnings from time to time. I recall some others reported some oddities with that particular drive and OS6.
The drive in the RN628 are all healthy, when only one drive reporting (2) reallocated sectors.

Any thoughts? It is scheduled to do rsync jobs tonight when it backs up another ReadyNAS. Should I let it do so?

StephenB · ‎2020-02-19

@eph3 wrote:

42 seconds after that reboot the log says
Feb 17 20:10:57 Lorraine kernel: RIP: 0010:[<ffffffff8807a023>] [<ffffffff8807a023>] kthread_data+0x7/0xc
Recall that kthread data+7 was the message on the front panel when I went to see what the problem was.

Yes. when the NAS crashes, it will display some of the crash message on the LCD.

@eph3 wrote:

The EDA500 has Seagate ST4000DM000 drives in it which seem to cause high command timeouts warnings from time to time. I recall some others reported some oddities with that particular drive and OS6.

The drive in the RN628 are all healthy, when only one drive reporting (2) reallocated sectors.

Personally I don't use desktop drives in my NAS - and you are correct in saying that many folks have had issues with DM drives (though I think the ST3000DM000 was the most problematic).

I do suggest running regular disk tests on the internal drives, and periodically download the log zip file and check the SMART stats. Netgear's thresholds for alerts are much higher than my own, so personally I don't want the drives to get so bad that the NAS generates an alert.

But without more information we can't really tie the crash to a disk problem.

@eph3 wrote:

It is scheduled to do rsync jobs tonight when it backs up another ReadyNAS. Should I let it do so?

I would.

eph3 · ‎2020-02-20

The overnight rsync jobs ran without comment in the logs.

For what it's worth, I do use WD Iron Wolf Pro drives in the RN682. I use the EDA500 some low probability of use backups which I why I threw the old Seagates in it.

So, any theories? Could this have been a single event upset or is something more serious likely?

StephenB · ‎2020-02-20

@eph3 wrote:

So, any theories? Could this have been a single event upset or is something more serious likely?

I'd wait and see if it (or something similar) happens again.

RN628 offline after spontaneous reboot

RN628 offline after spontaneous reboot

Re: RN628 offline after spontaneous reboot

Re: RN628 offline after spontaneous reboot

Re: RN628 offline after spontaneous reboot

Re: RN628 offline after spontaneous reboot

Re: RN628 offline after spontaneous reboot