× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Tomahna
Aspirant

Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Before I was able to replace the disks (for which smart disk errors were reported previously), the next reboot gave me an unrecoverable disk problem (End of Life error). Checking the fs_check log, revealed a set of block reading errors resulting in short reads. It suggests to run fsck manually. (See end of log file below)

 

.....

Error reading block 242810730 (Attempt to read block from filesystem resulted in short read). Ignore error? yes Force rewrite? yes fsck.ext4: Attempt to read block from filesystem resulted in short read while trying to re-open /dev/c/c /dev/c/c: ********** WARNING: Filesystem still has errors ********** ***** File system check performed at Sat Aug 25 19:18:09 CEST 2018 ***** fsck 1.42.12 (29-Aug-2014) /dev/c/c: recovering journal Error reading block 242799888 (Attempt to read block from filesystem resulted in short read). /dev/c/c: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. (i.e., without -a or -p options)

 

However when I run fsck with -v -n to get more information, it does not get past de short read error. (See below)

 

/root$ fsck.ext4 -v -n /dev/c/c
e2fsck 1.42.12 (29-Aug-2014)
fsck.ext4: Attempt to read block from filesystem resulted in short read while trying to open /dev/c/c
Could this be a zero-length partition?

 

How can I run fsck manually in the correct way, to attempt to identify an possibly resolve the disk errors.

 

Any suggestions are apreciatied.

 

Regards,

Mark

Model: ReadyNAS RNDU4000|ReadyNAS Ultra 4 Chassis only
Message 1 of 18
Marc_V
NETGEAR Employee Retired

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Hi @Tomahna

 

Happy New Year!

 

We wouldn't recommend doing any further action as this might cause much more damage on recovery of data.

 

I would suggest to try and get the data off the disks through Data recovery software or service and try to clone the disks. Contacting Support for assistance on this is also advisable though Data Recovery and Support might need to be purchased. You can login to my.netgear.com and create a case for your product so the experts can assist you on escalating the case.

 

Have you checked which of the 2 disks failed the last? This might be needed for re-assembling the RAID.

 

Hope this helps!

 

 

Regards

 

 

Message 2 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Hello Marc,

 

Thank you for your reply. I will check out my.netgear.com

 

PS both disks failed at exactly the same time, which I found rather suspious, but never the less the the problem remains the same. Unless another cause can be identified that would cause both disks to become unusbale (powersupply or driver), which can be recovered in another way.

 

Best regards,

Mark

Message 3 of 18
Marc_V
NETGEAR Employee Retired

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Hi @Tomahna

 

If it's usually the PSU it won't boot or won't even turn on, I hope you can contact Support so they can assist and resolve this for you and the best solution can be provided.

 

 

Regards

Message 4 of 18
StephenB
Guru

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time


@Tomahna wrote:

 

... gave me an unrecoverable disk problem (End of Life error).

...PS disks failed at exactly the same time, which I found rather suspious,

"End of Life" isn't a SMART statistic, and it's not an error I've seen before from a NAS. 

 

Are you sure this was the error?

Were these SSD disks?  

 

Perhaps download the log zip file and/or query the disk status with smartctl.

Message 5 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

The “End of life error” was displayed on the Ultra4 unit itself. I cannot find it in the log files.

 

They are not SSD disks:

- Disk 1 WD2003FYYS-23W0B0 42D0788 42D0791IBM 1863 GB
- Disk 2 WD2003FYYS-23W0B0 42D0788 42D0791IBM 1863 GB
- Disk 3 Seagate ST3000DM001-1ER166 2794 GB
- Disk 4 Seagate ST3000DM001-1ER166 2794 GB

 

Please find below some interesting snippets from the system.log file

(Also attached the log web-screen positioned form when the problem started.)

 

Aug 29 10:49:49 Serenia kernel: All bugs added by David S. Miller davem@redhat.com

Probably some kind of joke...

 

Jan 2 17:36:52 Serenia kernel: ata2.00: cmd 60/00:10:d8:b4:26/01:00:27:00:00/40 tag 2 ncq 131072 in
Jan 2 17:36:52 Serenia kernel: res 41/40:00:e0:b4:26/00:00:27:00:00/40 Emask 0x409 (media error) <F>

 

A whole set of:
Jan 2 17:36:52 Serenia kernel: ata2.00: status: { DRDY ERR }
Jan 2 17:36:52 Serenia kernel: ata2.00: error: { UNC }
Jan 2 17:36:52 Serenia kernel: ata2.00: configured for UDMA/133
Jan 2 17:36:52 Serenia kernel: ata2: EH complete
Jan 2 17:36:52 Serenia kernel: ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Jan 2 17:36:52 Serenia kernel: ata2.00: irq_stat 0x40000008
Jan 2 17:36:52 Serenia kernel: ata2.00: failed command: READ FPDMA QUEUED

 

 

 

Jan 2 17:36:52 Serenia kernel: bio: create slab <bio-1> at 1
Jan 2 17:36:52 Serenia kernel: md/raid1:md0: active with 4 out of 4 mirrors
Jan 2 17:36:52 Serenia kernel: md0: detected capacity change from 0 to 4294955008
Jan 2 17:36:52 Serenia kernel: md0: unknown partition table
Jan 2 17:36:52 Serenia kernel: md: bind<sdb2>

 

Jan 2 17:36:52 Serenia kernel: md: using 128k window, over a total of 1948793216 blocks.
Jan 2 17:36:52 Serenia kernel: md2: unknown partition table
Jan 2 17:36:52 Serenia kernel: md: bind<sdd6>

Unrecovered read error - auto reallocate failed
Jan 2 17:36:52 Serenia kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 27 26 b4 e0 00 00 f8 00
Jan 2 17:36:52 Serenia kernel: end_request: I/O error, dev sda, sector 656848098
Jan 2 17:36:52 Serenia kernel: md/raid:md2: read error not correctable (sector 647410928 on sda5).
Jan 2 17:36:52 Serenia kernel: md/raid:md2: Disk failure on sdb5, disabling device.
Jan 2 17:36:52 Serenia kernel: <1>md/raid:md2: Operation continuing on 3 devices.
Jan 2 17:36:52 Serenia kernel: md/raid:md2: Disk failure on sda5, disabling device.
Jan 2 17:36:52 Serenia kernel: <1>md/raid:md2: Operation continuing on 2 devices.
Jan 2 17:36:52 Serenia kernel: Buffer I/O error on device dm-0, logical block 242778913
Jan 2 17:36:52 Serenia kernel: md/raid:md2: read error not correctable (sector 647410936 on sda5).
Jan 2 17:36:52 Serenia kernel: md/raid:md2: read error not correctable (sector 647410944 on sda5).
Jan 2 17:36:52 Serenia kernel: md/raid:md2: read error not correctable (sector 647410952 on sda5).
Jan 2 17:36:52 Serenia kernel: Buffer I/O error on device dm-0, logical block 242778914
Jan 2 17:36:52 Serenia kernel: md/raid:md2: read error not correctable (sector 647410960 on sda5).
Jan 2 17:36:52 Serenia kernel: Buffer I/O error on device dm-0, logical block 242778915
Jan 2 17:36:52 Serenia kernel: md/raid:md2: read error not correctable (sector 647410968 on sda5).
Jan 2 17:36:52 Serenia kernel: Buffer I/O error on device dm-0, logical block 242778916
Jan 2 17:36:52 Serenia kernel: ata1: EH complete
Jan 2 17:36:52 Serenia kernel: md: md2: resync done.
Jan 2 17:36:52 Serenia kernel: md: checkpointing resync of md2.
Jan 2 17:36:52 Serenia kernel: RAID conf printout:
Jan 2 17:36:52 Serenia kernel: --- level:5 rd:4 wd:2
Jan 2 17:36:52 Serenia kernel: disk 0, o:0, dev:sda5
Jan 2 17:36:52 Serenia kernel: disk 1, o:0, dev:sdb5
Jan 2 17:36:52 Serenia kernel: disk 2, o:1, dev:sdc5
Jan 2 17:36:52 Serenia kernel: disk 3, o:1, dev:sdd5

 

Message 6 of 18
StephenB
Guru

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

The logical next step is to power down the NAS and test disks 1 and 2 in a Windows PC using WDC's lifeguard software.  You can connect the disks using either SATA or a USB adapter/dock.

 

You could also try powering down, removing disk 1, and then try to boot without disk 1.  I'm not seeing the details of the disk 2 failure in the log snippet, but there is plenty of evidence that disk 1 is struggling.  So it's conceivable that the system would boot w/o disk 1.  If it does, the next step is to backup the data.  Do that before you do anything else.

Message 7 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Hello Stephan,

 

Thank you for the suggestions. I’ve tried te boot without disk 1. Initially it looked promising, because - on the unit - the disk lights 2,3 and 4 came on, instead of lgihts 1 and 2 immediately start blinking. However, shortly after disk light 2 started blinking. Perhaps this could be resolved by rebooting the unit with the option to perform a file system check. However I‘m not sure if this is a wise action to take (in regards to data loss). I will have a look at the system.log to determine if I can find any new information.

 

Also I’ve connected Disk 1 using my USB docking station and ran the WD LifeGuard diagnostics. The extensive test ran (a long time) and now reports that the test found bad sectors that may be repairable. I have the option to select repair. However I don’t know if I should do this.

 

All suggestions are welcome.

 

Best Regards,

Mark 

Message 8 of 18
StephenB
Guru

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

I don't recommend doing a file system check on the NAS, as any repair attempt could do more damage.  Though the NAS might already be doing that.  Have you tried looking at the status with RAIDar?  

 

I wouldn't try to repair the bad sectors either.

 

If there are failures on disk 2 also, then you probably will need data recovery.  If you are willing to pay for that (either a service or software), then it is best to not attempt any more repairs - it often makes the recovery more difficult.

Message 9 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

I’ve checked with Raidar. What is strange is that Raidar report all 4 disks in status green. (See attached screen) This while disk 1 is not even present at the moment and disk two is currently unasable. The lights on the unit report it correctly though, as the light for disk 1 is off and for disk 2 is flashing. 

 

I will look into the system.log an run the extensive check on disk 2. I’ll report back when I have more info.

Message 10 of 18
StephenB
Guru

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time


@Tomahna wrote:

What is strange is that Raidar report all 4 disks in status green. (See attached screen)

I think you forgot to attach it.
Message 11 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Something must have gone wrong with adding the attachment. Let me try again

Message 12 of 18
StephenB
Guru

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time


@Tomahna wrote:

Something must have gone wrong with adding the attachment. Let me try again


Well, clearly RAIDar is confused since disk 1 isn't installed.  Did you try logging into Frontview?

Message 13 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Yes I looked at Frontview (see attachment for the status screen). The strange thing is that it also reports disk 1 present. While the unit knows it is not (the light for it is off). I will also attach the system log file. It shows that is tries to configure raid for 3 disks instead of 4. So I find it confusing that the status is not consistanly reported. Could there perhaps be some data corruption in de setup or in the local data of the unit itself (i.e. Not stored on the raid disks?).

Message 14 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

And the system.log for today’s boot.

Message 15 of 18
StephenB
Guru

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

The disk numbering starts from 0, so the boot process is detecting that the first disk isn't present.  And you are getting read failures on the second disk, so the array can't be mounted.

 

Your Frontview screenshot shows the first two disks as not working (disk 1 of course is removed) - the SMART stats button for both is grayed out.

 

Message 16 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

Clear, thank you. I suspected as much when I looked at the system.log file

 

What remains confusing is that frontview displays the disk specificatiions for disk 1 (only without temperature) while it is not present, but that is not a problem for me. When I check te logs for the disk_smart disk log it reported before that all 4 disks passed the tests. That is probably why RAIDar is reporting 4 (Green lights) available disks.

 

I will continue doing some more tests this week and determine my options. I’ll report back when I’ve reached that point. Again thank you!

Message 17 of 18
Tomahna
Aspirant

Re: Readynas Ultra 4 RAID 5 - 2 of 4 disks fail at the same time

It has been a while since I had time to continue working on this problem, but this week I made some progress. Unfortunately I ran into a new (seemingly unrelated) problem. I recovered the first disk by cloning it. The replacement disk resulted in the unit coming back online! Note I have not cloned the second disk, since this disk has so many bad sectors and I intented to use a new disk voor slot 2 and then want sync it.

 

Unfortenately - for no apparent reason - one of the healthy disks has now been marked as unusable?! As a result only some of my Shares have come back online. Others most likely depend on the presence of the other disk. Checking the system log it reports that disk 3 is non-fresh

 

Apr 24 11:24:17 Serenia kernel: RAID conf printout:
Apr 24 11:24:17 Serenia kernel: --- level:5 rd:4 wd:3
Apr 24 11:24:17 Serenia kernel: disk 0, o:1, dev:sda5
Apr 24 11:24:17 Serenia kernel: disk 2, o:1, dev:sdb5
Apr 24 11:24:17 Serenia kernel: disk 3, o:1, dev:sdc5
Apr 24 11:24:17 Serenia kernel: md2: detected capacity change from 0 to 5986692759552
Apr 24 11:24:17 Serenia kernel: md2: unknown partition table
Apr 24 11:24:17 Serenia kernel: md: bind<sdc6>
Apr 24 11:24:17 Serenia kernel: md: bind<sdb6>
Apr 24 11:24:17 Serenia kernel: md: kicking non-fresh sdc6 from array!
Apr 24 11:24:17 Serenia kernel: md: unbind<sdc6>
Apr 24 11:24:17 Serenia kernel: md: export_rdev(sdc6)

 

Perhaps in the attempts to recover the array, a shutdown was not completed properly (however I'm not aware of it), because I found the following information on a linux forum: 

 

This can happen after an unclean shutdown (like a power fail). Usually removing and re-adding the problem devices will correct the situation:
/sbin/mdadm /dev/md0 --fail /dev/sda5 --remove /dev/sda5
/sbin/mdadm /dev/md0 --add /dev/sda5

 

My question now is, what is the best way to re-activate disk 4. Should I attempt the above commands for sdc6 or should it be sdc5. Or should I go about it differently? 

 

PS I'm confused about the reference to sdc6, because I only see sdc5 listed as a mounted disk in the system log...

Message 18 of 18
Top Contributors
Discussion stats
  • 17 replies
  • 3218 views
  • 0 kudos
  • 3 in conversation
Announcements