× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: ReadyTIER log message...problem?

sstillwell
Tutor

ReadyTIER log message...problem?

I've discussed elsewhere the setup of ReadyTIER on my ReadyNAS Pro unit, but I'm seeing some odd behavior in the logs and not sure what it implies...hoping someone has seen it before.

 

I've got a volume schedule set up twice a week to run a data migration from the SSD tier down to spinning storage.  That job runs as scheduled, and if the SSDs fill up to 90% before a scheduled run, it just does it on its own.  All good so far.

 

When a migration starts, I see the following in the log:

 

<timestamp> Volume: Data tier migration started for volume volume1.

 

Anywhere from an hour and a half to two and a half hours later, I'll see another message that is not so friendly....

 

<timestamp> Volume: Data tier migration failed to start for volume volume1.

 

Failed to START???  If I look at the status of the volume, I can see that the used SSD storage is indeed back down to just a bit over the amount of space that metadata takes, and I haven't seen corruption yet, so I THINK it's working...but what's up with that message?

 

Hoping that someone knows the details behind this.  Thanks in advance!

Model: RNDP6610|ReadyNAS Pro 6 6TB (6 x 1TB Enterprise)
Message 1 of 11
sstillwell
Tutor

Re: ReadyTIER log message...problem?

Anyone?  Is someone from Netgear around that might know?

Message 2 of 11
yxue
NETGEAR Expert

Re: ReadyTIER log message...problem?

Is your migration is doing,, or your disk group is almost full? so another migration job fail to start
Message 3 of 11
sstillwell
Tutor

Re: ReadyTIER log message...problem?

I don't think that the group is filling up before the migration finishes...but that sure is a possibility.  I can try lowering the percentage from 90% down to 75% or even 50%.  The SSD tier is a pair of 512GB Samsung 860 PRO.  I've set it to twice a week or when 70% is reached...we'll see if that improves matters.

Message 4 of 11
sstillwell
Tutor

Re: ReadyTIER log message...problem?

Okay, I've tried a few different things.  I have lowered the "full" threshold to 70%, and I have scheduled migrations to occur daily at 5:00 AM instead of once or twice weekly.  When I last checked the volume before bed (at 2:00 AM), the volume highest tier was 48 GB full, ~450 GB available after a full day's activity.  At 5:00 AM a migration began, and at 5:20 AM another message was logged saying it failed to start migration.  This has been happening every time it migrates.  That seems VERY unlikely to have been a second migration needing to start due to tier reaching capacity.  I think we need to look further for a cause.

Message 5 of 11
sstillwell
Tutor

Re: ReadyTIER log message...problem?

Okay, now we know there's a problem...my NAS was unresponsive this morning, taking down all my VMs.  When I rebooted the NAS, it came back up with volume in read-only mode.  I rebooted one more time and the volume is dead.  This is incredibly bad.  I do have quite a lot of this backed up via Unitrends Virtual Appliance to storage on the NAS itself, which is now apparently gone, but it's also backed up as cold copies to Amazon S3.  I need to figure out if there is any chance to recover this.

 

I'm wishing I could have had more response about this error message before this...I'm feeling a bit let down.  It's my responsibility to make backups, and I have, but it's going to take significant amounts of time to recover from this and it's going to cost me money in lost opportunities.  If I rebuild this I'm thinking that Ready Tier isn't ready at all and I'm not going to use it further.

Message 6 of 11
sstillwell
Tutor

Re: ReadyTIER log message...problem?

I've had more luck than I deserved...I managed to get the volume to mount after using a 'btrfs rescue zero-log' (checked the system logs for errors during the mount and it was failing during log recovery).  I'm sure there's some form of data loss, but the systems weren't under heavy use during the middle of the night, so I'm hoping it's minimal.

 

In the meantime I have turned ReadyTIER from data tiering to metadata tiering only.  If I can find a definitive answer as to whether it's safe to remove the SSD tier without destroying the volume, I'm going to get rid of the tiering entirely...it just doesn't seem ready for prime time.

 

I also think it's time to get a new NAS...I'll relegate this unit to other duties.  Haven't decided whether I'll get another Netgear or look at other brands that shall remain nameless.

Message 7 of 11
Icewaterhot
NETGEAR Employee

Re: ReadyTIER log message...problem?

Hello sstillwell,

 

Could you please enable SDM and PM the code? We will take a look this issue.
https://kb.netgear.com/000053266/ReadyNAS-OS-6-Enabling-Secure-Diagnostics-Mode

Message 8 of 11
sstillwell
Tutor

Re: ReadyTIER log message...problem?

I can do that if you want, but you should know that I've already destroyed and re-created the volume with just the 4 x 8 TB WD Red drives, let it resync, then ran a full disk check against it before starting to cautiously use it again.  If any logs are stored on nonvolatile storage rather than the disks, I'd be happy to let you see it, but I'm afraid there's not much there to see now.

Message 9 of 11
sstillwell
Tutor

Re: ReadyTIER log message...problem?

Ah, I guess at least the journalctl log goes back before the crash, so you may be able to get what you need.  I've sent you the port infirmation via PM.

Message 10 of 11
sstillwell
Tutor

Re: ReadyTIER log message...problem?

...and after recreating a brand-new volume with only the spinning disks in RAID5, allowing it to fully sync and running successful disk checks on all disks...

 

It went read-only and then completely down again.  This time I may not be able to get it back.  Mounting it says there's a bad superblock, but btrfs rescue super-recover says "All supers are valid, no need to recover".  Mounting with -o recovery,nospace_cache,clear_cache says "wrong fs type, bad option, bad superblock on /dev/md127.  btrfs rescue zero-log says "ERROR: incorrect offsets 24187 2658730369", btrfs check /dev/md127 says the same thing.

 

Whatever data I manage to recover, this unit is getting powered down and is going on a shelf or into the recycle bin.  I can't afford the time this is costing me, not to mention the data I'm probably going to lose...terabytes of archival material that cannot be replaced.  I'm angry enough that I want to use language that would not be approved here.  Note:  these drives are very new, and check out 100% okay...according to your own diagnostics.  Maybe I shouldn't trust those either, eh?

 

If anyone from Netgear wants to log in and look at it, I will turn on remote diagnostics while I still have it powered up, but note that this won't be for long...only until the point that I give up trying to get data back.  Once I'm done with that, it's all over.

Message 11 of 11
Top Contributors
Discussion stats
  • 10 replies
  • 1795 views
  • 0 kudos
  • 3 in conversation
Announcements