NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

tominwi's avatar
tominwi
Luminary
May 14, 2019

ReadyNAS NV+ after HDD upgrade

I decided to upgrade my NV+ V1 with 4.1.16 and X-RAID from qty 4 1TB drives to qty 4 2TB drives yesterday morning. Followed the Netgear recommendation to pull the 1st drive hot and replace it, and AFAICT this went swimmingly well in that I got all the right emails, starting with "Disk fail event", "Disk add event", and "Disk initialization started . The estimated time of completions is 17 hour(s).." culminating last evening with "Disk initialization successfully finished" and "Sync started on C" and some 4 hours later "RAID sync finished on volume C. The volume is now fully redundant". This happened around 2 a.m. this morning.

 

So I approached the NV+ this morning with the intent to pull Disk 2 and repeat the process, but found my NAS in Kernel panic. Had to pull the power cord as the power switch was unresponsive. Took 1/2 hour but the box booted, having done a FS and Quota check, and Frontview log timeline shows only the "RAID sync good--full redundancy" msg from 2am and my power pull "Improper shutdown..." and a new "RAID sync started" before a final "System is up" message a half hour later. The new 2TB drive shows up as Drive 1 and things seem OK to continue with replacement of Drive 2 EXCEPT...

 

The NV+ Act(ivity) LED is flashing rapidly!? What does anyone think my NAS is doing? Should I just wait it out to stop before continuing with Drive 2 replacement/upgrade?

26 Replies

  • Does the log say that the RAID resync has completed?  You can also see this status by hovering over the disk and volume icons in Frontview and RAIDar.

     

    Don't replace the second disk until the resync is done.

    • tominwi's avatar
      tominwi
      Luminary

      Hmm dunno where to "hover over icons" but found under Volumes > RAID Configuration this Status "Resync 15% complete, Time to finish 4 hr 51 min, Speed 45.1 MB/sec" so I guess it is indeed doing something.

       

      StephenB the Netgear kb I've read suggested to me that I could/should replace all disks and then restart. Was this wrong advice? Should I instead, after replacing each disk, do a restart and (apparently) Resync as it applears to be doing now?

       

      Also, any thoughts on why my kernel panicked? ;-)

      • StephenB's avatar
        StephenB
        Guru

        tominwi wrote:

        StephenB the Netgear kb I've read suggested to me that I could/should replace all disks and then restart.


        If you install all disks at once, then you are starting over.  You'd need to reconfigure the NAS (recreate user accounts, shares, install any add-ons, ...) and then restore all the ifles from a backup.  This can be faster than doing four resyncs.

         

        The path you are on will preserve your data.  You replace one disk at a time, wait for the resync, and move on to the rest.  After the fourth disk is resynced, the volume should expand (perhaps requireing a NAS reboot to trigger it).

         


        tominwi wrote:

        Hmm dunno where to "hover over icons" but found under Volumes > RAID Configuration this Status "Resync 15% complete, Time to finish 4 hr 51 min, Speed 45.1 MB/sec" so I guess it is indeed doing something.

         


        Indeed.

         

        If you look on the bottom of the Frontview page you will see a round icon labeled volume, and another block of icons labeled "disks".  You can hover your mouse over those icons, and it should give you the status.   

         


        tominwi wrote:

        Also, any thoughts on why my kernel panicked? ;-)


        No. You should download the log zip file, there might be some clues in there. 

         

        I suggest looking at the disk health before you proceed (clicking on the SMART+ control for each disk on the health page).  You can't use Chrome for this, as it will show you a blank pop-up.  IE or FireFox will work ok.

         

        It would be wise to make sure you have an up-to-date backup of your files before resyncing the next disk.

  • My situation gets worse: I've reverted to the original 4x1TB drives, all of which have worked fine for years and which have good Health/SMART status afaict, and today I got another kernel panic, after it had been working just fine for a couple days (though only a few hours at a time--today is Backups day and so I leave it on all day and make an rsync bu to USB which btw worked fine too, but it paniced mid-afternoon after being on for 8 or 10 hours). After pulling the plug on the NV+ and re-powering, I ran a Memory Test which came-out OK (I have the stock RAM) and upon Booting it did the expected FS and Quota CHKs, but now the box is doing a Resync, as if I'd pulled or replaced a drive or ??? Says it's gonna take 12 hours to do this. What to do next? Shut it down and pull each drive one-by-one and test them? Or maybe I should try instead an OS Reinstall in case something got hosed in the config files, maybe when I installed those 2TB drives? I do have a spare platform I could try, but I'd have to update its firmware first and honestly this doesn't feel like a Sparc or mobo or power supply problem to me. Here is the dmesg.log which shows a bad block in the NAND? What's that, the onboard firmware? I checked back on ancient (working system) logs and it has always been there so I dunno that could be the problem now?
    • tominwi's avatar
      tominwi
      Luminary

      Linux version 2.6.17.14ReadyNAS (root@calzone) (gcc version 3.3.5 (Infrant 3.3.5-1)) #1 Wed Jun 20 20:08:20 PDT 2012
      You system is PZERO.
      ASIC=IT3107
      On node 0 totalpages: 16384
      Normal zone: 15360 pages, LIFO batch:3
      DMA zone: 1024 pages, LIFO batch:0
      zlist 0 802f115c
      zone 802f0f14, name Normal
      zlist 1 802f1170
      zone 802f0ccc, name DMA
      zlist 2 802f1184
      zone 802f0f14, name Normal
      Built 1 zonelists
      Kernel command line: root=/dev/ram0 init=/linuxrc rw raid=noautodetect profile=2
      kernel profiling enabled (shift: 2)
      PID hash table entries: 2048 (order: 11, 8192 bytes)
      Dentry cache hash table entries: 32768 (order: 3, 131072 bytes)
      Inode-cache hash table entries: 16384 (order: 2, 65536 bytes)
      Memory: 226176k/262144k available (2592k kernel code, 35888k reserved, 656k data, 96k init, 0k highmem)
      init_mm.pgd 8f0ff000
      Calibrating delay loop... 186.36 BogoMIPS (lpj=931840)
      Mount-cache hash table entries: 2048
      checking if image is initramfs...it isn't (no cpio magic); looks like an initrd
      Freeing initrd memory: 16384k freed
      NET: Registered protocol family 16
      usbcore: registered new driver usbfs
      usbcore: registered new driver hub
      NET: Registered protocol family 2
      IP route cache hash table entries: 2048 (order: -1, 8192 bytes)
      TCP established hash table entries: 8192 (order: 1, 32768 bytes)
      TCP bind hash table entries: 4096 (order: 0, 16384 bytes)
      TCP: Hash tables configured (established 8192 bind 4096)
      TCP reno registered
      audit: initializing netlink socket (disabled)
      audit(1070280003.130:1): initialized
      VFS: Disk quotas dquot_6.5.1
      Dquot-cache hash table entries: 4096 (order 0, 16384 bytes)
      Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
      Initializing Cryptographic API
      io scheduler noop registered
      io scheduler anticipatory registered
      io scheduler deadline registered
      io scheduler cfq registered (default)
      RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
      loop: loaded (max 8 devices)
      nbd: registered device at major 43
      tun: Universal TUN/TAP device driver, 1.6
      tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
      md: raid0 personality registered for level 0
      md: raid1 personality registered for level 1
      md: raid5 personality registered for level 5
      md: raid4 personality registered for level 4
      xor engine => SPARC.
      device-mapper: 4.6.0-ioctl (2006-02-17) initialised: dm-devel@redhat.com
      Serial: Padre driver $Revision: 1.1.1.1 $ 2 ports
      ttyS0 at I/O 0x0 (irq = 7) is a padre uart
      ttyS1 at I/O 0x0 (irq = 8) is a padre uart
      oprofile: using timer interrupt.
      TCP bic registered
      NET: Registered protocol family 1
      NET: Registered protocol family 17
      NET: Registered protocol family 5
      802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
      All bugs added by David S. Miller <davem@redhat.com>
      md: Skipping autodetection of RAID arrays. (raid=noautodetect)
      RAMDISK: Compressed image found at block 0
      VFS: Mounted root (ext2 filesystem).
      Freeing unused kernel memory: 96k freed
      padre_i2c: module license 'Infrant Technologies, Inc.' taints kernel.
      padre_i2c: no version for "udiv" found: kernel tainted.
      TWSI Initialize
      Padre NSPIO setup: 80353394... No argv, go to default.
      raid5: xor select to PADRE_RXA.
      Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
      ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
      padre chip scan,token=1
      Scan the padre NSP IO hardware.
      Need memory for RTEngine 63680
      PIO mode on chan 7
      DMA mode on chan 0
      DMA mode on chan 1
      DMA mode on chan 2
      DMA mode on chan 3
      Padre IDE controller, sata start:1
      No TLER on ST31000528AS €
      hdc: ST31000528AS (s/n:9VP56NT9), ATA DISK drive (ATAEXT)
      No TLER on ST31000528AS €
      hde: ST31000528AS (s/n:9VP52XV1), ATA DISK drive (ATAEXT)
      No TLER on ST31000528AS €
      hdg: ST31000528AS (s/n:9VP4YDZ0), ATA DISK drive (ATAEXT)
      No TLER on ST31000528AS €
      hdi: ST31000528AS (s/n:9VP55BNP), ATA DISK drive (ATAEXT)
      ide1 at 0x200-0x207,0x208 on irq 32
      ide2 at 0x280-0x287,0x288 on irq 33
      ide3 at 0x300-0x307,0x308 on irq 34
      ide4 at 0x380-0x387,0x388 on irq 35
      Update NSPIO settings 80353394.
      hdc: max request size: 512KiB
      hdc: use capacity 1953525168 sectors (1000204 MB)
      Drive support hpa, still should not change max addr.
      hdc: 1953108616 sectors (999991 MB), CHS=65535/255/63
      hdc: cache flushes supported
      hdc: hdc1 hdc2 hdc3 < hdc5 >
      hde: max request size: 512KiB
      hde: use capacity 1953525168 sectors (1000204 MB)
      Drive support hpa, still should not change max addr.
      hde: 1953108616 sectors (999991 MB), CHS=65535/255/63
      hde: cache flushes supported
      hde: hde1 hde2 hde3 < hde5 >
      hdg: max request size: 512KiB
      hdg: use capacity 1953525168 sectors (1000204 MB)
      Drive support hpa, still should not change max addr.
      hdg: 1953108616 sectors (999991 MB), CHS=65535/255/63
      hdg: cache flushes supported
      hdg:chn=2, statu/LP_S=0x(d0/d050)29, 8
      hdg1 hdg2 hdg3 < hdg5 >
      hdi: max request size: 512KiB
      hdi: use capacity 1953525168 sectors (1000204 MB)
      Drive support hpa, still should not change max addr.
      hdi: 1953108616 sectors (999991 MB), CHS=65535/255/63
      hdi: cache flushes supported
      hdi: unknown partition table
      Link to padre IO.

      RAID disks check:
      ALL = 22/33/34/56/0/0/0/0, 4
      IDE = 22/33/34/0/0/0/0/0, 3
      MD = 0/0/0/0/0/0/0/0, 0
      RAID rule check result: 0
      md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
      md: bitmap version 4.39
      Disk protected mark = 1
      x_raid_start: 1,current:0
      BDL_count= 0, fw=SN04, model=
      BDL_count= 1, fw=SN04, model=
      GOT MEMORY FOR DJ: 128*4k
      Drive hdc SB at 1953511632(-sbs) CURRENT
      Drive hde SB at 1953511632(-sbs) CURRENT
      Drive hdg SB at 1953511632(-sbs) CURRENT
      Drive hdi SB at 1953511632(-sbs) CURRENT
      x_raid_start: 1,result:0
      Find PHY: 0
      Lookup PHY ID: 0x000f, 0x01
      P0 GPIO initialization... BOARDID=1
      Boot type/reason: normal/2/0
      Adjusting fan pwm........................
      Found LM75 at 0x48
      NAND device: Manufacture ID: 0xad, Chip ID: 0x76 (Hynix NAND 64MiB 3,3V 8-bit)
      size of table 4096
      table is there 0x8
      bad block 1573 replacing by 4095
      total bad block 1
      bad 1573 = 4095 bad 4095 = -1 Total bad block number 1
      retlen = 0x0200
      VPD checksum = 0x9406
      ECC is ON
      Creating 2 MTD partitions on "NAND 64MiB 3,3V 8-bit":
      0x00000000-0x00100000 : "P0 flash partition 1"
      0x00100000-0x03ffc000 : "P0 flash partition 2"
      NEON flash: probing 8-bit flash bus
      CFI: Found no NEON flash device at location zero
      NEON flash: unknown flash device, mfr id 0x1, dev id 0x0
      NEON flash: Found no Atmel device at location zero
      This board is not supported.
      You can use parm_extport=X module parm.
      ID=6013 on i2c_addr=1f
      GPIO2X=3c
      lcd:driver loaded
      X_RAID_START
      startstop XRAID command = start, flash_cache=0
      X_RAID clean shutdown indicator: 0xf.
      0 4 4 4 4 0 0 0
      0 1 1 1
      1 0 1 1
      1 1 0 1
      1 1 1 0
      Update time for sb 1 = 5cdd603a.
      Update time for sb 2 = 5cdd603a.
      Update time for sb 3 = 5cdd603a.
      Update time for sb 4 = 5cdd603a.
      recent_ID = 1, select_ID=1, most_ID=4 right_mac=4
      Selected sb 1, ctime=5cdd603a, id=a20276e8.
      Use this image: 1

      VERSION/ID : SB=(V:0.1.0) ID=<a20276e8.00000000.00000000.00000000> CT:5cdd603a
      RAID_INFO : DISKS(TOTAL:4 RAID:4 PARITY:3 ONL:4 WRK:4 FAILED:0 SPARE:0 BASE:0)
      SZ:1953108616 UT:00000000 STATE:0 LUNS:2 EXTCMD:1 LSZ:1953108614
      LOGICAL_DRIVE : 0: B:0000000002 E:0004096000 R:1 O:1 I:1:000000000 DM:f
      LOGICAL_DRIVE : 1: B:0004096002 E:1949012614 R:4 O:1 I:1:000000000 DM:f
      PHYSICAL_DRIVE: 0: DISK<N:0/1,hdc(22,0),ID:0,PT:1,SZ:1953108616,ST: B:online>
      PHYSICAL_DRIVE: 1: DISK<N:1/2,hde(33,0),ID:1,PT:1,SZ:1953108616,ST: :online>
      PHYSICAL_DRIVE: 2: DISK<N:2/3,hdg(34,0),ID:2,PT:1,SZ:1953108616,ST: :online>
      PHYSICAL_DRIVE: 3: DISK<N:3/4,hdi(56,0),ID:3,PT:1,SZ:1953108616,ST:P :online>
      CURRENT_DRIVE : DISK<N:0/1,XXX(22,0),ID:0,PT:1,SZ:1953108616,ST: B:online>
      Need to do drives searching.
      Find p d at 3, chn 3
      Total=4; raid=4; ready=0; work=4; failed=0
      Check degraded mode, start_pos=1
      No drive missing, X_RAID run in opt mode.
      Change X_RAID running mode from 0 to 1
      Update backup SB.
      X_RAID: recovery thread got woken up ...
      New = 3, source drives = f, current/active=4/4
      hdc: hdc1 hdc2 hdc3 < hdc5 >
      hde: hde1 hde2 hde3 < hde5 >
      hdg: hdg1 hdg2 hdg3 < hdg5 >
      hdi: unknown partition table
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      kjournald starting. Commit interval 5 seconds
      EXT3 FS on hdc1, internal journal
      EXT3-fs: recovery complete.
      EXT3-fs: mounted filesystem with ordered data mode.
      linked, 1000mbps mode
      kjournald starting. Commit interval 5 seconds
      EXT3 FS on hdc1, internal journal
      EXT3-fs: mounted filesystem with journal data mode.
      Adding 255968k swap on /dev/hdc2. Priority:0 extents:1 across:255968k
      Adding 255968k swap on /dev/hde2. Priority:0 extents:1 across:255968k
      Adding 255968k swap on /dev/hdg2. Priority:0 extents:1 across:255968k
      enable_irq(11) unbalanced from f804059c
      hdc: cache flushes supported
      hde: cache flushes supported
      hdg: cache flushes supported
      hdi: cache flushes supported
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      hdi:chn=3, statu/LP_S=0x(d0/d050)29, 8
      unknown partition table
      chn=2, statu/LP_S=0x(d0/d050)29, 4
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      kjournald starting. Commit interval 5 seconds
      EXT3 FS on dm-0, internal journal
      EXT3-fs: mounted filesystem with journal data mode.
      X_RAID_SYNC_FORCE
      X_RAID: recovery thread got woken up ...
      New = 3, source drives = f, current/active=4/4
      Prepare sync raid 1, source_image=f, sync=4, ready=3
      Start sync:0, source_image=1/3, between 2 to 4096002
      sata_hotplug: /sbin/hotplug xraid_sync_started hdiUser mode helper start.
      done do_sata_hotplug
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      hdc: cache flushes supported
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      hde: cache flushes supported
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      hdg: cache flushes supported
      hdi: cache flushes supported
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      chn=0, statu/LP_S=0x(d0/d050)29, 32
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      Sync:1, source_image=1/1, between 2 to 4096002,all_tgt=f
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      chn=1, statu/LP_S=0x(d0/d050)29, 32
      Sync:2, source_image=1/2, between 2 to 4096002,all_tgt=f
      chn=2, statu/LP_S=0x(d0/d050)29, 32
      chn=2, statu/LP_S=0x(d0/d050)29, 32
      Sync:3, source_image=1/3, between 2 to 4096002,all_tgt=f
      chn=2, statu/LP_S=0x(d0/d050)29, 32
      Start sync raid 4 from 4096002 to 1953108616
      chn=2, statu/LP_S=0x(d0/d050)29, 32
      chn=2, statu/LP_S=0x(d0/d050)29, 32
      chn=2, statu/LP_S=0x(d0/d050)29, 32
      chn=2, statu/LP_S=0x(d0/d050)29, 32
      chn=2, statu/LP_S=0x(d0/d050)29, 32
      EXT3 FS on dm-0, internal journal

    • StephenB's avatar
      StephenB
      Guru

      tominwi wrote:
      Here is the dmesg.log which shows a bad block in the NAND? What's that, the onboard firmware? 

      Yes.  The firmware install is stored in the onboard flash (and I believe the boot loader is also).  It looks like the system is sparing the bad block, so I agree that doesn't seem to be the main issue.

       


      tominwi wrote:
       but now the box is doing a Resync, as if I'd pulled or replaced a drive or ??? Says it's gonna take 12 hours to do this. What to do next? Shut it down and pull each drive one-by-one and test them?

      I'd let the resync continue to completion.   This is likely a side-effect of the kernel panic (unclean shutdown).

       


      tominwi wrote:
       today I got another kernel panic, after it had been working just fine for a couple days ... Or maybe I should try instead an OS Reinstall

      Ok.  So this means the kernel panic isn't linked to the new disks.

       

      I'd suggest backing up the data and trying to do a factory reset with the 4x1TB drives in place.  You could do an OS reinstall first I guess, and see if it helps.  The OS reinstall doesn't reinstall everything, so it might not help if there's some corruption in the OS partition.

       

      Another option is to do a factory install with one of the 2 TB drives (leaving the other bays empty).  Don't use the one that initially failed Seatools, since it's health is unknown.  Then add the other disks one of a time, and see if the kernel panic strikes again.  If all looks well, then reconfigure the NAS and restore the files from backup.

       

      Though I do think you should consider how much time/energy you want to put into this.  It might be time to get a new NAS and let this one go.

       

       

      • tominwi's avatar
        tominwi
        Luminary

        All good thoughts, thanks. Now For The Dumb Questions: I've been backing-up the entire volume, the C: drive, using Rsync to an external USB. If I were to do the Factory Reset/Reinstall, how do I recover from this USB backup? And do I restore my CONFIG file using FV Before or After I get the files from USB drive back onto the NAS?

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More