NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
Spielhaug
Mar 05, 2019Aspirant
ReadyNAS NV+ v2 - hdd intermittenly drops connection
My NAS has recently started to drop connection to several HDDs. "Disk removal detected [disk 2]", "Disk removal detected [disk1]". Aprox 4 hours later the drive is rediscovered and rebuilding starts....
Hopchen
Mar 05, 2019Prodigy
Hi Spielhaug
This looks like either a disk issue or a board issue. I can take a look at the logs if you like.
If so, download the full log set and throw them up on a Google link or similar and PM me the link.
Also, now would be a good time to backup the data if you don't have a backup already.
Cheers :)
This looks like either a disk issue or a board issue. I can take a look at the logs if you like.
If so, download the full log set and throw them up on a Google link or similar and PM me the link.
Also, now would be a good time to backup the data if you don't have a backup already.
Cheers :)
- HopchenMar 06, 2019Prodigy
Hi Spielhaug
The situation is not good. I can see you have been using two different size disks (likely 2x2TB 2x3TB?). This in itself is fine but it creates two raids on top of each other.
In the below raid output we would expect to see md0 (OS raid), md1 (swap raid), md2 (data raid 1) and md3 (data raid 2). md2 is completely missing and all the other raids are degraded.Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sdd4[0]
976750784 blocks super 1.2 level 5, 64k chunk, algorithm 2 [2/1] [U_]
md1 : active raid1 sda2[4]
524276 blocks super 1.2 [4/1] [U___]
md0 : active raid1 sda1[4]
4193268 blocks super 1.2 [3/1] [U__]
unused devices: <none>
The data volume cannot mount as a result of the missing md2 and you only see the OS volume information (information for volume "C", the data volume, is missing). At this point, access to the data is lost. You will be able to see some shares appear as folders but the content is not there. Neither can you write to the volume.Filesystem Size Used Avail Use% Mounted on
/dev/md0 4.0G 763M 3.1G 20% /
tmpfs 16K 0 16K 0% /USB
This situation most certainly occurred because of the disks dropping in and out of the raids causing double or even triple disk failures in the array. Your disks look fine from a healthcheck point-of-view. I can see you inserted a newer one though but that won't do much as the data raids are already in trouble. I don't see the disks in the past actually being bad. One of the disks had some errors on it, but only 2 ATA errors - you replaced that one.The 3 others appear fine yet still, we see disks dropping in and out.
3!!Thu Feb 21 17:28:58 CET 2019!!root!!Disk removal detected. [Disk 1]
3!!Thu Feb 21 17:35:36 CET 2019!!root!!Disk removal detected. [Disk 2]
3!!Fri Feb 22 06:47:39 CET 2019!!root!!Disk removal detected. [Disk 2]
3!!Fri Feb 22 06:47:39 CET 2019!!root!!Disk removal detected. [Disk 3]The above happens over and over and this will surely have been messing up your data raids and in turn your volume.
I suspect that perhaps the back-plane might be bad given that you didn't perform these actions.Do you have a backup of the data? I think data recovery is possible here, granted that we have at least 3 healthy disks that seems to be have been part of the raid before it went south. However, it would be wise to let NETGEAR do that part if you really need the data back. It will cost a data recovery contract which is a couple hundred bucks or something along those lines.
If you have a backup or don't need the data then a factory default would probably be the best option. However, I would be concerned myself to use the NAS as this issue here is possibly chassis related. I would advise that you test each disk with a tool from manufacturer to ensure the disks are fully OK. At that point you can start suspecting the chassis might be faulty.
Cheers- SandsharkMar 08, 2019Sensei
Can you tell if the drives are dropping out when they spin up? If so, it could be a power supply issue. Main chassis issue is just as likely, though, I think.
- HopchenMar 09, 2019Prodigy
Good point Sandshark
I don't necessarily see the spindown at the time of disks dropping out. However, there is indeed a lot of spin up/down in the logs. Like this:
Feb 21 18:37:02 NAS-Server noflushd[1934]: Disks spinning up after 2 minutes. Feb 21 18:42:05 NAS-Server noflushd[1934]: Spinning down disks. Feb 21 18:44:30 NAS-Server noflushd[1934]: Disks spinning up after 2 minutes. Feb 21 18:49:33 NAS-Server noflushd[1934]: Spinning down disks. Feb 21 18:51:58 NAS-Server noflushd[1934]: Disks spinning up after 2 minutes. Feb 21 18:57:01 NAS-Server noflushd[1934]: Spinning down disks. Feb 21 18:59:26 NAS-Server noflushd[1934]: Disks spinning up after 2 minutes. Feb 21 19:04:29 NAS-Server noflushd[1934]: Spinning down disks. Feb 21 19:06:53 NAS-Server noflushd[1934]: Disks spinning up after 2 minutes. Feb 21 19:11:56 NAS-Server noflushd[1934]: Spinning down disks. Feb 21 19:14:21 NAS-Server noflushd[1934]: Disks spinning up after 2 minutes. Feb 21 19:19:24 NAS-Server noflushd[1934]: Spinning down disks. Feb 21 19:21:49 NAS-Server noflushd[1934]: Disks spinning up after 2 minutes.
Furthermore, diving a little deeper I can see a few times where this is reported.
Feb 22 06:39:34 NAS-Server kernel: sd 0:0:1:0: timing out command, waited 180s Feb 22 06:39:34 NAS-Server kernel: sd 0:0:1:0: timing out command, waited 180s Feb 22 06:39:34 NAS-Server kernel: sd 0:0:1:0: [sdb] Unhandled error code Feb 22 06:39:34 NAS-Server kernel: sd 0:0:1:0: [sdb] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Feb 22 06:39:34 NAS-Server kernel: end_request: I/O error, dev sdb, sector 0 Feb 22 06:39:34 NAS-Server kernel: __ratelimit: 32 callbacks suppressed Feb 22 06:39:34 NAS-Server kernel: Buffer I/O error on device sdb, logical block 0 Feb 22 06:39:34 NAS-Server kernel: Buffer I/O error on device sdb, logical block 1 Feb 22 06:39:34 NAS-Server kernel: Buffer I/O error on device sdb, logical block 2 Feb 22 06:39:34 NAS-Server kernel: Buffer I/O error on device sdb, logical block 3
I think this could be a SATA connectivity issue though, also given that the smart-test passes. It is worth checking all disks fully with manufacturer software IMO. If they come out clean it is probably chassis issue.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!