NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
pwptech
Feb 23, 2021Tutor
ReadyNas 628x Mysterious Storage Consumption
I have an RN628x with 8x 2TB drives in RAID6. In total I have about 10.89TB of provisioned storage. Out of the main storage I have 3 main shares. I also have 1 thick provisioned LUN. I do not utilize...
pwptech
Feb 24, 2021Tutor
Thanks for the suggestionsStephenB
I have disabled then re-enabled quota. I also did Disk Balance and it completed successfully. The snapshots have been removed now but I'm still facing the same issue where about 2TB seems to be missing. I am showing 10.09TB for Data in the frontend.
I will also perform a scrub and defrag. Short of that working I'm not sure what else could be done except rebuilding the array which I really don't want to do.
rn_enthusiast
Feb 25, 2021Virtuoso
Hi pwptech
Would you mind grabbing the NAS log-set for me? Then I will have a look at it. On the web admin page, go to "System" > "Logs" > click "Download logs".
This should download a zip file containing all the logs. You can then upload that zip file to Google Drive, Dropbox or similar and make a link which I can use to download it. PM me this link - don't post it publicly here.
Thanks for the ping StephenB :)
Cheers
- rn_enthusiastFeb 26, 2021Virtuoso
The volume is 10.89 TiB and the used space is 10.22TiB (93% full).
Label: '0a4357ec:data' uuid: 3e5e6709-18d6-4dc0-b238-ba3e4cbaf8de Total devices 1 FS bytes used 10.22TiB
I don't see a reason not to trust this report from the filesystem. By default the snapshots are stored in an unlistable directory and thus doing "du" probably isn't going to be accurate and as you can see only showed 7.4TB. I can see you have 2 snapshots but one of them is odd and exists in /data/._share which is a config directory.
ID 42109 gen 51022532 top level 260 path ._share/Backups/.snapshot/b_1614215108_6056
Path would be /data/._share/Backups/.snapshot/b_1614215108_6056
I wonder how this came about... Can you run do these commands and show output:
btrfs subv show /data/._share/Backups/.snapshot/b_1614215108_6056 btrfs subv list -s /data btrfs qgroup show /data
Side note: You have a disk that failed your disk test, you need to replace it. Disk 1. The kernel logs are also complaining about this disk. Replace it asap.
[21/02/24 09:19:14 MST] warning:volume:LOGMSG_DISKTEST_RESULT_FAIL_DISK Disk test failed on disk in channel 1, model WDC_WD20EFRX-68EUZN0, serial WD-WCC4M0KH0HDA.
Device: sda Model: WDC WD20EFRX-68EUZN0 Serial: WD-WCC4M0KH0HDA Firmware: 82.00A82W Class: SATA RPM: 5400 Sectors: 3907029168 Pool: data PoolType: RAID 6 PoolState: 1 PoolHostId: a4357ec Health data ATA Error Count: 0 Reallocated Sectors: 0 Reallocation Events: 0 Spin Retry Count: 0 Current Pending Sector Count: 126 Uncorrectable Sector Count: 0 Temperature: 32 Start/Stop Count: 20 Power-On Hours: 24275 Power Cycle Count: 20 Load Cycle Count: 1034
Many of these errors in kernel logs. Not good to keep it in the NAS.
[Thu Feb 25 06:59:36 2021] sd 0:0:0:0: [sda] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [Thu Feb 25 06:59:36 2021] sd 0:0:0:0: [sda] tag#6 Sense Key : Medium Error [current] [descriptor] [Thu Feb 25 06:59:36 2021] sd 0:0:0:0: [sda] tag#6 Add. Sense: Unrecovered read error - auto reallocate failed [Thu Feb 25 06:59:36 2021] sd 0:0:0:0: [sda] tag#6 CDB: Read(10) 28 00 e3 18 6c d0 00 02 28 00 [Thu Feb 25 06:59:36 2021] blk_update_request: I/O error, dev sda, sector 3810028752
- rn_enthusiastMar 01, 2021Virtuoso
Update on this one. I worked with pwptech in PM.
We can see that there are the 3 shares/LUNs of interest, that are taking most space on the NAS:
ID 0/281 = 3.47TiB = /data/Backups ID 0/5548 = 90.09GiB = /data/Software ID 0/31827 = 6.65TiB = /data/VeeamBackups
"VeeamBackups" is the LUN. This LUN is configured to 4.5TB (thick) as confirmed via screenshots of the config and we can see this too, from the command-line:
root@NAS-01:~# ls -lahR /data/VeeamBackups/ /data/VeeamBackups/: total 32K drwxr-xr-x 1 root root 12 Mar 27 2020 . drwxr-xr-x 1 root root 142 Feb 22 10:35 .. drwxr-xr-x 1 root root 74 Mar 27 2020 .iscsi /data/VeeamBackups/.iscsi: total 4.6T drwxr-xr-x 1 root root 74 Mar 27 2020 . drwxr-xr-x 1 root root 12 Mar 27 2020 .. -rw-r--r-- 1 root root 4.5T Feb 28 18:29 iscsi_lun_backing_store <<<===== -rw-r--r-- 1 root root 36 Mar 27 2020 .serial_number
Yet still, the filesystem is allocating 2TB more than expected, to house this NAS. This lines up with what pwptech reported in the first place, that he/she lost 2TB "out of the blue". I looked at the Status history on the NAS and found that the NAS was stable at around 20% space left, leading up to the episode:
[21/01/16 11:50:44 MST] warning:volume:LOGMSG_VOLUME_USAGE_WARNING Less than 20% of volume data's capacity is free. Performance on volume data will degrade if additional capacity is consumed. NETGEAR recommends that you add capacity to avoid performance degradation. [21/01/31 11:06:42 MST] warning:volume:LOGMSG_VOLUME_USAGE_WARNING Less than 20% of volume data's capacity is free. Performance on volume data will degrade if additional capacity is consumed. NETGEAR recommends that you add capacity to avoid performance degradation. [21/02/02 09:32:22 MST] warning:volume:LOGMSG_VOLUME_USAGE_WARNING Less than 20% of volume data's capacity is free. Performance on volume data will degrade if additional capacity is consumed. NETGEAR recommends that you add capacity to avoid performance degradation. [21/02/12 13:05:30 MST] warning:volume:LOGMSG_VOLUME_USAGE_WARNING Less than 20% of volume data's capacity is free. Performance on volume data will degrade if additional capacity is consumed. NETGEAR recommends that you add capacity to avoid performance degradation.
Then suddenly it dropped to less 5% (essentially, the NAS filled up):
[21/02/20 16:47:25 MST] warning:volume:LOGMSG_VOLUME_USAGE_CRITICAL Less than 5% of volume data's capacity is free. data's performance is degraded and you risk running out of usable space. To improve performance and stability, you must add capacity or make free space.
What preceeded this was a defragmentation on the NAS - which would also defrag the LUN file (iscsi_lun_backing_store):
[21/02/13 05:11:04 MST] notice:volume:LOGMSG_DEFRAGSTART_VOLUME Defragmentation started for volume data. [21/02/13 13:16:14 MST] notice:volume:LOGMSG_DEFRAGEND_VOLUME Defragmentation complete for volume data.
This is the only time a defrag was ever run on the NAS and a few days later, the NAS reported out of space condition. Keep in mind that these space warnings don't happen all the time so it is quite possible that the space issues happened just after the LUN was defragged or very shortly thereafter. So, the filesystem is using 6.65TiB to house a thick LUN of 4.5TB. This reminds of an issue other ReadyNAS users reported where defragging a LUN would balloon the space utilization on the NAS. mdgm might remember more about this but we saw several reports of it. It has something to do with extents on the LUN increasing/breaking after the defrag and likely more of a BTRFS issue than a NAS issue, per se. I very much suspect exactly the same thing happened here. The only possible remedy I think there is, is to defrag the LUN again, using a smaller block size:
btrfs fi defragment -t 8192 -v /data/VeeamBackups/.iscsi/iscsi_lun_backing_store
Or even using a 4K (4096) size could maybe help even more. It is not a guarantee to work but it is probably worth a shot, I think. mdgm could have some insight here too, if he remembers these issues - but it has been a couple of years now :)
Cheers- mdgmMar 01, 2021VirtuosoI don’t think I remember that issue though it has been a while.
Another thing worth noting is that defragmentation breaks the CoW link between snapshots. If there were ever snapshots of the LUN that could be related to the problem. Edit: oops. If there were snapshots they have been deleted so probably not relevant I guess.
The snapshot with a name that started with a b would have been from a backup job. I think it was probably meant to be deleted when the backup job finished but for some reason it wasn’t. Perhaps there was a power failure at some point in the middle of a backup job or something like that.
Further edit: 1614215108 is epoch time for 25 February at about 1am. It could just be that the logs were downloaded whilst the relevant backup job was running.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!