RNDU2000 Unit systemd-journald.service entered failed state

firerain · ‎2017-01-26

Hey guys,

I've got a problem with my RNDU2000. Today, out of sudden, I was unable to get to my shares and network traffic stopped (it served as a router between 2 networks). I wasn't able to shut it down gracefully, so I just forced it to do so.

After reboot it failed to boot: fans spinning properly, activity and power LEDs blinking as they should but both drive LEDs dead + connection refused on SSH/Telnet, admin page not accessible but pings flying all right. Being terribly exhausted today, I've rebooted the unit again with same method - twice (totally noobish of me).

After that I got enlightened: RAIDar. It says everything is all right, even system diagnostics. I was able to fetch logs and wasn't pleased to see lines like this inside dmesg.log:

[Fri Jan 27 01:01:13 2017] BTRFS: bdev /dev/md0 errs: wr 2, rd 167, flush 0, corrupt 0, gen 0

[Fri Jan 27 01:01:13 2017] systemd[1]: Unit systemd-journald.service entered failed state.
[Fri Jan 27 01:01:13 2017] systemd[1]: systemd-journald.service start request repeated too quickly, refusing to start.

[Fri Jan 27 01:01:14 2017] systemd[1243]: Failed at step STDOUT spawning /sbin/swapon: Connection refused
[Fri Jan 27 01:01:14 2017] systemd[1]: dev-md1.swap swap process exited, code=exited status=209
[Fri Jan 27 01:01:14 2017] systemd[1]: Unit dev-md1.swap entered failed state.
[Fri Jan 27 01:01:14 2017] systemd[1254]: Failed at step STDOUT spawning /bin/mount: Connection refused
[Fri Jan 27 01:01:14 2017] systemd[1256]: Failed at step STDOUT spawning /sbin/udevadm: Connection refused
[Fri Jan 27 01:01:14 2017] systemd[1]: VOL1.mount mount process exited, code=exited status=209
[Fri Jan 27 01:01:14 2017] systemd[1]: Job apps.mount/start failed with result 'dependency'.
[Fri Jan 27 01:01:14 2017] systemd[1]: Job home.mount/start failed with result 'dependency'.
[Fri Jan 27 01:01:14 2017] systemd[1]: Job local-fs.target/start failed with result 'dependency'.
[Fri Jan 27 01:01:14 2017] systemd[1]: Unit VOL1.mount entered failed state.
[Fri Jan 27 01:01:14 2017] systemd[1]: udev-trigger.service: main process exited, code=exited, status=209
[Fri Jan 27 01:01:14 2017] systemd[1]: Unit udev-trigger.service entered failed state.

[Fri Jan 27 01:22:57 2017] BTRFS: bdev /dev/md0 errs: wr 2, rd 215, flush 0, corrupt 0, gen 0
[Fri Jan 27 01:22:57 2017] systemd[1]: Unit systemd-journald.service entered failed state.
[Fri Jan 27 01:22:57 2017] systemd[1]: systemd-journald.service start request repeated too quickly, refusing to start.
[Fri Jan 27 01:22:57 2017] systemd[1]: Unit systemd-journald.socket entered failed state.
[Fri Jan 27 01:49:19 2017] BTRFS: bdev /dev/md0 errs: wr 2, rd 216, flush 0, corrupt 0, gen 0
[Fri Jan 27 01:49:42 2017] BTRFS: bdev /dev/md0 errs: wr 2, rd 217, flush 0, corrupt 0, gen 0
[Fri Jan 27 01:49:42 2017] BTRFS: bdev /dev/md0 errs: wr 2, rd 218, flush 0, corrupt 0, gen 0
[Fri Jan 27 01:49:42 2017] BTRFS: bdev /dev/md0 errs: wr 2, rd 219, flush 0, corrupt 0, gen 0

Ofc, these are only parts of the drama I saw inside

My unit is running RNOS 6.5.1, stuffed with 2x WD30EFRX.

Is there any way I could try to fix these errors within my ReadyNAS? Or I shouldn't even bother, rig both drives to PC and start recovery process?

mdgm-ntgr · ‎2017-02-01

Sounds like if possible backing up the data would be the next step. Assuming the data volume is fine this should be straightforward.

View solution in original post

mdgm-ntgr · ‎2017-01-26

You could send the logs zip file in (see the Sending Logs link in my sig).

Looks like there's problems wiht the 4GB root volume.

Are you seeing any SMART errors on the disks?

firerain · ‎2017-01-27

Thanx for your reply. I've just sent the logs (minus bash_history - some sensitive data out there). I've searched all the logs for smart and got only few infos - no warns/errors at all. If you look in the logs, you will see alot of low voltage warnings - I'm aware of that but not sure if it could cause such errors.

I haven't tried USB recovery yet. Do you think it could help in this matter? For now, my unit is turned off to prevent any further disk corruption - if there could be any. I'm IT spec and did alot of data recoveries and had a failure once doing it with my data so I prefer to be cautious.

I really appreciate the great job you are doing for the community!

mdgm-ntgr · ‎2017-01-29

The disks look healthy.

You could try running the memory test boot menu option if you want.

I can see lots of errors about voltage being out of spec in the logs. This could suggest a possible PSU issue.

firerain · ‎2017-01-30

Thanx for answering. I did not find anything disturbing in the logs neither. Mem test running for about 20 mins. LED sequence is: Disk 1, Disk 2, Power, USB, so afaik it should be all right. I'll keep it up for some time and will do a disk test and report it back here.

In one of my previous posts I told you about voltage warnings. I'm aware of that - it's because of self-made PSU (or rather 12V UPS) rigged to my unit. I'll handle it later on by increasing output voltage by 0.3V or so.

firerain · ‎2017-01-30

All right, disk test finished (at least I hope so - only power LED is blinking now). RAIDar not reporting any errors after performing the test.

Any other thoughts on this subject?

I could try swapping RAM or HDD - just give me some clue what would help in solving this.

mdgm-ntgr · ‎2017-01-30

Well perhaps fix the power problem and see what difference that makes.

firerain · ‎2017-01-31

I'm sorry, I didn't want to be so pushy.

I've tried that. While running all the tests, my NAS was connected with PSU that came in the box (this one was in use for few days only so I assume it should be fully functional).

Some other of my thoughts/observations:

The boot process seems completly normal: fans spinning up to max at the beginning and then down after a short time. However, the HDD LEDs not being lit up at all. Anyway, I can hear drives working (sometimes louder than when on standby, like during read operations). During the HDD test, drive LEDs blinking and corresponding sound (read/write operations) could be heard;
RAIDar still works perfectly, saying that everything is completly OK. However, trying to restart the unit within RAIDar does nothing and both SSH and HTTP refusing connections;
While reading the logs again, I've noticed that some of services are being fired up (like Apache, but only at 443 and raidar daemon - which is obvious since RAIDar is working). Admin page accessed over SSL opens up, asks for credentials and after long wait at progress bar page it says that "Admin Page is offline".

I think it's time to poke around with tech support mode and/or USB recovery 😉

firerain · ‎2017-01-31

Now, some heavier details:

Unit working in tech support mode now. All filesystems mount without problems. smartctl not giving any error other than 1 tick in UDMA_CRC_Error_Count on one drive.

I did scrub of root fs - every time (scrubbing ran 3 times) it reports an error and says it's been corrected.

dmesg gives me the same as we seen in logs:

BTRFS: bdev /dev/md/0 errs: wr 2, rd 386, flush 0, corrupt 0, gen 0

BUT, there is also one more thing in there:

BTRFS: i/o error at logical 3377745920 on dev /dev/md/0, sector 7032488, root 5, inode 140793, offset 0, length 4096, links 1 (path: var/log/journal/17c5a086e9eb417ea121e4464ee323f9/system.journal

Ofcourse, trying to cat this file gives me i/o error.

# hdparm --read-sector 7032488 on sda and sdb (matrix members) gives no error at all.

After that I finally did # btrfsck --repair /dev/md0 - no errors. Scrubbing after that reported an error again.

I'm lacking ideas for now...

mdgm-ntgr · ‎2017-02-01

Sounds like if possible backing up the data would be the next step. Assuming the data volume is fine this should be straightforward.

firerain · ‎2017-02-02

Yep, I've heard that btrfs isn't really mature fs and that it can even loose a partition without a certain reason but didn't expect to experience such problems.

Data partition is all right (according to scrub), so I assume I will hit the road toward this direction. Lot of copying over 100Base to my old NASes ahead of me 😕

Thanx mdgm for your advice and patience for my laziness 😉

RNDU2000 Unit systemd-journald.service entered failed state

RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state

Re: RNDU2000 Unit systemd-journald.service entered failed state