2001-11-29 16:07:20

by Chris Friesen

[permalink] [raw]
Subject: logging to NFS-mounted files seems to cause hangs when NFS dies


I'm working on an embedded platform and we seem to be having a problem with
syslog and logging to NFS-mounted files.

We have syslog logging to NFS and also logging to a server on another machine.
The desired behaviour is that if the NFS server or the net connection conks out,
the logs are silently dropped. (Critical logs are also logged in memory that
isn't wiped out on reboot.)

Currently, /var/log is mounted with the following options:
rw,rsize=4096,wsize=4096,timeo=7,retrans=3,bg,soft,intr

We started off with hard mounts due to the warnings about soft mounts, but that
led to boxes totally hanging when the network connections were pulled or the NFS
server was taken down. In this scenario we are even unable to login as root at
the console. This forced us to go to soft mounts in an attempt to fix this
behaviour.

The problem we are seeing is that if we lose the network connection or the NFS
mount (which immediately causes an attempt to log the problem), it seems that
syslog gets stuck in NFS code in the kernel and other stuff can be delayed for a
substantial amount of time (many tens of seconds). Just for kicks we tried
logging to ramdisk, and everything works beautifully.

Now I'm a bit unclear as to why other processes are being delayed--does anyone
have any ideas? My current theories are that either the nfs client code has a
bug, or syslog() calls are somehow blocking if syslogd can't write the file
out. I've just started looking at the syslog code, but its pretty rough going
as there are very few comments.

Help? We're running a customized 2.2.17 kernel and syslog 1.4.1.

Thanks,

Chris Friesen


--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]


2001-11-29 16:44:20

by Andreas Dilger

[permalink] [raw]
Subject: Re: logging to NFS-mounted files seems to cause hangs when NFS dies

On Nov 29, 2001 11:07 -0500, Christopher Friesen wrote:
> I'm working on an embedded platform and we seem to be having a problem with
> syslog and logging to NFS-mounted files.
>
> We have syslog logging to NFS and also logging to a server on another machine.

Why not just log to the syslog daemon on another machine. Logging to NFS
does not help you in this case.

> The desired behaviour is that if the NFS server or the net connection conks
> out, the logs are silently dropped. (Critical logs are also logged in memory
> that isn't wiped out on reboot.)

> The problem we are seeing is that if we lose the network connection or the
> NFS mount (which immediately causes an attempt to log the problem), it seems
> that syslog gets stuck in NFS code in the kernel and other stuff can be
> delayed for a substantial amount of time (many tens of seconds). Just for
> kicks we tried logging to ramdisk, and everything works beautifully.

Well, it seems obvious, doesn't it? If the network connection is lost, then
you can't very well write to the Network File System, can you? One of the
features of NFS is that if the network dies, or the server is lost, then
the client does not lose any data that was being written to the NFS mount.

> Now I'm a bit unclear as to why other processes are being delayed--does anyone
> have any ideas? My current theories are that either the nfs client code has a
> bug, or syslog() calls are somehow blocking if syslogd can't write the file
> out. I've just started looking at the syslog code, but its pretty rough going
> as there are very few comments.

This is entirely a syslog problem, if you want to do it that way. The NFS
code is working as expected, and will not be changed. You might have to
multi-thread syslog to get it to do what you want, but in the end you are
better off just using the network logging feature and write the logs at
the server directly.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-11-29 21:01:44

by Peter Wächtler

[permalink] [raw]
Subject: Re: logging to NFS-mounted files seems to cause hangs when NFS dies

Christopher Friesen schrieb:
>
> I'm working on an embedded platform and we seem to be having a problem with
> syslog and logging to NFS-mounted files.
>
> We have syslog logging to NFS and also logging to a server on another machine.
> The desired behaviour is that if the NFS server or the net connection conks out,
> the logs are silently dropped. (Critical logs are also logged in memory that
> isn't wiped out on reboot.)
>
> Currently, /var/log is mounted with the following options:
> rw,rsize=4096,wsize=4096,timeo=7,retrans=3,bg,soft,intr
>
> We started off with hard mounts due to the warnings about soft mounts, but that
> led to boxes totally hanging when the network connections were pulled or the NFS
> server was taken down. In this scenario we are even unable to login as root at
> the console. This forced us to go to soft mounts in an attempt to fix this
> behaviour.
>
> The problem we are seeing is that if we lose the network connection or the NFS
> mount (which immediately causes an attempt to log the problem), it seems that
> syslog gets stuck in NFS code in the kernel and other stuff can be delayed for a
> substantial amount of time (many tens of seconds). Just for kicks we tried
> logging to ramdisk, and everything works beautifully.
>
> Now I'm a bit unclear as to why other processes are being delayed--does anyone
> have any ideas? My current theories are that either the nfs client code has a
> bug, or syslog() calls are somehow blocking if syslogd can't write the file
> out. I've just started looking at the syslog code, but its pretty rough going
> as there are very few comments.
>
> Help? We're running a customized 2.2.17 kernel and syslog 1.4.1.
>

I can recommend syslogd's ability to log to remote syslogd via

/etc/syslog.conf

*.info |host.or.ip


The remote site has to run syslogd with "syslogd -r".
Since it uses UDP there is no blocking.

2001-11-29 21:08:24

by Peter Wächtler

[permalink] [raw]
Subject: Re: logging to NFS-mounted files seems to cause hangs when NFS dies

Andreas Dilger schrieb:
>
> On Nov 29, 2001 11:07 -0500, Christopher Friesen wrote:
> > I'm working on an embedded platform and we seem to be having a problem with
> > syslog and logging to NFS-mounted files.
> >
> > We have syslog logging to NFS and also logging to a server on another machine.
>
> Why not just log to the syslog daemon on another machine. Logging to NFS
> does not help you in this case.
>
> > The desired behaviour is that if the NFS server or the net connection conks
> > out, the logs are silently dropped. (Critical logs are also logged in memory
> > that isn't wiped out on reboot.)
>
> > The problem we are seeing is that if we lose the network connection or the
> > NFS mount (which immediately causes an attempt to log the problem), it seems
> > that syslog gets stuck in NFS code in the kernel and other stuff can be
> > delayed for a substantial amount of time (many tens of seconds). Just for
> > kicks we tried logging to ramdisk, and everything works beautifully.
>
> Well, it seems obvious, doesn't it? If the network connection is lost, then
> you can't very well write to the Network File System, can you? One of the
> features of NFS is that if the network dies, or the server is lost, then
> the client does not lose any data that was being written to the NFS mount.
>
> > Now I'm a bit unclear as to why other processes are being delayed--does anyone
> > have any ideas? My current theories are that either the nfs client code has a
> > bug, or syslog() calls are somehow blocking if syslogd can't write the file
> > out. I've just started looking at the syslog code, but its pretty rough going
> > as there are very few comments.
>
> This is entirely a syslog problem, if you want to do it that way. The NFS
> code is working as expected, and will not be changed. You might have to
> multi-thread syslog to get it to do what you want, but in the end you are
> better off just using the network logging feature and write the logs at
> the server directly.
>

Well, it could use nonblocking IO like it does with named pipes.

When syslogd logs to a named pipe and the reader does not consume the data,
syslogd will not block but discards the messages.

The best way, like you suggested, is logging to a remote syslogd, running
with syslogd -r

2001-11-29 22:42:50

by Jesse Pollard

[permalink] [raw]
Subject: Re: logging to NFS-mounted files seems to cause hangs when NFS dies

On Thursday 29 November 2001 10:07, Christopher Friesen wrote:
> I'm working on an embedded platform and we seem to be having a problem with
> syslog and logging to NFS-mounted files.
>
> We have syslog logging to NFS and also logging to a server on another
> machine. The desired behaviour is that if the NFS server or the net
> connection conks out, the logs are silently dropped. (Critical logs are
> also logged in memory that isn't wiped out on reboot.)
>
> Currently, /var/log is mounted with the following options:
> rw,rsize=4096,wsize=4096,timeo=7,retrans=3,bg,soft,intr
>
> We started off with hard mounts due to the warnings about soft mounts, but
> that led to boxes totally hanging when the network connections were pulled
> or the NFS server was taken down. In this scenario we are even unable to
> login as root at the console. This forced us to go to soft mounts in an
> attempt to fix this behaviour.
>
> The problem we are seeing is that if we lose the network connection or the
> NFS mount (which immediately causes an attempt to log the problem), it
> seems that syslog gets stuck in NFS code in the kernel and other stuff can
> be delayed for a substantial amount of time (many tens of seconds). Just
> for kicks we tried logging to ramdisk, and everything works beautifully.
>
> Now I'm a bit unclear as to why other processes are being delayed--does
> anyone have any ideas? My current theories are that either the nfs client
> code has a bug, or syslog() calls are somehow blocking if syslogd can't
> write the file out. I've just started looking at the syslog code, but its
> pretty rough going as there are very few comments.
>
> Help? We're running a customized 2.2.17 kernel and syslog 1.4.1.

There is an easier way - depending on your point of view:

1. Designate a host on the network to be a loghost, and have syslog send the
messages there. Syslog will use UDP to send the messages so if the
network is dead, the messages will be dropped.

2. Don't log messages to a NFS mounted disk. Part of the problem
in doing so causes a cascading of messages (NFS timeout, log
the timeout, syslog writes to NFS... another NFS timeout + long
delay with syslog trying to write the first... buffers fill up....

There should be an example entry in the default /etc/syslog.conf for
logging to a remote host, but the line entry should be:

* @loghost

This can (most likely) be the only entry in the config file.

One advantage this has is that all log files can be scanned at
once, and on one system. Another is that if a workstation gets
hacked, at least they won't be able to remove any evidence that might
have shown up in the syslog output...