2003-04-04 23:56:57

by Philippe Gramoullé

[permalink] [raw]
Subject: Problems with statd and lockd


Hi,

Recently we started to get the following error messages on an Linux NFS server, running
2.4.21-pre5 vanilla, Debian unstable

> statd: server localhost not responding, timed out
> nsm_mon_unmon: rpc failed, status=-5
> lockd: cannot monitor <Client IP here>

A restart of nfs-kernel-server cleared the problem.

We have filesystems that are exported sync, other are exported async.

Can you tell me what the "-5" return code means ?

I've looked at the source code (nsm_mon_unmon in mon.c
that sends back to nfs3proc.c and the rpc_call wrapper
but i'm lost after that :)

Can someone enlighten me ?

Thanks,

Philippe


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-04-05 13:45:38

by Bernd Schubert

[permalink] [raw]
Subject: Re: Problems with statd and lockd

On Saturday 05 April 2003 01:56, Philippe Gramoull? wrote:
> Hi,
>
> Recently we started to get the following error messages on an Linux NFS
> server, running 2.4.21-pre5 vanilla, Debian unstable
>
> > statd: server localhost not responding, timed out
> > nsm_mon_unmon: rpc failed, status=-5
> > lockd: cannot monitor <Client IP here>
>
> A restart of nfs-kernel-server cleared the problem.
>
> We have filesystems that are exported sync, other are exported async.
>


Hi,

we also see those messages when we use the rpc.statd from debian packages. I
think the debian nfs-utils packages got "--enable-secure-statd" as configure
option. After recompiling the nfs-utils without this option and starting the
new rpc.statd on the clients, those messages should disappear, even without
restarting the nfs-server.
(I discovered this odd behaviour by accident.)


Best regards,
Bernd


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-05 14:27:05

by Trond Myklebust

[permalink] [raw]
Subject: Re: Problems with statd and lockd

>>>>> " " == Bernd Schubert <[email protected]> writes:

> we also see those messages when we use the rpc.statd from
> debian packages. I think the debian nfs-utils packages got
> "--enable-secure-statd" as configure option. After recompiling
> the nfs-utils without this option and starting the new
> rpc.statd on the clients, those messages should disappear, even
> without restarting the nfs-server. (I discovered this odd
> behaviour by accident.)

--enable-secure-statd is there for a good reason. I wouldn't recommend
that anybody turn it off. In any case, if that is the problem, then
you should see a warning from 'statd' in your syslog (assuming you
have enabled 'warn' messages in syslog.conf).

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-05 16:30:57

by Bernd Schubert

[permalink] [raw]
Subject: Re: Problems with statd and lockd

On Saturday 05 April 2003 16:27, Trond Myklebust wrote:
> >>>>> " " == Bernd Schubert <[email protected]> writes:
> > we also see those messages when we use the rpc.statd from
> > debian packages. I think the debian nfs-utils packages got
> > "--enable-secure-statd" as configure option. After recompiling
> > the nfs-utils without this option and starting the new
> > rpc.statd on the clients, those messages should disappear, even
> > without restarting the nfs-server. (I discovered this odd
> > behaviour by accident.)
>
> --enable-secure-statd is there for a good reason. I wouldn't recommend
> that anybody turn it off. In any case, if that is the problem, then
> you should see a warning from 'statd' in your syslog (assuming you
> have enabled 'warn' messages in syslog.conf).
>
> Cheers,
> Trond
>

Hello,

here are the entries from our /etc/syslog.conf about warnings:

*.=info;*.=notice;*.=warn;\
auth,authpriv.none;\
cron,daemon.none;\
mail,news.none -/var/log/messages

So I think warnings are enabled. Recently the self-created nfs-utils packages
(with disabled --enable-secure-statd) got replaced by an automatic update.
Some time afterwards our clients began to give those messages (here as
dmesg-output):

statd: server localhost not responding, timed out
nsm_mon_unmon: rpc failed, status=-5
lockd: cannot monitor <server IP>
lockd: failed to monitor <server IP>

So on our systems not the server, but the clients are reporting those
messages. Though I have to admit that our server is also running nfs-utils
without secure-statd support, but as much as I can remember there where
similar messages on the server when secure-statd was enabled.

The /var/log/messages has this entries:

Apr 5 15:28:16 maxwell kernel: statd: server localhost not responding, timed
out
Apr 5 15:28:16 maxwell kernel: lockd: cannot monitor <server IP>
Apr 5 15:28:16 maxwell kernel: lockd: failed to monitor <server IP>

What other messages do you expect ?

As soon as I stop the debian-statd and start the new statd (that has
secure-statd disabled), those messages disappear.
Furthermore, some programs as e.g. KDE have some problems to run when those
messages appear (our home directory is served by this server).

What problems can be caused by a statd without the secure option ?


Thanks in advance,
Bernd


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-05 23:07:18

by Philippe Gramoullé

[permalink] [raw]
Subject: Re: Problems with statd and lockd


Now i remember : i also built nfs-utils logn time ago and IIRC, i had disable
secure stad. It was old enough that since, so many things went through that i had forgotten that.

As this one is a standard debian package, it has secure statd enabled by default.

Running without secure statd would mean that lockd can be used by other than lockd,
and also that hostnames resolve to dotted quads right ?

Thanks,

Philippe

On Sat, 5 Apr 2003 18:31:09 +0200
Bernd Schubert <[email protected]> wrote:

| Recently the self-created nfs-utils packages
| (with disabled --enable-secure-statd) got replaced by an automatic update.
| Some time afterwards our clients began to give those messages (here as
| dmesg-output):
|
| statd: server localhost not responding, timed out
| nsm_mon_unmon: rpc failed, status=-5
| lockd: cannot monitor <server IP>
| lockd: failed to monitor <server IP>


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-06 12:16:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: Problems with statd and lockd

>>>>> " " == Bernd Schubert <[email protected]> writes:

> statd: server localhost not responding, timed out
> nsm_mon_unmon: rpc failed, status=-5 lockd: cannot monitor
> <server IP> lockd: failed to monitor <server IP>

> So on our systems not the server, but the clients are reporting
> those messages.

That is normal: lockd is a strange being. It has to run as a server on
both the NFS client and the NFS server.

> Apr 5 15:28:16 maxwell kernel: statd: server localhost not
> responding, timed out Apr 5 15:28:16 maxwell kernel: lockd:
> cannot monitor <server IP> Apr 5 15:28:16 maxwell kernel:
> lockd: failed to monitor <server IP>

> What other messages do you expect ?

If secure-statd was the cause of the problem, I would expect
additional messages of the form

"Call to statd from non-local host blah"

or

"Attempt to register callback to 1000xy/z"

or

"Attempt to register host blah (not a dotted quad)"

> As soon as I stop the debian-statd and start the new statd
> (that has secure-statd disabled), those messages disappear.
> Furthermore, some programs as e.g. KDE have some problems to
> run when those messages appear (our home directory is served by
> this server).

> What problems can be caused by a statd without the secure
> option ?

People could use your statd to make arbitrary RPC calls.

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-06 13:09:43

by Bernd Schubert

[permalink] [raw]
Subject: Re: Problems with statd and lockd

>
> If secure-statd was the cause of the problem, I would expect
> additional messages of the form
>
> "Call to statd from non-local host blah"
>
> or
>
> "Attempt to register callback to 1000xy/z"
>
> or
>
> "Attempt to register host blah (not a dotted quad)"

No, we have never seen anything like this.

> > What problems can be caused by a statd without the secure
> > option ?
>
> People could use your statd to make arbitrary RPC calls.
>

Hmm, though we are protected by a firewall on most ports, I better add a
statd-line to /etc/hosts.allow ;-)


Is there something I can do to debug this problem ?


Best regards,
Bernd




-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-05 10:14:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: Problems with statd and lockd

>>>>> " " == Philippe Gramoull <Philippe> writes:

> We have filesystems that are exported sync, other are exported
> async.

> Can you tell me what the "-5" return code means ?

-5 == -EIO. For a soft RPC call it usually means that a timeout has
occured. This would be consistent with the logged 'localhost not
responding'.

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-05 11:18:44

by Philippe Gramoullé

[permalink] [raw]
Subject: Re: Problems with statd and lockd


On 05 Apr 2003 12:14:48 +0200
Trond Myklebust <[email protected]> wrote:

| >>>>> " " == Philippe Gramoull <Philippe> writes:
|
| > We have filesystems that are exported sync, other are exported
| > async.
|
| > Can you tell me what the "-5" return code means ?
|
| -5 == -EIO. For a soft RPC call it usually means that a timeout has
| occured. This would be consistent with the logged 'localhost not
| responding'.

Ok.

And do you have an idea of what could cause such errors ?

We've been using Linux NFS servers and clients for a long time and never had problems like this before.

Latest kernel before this one was a 2.4.19-pre6 ( servers & clients )
Now clients & servers are 2.4.21-pre5.

Thanks

Philippe


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-05 11:50:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: Problems with statd and lockd

>>>>> " " == Philippe Gramoull <Philippe> writes:


> And do you have an idea of what could cause such errors ?

The message just means what it says. The problem could be anything
from a temporary hang of statd or the portmapper right down to a deep
kernel bug.

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-05 12:05:51

by Philippe Gramoullé

[permalink] [raw]
Subject: Re: Problems with statd and lockd


On 05 Apr 2003 13:50:29 +0200
Trond Myklebust <[email protected]> wrote:

| The message just means what it says. The problem could be anything
| from a temporary hang of statd or the portmapper right down to a deep
| kernel bug.
| Trond

Ok,

Portmapper was up. It was statd that was down and lockd , but they had been started
correctly before the problem happened.

I'll try to reproduce the problem as it happened of 4 different NFS servers.

I'll turn RPC debugging on and see if i can get something interesting.

Thanks,

Philippe



-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs