2003-09-13 20:02:51

by Frans Pop

[permalink] [raw]
Subject: Re: Debian bug #165744 - 'Received erroneous SM_UNMON request'

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Chip, Neil,

I think I may have found the source of this bug.
In Debian, rpc.statd is compiled with --enable-secure-statd; when I recompi=
le=20
with this option diabled, the error messages disapear!

I tried this after I took a look at the source for rpc.statd. I noticed tha=
t=20
in monitor.c there is the statement
my_name =3D "127.0.0.1"
when servicing SM_MON requests with secure-statd enabled.
I think in my setup this causes the lookups in the run-time list (which use=
=20
my_name) to fail when servicing SM_UNMON requests if in my situation my_nam=
e=20
is equal to the real adresses of my boxes, causing the 'Received erroneous=
=20
SM_UNMON request' errors.

I think this would also explain the 'notify_host: failed to notify 127.0.0.=
1'=20
errors. This second error occurs when I reboot the server while a=20
NFS-connection was up: after restarting the server tries to notify the clie=
nt=20
it is back up, but can't because the client is not at localhost but at=20
10.19.66.21 and there is no NFS-client running on the server itself.

I think the reason the error appears in my setup could be because I have NF=
S=20
kernel-support compiled in both server and clients. In Debian, this means=20
rpc.lockd is not run from the init scripts. (In /etc/init.d/nfs-common 'gre=
p=20
=2D -q lockdctl /proc/ksyms' returns 1, so NEED_LOCKD is set to 'no'.)

I have (from ps alx):
F UID PID ... WCHAN STAT ... COMMAND
140 1 129 poll S /sbin/portmap
040 0 135 rpciod SW [rpciod]
040 0 136 svc_re SW [lockd]
140 0 214 select S /sbin/rpc.statd

I can easily reproduce the errors by restoring the original binary of=20
rpc.statd. Also, the error occurs on all - well both - my nfs-clients. The=
=20
error occurs very frequently when I am running with the original binaries.

Hope this information helps. I am willing to help any way I can to solve th=
is=20
bug.

Regards,

=46rans Pop

P.S. I already send a similar message to [email protected] (CC Neil) o=
n=20
August 29, but as there was no reply to date I am trying again. I would=20
appreciate a reply.
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/Y3fkgm/Kwh6ICoQRApboAJ4gqgBY/KRL3WZg6QNu+jnni/5xwQCfTc6z
6/R4stFNt+a8EnlXjOp3NI8=3D
=3D/CN2
=2D----END PGP SIGNATURE-----



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-09-15 02:22:51

by Chip Salzenberg

[permalink] [raw]
Subject: Re: Debian bug #165744 - 'Received erroneous SM_UNMON request'

According to Frans Pop:
> In monitor.c there is the statement
> my_name = "127.0.0.1"
> when servicing SM_MON requests with secure-statd enabled.
> I think in my setup this causes the lookups in the run-time list (which use
> my_name) to fail when servicing SM_UNMON requests if in my situation my_name
> is equal to the real adresses of my boxes, causing the 'Received erroneous
> SM_UNMON request' errors.

Interesting!

Neil, I'd seen the assignment to my_name but I figured it was just a
housekeeping variable for connections. But if it's used in a later
lookup, assigning to my_name could be a Bad Thing. I'm not sure
enough of my understanding of statd to answer this suggestion.

> I think this would also explain the 'notify_host: failed to notify 127.0.0.1'
> errors. This second error occurs when I reboot the server while a
> NFS-connection was up: after restarting the server tries to notify the client
> it is back up, but can't because the client is not at localhost but at
> 10.19.66.21 and there is no NFS-client running on the server itself.

You've lost me.

> I think the reason the error appears in my setup could be because I have NFS
> kernel-support compiled in both server and clients. In Debian, this means
> rpc.lockd is not run from the init scripts.

What does the timing of lockd have to do with it? Because that's the
only difference between a user-space lockd and a kernel-thread lockd;
the kernel thread only starts up when something might need it, while
the user-space one starts up at boot time.
--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-09-15 16:47:12

by Frans Pop

[permalink] [raw]
Subject: Re: Debian bug #165744 - 'Received erroneous SM_UNMON request'

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

And here is the proof!
I added a simple log statement to the monitor.c source where SM_UNMON is=20
handled.

=3D=3D=3D SNIP FROM MONITOR.C =3D=3D=3D
* OK, we are. Now look for appropriate entry in run-time list.
* There should only be *one* match on this, since I block "duplicate"
* SM_MON calls. (Actually, duplicate calls are allowed, but only one
* entry winds up in the list the way I'm currently handling them.)
*/
while ((clnt =3D nlist_gethost(clnt, mon_name, 0))) {
+++++ ADDED LINES +++++
log(L_WARNING, "Result of hostname lookup: %s is %s",
my_name, NL_MY_NAME(clnt));
+++++ ADDED LINES END +++++
if (matchhostname(NL_MY_NAME(clnt), my_name) &&
NL_MY_PROC(clnt) =3D=3D id->my_proc &&
NL_MY_PROG(clnt) =3D=3D id->my_prog &&
NL_MY_VERS(clnt) =3D=3D id->my_vers) {
/* Match! */
=3D=3D=3D END OF SNIP FROM MONITOR.C =3D=3D=3D

This results in my log in:
rpc.statd[3066]: Result of hostname lookup: galadriel is 127.0.0.1
rpc.statd[3066]: Received erroneous SM_UNMON request from galadriel for=20
10.19.66.2

On Monday 15 September 2003 04:23, Chip Salzenberg wrote:
> According to Frans Pop:
> > I think the reason the error appears in my setup could be because I have
> > NFS kernel-support compiled in both server and clients. In Debian, this
> > means rpc.lockd is not run from the init scripts.
>
> What does the timing of lockd have to do with it? Because that's the
> only difference between a user-space lockd and a kernel-thread lockd;
> the kernel thread only starts up when something might need it, while
> the user-space one starts up at boot time.

I'm not sure how this would affect things. I was just looking for an=20
explanation why this problem has not been reported by many, many more peopl=
e=20
who use NFS.

=46rans Pop
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/Zezigm/Kwh6ICoQRAsBOAJ0aCpzc+i2BSx5DOccWRHFYeijclACgum3p
6ZlKRKULyVMcaL5LqQQj5GE=3D
=3DZkmx
=2D----END PGP SIGNATURE-----



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs