2003-08-12 15:21:35

by Chip Salzenberg

[permalink] [raw]
Subject: [[email protected]: Debian Bug#203918 - statd request on eth interface, not localhost?]

HEY NEIL. HEY EVERYBODY ELSE TOO.

Could you please give me some reply to this issue? The beginning of
the bug report looks boring but it's actually weird. To me anyway. :-/


----- Forwarded message from Chip Salzenberg <[email protected]> -----

Subject: [NFS] Debian Bug#203918 - statd request on eth interface, not localhost?
From: Chip Salzenberg <[email protected]>
To: [email protected]
Cc:
Date: Mon, 4 Aug 2003 20:48:53 -0400

OK, guys, I've got an interesting bug here. The server kernel is
2.4.20, the client is 2.6.0 (I think test2). The first mount attempt
succeeds, and things seem to work. But unmount requests fail. And
there's a statd anomaly along the way.

The server is "natura" (213.96.69.115) and the client is "nubol" (IP
unknown).

Here's the /etc/exports:
> /var/cache/apt/archives *.int.oskuro.net(rw,async,no_root_squash)
> /home *.int.oskuro.net(rw,async,no_root_squash)

And here's the intriguing server log message:

> Aug 3 18:22:42 natura rpc.statd[467]: Call to statd from non-local host 213.96.69.115
> Aug 3 18:22:42 natura rpc.statd[467]: STAT_FAIL to natura for SM_MON of 192.168.1.3
>
> This is the first mount. If you remember, it worked ok the first time I
> apt-get'ed. 213.96.69.115 is the public ip of the nfs server.

So let's get this straight. The server's kernel is apparently sending
an SM_MON request to its own statd, but not using the loopback address,
but rather the public ethernet address.

Can I get a "Huh?!" from the congregation? How does that even happen?

Subsequent trouble is perhaps a result of the first problem, somehow?

> Bah, now it won't umount.
> Aug 3 18:29:58 nubol automount[457]: shutdown failed: filesystem still busy
>
> But nothing is using it.
> 65567:jordi@nubol:~$ sudo lsof |grep autofs
> zsh: done sudo lsof |
> zsh: exit 1 grep autofs
> 65568:jordi@nubol:~$ sudo umount /var/autofs/natura/archives
> umount: /var/autofs/natura/archives: device is busy
>
> So I can't umount right now. Now, it stays mounted, and it works, but
> it'd be nice if I could get rid of it too :)
>
> The other day, it would umount and I got the connection refused error.

adTHANKSvance for debugging info.
--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs



----- End forwarded message -----

--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-08-21 16:59:04

by Chip Salzenberg

[permalink] [raw]
Subject: Re: Debian bug #165744 - 'Received erroneous SM_UNMON request'

According to Neil Brown:
> The first is the "Call to statd from non-local host". Given that
> the in-kernel lockd *always* uses 127.0.0.1 to talk to statd, this
> request cannot have come from there.

Neil, would you be so kind as to examine this old (nearly a year now),
but still open, Debian bug report?

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=165744

This complaint *is* very rare, but it seems not to be specific to the
use of 2.6. Or do you think you know what's up with this old report?
Please advise.
--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-21 07:58:36

by Jordi Mallach

[permalink] [raw]
Subject: Re: [[email protected]: Debian Bug#203918 - statd request on eth interface, not localhost?]

On Wed, Aug 20, 2003 at 01:59:51PM -0400, Chip Salzenberg wrote:
> So, what news?

News are that I can't compile 2.6.0-test3, so I have no more info :(

> If you're still getting invalid statd requests from an IP other than
> 127.0.0.1, they're not from the kernel. Can you use tcpdump to find
> out where the statd requests are coming from?

From the server's log:

Aug 20 02:02:24 natura rpc.statd[467]: Call to statd from non-local host 213.96.69.115
Aug 20 02:02:24 natura rpc.statd[467]: STAT_FAIL to natura for SM_MON of 192.168.1.3

So, yes.. I'm still getting those 2.5 minutes after the apt-get cronjob
starts.

I have never used tcpdump more than to play a little, so I could use
some pointers about what I should look at.

Thanks,
Jordi
--
Jordi Mallach P?rez -- Debian developer http://www.debian.org/
[email protected] [email protected] http://www.sindominio.net/
GnuPG public key information available at http://oskuro.net/~jordi/


Attachments:
(No filename) (963.00 B)
(No filename) (189.00 B)
Download all attachments

2003-08-21 18:16:11

by Chip Salzenberg

[permalink] [raw]
Subject: Re: [[email protected]: Debian Bug#203918 - statd request on eth interface, not localhost?]

According to Jordi Mallach:
> I told Chip I'd try to upgrade to the latest 2.6.0 kernel soon. If this
> makes it improve, I'll tell you.

So, what news?

If you're still getting invalid statd requests from an IP other than
127.0.0.1, they're not from the kernel. Can you use tcpdump to find
out where the statd requests are coming from?
--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-22 06:26:36

by NeilBrown

[permalink] [raw]
Subject: Re: Re: Debian bug #165744 - 'Received erroneous SM_UNMON request'

On Wednesday August 20, [email protected] wrote:
> According to Neil Brown:
> > The first is the "Call to statd from non-local host". Given that
> > the in-kernel lockd *always* uses 127.0.0.1 to talk to statd, this
> > request cannot have come from there.
>
> Neil, would you be so kind as to examine this old (nearly a year now),
> but still open, Debian bug report?
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=165744
>
> This complaint *is* very rare, but it seems not to be specific to the
> use of 2.6. Or do you think you know what's up with this old report?
> Please advise.

Sorry, but I'm at a bit of a loss.

It seems that
lockd asked statd to monitor some host.
statd gets information that the host has restarted.
statd tries to notify lockd about the restart
statd fails to even send a message, and so forgets about that host
lockd asked statd to unmonitor the host, and statd doesn't know
what it is talking about

If you (or someone) could play around and find a sequence of events
that would reproduce the problem it would, of course, help a lot.

NeilBrown


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-22 15:11:11

by Chip Salzenberg

[permalink] [raw]
Subject: Re: Debian bug #165744 - 'Received erroneous SM_UNMON request'

Thanks for the analysis, Neil. I followed it up until this point:

> statd tries to notify lockd about the restart
> statd fails to even send a message, and so forgets about that host

Do you know that? It seems all we can know for sure about this part
of the history is that the message wasn't received, but for all we
know it may have been sent.
--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-21 15:09:00

by Chip Salzenberg

[permalink] [raw]
Subject: Re: [[email protected]: Debian Bug#203918 - statd request on eth interface, not localhost?]

According to Jordi Mallach:
> I have never used tcpdump more than to play a little, so I could use
> some pointers about what I should look at.

First you use 'rpcinfo -p' to see which ports statd is listening to
on the machine where statd is complaining:

$ rpcinfo -p | grep status
100024 1 udp 936 status
100024 1 tcp 939 status

That's udp port 936 and tcp port 939. Then you want to watch all
traffic to/from those ports:

# tcpdump udp port 936 or tcp port 939

You may also need to specify "-i eth1" or whatever if the IP address
you're seeing isn't on your eth0.

Then you sit and watch. A combination of "screen" and "script" may
also be helpful. The man page for tcpdump, section "OUTPUT FORMAT",
explains how to interpret the output.

The key thing is the source IP and port of whoever's talking to statd.
Then you go to the source machine and use 'lsof' to see what program
has bound to that port. If the program is gone by the time you get
there, you'll need to prevent statd from answering the request, which
will entail compiling a custom statd or perhaps using iptables to kill
any outgoing answer packets.
--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-22 19:38:23

by Jordi Mallach

[permalink] [raw]
Subject: Re: [[email protected]: Debian Bug#203918 - statd request on eth interface, not localhost?]

On Thu, Aug 21, 2003 at 09:58:26AM +0200, Jordi Mallach wrote:
> News are that I can't compile 2.6.0-test3, so I have no more info :(

I finally got test3 under control, rebooted into it and...

Aug 22 19:29:31 nubol automount[3476]: running expiration on path /var/autofs/natura/archives
Aug 22 19:29:31 nubol rpc.statd[1909]: Received erroneous SM_UNMON request from nubol for 192.168.1.1
Aug 22 19:29:31 nubol automount[3476]: expired /var/autofs/natura/archives

I still get the weird messages at the server though:

Aug 22 19:22:43 natura ippl: sunrpc connection attempt from 192.168.1.3
Aug 22 19:22:43 natura rpc.statd[1982]: Call to statd from non-local host 213.96.69.115
Aug 22 19:22:43 natura rpc.statd[1982]: STAT_FAIL to natura for SM_MON of 192.168.1.3

So, the original problem of "busy mounts" has gone, apparently it was a
bug in -test2 fixed in -test3. I still get these logs that seem to make
you wonder a bit ;) but in general, my setup works ok again.

Jordi
--
Jordi Mallach P?rez -- Debian developer http://www.debian.org/
[email protected] [email protected] http://www.sindominio.net/
GnuPG public key information available at http://oskuro.net/~jordi/


Attachments:
(No filename) (1.17 kB)
(No filename) (189.00 B)
Download all attachments

2003-08-25 06:38:06

by NeilBrown

[permalink] [raw]
Subject: Re: Re: Debian bug #165744 - 'Received erroneous SM_UNMON request'

On Friday August 22, [email protected] wrote:
> Thanks for the analysis, Neil. I followed it up until this point:
>
> > statd tries to notify lockd about the restart
> > statd fails to even send a message, and so forgets about that host
>
> Do you know that? It seems all we can know for sure about this part
> of the history is that the message wasn't received, but for all we
> know it may have been sent.

The

elrond rpc.statd[179]: notify_host: failed to notify 127.0.0.1

messages indicate that the "sendto" in utils/statd/rmtcall.c(xmit_call)
failed, which suggests that the message wasn't sent.
It might actually be sending a port-lookup call to portmap at this
point. It is certainly odd that that fails....

NeilBrown


-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-13 04:07:35

by NeilBrown

[permalink] [raw]
Subject: Re: [[email protected]: Debian Bug#203918 - statd request on eth interface, not localhost?]

On Tuesday August 12, [email protected] wrote:
> HEY NEIL. HEY EVERYBODY ELSE TOO.
>
> Could you please give me some reply to this issue? The beginning of
> the bug report looks boring but it's actually weird. To me anyway. :-/
>

There are two issues here which are possibly quite separate.

The first is the "Call to statd from non-local host".
Given that the in-kernel lockd *always* uses 127.0.0.1 to talk to
statd, this request cannot have come from there.
So it must have come from elsewhere.
Finding out where would not be easy.
You would want to modify statd so that it reports the source port
number as well as ip address, and then doesn't send a reply (so the
source will keep waiting).
Then cause the problem to re-appear and use lsof to hunt around and
find who has the offending port open.

I don't suppose there is any chance that a user-space nfsd is running
as well as the kernel one?

The second problem relates to the filesystem not being unmountable.
This could be the bug that Mark Hemmet recently reported. Look for
the subject:
"lockd fails to purge blocked NLM_LOCKs"

in the archives of this list. Or it could be some simlar bug that
hasn't yet been found.
Without a "how-to-reproduce" it is hard to know.

NeilBrown


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-13 08:22:24

by Jordi Mallach

[permalink] [raw]
Subject: Re: [[email protected]: Debian Bug#203918 - statd request on eth interface, not localhost?]

On Wed, Aug 13, 2003 at 02:06:43PM +1000, Neil Brown wrote:
> I don't suppose there is any chance that a user-space nfsd is running
> as well as the kernel one?

I am 100% sure that I only have ther kernel daemon installed.

I told Chip I'd try to upgrade to the latest 2.6.0 kernel soon. If this
makes it improve, I'll tell you.

Jordi
--
Jordi Mallach P?rez -- Debian developer http://www.debian.org/
[email protected] [email protected] http://www.sindominio.net/
GnuPG public key information available at http://oskuro.net/~jordi/


Attachments:
(No filename) (551.00 B)
(No filename) (189.00 B)
Download all attachments