2003-07-27 16:31:59

by Chip Salzenberg

[permalink] [raw]
Subject: Debian Bug#203077: Locks not released on NFS client reboot

I confess I don't quite understand this bug report. Is this person
asking for something that NFS can't do, or is there perhaps some error
in configuration ... ?

Please advise.

The report is again Debian stable (woody), which uses nfs-utils 1.02.

According to Nick Nassar:
> File locks on NFS clients are not released on reboot. When machines shut
> down unexpectedly for whatever reason, the locks stay there with no
> apparent way to clear them except restarting NFS on the server.
>
> See "I'm having a lock file problem. What do I do?" in
> http://www.gnome.org/projects/gconf/ for an example of the kind of problem
> caused by this.
>
> The machine that I'm specifically having the trouble with is using a stock
> 2.4.20 kernel that I compiled from source.


--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-07-28 00:56:16

by NeilBrown

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Sunday July 27, [email protected] wrote:
> I confess I don't quite understand this bug report. Is this person
> asking for something that NFS can't do, or is there perhaps some error
> in configuration ... ?
>
> Please advise.

This is something that NFS *can* do, so it's either a config error or
a source-code error...

When the server gets a lock request from a client, it asked the local
statd to monitor that client.
The server-statd contacts the client-statd and tells it that it wants
to know about reboot. The client-statd record the address of the
server in /var/lib/nfs/sm/ as a new empty file.
Only if this succeeds does the server grant the lock.

When the client reboots, statd will be restarted and will move all
files from .../sm/ to .../sm.bak/ and will then iterate through those
files sending a message to statd on the relevant servers telling them
that a reboot has happened.
The server-statd will, on receipt of this message, tell the
server-lockd that the client has rebooted, and lockd will release all
the locks.

I just tried this and it worked (2.4.19/21 kernels and nfs-utils 1.0.5)
so it doesn't seem to be a source-code error.

The mostly likely config error is not running statd on the client.
i.e. not having nfs-common installed. But given that the bug was
reported against nfs-common, that seems unlikely in this case.

The next mosty likely would be tcpwrappers problems.
I notice that the man page for statd says that "You have to give the
clients access to rpc.statd", but ofcourse you need to give the server
access to statd on the clients aswell.

In order for the server to allow a lock from the client, statd on
the client must allow access from the server.
In order for the client to be able to revoke locks on reboot, statd on
the server must allow access from the client.

It is possible that the problem is caused by the server not allowing
statd requests from the client.

Looking at the tcp_wrapper stuff used by statd, it looks rather bogus(*),
though I think it is more likely to give away access that it shouldn't
rather than restrict access that it should grant.

Anyway, I suggest that the person having the problem tries:

rpcinfo -u SERVERNAME status
from the client, and

rpcinfo -u CLIENTNAME status
from the server
and checks that both works. If either doesn't I suspect that is the
problem. If they both work.... I don't know.


(*) good_client in tcp_wrapper.c calls hosts_ctl twice, once with IP
address and once with hostname. If the first successed, the second
isn't tried.

So if my hosts.deny says that a specific hostname is restricted, but
doesn't say the IP address is restricted, then access is granted.

NeilBrown


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 01:51:25

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Mon, 28 Jul 2003, Neil Brown wrote:

> On Sunday July 27, [email protected] wrote:
>> I confess I don't quite understand this bug report. Is this person
>> asking for something that NFS can't do, or is there perhaps some error
>> in configuration ... ?
>>
>> Please advise.
>

SNIP

>
> Anyway, I suggest that the person having the problem tries:
>
> rpcinfo -u SERVERNAME status
> from the client, and
>
> rpcinfo -u CLIENTNAME status
> from the server
> and checks that both works. If either doesn't I suspect that is the
> problem. If they both work.... I don't know.

SNIP


i am seeing problems here on my system (which has rebooted and now has stale
locks on server)

client:

bligh:~ > rpcinfo -u mussel status
program 100024 version 1 ready and waiting

server:

mussel:~ > rpcinfo -u bligh status
rpcinfo: RPC: Port mapper failure - RPC: Unable to receive
program 100024 is not available

the client rebooted into a new kernel (latest enterprise) and the server has
not. could this cause this problem? if not what other info should i be
looking for.

not sure if this is helpful but:

client:

bligh:~ > rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 32768 status
100024 1 tcp 32768 status
100021 1 udp 32769 nlockmgr
100021 3 udp 32769 nlockmgr
100021 4 udp 32769 nlockmgr
100021 1 tcp 32769 nlockmgr
100021 3 tcp 32769 nlockmgr
100021 4 tcp 32769 nlockmgr


server:

mussel:~ > rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 32768 status
100024 1 tcp 32768 status
100011 1 udp 768 rquotad
100011 2 udp 768 rquotad
100011 1 tcp 771 rquotad
100011 2 tcp 771 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100021 1 udp 33551 nlockmgr
100021 3 udp 33551 nlockmgr
100021 4 udp 33551 nlockmgr
100021 1 tcp 37939 nlockmgr
100021 3 tcp 37939 nlockmgr
100021 4 tcp 37939 nlockmgr
100005 1 udp 784 mountd
100005 1 tcp 787 mountd
100005 2 udp 784 mountd
100005 2 tcp 787 mountd
100005 3 udp 784 mountd
100005 3 tcp 787 mountd


thanks in advance for any help.

kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 02:57:30

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 14 Jan 2005, Neil Brown wrote:

> On Thursday January 13, [email protected] wrote:
>>
>> i am seeing problems here on my system (which has rebooted and now has stale
>> locks on server)
>>
> ..
>> server:
>>
>> mussel:~ > rpcinfo -u bligh status
>> rpcinfo: RPC: Port mapper failure - RPC: Unable to receive
>> program 100024 is not available
> ...
>>
>> client:
>>
>> bligh:~ > rpcinfo -p
>> program vers proto port
>> 100000 2 tcp 111 portmapper
>> 100000 2 udp 111 portmapper
>> 100024 1 udp 32768 status
>> 100024 1 tcp 32768 status
> ...
>
>
> So bligh, the client, is running statd (the "status" service), but
> mussel can not talk to it. This is a problem.
>
> It would appear that some for of firewall is blocking access to
> bligh's statd from mussel, or that bligh's statd is ignoring requests
> from mussel. I don't know which.
>
> NeilBrown

alright - i'll look into this. as you might have guessed, i'm the developer
and not the sysad so i can't do much at the moment. i'm guessing you are
correct on the former count. i surely wasn't told about any changes but that
has been known to happen before. in addition to that government security
policies get stricter and stricter and redhat could have done something in the
new kernel that is 'safer'. i'll investigate and get back to you.

thanks tons for the lead... i'll post more tomorrow.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 03:02:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

to den 13.01.2005 Klokka 18:51 (-0700) skreiv Ara.T.Howard:

> server:
>
> mussel:~ > rpcinfo -u bligh status
> rpcinfo: RPC: Port mapper failure - RPC: Unable to receive
> program 100024 is not available

If the portmapper on the server is not responding to the client (as the
above error message appears to indicate) then that would explain your
problem.

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 02:29:55

by NeilBrown

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Thursday January 13, [email protected] wrote:
>
> i am seeing problems here on my system (which has rebooted and now has stale
> locks on server)
>
..
> server:
>
> mussel:~ > rpcinfo -u bligh status
> rpcinfo: RPC: Port mapper failure - RPC: Unable to receive
> program 100024 is not available
...
>
> client:
>
> bligh:~ > rpcinfo -p
> program vers proto port
> 100000 2 tcp 111 portmapper
> 100000 2 udp 111 portmapper
> 100024 1 udp 32768 status
> 100024 1 tcp 32768 status
...


So bligh, the client, is running statd (the "status" service), but
mussel can not talk to it. This is a problem.

It would appear that some for of firewall is blocking access to
bligh's statd from mussel, or that bligh's statd is ignoring requests
from mussel. I don't know which.

NeilBrown


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 14:54:06

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Thu, 13 Jan 2005, Trond Myklebust wrote:

> to den 13.01.2005 Klokka 18:51 (-0700) skreiv Ara.T.Howard:
>
>> server:
>>
>> mussel:~ > rpcinfo -u bligh status
>> rpcinfo: RPC: Port mapper failure - RPC: Unable to receive
>> program 100024 is not available
>
> If the portmapper on the server is not responding to the client (as the
> above error message appears to indicate) then that would explain your
> problem.

here's the thing - nothing has changed execpt a kernel upgrade on the client -
it rebooted to a new kernel and now the locks are stale. client is latest
enterprise and server is the next latest enterprise since it has not (yet)
been rebooted. i'm waiting for our sysads to get here to work on it... i'd
reboot now except i'm afraid i'll lose any debugging state...

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 17:47:31

by Dan Stromberg

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 2005-01-14 at 13:29 +1100, Neil Brown wrote:
> On Thursday January 13, [email protected] wrote:
> >
> > i am seeing problems here on my system (which has rebooted and now has stale
> > locks on server)
> >
> ..
> > server:
> >
> > mussel:~ > rpcinfo -u bligh status
> > rpcinfo: RPC: Port mapper failure - RPC: Unable to receive
> > program 100024 is not available
> ...
> >
> > client:
> >
> > bligh:~ > rpcinfo -p
> > program vers proto port
> > 100000 2 tcp 111 portmapper
> > 100000 2 udp 111 portmapper
> > 100024 1 udp 32768 status
> > 100024 1 tcp 32768 status
> ...
>
>
> So bligh, the client, is running statd (the "status" service), but
> mussel can not talk to it. This is a problem.
>
> It would appear that some for of firewall is blocking access to
> bligh's statd from mussel, or that bligh's statd is ignoring requests
> from mussel. I don't know which.
>
> NeilBrown

I'm actually seeing a lot of problems on *ix systems were a service is
registered, but then the corresponding daemon doesn't actually service
requests.

My rpc-health script allowed me to identify a lot of such problems
fairly quickly:

http://dcs.nac.uci.edu/~strombrg/rpc-health.html

...so I guess the upshot is "It isn't necessarily a firewall problem".


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-01-14 16:05:39

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 14 Jan 2005, Neil Brown wrote:

> So bligh, the client, is running statd (the "status" service), but mussel
> can not talk to it. This is a problem.

are you saying inbound rpc traffic flowing from server -> client MUST not be
blocked by the firewall and that it is NOT sufficient to allow ONLY inbound
rpc traffic client -> server? sorry if this does not make sense - i'm a bit
out of my domain here...

> It would appear that some for of firewall is blocking access to bligh's
> statd from mussel, or that bligh's statd is ignoring requests from mussel.
> I don't know which.

does that fit with this senario:

- after reboot client/server have stale locks

- oddly enough though, locking DOES work between client and server

the reason it works (even on the files with stale locks) is that i have built
in my own 'leasing' system to all the files i lock. it basically does

if get_lock
refresher = forked_process_touching_file_at_interval
at_exit{ release_lock_and_kill_refresher }
else
if lock_is_too_old
mv file file.tmp && mv file.tmp file
end
retry
end

although it's quite a bit smarter than that (for instance it uses an nfs safe
lockfile to ensure only one node could attempt lock recovery at a time).

this seems to work because it give the file a new inode and, therefore, the
stale lock is invalidated - though it obviously still exists.

whenever i attempt this procedure - which is admittedly pretty sketchy - i
send emails to myself detailing the file in question (stale lock), it's inode,
etc. i have only ever seen this happen one time in 8 months and that was
during brutal testing that did a bunch of kill -9's on things. that was
before yesterday - yesterday AALL my processes ran this procedure and this is
how i came to know that the system was fubar.

so, in summary, does your understanding indicate that it should be possible
for locks themselves to work but lock recovery to fail? is that consistent
with some sort of firewall mis-config between server and client? eg. is
the traffic pattern required different for the two?

many thanks for the insight!

kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 18:20:14

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 14 Jan 2005, Dan Stromberg wrote:

> On Fri, 2005-01-14 at 13:29 +1100, Neil Brown wrote:
>> On Thursday January 13, [email protected] wrote:
>>>
>>> i am seeing problems here on my system (which has rebooted and now has stale
>>> locks on server)
>>>
>> ..
>>> server:
>>>
>>> mussel:~ > rpcinfo -u bligh status
>>> rpcinfo: RPC: Port mapper failure - RPC: Unable to receive
>>> program 100024 is not available
>> ...
>>>
>>> client:
>>>
>>> bligh:~ > rpcinfo -p
>>> program vers proto port
>>> 100000 2 tcp 111 portmapper
>>> 100000 2 udp 111 portmapper
>>> 100024 1 udp 32768 status
>>> 100024 1 tcp 32768 status
>> ...
>>
>>
>> So bligh, the client, is running statd (the "status" service), but
>> mussel can not talk to it. This is a problem.
>>
>> It would appear that some for of firewall is blocking access to
>> bligh's statd from mussel, or that bligh's statd is ignoring requests
>> from mussel. I don't know which.
>>
>> NeilBrown
>
> I'm actually seeing a lot of problems on *ix systems were a service is
> registered, but then the corresponding daemon doesn't actually service
> requests.
>
> My rpc-health script allowed me to identify a lot of such problems
> fairly quickly:
>
> http://dcs.nac.uci.edu/~strombrg/rpc-health.html
>
> ...so I guess the upshot is "It isn't necessarily a firewall problem".

nice!

it is showing (mussel=server, bligh=client) :

mussel:

~ > ./rpc-health bligh
rpcinfo: can't contact portmapper: RPC: Remote system error - No route to
host

bligh:

~ > ./rpc-health mussel
Program portmapper/100000, Proto tcp, Version 2 is OK
Program portmapper/100000, Proto udp, Version 2 is OK
Program status/100024, Proto udp, Version 1 is OK
Program status/100024, Proto tcp, Version 1 is BAD <========
Program rquotad/100011, Proto udp, Version 1 is OK
Program rquotad/100011, Proto udp, Version 2 is OK
Program rquotad/100011, Proto tcp, Version 1 is BAD <========
Program rquotad/100011, Proto tcp, Version 2 is BAD <========
Program nfs/100003, Proto udp, Version 2 is OK
Program nfs/100003, Proto udp, Version 3 is OK
Program nfs/100003, Proto tcp, Version 2 is OK
Program nfs/100003, Proto tcp, Version 3 is OK
Program nlockmgr/100021, Proto udp, Version 1 is OK
Program nlockmgr/100021, Proto udp, Version 3 is OK
Program nlockmgr/100021, Proto udp, Version 4 is OK
Program nlockmgr/100021, Proto tcp, Version 1 is BAD <========
Program nlockmgr/100021, Proto tcp, Version 3 is BAD <========
Program nlockmgr/100021, Proto tcp, Version 4 is BAD <========
Program mountd/100005, Proto udp, Version 1 is OK
Program mountd/100005, Proto tcp, Version 1 is BAD <========
Program mountd/100005, Proto udp, Version 2 is OK
Program mountd/100005, Proto tcp, Version 2 is BAD <========
Program mountd/100005, Proto udp, Version 3 is OK
Program mountd/100005, Proto tcp, Version 3 is BAD <========

so apparently our system is severly misconfigured! i'm guess all the BAD's
for tcp are o.k. but that the 'no route to host' is not a good thing. sound
accurate?

btw. here is a small patch:

[ahoward@mussel ahoward]$ diff -u rpc-health.org rpc-health
--- rpc-health.org 2005-01-06 17:48:53.000000000 -0700
+++ rpc-health 2005-01-14 11:11:20.000000000 -0700
@@ -1,7 +1,9 @@
-#!/dcs/bin/bash2
+#!/usr/bin/env bash

#set -x

+PATH=$PATH:/usr/sbin:sbin # for rpcinfo
+
function usage
{
echo Usage "$0" hostname 1>&2


kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 21:46:48

by Lever, Charles

[permalink] [raw]
Subject: RE: Debian Bug#203077: Locks not released on NFS client reboot

> >> i poked around the nfs-fag and how-to and didn't see this... if i=20
> >> didn't miss it (quite possible) may suggest it be added?
> >
> > so far this has not been a common issue, but i will consider it.
>=20
> great. it may not be common but it is, for a developer,=20
> __extrememely__ difficult to debug. i guaruntee every=20
> government lab will have this issue without knowing it=20
> because of increased security policies.

general question, then: what can we add to the client or server
implementation to make it easier to diagnose?


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 19:43:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

fr den 14.01.2005 Klokka 09:05 (-0700) skreiv Ara.T.Howard:
> On Fri, 14 Jan 2005, Neil Brown wrote:
>
> > So bligh, the client, is running statd (the "status" service), but mussel
> > can not talk to it. This is a problem.
>
> are you saying inbound rpc traffic flowing from server -> client MUST not be
> blocked by the firewall and that it is NOT sufficient to allow ONLY inbound
> rpc traffic client -> server? sorry if this does not make sense - i'm a bit
> out of my domain here...

Bi-directional RPC traffic must be allowed if you plan on using NLM
locking, since it is callback based. A couple of issues that immediately
spring to mind in the case where the server cannot call the client are:

- Blocking locks (F_SETLKW) will be hampered since the client
expects the server's lockd daemon to call it back as soon as any
conflicting locks have been released and the lock granted...

- Server reboot recovery will be broken, since the server's
rpc.statd daemon will be incapable of notifying the clients that
their locks have been lost and need to be recovered.

Cheers,
Trond
--
Trond Myklebust <[email protected]>



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 19:50:51

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 14 Jan 2005, Trond Myklebust wrote:

> fr den 14.01.2005 Klokka 09:05 (-0700) skreiv Ara.T.Howard:
>> On Fri, 14 Jan 2005, Neil Brown wrote:
>>
>>> So bligh, the client, is running statd (the "status" service), but mussel
>>> can not talk to it. This is a problem.
>>
>> are you saying inbound rpc traffic flowing from server -> client MUST not be
>> blocked by the firewall and that it is NOT sufficient to allow ONLY inbound
>> rpc traffic client -> server? sorry if this does not make sense - i'm a bit
>> out of my domain here...
>
> Bi-directional RPC traffic must be allowed if you plan on using NLM
> locking, since it is callback based. A couple of issues that immediately
> spring to mind in the case where the server cannot call the client are:
>
> - Blocking locks (F_SETLKW) will be hampered since the client
> expects the server's lockd daemon to call it back as soon as any
> conflicting locks have been released and the lock granted...
>
> - Server reboot recovery will be broken, since the server's
> rpc.statd daemon will be incapable of notifying the clients that
> their locks have been lost and need to be recovered.
>
> Cheers,
> Trond
> --
> Trond Myklebust <[email protected]>

thanks trond!

i don't know if you remember - but i was complaining about F_SETLKW performace
a while back. ;-) sounds like this IS the problem...

i poked around the nfs-fag and how-to and didn't see this... if i didn't miss
it (quite possible) may suggest it be added? i certainly will volunteer but
am unsure of the procedure.

kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 21:21:14

by Dan Stromberg

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 2005-01-14 at 11:19 -0700, Ara.T.Howard wrote:

> nice!

Why thank you. :)

> it is showing (mussel=server, bligh=client) :
>
> mussel:
>
> ~ > ./rpc-health bligh
> rpcinfo: can't contact portmapper: RPC: Remote system error - No route to
> host

That sounds like your portmapper is blocked somehow.

> bligh:
>
> ~ > ./rpc-health mussel
> Program portmapper/100000, Proto tcp, Version 2 is OK
> Program portmapper/100000, Proto udp, Version 2 is OK
> Program status/100024, Proto udp, Version 1 is OK
> Program status/100024, Proto tcp, Version 1 is BAD <========
> Program rquotad/100011, Proto udp, Version 1 is OK
> Program rquotad/100011, Proto udp, Version 2 is OK
> Program rquotad/100011, Proto tcp, Version 1 is BAD <========
> Program rquotad/100011, Proto tcp, Version 2 is BAD <========
> Program nfs/100003, Proto udp, Version 2 is OK
> Program nfs/100003, Proto udp, Version 3 is OK
> Program nfs/100003, Proto tcp, Version 2 is OK
> Program nfs/100003, Proto tcp, Version 3 is OK
> Program nlockmgr/100021, Proto udp, Version 1 is OK
> Program nlockmgr/100021, Proto udp, Version 3 is OK
> Program nlockmgr/100021, Proto udp, Version 4 is OK
> Program nlockmgr/100021, Proto tcp, Version 1 is BAD <========
> Program nlockmgr/100021, Proto tcp, Version 3 is BAD <========
> Program nlockmgr/100021, Proto tcp, Version 4 is BAD <========
> Program mountd/100005, Proto udp, Version 1 is OK
> Program mountd/100005, Proto tcp, Version 1 is BAD <========
> Program mountd/100005, Proto udp, Version 2 is OK
> Program mountd/100005, Proto tcp, Version 2 is BAD <========
> Program mountd/100005, Proto udp, Version 3 is OK
> Program mountd/100005, Proto tcp, Version 3 is BAD <========
>
> so apparently our system is severly misconfigured! i'm guess all the BAD's
> for tcp are o.k. but that the 'no route to host' is not a good thing. sound
> accurate?

Those bads are probably instances of services that are registered, but
do not respond to a minimalist, "ping like" RPC procedure. They may
actually be troublesome. Or not. :)

> btw. here is a small patch:

Thanks. I've incorporated something along these lines now.



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-01-14 21:24:04

by Lever, Charles

[permalink] [raw]
Subject: RE: Debian Bug#203077: Locks not released on NFS client reboot

> i poked around the nfs-fag and how-to and didn't see this...=20
> if i didn't miss it (quite possible) may suggest it be added?=20

so far this has not been a common issue, but i will consider it.


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-14 21:32:44

by Ara.T.Howard

[permalink] [raw]
Subject: RE: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 14 Jan 2005, Lever, Charles wrote:

>> i poked around the nfs-fag and how-to and didn't see this...
>> if i didn't miss it (quite possible) may suggest it be added?
>
> so far this has not been a common issue, but i will consider it.

great. it may not be common but it is, for a developer, __extrememely__
difficult to debug. i guaruntee every government lab will have this issue
without knowing it because of increased security policies.

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-15 05:44:06

by Ara.T.Howard

[permalink] [raw]
Subject: RE: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 14 Jan 2005, Lever, Charles wrote:

>>>> i poked around the nfs-fag and how-to and didn't see this... if i
>>>> didn't miss it (quite possible) may suggest it be added?
>>>
>>> so far this has not been a common issue, but i will consider it.
>>
>> great. it may not be common but it is, for a developer,
>> __extrememely__ difficult to debug. i guaruntee every
>> government lab will have this issue without knowing it
>> because of increased security policies.
>
> general question, then: what can we add to the client or server
> implementation to make it easier to diagnose?

the hostname of the node holding a lock would be awesome. unless i am
mistaken (normally ;-)) fcntl will only tell you the pid of the process
holding the lock - not the hostname. this info in /proc/locks would be great
too. if that were easy to get at it would be easy for an application to
detect stale locks. i guess general (userland) meta-data from the nfs server
on files would be great (who has it locked, for how long, etc) - but i realize
this is severly limited by the vfs layer.

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-18 21:00:02

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 14 Jan 2005, Trond Myklebust wrote:

> fr den 14.01.2005 Klokka 09:05 (-0700) skreiv Ara.T.Howard:
>> On Fri, 14 Jan 2005, Neil Brown wrote:
>>
>>> So bligh, the client, is running statd (the "status" service), but mussel
>>> can not talk to it. This is a problem.
>>
>> are you saying inbound rpc traffic flowing from server -> client MUST not be
>> blocked by the firewall and that it is NOT sufficient to allow ONLY inbound
>> rpc traffic client -> server? sorry if this does not make sense - i'm a bit
>> out of my domain here...
>
> Bi-directional RPC traffic must be allowed if you plan on using NLM
> locking, since it is callback based. A couple of issues that immediately
> spring to mind in the case where the server cannot call the client are:
>
> - Blocking locks (F_SETLKW) will be hampered since the client
> expects the server's lockd daemon to call it back as soon as any
> conflicting locks have been released and the lock granted...
>
> - Server reboot recovery will be broken, since the server's
> rpc.statd daemon will be incapable of notifying the clients that
> their locks have been lost and need to be recovered.
>
> Cheers,
> Trond
> --
> Trond Myklebust <[email protected]>

to anyone following this thread: this was indeed the problem - we had a
firewall rule in place that allowed only one-way traffic. if you are having
lock recovery issues look here!

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-21 00:24:41

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 14 Jan 2005, Neil Brown wrote:

> So bligh, the client, is running statd (the "status" service), but mussel
> can not talk to it. This is a problem.
>
> It would appear that some for of firewall is blocking access to bligh's
> statd from mussel, or that bligh's statd is ignoring requests from mussel.
> I don't know which.

so i thought we had this figured - but it seems we do not. here is what we
are (still) seeing

client > obtain_lock

server > cat /proc/locks # shows client pid

client > reboot

client > obtain_lock # fails

server > cat /proc/locks # shows OLD client pid


so lock recovery is still not working. our firewalls are as follows:

server iptables:

*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-N NFS
-N ICMP
-A INPUT -i lo -j ACCEPT
-A INPUT -p icmp --icmp-type any -j ICMP
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH
-A INPUT -m state --state NEW -p udp -m udp -j NFS
-A INPUT -m state --state NEW -p tcp -m tcp --dport 2049 -j NFS
-A INPUT -m state --state NEW -p tcp -m tcp --dport 111 -j NFS
-A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A ICMP -s 0/0 -j ACCEPT
-A ICMP -j REJECT --reject-with icmp-host-prohibited
-A NFS -s 10.1.0.0/16 -j ACCEPT
-A NFS -j REJECT --reject-with icmp-host-prohibited
-A SSH -s 10.1.0.0/16 -j ACCEPT
-A SSH -j REJECT --reject-with icmp-host-prohibited
COMMIT

client iptables:

filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-N NFS
-N ICMP
-A INPUT -i lo -j ACCEPT
-A INPUT -p tcp --dport 111 -j NFS
-A INPUT -p udp --dport 111 -j NFS
-A INPUT -p tcp --dport 32768 -j NFS
-A INPUT -p udp --dport 32768 -j NFS
-A INPUT -p tcp --dport 32769 -j NFS
-A INPUT -p udp --dport 32769 -j NFS
-A INPUT -s 10.1.0.0/16 -j ACCEPT
-A INPUT -p icmp --icmp-type any -j ICMP
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH
-A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A ICMP -s 0/0 -j ACCEPT
-A ICMP -j REJECT --reject-with icmp-host-prohibited
-A SSH -s 10.1.0.0/16 -j ACCEPT
-A SSH -j REJECT --reject-with icmp-host-prohibited
-A NFS -s 10.1.0.0/16 -j ACCEPT
-A NFS -j REJECT --reject-with icmp-host-prohibited
COMMIT

if i understand correctly (and i realize this is off list) this should be
allowing everything between server and client. we addeded the hole between
client and server to confirm that the firewall is not the problem. we still,
however, see the problem. btw - we a 'clearing' the lock in question by doing

mv file_with_stale_lock foobar && mv foobar file_with_stale_lock

to give it a fresh inode. this leaves the record in /proc/locks but allows us
to continue testing (we can again get the lock). is there a better way to do
this that cleans out /proc/locks?

is anything obvious here?

some other bits of info:

- both the server and client have two network cards (frontdoor/backdoor).
nfs runs all on back door. the holes we opened up were on both client and
server for both frontdoor/backdoor.

- all names live in dns (server, server.b)

- we are seeing this kind of thing (not only assoc with lock recovery) in
/var/log/messages

rpc.statd[1734]: Received erroneous SM_UNMON request from <client> for <server>

i gather this is cause by some name confusion...

so. where to go from here? i can reproduce a 'dead' lock at will by simply
rebooting a client while holding a lock. if i understand correctly the server
should be notified by the client of any locks it held before halting on the
subsequent reboot? can this communication be logged verbosly somehow? is
there an easier way to cause the notification of old locks to the server?
perhaps something like 'service nfslock restart' or is rebooting the only way?

sorry for false positive earlier.

kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-21 00:52:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

to den 20.01.2005 Klokka 17:24 (-0700) skreiv Ara.T.Howard:

> server iptables:
>
> *filter
> :INPUT ACCEPT [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> -N NFS
> -N ICMP
> -A INPUT -i lo -j ACCEPT
> -A INPUT -p icmp --icmp-type any -j ICMP
> -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
> -A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH
> -A INPUT -m state --state NEW -p udp -m udp -j NFS
> -A INPUT -m state --state NEW -p tcp -m tcp --dport 2049 -j NFS
> -A INPUT -m state --state NEW -p tcp -m tcp --dport 111 -j NFS
> -A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited
> -A INPUT -j REJECT --reject-with icmp-host-prohibited
> -A ICMP -s 0/0 -j ACCEPT
> -A ICMP -j REJECT --reject-with icmp-host-prohibited
> -A NFS -s 10.1.0.0/16 -j ACCEPT
> -A NFS -j REJECT --reject-with icmp-host-prohibited
> -A SSH -s 10.1.0.0/16 -j ACCEPT
> -A SSH -j REJECT --reject-with icmp-host-prohibited
> COMMIT

Where is the rule to accept incoming rpc.statd connections?

Cheers,
Trond

--
Trond Myklebust <[email protected]>



-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-21 18:32:24

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Thu, 20 Jan 2005, Trond Myklebust wrote:

> to den 20.01.2005 Klokka 17:24 (-0700) skreiv Ara.T.Howard:
>
>> server iptables:
>>
>> *filter
>> :INPUT ACCEPT [0:0]
>> :FORWARD ACCEPT [0:0]
>> :OUTPUT ACCEPT [0:0]
>> -N NFS
>> -N ICMP
>> -A INPUT -i lo -j ACCEPT
>> -A INPUT -p icmp --icmp-type any -j ICMP
>> -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
>> -A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH
>> -A INPUT -m state --state NEW -p udp -m udp -j NFS
>> -A INPUT -m state --state NEW -p tcp -m tcp --dport 2049 -j NFS
>> -A INPUT -m state --state NEW -p tcp -m tcp --dport 111 -j NFS
>> -A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited
>> -A INPUT -j REJECT --reject-with icmp-host-prohibited
>> -A ICMP -s 0/0 -j ACCEPT
>> -A ICMP -j REJECT --reject-with icmp-host-prohibited
>> -A NFS -s 10.1.0.0/16 -j ACCEPT
>> -A NFS -j REJECT --reject-with icmp-host-prohibited
>> -A SSH -s 10.1.0.0/16 -j ACCEPT
>> -A SSH -j REJECT --reject-with icmp-host-prohibited
>> COMMIT
>
> Where is the rule to accept incoming rpc.statd connections?
>
> Cheers,
> Trond

sorry, we edited out the critical info.

on the client we have

[root@bligh root]# rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 32768 status
100024 1 tcp 32768 status
100021 1 udp 32769 nlockmgr
100021 3 udp 32769 nlockmgr
100021 4 udp 32769 nlockmgr
100021 1 tcp 32769 nlockmgr
100021 3 tcp 32769 nlockmgr
100021 4 tcp 32769 nlockmgr

[root@bligh root]# grep NFS /etc/sysconfig/iptables
-N NFS
-A INPUT -p tcp --dport 111 -j NFS
-A INPUT -p udp --dport 111 -j NFS
-A INPUT -p tcp --dport 32768:32769 -j NFS
-A INPUT -p udp --dport 32768:32769 -j NFS
-A NFS -s 10.1.186.70/32 -j ACCEPT
-A NFS -j REJECT --reject-with icmp-host-prohibited

this did not work. just to be safe we added

on the server - where ip is the client's ip

-A INPUT -s 10.1.186.71/32 -j ACCEPT

on the client, where ip is the server's ip

-A INPUT -s 10.1.186.54/32 -j ACCEPT

to the top of our ruleset before ANY denys. still no go. on shutdown/reboot
of the client there are no error message whatsoever. however, we are seeing
lots of these in /var/log/messages

...
...
...
Jan 21 10:55:50 moby rpc.statd[1985]: Received erroneous SM_UNMON request from moby.ngdc.noaa.gov for 10.1.186.62
Jan 21 10:55:50 moby rpc.statd[1985]: Received erroneous SM_UNMON request from moby.ngdc.noaa.gov for 10.1.186.67
...
...
...

where the ips are those of various clients that are successfully performing
locking. what is this about?

as i said before, both the server and all clients are multihomed with nfs
running only on the backdoor. the frontdoor/backdoor have names like name,
name.b respectively. these names are all in dns. is there any chance this
could be related to lock recovery failure?

our sysad has suggested starting a tcpdump in the nfslock init.d script to see
what's happening - any other suggestions?

kind regards.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-21 18:41:07

by Dan Stromberg

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot


A sniffer is your friend. ethereal is the best I've encountered.

If you fire up an ethereal to sniff packets from your client on the
server, and/or vice versa, you will likely get an idea of what's wrong
fairly quickly.

BTW, last I heard, NFS on linux could be firewalled, but it required
starting up some daemons with some magic options to hard code them to
specific ports, rather than allowing portmap/rpcbind to move them
around.

On Fri, 2005-01-21 at 11:32 -0700, Ara.T.Howard wrote:
> On Thu, 20 Jan 2005, Trond Myklebust wrote:
>
> > to den 20.01.2005 Klokka 17:24 (-0700) skreiv Ara.T.Howard:
> >
> >> server iptables:
> >>
> >> *filter
> >> :INPUT ACCEPT [0:0]
> >> :FORWARD ACCEPT [0:0]
> >> :OUTPUT ACCEPT [0:0]
> >> -N NFS
> >> -N ICMP
> >> -A INPUT -i lo -j ACCEPT
> >> -A INPUT -p icmp --icmp-type any -j ICMP
> >> -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
> >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH
> >> -A INPUT -m state --state NEW -p udp -m udp -j NFS
> >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 2049 -j NFS
> >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 111 -j NFS
> >> -A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited
> >> -A INPUT -j REJECT --reject-with icmp-host-prohibited
> >> -A ICMP -s 0/0 -j ACCEPT
> >> -A ICMP -j REJECT --reject-with icmp-host-prohibited
> >> -A NFS -s 10.1.0.0/16 -j ACCEPT
> >> -A NFS -j REJECT --reject-with icmp-host-prohibited
> >> -A SSH -s 10.1.0.0/16 -j ACCEPT
> >> -A SSH -j REJECT --reject-with icmp-host-prohibited
> >> COMMIT
> >
> > Where is the rule to accept incoming rpc.statd connections?
> >
> > Cheers,
> > Trond
>
> sorry, we edited out the critical info.
>
> on the client we have
>
> [root@bligh root]# rpcinfo -p
> program vers proto port
> 100000 2 tcp 111 portmapper
> 100000 2 udp 111 portmapper
> 100024 1 udp 32768 status
> 100024 1 tcp 32768 status
> 100021 1 udp 32769 nlockmgr
> 100021 3 udp 32769 nlockmgr
> 100021 4 udp 32769 nlockmgr
> 100021 1 tcp 32769 nlockmgr
> 100021 3 tcp 32769 nlockmgr
> 100021 4 tcp 32769 nlockmgr
>
> [root@bligh root]# grep NFS /etc/sysconfig/iptables
> -N NFS
> -A INPUT -p tcp --dport 111 -j NFS
> -A INPUT -p udp --dport 111 -j NFS
> -A INPUT -p tcp --dport 32768:32769 -j NFS
> -A INPUT -p udp --dport 32768:32769 -j NFS
> -A NFS -s 10.1.186.70/32 -j ACCEPT
> -A NFS -j REJECT --reject-with icmp-host-prohibited
>
> this did not work. just to be safe we added
>
> on the server - where ip is the client's ip
>
> -A INPUT -s 10.1.186.71/32 -j ACCEPT
>
> on the client, where ip is the server's ip
>
> -A INPUT -s 10.1.186.54/32 -j ACCEPT
>
> to the top of our ruleset before ANY denys. still no go. on shutdown/reboot
> of the client there are no error message whatsoever. however, we are seeing
> lots of these in /var/log/messages
>
> ...
> ...
> ...
> Jan 21 10:55:50 moby rpc.statd[1985]: Received erroneous SM_UNMON request from moby.ngdc.noaa.gov for 10.1.186.62
> Jan 21 10:55:50 moby rpc.statd[1985]: Received erroneous SM_UNMON request from moby.ngdc.noaa.gov for 10.1.186.67
> ...
> ...
> ...
>
> where the ips are those of various clients that are successfully performing
> locking. what is this about?
>
> as i said before, both the server and all clients are multihomed with nfs
> running only on the backdoor. the frontdoor/backdoor have names like name,
> name.b respectively. these names are all in dns. is there any chance this
> could be related to lock recovery failure?
>
> our sysad has suggested starting a tcpdump in the nfslock init.d script to see
> what's happening - any other suggestions?
>
> kind regards.
>
> -a
> --
> ===============================================================================
> | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
> | PHONE :: 303.497.6469
> | When you do something, you should burn yourself completely, like a good
> | bonfire, leaving no trace of yourself. --Shunryu Suzuki
> ===============================================================================
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
> Tool for open source databases. Create drag-&-drop reports. Save time
> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-01-21 18:59:24

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 21 Jan 2005, Dan Stromberg wrote:

>
> A sniffer is your friend. ethereal is the best I've encountered.
>
> If you fire up an ethereal to sniff packets from your client on the server,
> and/or vice versa, you will likely get an idea of what's wrong fairly
> quickly.

probably the next step.

> BTW, last I heard, NFS on linux could be firewalled, but it required
> starting up some daemons with some magic options to hard code them to
> specific ports, rather than allowing portmap/rpcbind to move them around.


unless i am mistaken, by adding

>> on the server - where ip is the client's ip
>>
>> -A INPUT -s 10.1.186.71/32 -j ACCEPT
>>
>> on the client, where ip is the server's ip
>>
>> -A INPUT -s 10.1.186.54/32 -j ACCEPT

we effictively did NOT firewall ANYTHING between client and server so
portmapping shouldn't have made any difference right?

cheers.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-21 19:06:00

by Dan Stromberg

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 2005-01-21 at 11:59 -0700, Ara.T.Howard wrote:
> >> on the server - where ip is the client's ip
> >>
> >> -A INPUT -s 10.1.186.71/32 -j ACCEPT
> >>
> >> on the client, where ip is the server's ip
> >>
> >> -A INPUT -s 10.1.186.54/32 -j ACCEPT
>
> we effictively did NOT firewall ANYTHING between client and server so
> portmapping shouldn't have made any difference right?

Likely, but the sniffer should give you an empirical indication, which
is better than what our theoretical discussion might give you.


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2005-01-21 19:10:27

by Ara.T.Howard

[permalink] [raw]
Subject: Re: Debian Bug#203077: Locks not released on NFS client reboot

On Fri, 21 Jan 2005, Dan Stromberg wrote:

> On Fri, 2005-01-21 at 11:59 -0700, Ara.T.Howard wrote:
>>>> on the server - where ip is the client's ip
>>>>
>>>> -A INPUT -s 10.1.186.71/32 -j ACCEPT
>>>>
>>>> on the client, where ip is the server's ip
>>>>
>>>> -A INPUT -s 10.1.186.54/32 -j ACCEPT
>>
>> we effictively did NOT firewall ANYTHING between client and server so
>> portmapping shouldn't have made any difference right?
>
> Likely, but the sniffer should give you an empirical indication, which
> is better than what our theoretical discussion might give you.

that's the right attitude - never trust anything! ;-) we'll look further as
you reccomend.

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself. --Shunryu Suzuki
===============================================================================


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs