From: Neil Brown <neilb@suse.de>
Subject: Re: Regression: NFS locking hangs when statd not running.
Date: Tue, 24 Oct 2006 11:06:16 +1000
Message-ID: <17725.26376.280902.571606@cse.unsw.edu.au>
References: <17720.41873.549441.330938@cse.unsw.edu.au>
	<20061020124119.GE27351@suse.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Cc: Takashi Iwai <tiwai@suse.de>,
	Chuck Lever <chuck.lever@oracle.com>, nfs@lists.sourceforge.net
To: Olaf Kirch <okir@suse.de>
In-Reply-To: message from Olaf Kirch on Friday October 20
Sender: nfs-bounces@lists.sourceforge.net
Errors-To: nfs-bounces@lists.sourceforge.net

On Friday October 20, okir@suse.de wrote:
> 
> I believe this should depend on the semantics of the parent mount.
> Basically, we should copy intr,hard from the NFS mount to the lockd
> client we use, and from there to the portmap client. Otherwise
> in a HA setup where you have hard mounts, you will suddenly start
> seeing IO errors during failover.

Having almost implemented this, I find I disagree.  

Due to the state-management nature of lockd requests, I think they
need to be hard,nointr always (as they currently are) otherwise the
client and server can get out-of-sync causing serious confusion.

Normally I would expect a successful GETATTR before a lock request,
and the chance of the server becoming unavailable in that window is
pretty small.
'soft' lock requests are just silly, and interrupting lock requests
should be handled by leaving an unlock request running asynchronously
(which maybe we already do).

So I don't think there is anything that needs to be done specifically
to lockd requests.  statd is what I am really interested in here..


> 
> The patch looks good, except maybe I'd use a different name, like
> RPC_CLNT_BIND_NORETRY or some such.

Hmmm... you prefer the name to reflect what happens rather than why it
happens, and that is not unreasonable.  Your proposed name doesn't
quite capture what I was doing.  I was only avoiding the retry if
statd wasn't registered.  If portmap isn't running or statd is
responding slowly (or has died I guess) then we still retry.. Maybe we
shouldn't?

When talking to statd or local portmap we really want to abort if
statd says 'no', or if we get ECONREFUSED from portmap, and probably
even if we get ECONREFUSED from statd.... though I'm not 100% certain
about the last.
But if statd is slow, we still want to retry.

I think I'll stick with the current name, but the next patch will look
different and maybe we can discuss the name issue again...

Stay tuned.

NeilBrown

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs