2006-10-20 10:23:22

by NeilBrown

[permalink] [raw]
Subject: Regression: NFS locking hangs when statd not running.


Hi Chunk et.al.

There seems to be a regression in 2.6.19-rc which I'm blaming
on you :-)
I should haste to say that I think you fixed something in the RPC
layer and this as exposed a new problem. I'm hoping that you can
tell me if my analysis can patch make sense.

In call_bind_status (net/sunrpc/clnt.c) there is code to handle the
case where portmap has reported that the program/version is
unavailable (via -EACCES).
It requests a 3 second delay followed by a retry. This can be expected
to recur until a major timeout.

In 2.6.18, this code is never exercised. I'm not sure of exactly
why, but when the portmap sub-task aborts, the whole task aborts.
In 2.6.19-rc, this case is exercised. I assume you fixed something
so that the whole task doesn't get aborted.

The problem is that there is a case where we don't want the retry.

In 2.6.18, if statd isn't running, then a lock attempt returns ENOLCK
immediately, which I think is good.
In 2.6.19-rc, in the same situation, a lock attempt waits for a major
timeout (30 seconds for TCP mounts) and is not interruptible for this
whole time (even with '-o intr' mounts).

So: what to do? Should we retry requests when portmap says "no such
service".

I think that for requests to a remote service - lockd or nfsd - we do
want to retry. The server might be rebooting and so "no such
service" should be treated much like "no reply".

However for local services - statd - I don't think the timeout is
desired. So I would like to propose the following patch. It
introduces a new flag 'local' that gets set for statd requests and
causes the call_bind_status to abort rather than retry after a
timeout.
It also sets RPC_CLNT_CREATE_NOPING as I couldn't see an obvious way
to pass 'local' through to the ping request. Maybe this aspect of
the patch can be improved.

This also raises another issues. The 'soft' and 'intr' flags aren't
really passed around very much. An 'intr' mount still makes 'nointr'
requests to statd, and an 'intr,hard' rpc request will make a
'nointr,soft' request to portmap for binding. This doesn't seem
right though I'm not certain if there are bad consequences.
Have you thought about this issue? Would you like to convince me
that the current situation is fine?

Thanks for listening,
Comments on the patch appreciated.
NeilBrown

Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./fs/lockd/mon.c | 4 +++-
./include/linux/sunrpc/clnt.h | 2 ++
./net/sunrpc/clnt.c | 4 ++++
3 files changed, 9 insertions(+), 1 deletion(-)

diff .prev/fs/lockd/mon.c ./fs/lockd/mon.c
--- .prev/fs/lockd/mon.c 2006-10-20 16:30:56.000000000 +1000
+++ ./fs/lockd/mon.c 2006-10-20 18:45:49.000000000 +1000
@@ -138,7 +138,9 @@ nsm_create(void)
.program = &nsm_program,
.version = SM_VERSION,
.authflavor = RPC_AUTH_NULL,
- .flags = (RPC_CLNT_CREATE_ONESHOT),
+ .flags = (RPC_CLNT_CREATE_ONESHOT|
+ RPC_CLNT_CREATE_NOPING|
+ RPC_CLNT_CREATE_LOCAL),
};

return rpc_create(&args);

diff .prev/include/linux/sunrpc/clnt.h ./include/linux/sunrpc/clnt.h
--- .prev/include/linux/sunrpc/clnt.h 2006-10-20 16:24:50.000000000 +1000
+++ ./include/linux/sunrpc/clnt.h 2006-10-20 18:41:24.000000000 +1000
@@ -42,6 +42,7 @@ struct rpc_clnt {
cl_intr : 1,/* interruptible */
cl_autobind : 1,/* use getport() */
cl_oneshot : 1,/* dispose after use */
+ cl_local : 1, /* don't retry if not registered */
cl_dead : 1;/* abandoned */

struct rpc_rtt * cl_rtt; /* RTO estimator data */
@@ -110,6 +111,7 @@ struct rpc_create_args {
#define RPC_CLNT_CREATE_ONESHOT (1UL << 3)
#define RPC_CLNT_CREATE_NONPRIVPORT (1UL << 4)
#define RPC_CLNT_CREATE_NOPING (1UL << 5)
+#define RPC_CLNT_CREATE_LOCAL (1UL << 6)

struct rpc_clnt *rpc_create(struct rpc_create_args *args);
struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *,

diff .prev/net/sunrpc/clnt.c ./net/sunrpc/clnt.c
--- .prev/net/sunrpc/clnt.c 2006-10-20 15:21:14.000000000 +1000
+++ ./net/sunrpc/clnt.c 2006-10-20 18:42:07.000000000 +1000
@@ -238,6 +238,8 @@ struct rpc_clnt *rpc_create(struct rpc_c
clnt->cl_autobind = 1;
if (args->flags & RPC_CLNT_CREATE_ONESHOT)
clnt->cl_oneshot = 1;
+ if (args->flags & RPC_CLNT_CREATE_LOCAL)
+ clnt->cl_local = 1;

return clnt;
}
@@ -860,6 +862,8 @@ call_bind_status(struct rpc_task *task)
case -EACCES:
dprintk("RPC: %4d remote rpcbind: RPC program/version unavailable\n",
task->tk_pid);
+ if (task->tk_client->cl_local)
+ break;
rpc_delay(task, 3*HZ);
goto retry_timeout;
case -ETIMEDOUT:

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-10-20 12:41:25

by Olaf Kirch

[permalink] [raw]
Subject: Re: Regression: NFS locking hangs when statd not running.

On Fri, Oct 20, 2006 at 08:23:13PM +1000, Neil Brown wrote:
> The problem is that there is a case where we don't want the retry.
>
> In 2.6.18, if statd isn't running, then a lock attempt returns ENOLCK
> immediately, which I think is good.
> In 2.6.19-rc, in the same situation, a lock attempt waits for a major
> timeout (30 seconds for TCP mounts) and is not interruptible for this
> whole time (even with '-o intr' mounts).
>
> So: what to do? Should we retry requests when portmap says "no such
> service".

When lockd tries to do an upcall to statd? Definitely no, I'd say.
Essentially, statd upcalls should be a one-shot affair with minimal
timeout.

> I think that for requests to a remote service - lockd or nfsd - we do
> want to retry. The server might be rebooting and so "no such
> service" should be treated much like "no reply".

I believe this should depend on the semantics of the parent mount.
Basically, we should copy intr,hard from the NFS mount to the lockd
client we use, and from there to the portmap client. Otherwise
in a HA setup where you have hard mounts, you will suddenly start
seeing IO errors during failover.

The patch looks good, except maybe I'd use a different name, like
RPC_CLNT_BIND_NORETRY or some such.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-20 13:00:28

by Chuck Lever

[permalink] [raw]
Subject: Re: Regression: NFS locking hangs when statd not running.

On 10/20/06, Olaf Kirch <[email protected]> wrote:
> On Fri, Oct 20, 2006 at 08:23:13PM +1000, Neil Brown wrote:
> > The problem is that there is a case where we don't want the retry.
> >
> > In 2.6.18, if statd isn't running, then a lock attempt returns ENOLCK
> > immediately, which I think is good.
> > In 2.6.19-rc, in the same situation, a lock attempt waits for a major
> > timeout (30 seconds for TCP mounts) and is not interruptible for this
> > whole time (even with '-o intr' mounts).
> >
> > So: what to do? Should we retry requests when portmap says "no such
> > service".
>
> When lockd tries to do an upcall to statd? Definitely no, I'd say.
> Essentially, statd upcalls should be a one-shot affair with minimal
> timeout.

I don't have a strong opinion here, but what you and Olaf say sounds reasonable.

> > I think that for requests to a remote service - lockd or nfsd - we do
> > want to retry. The server might be rebooting and so "no such
> > service" should be treated much like "no reply".
>
> I believe this should depend on the semantics of the parent mount.
> Basically, we should copy intr,hard from the NFS mount to the lockd
> client we use, and from there to the portmap client. Otherwise
> in a HA setup where you have hard mounts, you will suddenly start
> seeing IO errors during failover.

Copying the intr flag makes sense. Neil, I'd like to see your patch
address this too.

The hard v. soft issue is more difficult. The semantic you are
requesting for the local statd is clearly "soft" and that's what you
say you always want. Otherwise copying the soft flag makes sense.

Maybe you can re-use the soft flag and expose a way to set the soft
timeout for the mon client to get the exact behavior you want?

--
"We who cut mere stones must always be envisioning cathedrals"
-- Quarry worker's creed

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-24 01:06:29

by NeilBrown

[permalink] [raw]
Subject: Re: Regression: NFS locking hangs when statd not running.

On Friday October 20, [email protected] wrote:
>
> I believe this should depend on the semantics of the parent mount.
> Basically, we should copy intr,hard from the NFS mount to the lockd
> client we use, and from there to the portmap client. Otherwise
> in a HA setup where you have hard mounts, you will suddenly start
> seeing IO errors during failover.

Having almost implemented this, I find I disagree.

Due to the state-management nature of lockd requests, I think they
need to be hard,nointr always (as they currently are) otherwise the
client and server can get out-of-sync causing serious confusion.

Normally I would expect a successful GETATTR before a lock request,
and the chance of the server becoming unavailable in that window is
pretty small.
'soft' lock requests are just silly, and interrupting lock requests
should be handled by leaving an unlock request running asynchronously
(which maybe we already do).

So I don't think there is anything that needs to be done specifically
to lockd requests. statd is what I am really interested in here..


>
> The patch looks good, except maybe I'd use a different name, like
> RPC_CLNT_BIND_NORETRY or some such.

Hmmm... you prefer the name to reflect what happens rather than why it
happens, and that is not unreasonable. Your proposed name doesn't
quite capture what I was doing. I was only avoiding the retry if
statd wasn't registered. If portmap isn't running or statd is
responding slowly (or has died I guess) then we still retry.. Maybe we
shouldn't?

When talking to statd or local portmap we really want to abort if
statd says 'no', or if we get ECONREFUSED from portmap, and probably
even if we get ECONREFUSED from statd.... though I'm not 100% certain
about the last.
But if statd is slow, we still want to retry.

I think I'll stick with the current name, but the next patch will look
different and maybe we can discuss the name issue again...

Stay tuned.

NeilBrown

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-24 01:26:54

by NeilBrown

[permalink] [raw]
Subject: Re: Regression: NFS locking hangs when statd not running.

On Friday October 20, [email protected] wrote:
>
> Copying the intr flag makes sense. Neil, I'd like to see your patch
> address this too.

I'll see what I can do... but not for lockd requests. See previous mail

>
> The hard v. soft issue is more difficult. The semantic you are
> requesting for the local statd is clearly "soft" and that's what you
> say you always want. Otherwise copying the soft flag makes sense.
>
> Maybe you can re-use the soft flag and expose a way to set the soft
> timeout for the mon client to get the exact behavior you want?

It's not really about timeouts - or about 'soft'. It is about getting
a definite 'no' answer (either from portmap or the network stack) and
believing it, which is only safe when talking to a local service
.... hmmm. Maybe that is also safe when doing the initial 'ping'.
That is currently always 'soft' and presumably 'expects' the server to
be there and functional so a 'no' reply could reasonably be believed
there.

Looking more at where and initial ping is used, I notice that the
lockd client does a (soft) ping first. So if you have a hard mount,
the first lock request can still fail due to an unresponsive server
because the ping will fail... Shouldn't lockd clients be created with
NO_PING ??

NeilBrown

>
> --
> "We who cut mere stones must always be envisioning cathedrals"
> -- Quarry worker's creed
A colleague as a message on his wall something like:
If you want to build great boats, teach your workers to yearn for
the wide open oceans.

or something like that. :-)

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-24 01:54:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: Regression: NFS locking hangs when statd not running.

On Tue, 2006-10-24 at 11:06 +1000, Neil Brown wrote:
> Due to the state-management nature of lockd requests, I think they
> need to be hard,nointr always (as they currently are) otherwise the
> client and server can get out-of-sync causing serious confusion.

Not really. As long as the interrupted lock request is followed by an
UNLOCK request, you should be safe.


> When talking to statd or local portmap we really want to abort if
> statd says 'no', or if we get ECONREFUSED from portmap, and probably
> even if we get ECONREFUSED from statd.... though I'm not 100% certain
> about the last.
> But if statd is slow, we still want to retry.

No. We should back off and retry. If the user wants to be able to lock,
then the kernel shouldn't be overriding that choice. Only if the user
specifies "nolock" should we fall back to local locking.

Cheers,
Trond


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2006-10-24 02:17:17

by NeilBrown

[permalink] [raw]
Subject: Re: Regression: NFS locking hangs when statd not running.

On Monday October 23, [email protected] wrote:
> On Tue, 2006-10-24 at 11:06 +1000, Neil Brown wrote:
> > Due to the state-management nature of lockd requests, I think they
> > need to be hard,nointr always (as they currently are) otherwise the
> > client and server can get out-of-sync causing serious confusion.
>
> Not really. As long as the interrupted lock request is followed by an
> UNLOCK request, you should be safe.

Yes, I guess so. Do we do that? Send an UNLOCK if a LOCK fails
mysteriously?

>
>
> > When talking to statd or local portmap we really want to abort if
> > statd says 'no', or if we get ECONREFUSED from portmap, and probably
> > even if we get ECONREFUSED from statd.... though I'm not 100% certain
> > about the last.
> > But if statd is slow, we still want to retry.
>
> No. We should back off and retry. If the user wants to be able to lock,
> then the kernel shouldn't be overriding that choice. Only if the user
> specifies "nolock" should we fall back to local locking.

Certainly we should only fall back to lock locking if 'nolock' was
given. But in what circumstances can we return -ENOLOCK?
I'm suggesting that if statd isn't running, then the best thing is to
return -ENOLOCK quickly. Currently we block uninterruptible for 30
seconds (on a tcp mount). For every lock request.

NeilBrown

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs