2007-04-03 16:09:47

by Frank Filz

[permalink] [raw]
Subject: Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

I'm looking at the following code, and wondering if something is missing
in the handling of NFS4ERR_OLD_STATEID. The result is that if this error
occurs, nfs4_map_errors() will print:

nfs4_map_errors could not handle NFSv4 error 10024

It also looks like the handling of NFS4ERR_DELAY etc. may be wrong,
since if nfs4_delay() returns without error, it falls through to the
handling of NFS4ERR_OLD_STATEID.

Based on the code in nfs4_async_handle_error(), it looks like it might
be sufficient to set ret = 0 in addition to exception->retry = 1.

Thanks for any thoughts

Frank Filz

/* This is the error handling routine for processes that are allowed
* to sleep.
*/
int nfs4_handle_exception(const struct nfs_server *server, int
errorcode, struct nfs4_exception *exception)
{
struct nfs_client *clp = server->nfs_client;
int ret = errorcode;

exception->retry = 0;
switch(errorcode) {
case 0:
return 0;
case -NFS4ERR_STALE_CLIENTID:
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_EXPIRED:
nfs4_schedule_state_recovery(clp);
ret = nfs4_wait_clnt_recover(server->client, clp);
if (ret == 0)
exception->retry = 1;
break;
case -NFS4ERR_FILE_OPEN:
case -NFS4ERR_GRACE:
case -NFS4ERR_DELAY:
ret = nfs4_delay(server->client, &exception->timeout);
if (ret != 0)
break;
case -NFS4ERR_OLD_STATEID:
exception->retry = 1;
}
/* We failed to handle the error */
return nfs4_map_errors(ret);
}



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-04-23 17:10:36

by Frank Filz

[permalink] [raw]
Subject: Re: Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

On Wed, 2007-04-18 at 09:57 -0400, Jeff Layton wrote:
> In that case, then what would you think about this patch, based on Frank's
> suggestion. The other option might be to set "ret = -EIO" to make it avoid
> the printk, but if all callers are expected to retry in that situation, I
> don't guess it matters much:

I think this is what we want, returning -EIO doesn't make sense to me
(realizing that all callers ignore the return code if exception->retry
is 1).

Frank

> Signed-off-by: Jeff Layton <[email protected]>
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index f52cf5c..b456783 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -2774,6 +2774,7 @@ int nfs4_handle_exception(const struct nfs_server *server, int errorcode, struct
> break;
> case -NFS4ERR_OLD_STATEID:
> exception->retry = 1;
> + ret = 0;
> }
> /* We failed to handle the error */
> return nfs4_map_errors(ret);


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-24 12:29:58

by Jeff Layton

[permalink] [raw]
Subject: Re: Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

On Mon, Apr 23, 2007 at 10:14:30AM -0700, Frank Filz wrote:
> On Wed, 2007-04-18 at 09:57 -0400, Jeff Layton wrote:
> > In that case, then what would you think about this patch, based on Frank's
> > suggestion. The other option might be to set "ret = -EIO" to make it avoid
> > the printk, but if all callers are expected to retry in that situation, I
> > don't guess it matters much:
>
> I think this is what we want, returning -EIO doesn't make sense to me
> (realizing that all callers ignore the return code if exception->retry
> is 1).
>
> Frank
>

True, though you can make the case that returning with an error code of 0
implies "success" and that doesn't make much sense here either. Regardless,
since all current callers seem to retry indefinitely in this situation, then
it really doesn't make much difference either way.

-- Jeff

> > Signed-off-by: Jeff Layton <[email protected]>
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index f52cf5c..b456783 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -2774,6 +2774,7 @@ int nfs4_handle_exception(const struct nfs_server *server, int errorcode, struct
> > break;
> > case -NFS4ERR_OLD_STATEID:
> > exception->retry = 1;
> > + ret = 0;
> > }
> > /* We failed to handle the error */
> > return nfs4_map_errors(ret);
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-18 13:58:17

by Jeff Layton

[permalink] [raw]
Subject: Re: Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

On Thu, Apr 12, 2007 at 02:07:48PM -0400, Trond Myklebust wrote:
> On Tue, 2007-04-03 at 09:09 -0700, Frank Filz wrote:
> > I'm looking at the following code, and wondering if something is missing
> > in the handling of NFS4ERR_OLD_STATEID. The result is that if this error
> > occurs, nfs4_map_errors() will print:
> >
> > nfs4_map_errors could not handle NFSv4 error 10024
>
> Agreed. We don't really need to worry users with that printk.
>
> > It also looks like the handling of NFS4ERR_DELAY etc. may be wrong,
> > since if nfs4_delay() returns without error, it falls through to the
> > handling of NFS4ERR_OLD_STATEID.
>
> The correct way to deal with both NFS4ERR_DELAY and NFS4ERR_OLD_STATEID
> is simply to retry the request.
>
> As I said in an earlier mail, the NFS4ERR_OLD_STATEID is occurring
> because some OPEN/OPEN_DOWNGRADE has caused the stateid to change. All
> we can do is retry with the new stateid.
>
> Cheers,
> Trond
>

In that case, then what would you think about this patch, based on Frank's
suggestion. The other option might be to set "ret = -EIO" to make it avoid
the printk, but if all callers are expected to retry in that situation, I
don't guess it matters much:

Signed-off-by: Jeff Layton <[email protected]>

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index f52cf5c..b456783 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2774,6 +2774,7 @@ int nfs4_handle_exception(const struct nfs_server *server, int errorcode, struct
break;
case -NFS4ERR_OLD_STATEID:
exception->retry = 1;
+ ret = 0;
}
/* We failed to handle the error */
return nfs4_map_errors(ret);

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-12 11:59:49

by Jeff Layton

[permalink] [raw]
Subject: Re: Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

Frank Filz wrote:
> I'm looking at the following code, and wondering if something is missing
> in the handling of NFS4ERR_OLD_STATEID. The result is that if this error
> occurs, nfs4_map_errors() will print:
>
> nfs4_map_errors could not handle NFSv4 error 10024
>
> It also looks like the handling of NFS4ERR_DELAY etc. may be wrong,
> since if nfs4_delay() returns without error, it falls through to the
> handling of NFS4ERR_OLD_STATEID.
>
> Based on the code in nfs4_async_handle_error(), it looks like it might
> be sufficient to set ret = 0 in addition to exception->retry = 1.
>
> Thanks for any thoughts
>
> Frank Filz
>
> /* This is the error handling routine for processes that are allowed
> * to sleep.
> */
> int nfs4_handle_exception(const struct nfs_server *server, int
> errorcode, struct nfs4_exception *exception)
> {
> struct nfs_client *clp = server->nfs_client;
> int ret = errorcode;
>
> exception->retry = 0;
> switch(errorcode) {
> case 0:
> return 0;
> case -NFS4ERR_STALE_CLIENTID:
> case -NFS4ERR_STALE_STATEID:
> case -NFS4ERR_EXPIRED:
> nfs4_schedule_state_recovery(clp);
> ret = nfs4_wait_clnt_recover(server->client, clp);
> if (ret == 0)
> exception->retry = 1;
> break;
> case -NFS4ERR_FILE_OPEN:
> case -NFS4ERR_GRACE:
> case -NFS4ERR_DELAY:
> ret = nfs4_delay(server->client, &exception->timeout);
> if (ret != 0)
> break;
> case -NFS4ERR_OLD_STATEID:
> exception->retry = 1;
> }
> /* We failed to handle the error */
> return nfs4_map_errors(ret);
> }
>
>

This looks pretty much correct to me as-is. If we set ret=0 on
-NFS4ERR_OLD_STATEID, then the caller won't get back an error code. This
makes an assumption that every caller of nfs4_handle_exception is
looping based on exception->retry. I'm not sure if that's a safe
assumption. A better idea *might* be to fix up nfs4_map_errors not to
throw the warning for some errors < -1000, but still return an error.

This sounds sort of like addressing the symptom and not the real
problem, however. The real question ought to be why you're getting
OLD_STATEID errors back from the server here. There can be legit
reasons, but these errors ought to be fairly rare. I generally only have
seen them when processes are signalled while RPC requests are in flight.

Also, it seems like when we hit -NFS4ERR_DELAY, we want to retry but
only if the delay didn't hit an error. It looks like it only returns
error if process was signalled while in nfs4_delay, and then we want to
pass an -ERESTARTSYS back up the call chain (and not retry). So I think
that's also correct as-is.

-- Jeff


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-12 17:08:35

by Frank Filz

[permalink] [raw]
Subject: Re: Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

On Thu, 2007-04-12 at 07:59 -0400, Jeff Layton wrote:
> This looks pretty much correct to me as-is. If we set ret=0 on
> -NFS4ERR_OLD_STATEID, then the caller won't get back an error code. This
> makes an assumption that every caller of nfs4_handle_exception is
> looping based on exception->retry. I'm not sure if that's a safe
> assumption. A better idea *might* be to fix up nfs4_map_errors not to
> throw the warning for some errors < -1000, but still return an error.

nfs4_map_errors should warn about errors, because it's a last defense
against leaking NFS4 error numbers to the rest of the kernel (that
doesn't recognize them). So before calling nfs4_map_errors(), the error
code should already be converted to an errno code.

It looked to me like every caller of nfs4_handle_exception() does loop
on exception-retry(), and in that case, does not look at the error
returned.

> This sounds sort of like addressing the symptom and not the real
> problem, however. The real question ought to be why you're getting
> OLD_STATEID errors back from the server here. There can be legit
> reasons, but these errors ought to be fairly rare. I generally only have
> seen them when processes are signalled while RPC requests are in flight.

Sure, understanding why we are getting them is important, but since it
appears that handling might be missing, we may be seeing more of them
than expected.

> Also, it seems like when we hit -NFS4ERR_DELAY, we want to retry but
> only if the delay didn't hit an error. It looks like it only returns
> error if process was signalled while in nfs4_delay, and then we want to
> pass an -ERESTARTSYS back up the call chain (and not retry). So I think
> that's also correct as-is.

I agree that's correct.

Frank



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-12 17:31:04

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NFS] Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

On Thu, 2007-04-12 at 07:59 -0400, Jeff Layton wrote:

> This sounds sort of like addressing the symptom and not the real
> problem, however. The real question ought to be why you're getting
> OLD_STATEID errors back from the server here. There can be legit
> reasons, but these errors ought to be fairly rare. I generally only have
> seen them when processes are signalled while RPC requests are in flight.

OLD_STATEID usually occurs if an OPEN or an OPEN_DOWNGRADE gets sent
that changes the current stateid while a READ or WRITE to the same file
is in flight.

It is generally a rare event, and so it tends to be much easier to deal
with by simply resending the READ/WRITEs with the updated stateid rather
than adding an expensive locking scheme for excluding READ/WRITE while
the OPEN/OPEN_DOWNGRADE is in progress.

Cheers,
Trond

2007-04-12 17:49:16

by Jeff Layton

[permalink] [raw]
Subject: Re: [NFS] Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

Trond Myklebust wrote:
> On Thu, 2007-04-12 at 07:59 -0400, Jeff Layton wrote:
>
>> This sounds sort of like addressing the symptom and not the real
>> problem, however. The real question ought to be why you're getting
>> OLD_STATEID errors back from the server here. There can be legit
>> reasons, but these errors ought to be fairly rare. I generally only have
>> seen them when processes are signalled while RPC requests are in flight.
>
> OLD_STATEID usually occurs if an OPEN or an OPEN_DOWNGRADE gets sent
> that changes the current stateid while a READ or WRITE to the same file
> is in flight.
>
> It is generally a rare event, and so it tends to be much easier to deal
> with by simply resending the READ/WRITEs with the updated stateid rather
> than adding an expensive locking scheme for excluding READ/WRITE while
> the OPEN/OPEN_DOWNGRADE is in progress.
>
> Cheers,
> Trond
>

Ahh that makes sense. I must have misunderstood the circumstances where
I was seeing it. Thanks for clarifying.

-- Jeff

2007-04-12 17:51:20

by Jeff Layton

[permalink] [raw]
Subject: Re: Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

Frank Filz wrote:
> On Thu, 2007-04-12 at 07:59 -0400, Jeff Layton wrote:
>> This looks pretty much correct to me as-is. If we set ret=0 on
>> -NFS4ERR_OLD_STATEID, then the caller won't get back an error code. This
>> makes an assumption that every caller of nfs4_handle_exception is
>> looping based on exception->retry. I'm not sure if that's a safe
>> assumption. A better idea *might* be to fix up nfs4_map_errors not to
>> throw the warning for some errors < -1000, but still return an error.
>
> nfs4_map_errors should warn about errors, because it's a last defense
> against leaking NFS4 error numbers to the rest of the kernel (that
> doesn't recognize them). So before calling nfs4_map_errors(), the error
> code should already be converted to an errno code.
>
> It looked to me like every caller of nfs4_handle_exception() does loop
> on exception-retry(), and in that case, does not look at the error
> returned.
>

Yes, that is the current case, but is it safe to assume that it will
always be that way (I'm not sure, which is why I'm asking)?

I guess I'm not sure what the problem is that you're trying to solve.
Are you simply attempting to avoid the printk's here? I'm not sure if
that would be a good thing. I'd think if we're seeing a lot of these
errors, then having the printk's might be good for helping to identify
the problem. If there aren't many, then the printks seem fairly harmless...

-- Jeff


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-04-12 18:08:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

On Tue, 2007-04-03 at 09:09 -0700, Frank Filz wrote:
> I'm looking at the following code, and wondering if something is missing
> in the handling of NFS4ERR_OLD_STATEID. The result is that if this error
> occurs, nfs4_map_errors() will print:
>
> nfs4_map_errors could not handle NFSv4 error 10024

Agreed. We don't really need to worry users with that printk.

> It also looks like the handling of NFS4ERR_DELAY etc. may be wrong,
> since if nfs4_delay() returns without error, it falls through to the
> handling of NFS4ERR_OLD_STATEID.

The correct way to deal with both NFS4ERR_DELAY and NFS4ERR_OLD_STATEID
is simply to retry the request.

As I said in an earlier mail, the NFS4ERR_OLD_STATEID is occurring
because some OPEN/OPEN_DOWNGRADE has caused the stateid to change. All
we can do is retry with the new stateid.

Cheers,
Trond


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs