2008-06-13 11:42:54

by NeilBrown

[permalink] [raw]
Subject: [PATCH] knfsd: nfsd: Handle ERESTARTSYS from syscalls.


OCFS2 can return -ERESTARTSYS from write requests (and possibly
elsewhere) if there is a signal pending.

If nfsd is shutdown (by sending a signal to each thread) while there
is still an IO load from the client, each thread could handle one last
request with a signal pending. This can result in -ERESTARTSYS
which is not understood by nfserrno() and so is reflected back to
the client as nfserr_io aka -EIO. This is wrong.

Instead, interpret ERESTARTSYS to mean "don't send a reply".
The client will resend and - if the server is restarted - the write will
(hopefully) be successful and everyone will be happy.

Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./fs/nfsd/nfsproc.c | 1 +
1 file changed, 1 insertion(+)

----
Funny how the shortest patches sometimes have the longest
descriptions.

The symptom that I narrowed down to this was:
copy a large file via NFS to an OCFS2 filesystem, and restart
the nfs server during the copy.
The 'cp' might get an -EIO, and the file will be corrupted -
presumably holes in the middle were writes appeared to fail.

diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
--- .prev/fs/nfsd/nfsproc.c 2008-06-13 21:31:53.000000000 +1000
+++ ./fs/nfsd/nfsproc.c 2008-06-13 21:31:57.000000000 +1000
@@ -614,6 +614,7 @@ nfserrno (int errno)
#endif
{ nfserr_stale, -ESTALE },
{ nfserr_jukebox, -ETIMEDOUT },
+ { nfserr_dropit, -ERESTARTSYS },
{ nfserr_dropit, -EAGAIN },
{ nfserr_dropit, -ENOMEM },
{ nfserr_badname, -ESRCH },

### Diffstat output
./fs/nfsd/nfsproc.c | 1 +
1 file changed, 1 insertion(+)

diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
--- .prev/fs/nfsd/nfsproc.c 2008-06-13 21:31:53.000000000 +1000
+++ ./fs/nfsd/nfsproc.c 2008-06-13 21:31:57.000000000 +1000
@@ -614,6 +614,7 @@ nfserrno (int errno)
#endif
{ nfserr_stale, -ESTALE },
{ nfserr_jukebox, -ETIMEDOUT },
+ { nfserr_dropit, -ERESTARTSYS },
{ nfserr_dropit, -EAGAIN },
{ nfserr_dropit, -ENOMEM },
{ nfserr_badname, -ESRCH },


2008-06-16 16:54:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH] knfsd: nfsd: Handle ERESTARTSYS from syscalls.

On Mon, 2008-06-16 at 11:09 -0400, Chuck Lever wrote:

> I think an error reply is much better than no reply in nearly every
> case. NFS3ERR_JUKEBOX/NFS4ERR_DELAY is an interesting idea, but
> something else again will probably be required for v4.1 with sessions.

NFS3ERR_JUKEBOX/NFS4ERR_DELAY may be inappropriate if the nfs daemon has
already started handling the RPC call, since you may be interrupting a
non-idempotent operation.

The only complete solution to this problem is NFSv4.1 with persistent
sessions.

Cheers
Trond

2008-06-16 18:14:09

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] knfsd: nfsd: Handle ERESTARTSYS from syscalls.

On Mon, 16 Jun 2008 12:54:46 -0400
Trond Myklebust <[email protected]> wrote:

> On Mon, 2008-06-16 at 11:09 -0400, Chuck Lever wrote:
>
> > I think an error reply is much better than no reply in nearly every
> > case. NFS3ERR_JUKEBOX/NFS4ERR_DELAY is an interesting idea, but
> > something else again will probably be required for v4.1 with sessions.
>
> NFS3ERR_JUKEBOX/NFS4ERR_DELAY may be inappropriate if the nfs daemon has
> already started handling the RPC call, since you may be interrupting a
> non-idempotent operation.
>

But if you drop the reply, the client will probably still end up
retransmitting the request. It seems like a JUKEBOX/DELAY error
is at least a defined error to the client instead of leaving it
guessing. Either way, the client could still end up on the wrong
side of a non-idempotent op.

Also, am I right that this should really only be happening if nfsd
catches a SIGKILL? All other signals should be masked off when
doing the local file operation. Or do we have a potential race if
we catch a signal just after svc_recv returns but before the new
sigmask is set and svc_process is called?

> The only complete solution to this problem is NFSv4.1 with persistent
> sessions.
>

That's probably the case, though we should probably try to do
best-effort for earlier versions.

--
Jeff Layton <[email protected]>

2008-06-14 17:11:11

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] knfsd: nfsd: Handle ERESTARTSYS from syscalls.

On Fri, Jun 13, 2008 at 09:42:15PM +1000, NeilBrown wrote:
>
> OCFS2 can return -ERESTARTSYS from write requests (and possibly
> elsewhere) if there is a signal pending.
>
> If nfsd is shutdown (by sending a signal to each thread) while there
> is still an IO load from the client, each thread could handle one last
> request with a signal pending. This can result in -ERESTARTSYS
> which is not understood by nfserrno() and so is reflected back to
> the client as nfserr_io aka -EIO. This is wrong.
>
> Instead, interpret ERESTARTSYS to mean "don't send a reply".
> The client will resend and - if the server is restarted - the write will
> (hopefully) be successful and everyone will be happy.

Thanks, applied--with a trivial change:

>
> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./fs/nfsd/nfsproc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> ----
> Funny how the shortest patches sometimes have the longest
> descriptions.

I added this to the body of the patch description:
>
> The symptom that I narrowed down to this was:
> copy a large file via NFS to an OCFS2 filesystem, and restart
> the nfs server during the copy.
> The 'cp' might get an -EIO, and the file will be corrupted -
> presumably holes in the middle were writes appeared to fail.

On the grounds that someone might run across the same problem or need to
reproduce it for some reason, so a good, simple description of the case
that found it might help.

--b.

>
> diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
> --- .prev/fs/nfsd/nfsproc.c 2008-06-13 21:31:53.000000000 +1000
> +++ ./fs/nfsd/nfsproc.c 2008-06-13 21:31:57.000000000 +1000
> @@ -614,6 +614,7 @@ nfserrno (int errno)
> #endif
> { nfserr_stale, -ESTALE },
> { nfserr_jukebox, -ETIMEDOUT },
> + { nfserr_dropit, -ERESTARTSYS },
> { nfserr_dropit, -EAGAIN },
> { nfserr_dropit, -ENOMEM },
> { nfserr_badname, -ESRCH },
>
> ### Diffstat output
> ./fs/nfsd/nfsproc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
> --- .prev/fs/nfsd/nfsproc.c 2008-06-13 21:31:53.000000000 +1000
> +++ ./fs/nfsd/nfsproc.c 2008-06-13 21:31:57.000000000 +1000
> @@ -614,6 +614,7 @@ nfserrno (int errno)
> #endif
> { nfserr_stale, -ESTALE },
> { nfserr_jukebox, -ETIMEDOUT },
> + { nfserr_dropit, -ERESTARTSYS },
> { nfserr_dropit, -EAGAIN },
> { nfserr_dropit, -ENOMEM },
> { nfserr_badname, -ESRCH },

2008-06-16 12:39:57

by Peter Staubach

[permalink] [raw]
Subject: Re: [PATCH] knfsd: nfsd: Handle ERESTARTSYS from syscalls.

NeilBrown wrote:
> OCFS2 can return -ERESTARTSYS from write requests (and possibly
> elsewhere) if there is a signal pending.
>
> If nfsd is shutdown (by sending a signal to each thread) while there
> is still an IO load from the client, each thread could handle one last
> request with a signal pending. This can result in -ERESTARTSYS
> which is not understood by nfserrno() and so is reflected back to
> the client as nfserr_io aka -EIO. This is wrong.
>
> Instead, interpret ERESTARTSYS to mean "don't send a reply".
> The client will resend and - if the server is restarted - the write will
> (hopefully) be successful and everyone will be happy.
>
>

Why not handle -ERESTARTSYS in the same fashion as -ETIMEDOUT, ie.
leading to a EJUKEBOX sort of error being returned if possible?

Simply not returning is a bad thing to do for anything other than
NFSv2. It is especially bad for NFSv4.

ps

> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./fs/nfsd/nfsproc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> ----
> Funny how the shortest patches sometimes have the longest
> descriptions.
>
> The symptom that I narrowed down to this was:
> copy a large file via NFS to an OCFS2 filesystem, and restart
> the nfs server during the copy.
> The 'cp' might get an -EIO, and the file will be corrupted -
> presumably holes in the middle were writes appeared to fail.
>
> diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
> --- .prev/fs/nfsd/nfsproc.c 2008-06-13 21:31:53.000000000 +1000
> +++ ./fs/nfsd/nfsproc.c 2008-06-13 21:31:57.000000000 +1000
> @@ -614,6 +614,7 @@ nfserrno (int errno)
> #endif
> { nfserr_stale, -ESTALE },
> { nfserr_jukebox, -ETIMEDOUT },
> + { nfserr_dropit, -ERESTARTSYS },
> { nfserr_dropit, -EAGAIN },
> { nfserr_dropit, -ENOMEM },
> { nfserr_badname, -ESRCH },
>
> ### Diffstat output
> ./fs/nfsd/nfsproc.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
> --- .prev/fs/nfsd/nfsproc.c 2008-06-13 21:31:53.000000000 +1000
> +++ ./fs/nfsd/nfsproc.c 2008-06-13 21:31:57.000000000 +1000
> @@ -614,6 +614,7 @@ nfserrno (int errno)
> #endif
> { nfserr_stale, -ESTALE },
> { nfserr_jukebox, -ETIMEDOUT },
> + { nfserr_dropit, -ERESTARTSYS },
> { nfserr_dropit, -EAGAIN },
> { nfserr_dropit, -ENOMEM },
> { nfserr_badname, -ESRCH },
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2008-06-16 15:09:57

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] knfsd: nfsd: Handle ERESTARTSYS from syscalls.

On Mon, Jun 16, 2008 at 8:39 AM, Peter Staubach <[email protected]> wrote:
> NeilBrown wrote:
>>
>> OCFS2 can return -ERESTARTSYS from write requests (and possibly
>> elsewhere) if there is a signal pending.
>>
>> If nfsd is shutdown (by sending a signal to each thread) while there
>> is still an IO load from the client, each thread could handle one last
>> request with a signal pending. This can result in -ERESTARTSYS
>> which is not understood by nfserrno() and so is reflected back to
>> the client as nfserr_io aka -EIO. This is wrong.
>>
>> Instead, interpret ERESTARTSYS to mean "don't send a reply".
>> The client will resend and - if the server is restarted - the write will
>> (hopefully) be successful and everyone will be happy.
>>
>>
>
> Why not handle -ERESTARTSYS in the same fashion as -ETIMEDOUT, ie.
> leading to a EJUKEBOX sort of error being returned if possible?
>
> Simply not returning is a bad thing to do for anything other than
> NFSv2. It is especially bad for NFSv4.

Actually, the NFSv4 spec *requires* the server to reply to every request.

Not replying means an NFSv4 client connected via NFSv4 will have to
disconnect and retransmit. That should be avoided if at all possible.

I think an error reply is much better than no reply in nearly every
case. NFS3ERR_JUKEBOX/NFS4ERR_DELAY is an interesting idea, but
something else again will probably be required for v4.1 with sessions.

>> Signed-off-by: Neil Brown <[email protected]>
>>
>> ### Diffstat output
>> ./fs/nfsd/nfsproc.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> ----
>> Funny how the shortest patches sometimes have the longest
>> descriptions.
>>
>> The symptom that I narrowed down to this was:
>> copy a large file via NFS to an OCFS2 filesystem, and restart
>> the nfs server during the copy.
>> The 'cp' might get an -EIO, and the file will be corrupted -
>> presumably holes in the middle were writes appeared to fail.
>>
>> diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
>> --- .prev/fs/nfsd/nfsproc.c 2008-06-13 21:31:53.000000000 +1000
>> +++ ./fs/nfsd/nfsproc.c 2008-06-13 21:31:57.000000000 +1000
>> @@ -614,6 +614,7 @@ nfserrno (int errno)
>> #endif
>> { nfserr_stale, -ESTALE },
>> { nfserr_jukebox, -ETIMEDOUT },
>> + { nfserr_dropit, -ERESTARTSYS },
>> { nfserr_dropit, -EAGAIN },
>> { nfserr_dropit, -ENOMEM },
>> { nfserr_badname, -ESRCH },
>>
>> ### Diffstat output
>> ./fs/nfsd/nfsproc.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c
>> --- .prev/fs/nfsd/nfsproc.c 2008-06-13 21:31:53.000000000 +1000
>> +++ ./fs/nfsd/nfsproc.c 2008-06-13 21:31:57.000000000 +1000
>> @@ -614,6 +614,7 @@ nfserrno (int errno)
>> #endif
>> { nfserr_stale, -ESTALE },
>> { nfserr_jukebox, -ETIMEDOUT },
>> + { nfserr_dropit, -ERESTARTSYS },
>> { nfserr_dropit, -EAGAIN },
>> { nfserr_dropit, -ENOMEM },
>> { nfserr_badname, -ESRCH },
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>



--
I am certain that these presidents will understand the cry of the
people of Bolivia, of the people of Latin America and the whole world,
which wants to have more food and not more cars. First food, then if
something's left over, more cars, more automobiles. I think that life
has to come first.
-- Evo Morales