2022-06-24 15:49:29

by Anna Schumaker

[permalink] [raw]
Subject: [PATCH] NFSD: Don't continue encoding if READ_PLUS gets confused

From: Anna Schumaker <[email protected]>

If we were in a HOLE segement, but vfs_llseek() claimed we were encoding
DATA then we would switch over to the DATA encoding function. This
conflicts with Chuck's latest xdr cleanup patches and can result in a
crash or silent hang. Let's just return nfserr_io if we find we are in
this situation, which will cause the encoder to return to the client
with the number of segments already encoded. The client can then try the
READ_PLUS call again.

Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer())
Signed-off-by: Anna Schumaker <[email protected]>
---
fs/nfsd/nfs4xdr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 61b2aae81abb..dc97d7c7e595 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
if (data_pos == -ENXIO)
data_pos = f_size;
else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
- return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
+ return nfserr_io;
count = data_pos - read->rd_offset;

/* Content type, offset, byte count */
--
2.36.1


2022-06-24 17:41:13

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH] NFSD: Don't continue encoding if READ_PLUS gets confused

Hi Anna!

> On Jun 24, 2022, at 11:47 AM, Anna Schumaker <[email protected]> wrote:
>
> From: Anna Schumaker <[email protected]>
>
> If we were in a HOLE segement, but vfs_llseek() claimed we were encoding
> DATA then we would switch over to the DATA encoding function. This
> conflicts with Chuck's latest xdr cleanup patches and can result in a
> crash or silent hang. Let's just return nfserr_io if we find we are in
> this situation, which will cause the encoder to return to the client
> with the number of segments already encoded. The client can then try the
> READ_PLUS call again.
>
> Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer())

checkpatch complains about this tag.

Also, Bruce said he wasn't able to reproduce the issue on
6c254bf3b637, only on the whole series. Were you able to hit
it with just this commit applied?


> Signed-off-by: Anna Schumaker <[email protected]>

I find this somewhat problematic. I can't apply this patch in
good conscience:

* The usual guideline for applying a workaround upstream is
that there's been a demonstration that it will be impossible
to find and fix the real problem. I don't see that here.

* We still don't understand the failure. The XDR code itself
might be broken? Therefore we don't understand if this
workaround is 100% effective

* Usually with a workaround, there's a hint of a long-term
fix... is this the long-term fix?

In other words, I might give this patch to a customer who
needed to get back on her feet quickly. I'm hesitant to take
it as an upstream change without further justification.

IMHO a fix in an XDR consumer (like READ_PLUS) needs to
demonstrate that the current XDR code is working as designed.
Otherwise, the XDR code itself is what needs to be fixed.


> ---
> fs/nfsd/nfs4xdr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 61b2aae81abb..dc97d7c7e595 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> if (data_pos == -ENXIO)
> data_pos = f_size;
> else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
> - return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
> + return nfserr_io;
> count = data_pos - read->rd_offset;
>
> /* Content type, offset, byte count */
> --
> 2.36.1
>

--
Chuck Lever



2022-06-24 18:13:45

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH] NFSD: Don't continue encoding if READ_PLUS gets confused

On Fri, Jun 24, 2022 at 1:40 PM Chuck Lever III <[email protected]> wrote:
>
> Hi Anna!
>
> > On Jun 24, 2022, at 11:47 AM, Anna Schumaker <[email protected]> wrote:
> >
> > From: Anna Schumaker <[email protected]>
> >
> > If we were in a HOLE segement, but vfs_llseek() claimed we were encoding
> > DATA then we would switch over to the DATA encoding function. This
> > conflicts with Chuck's latest xdr cleanup patches and can result in a
> > crash or silent hang. Let's just return nfserr_io if we find we are in
> > this situation, which will cause the encoder to return to the client
> > with the number of segments already encoded. The client can then try the
> > READ_PLUS call again.
> >
> > Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer())
>
> checkpatch complains about this tag.

Okay. I'll look into that.

>
> Also, Bruce said he wasn't able to reproduce the issue on
> 6c254bf3b637, only on the whole series. Were you able to hit
> it with just this commit applied?

I've been having trouble hitting it in general. I think I only got an
oops once, but most of the time it just silently hangs. I'll try a bit
longer to see if I can hit it again.

>
>
> > Signed-off-by: Anna Schumaker <[email protected]>
>
> I find this somewhat problematic. I can't apply this patch in
> good conscience:
>
> * The usual guideline for applying a workaround upstream is
> that there's been a demonstration that it will be impossible
> to find and fix the real problem. I don't see that here.
>
> * We still don't understand the failure. The XDR code itself
> might be broken? Therefore we don't understand if this
> workaround is 100% effective
>
> * Usually with a workaround, there's a hint of a long-term
> fix... is this the long-term fix?

No, the long term fix is my new READ_PLUS code. I'm hoping to have a
v2 ready soon, but I tested v1 and didn't have any problem (no oops or
silent hang)

>
> In other words, I might give this patch to a customer who
> needed to get back on her feet quickly. I'm hesitant to take
> it as an upstream change without further justification.
>
> IMHO a fix in an XDR consumer (like READ_PLUS) needs to
> demonstrate that the current XDR code is working as designed.
> Otherwise, the XDR code itself is what needs to be fixed.
>
>
> > ---
> > fs/nfsd/nfs4xdr.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 61b2aae81abb..dc97d7c7e595 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> > if (data_pos == -ENXIO)
> > data_pos = f_size;
> > else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
> > - return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
> > + return nfserr_io;
> > count = data_pos - read->rd_offset;
> >
> > /* Content type, offset, byte count */
> > --
> > 2.36.1
> >
>
> --
> Chuck Lever
>
>
>