LinuxLists.cc - [PATCH] NFS: Always provide aligned buffers to the RPC read layers

[permalink] [raw]

Subject: Re: [PATCH] NFS: Always provide aligned buffers to the RPC read layers

On Fri, Aug 27, 2021 at 02:00:56PM -0400, [email protected] wrote:
> From: Trond Myklebust <[email protected]>
>
> Instead of messing around with XDR padding in the RDMA layer, we should
> just give the RPC layer an aligned buffer. Try to avoid creating extra
> RPC calls by aligning to the smaller value of ALIGN(len, rsize) and
> PAGE_SIZE.

I've bisected this change to be the cause of xfstests generic/525
getting stuck when tested against a standard Linux NFS server. It was
stuck in a repeated request loop in kernel mode, and killable.

I posted a patch with a full explanation in a reply.

--
Dan Aloni

2022-01-21 06:02:31

[permalink] [raw]

Subject: [PATCH] NFS: fix an infinite request retry in an off-by-one last page read

Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
RPC read layers"), a read of 0xfff is aligned up to server rsize of
0x1000.

As a result, in a test where the server has a file of size
0x7fffffffffffffff, and the client tries to read from the offset
0x7ffffffffffff000, the read causes loff_t overflow in the server and it
returns an NFS code of EINVAL to the client. The client as a result
indefinitely retries the request.

This fixes the issue by cancelling the alignment for that case.

Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")
Signed-off-by: Dan Aloni <[email protected]>
---
fs/nfs/read.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 08d6cc57cbc3..d6fac5e4d3f4 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -296,16 +296,19 @@ readpage_async_filler(void *data, struct page *page)
struct inode *inode = page_file_mapping(page)->host;
unsigned int rsize = NFS_SERVER(inode)->rsize;
struct nfs_page *new;
- unsigned int len, aligned_len;
+ unsigned int len, request_len;
int error;

len = nfs_page_length(page);
if (len == 0)
return nfs_return_empty_page(page);

- aligned_len = min_t(unsigned int, ALIGN(len, rsize), PAGE_SIZE);
+ if (likely(page_index(page) != (LLONG_MAX >> PAGE_SHIFT)))
+ request_len = min_t(unsigned int, ALIGN(len, rsize), PAGE_SIZE);
+ else
+ request_len = len;

- new = nfs_create_request(desc->ctx, page, 0, aligned_len);
+ new = nfs_create_request(desc->ctx, page, 0, request_len);
if (IS_ERR(new))
goto out_error;

--
2.23.0

2022-01-21 06:39:15

[permalink] [raw]

Subject: Re: [PATCH] NFS: fix an infinite request retry in an off-by-one last page read

On Tue, 2022-01-18 at 21:33 +0200, Dan Aloni wrote:
> Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to
> the
> RPC read layers"), a read of 0xfff is aligned up to server rsize of
> 0x1000.
>
> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server and
> it
> returns an NFS code of EINVAL to the client. The client as a result
> indefinitely retries the request.
>
> This fixes the issue by cancelling the alignment for that case.
>

NACK. This would be a server bug, not a client bug. The NFS protocol
has no notion of signed offsets, and doesn't use loff_t.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2022-01-22 02:11:25

[permalink] [raw]

Subject: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
RPC read layers"), a read of 0xfff is aligned up to server rsize of
0x1000.

As a result, in a test where the server has a file of size
0x7fffffffffffffff, and the client tries to read from the offset
0x7ffffffffffff000, the read causes loff_t overflow in the server and it
returns an NFS code of EINVAL to the client. The client as a result
indefinitely retries the request.

This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX.

Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")
Signed-off-by: Dan Aloni <[email protected]>
---
fs/nfsd/vfs.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 738d564ca4ce..754f4e9ff4a2 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
__be32 err;

trace_nfsd_read_start(rqstp, fhp, offset, *count);
+
+ if (unlikely(offset + *count > NFS_OFFSET_MAX))
+ *count = NFS_OFFSET_MAX - offset;
+
err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
if (err)
return err;
--
2.23.0

2022-01-22 15:29:27

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

Hi Dan!

NFS server patches should be sent to me these days.

$ scripts/get_maintainer.pl fs/nfsd
Chuck Lever <[email protected]> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS)
[email protected] (open list:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS)
[email protected] (open list)

> On Jan 21, 2022, at 1:50 PM, Dan Aloni <[email protected]> wrote:
>
> Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
> RPC read layers"), a read of 0xfff is aligned up to server rsize of
> 0x1000.
>
> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
> returns an NFS code of EINVAL to the client. The client as a result
> indefinitely retries the request.

An infinite loop in this case is a client bug.

Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
5661 permits the NFSv4 READ operation to return
NFS4ERR_INVAL.

Was the client side fix for this issue rejected?

> This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX.

It's OK for the server to return a short READ in this case,
so I will indeed consider a change to make that happen. But
see below for comments specific to this patch.

> Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")
> Signed-off-by: Dan Aloni <[email protected]>
> ---
> fs/nfsd/vfs.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 738d564ca4ce..754f4e9ff4a2 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> __be32 err;
>
> trace_nfsd_read_start(rqstp, fhp, offset, *count);
> +
> + if (unlikely(offset + *count > NFS_OFFSET_MAX))
> + *count = NFS_OFFSET_MAX - offset;

Can @offset ever be larger than NFS_OFFSET_MAX?

Does this check have any effect on NFSv4 READ operations?

> +
> err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> if (err)
> return err;
> --
> 2.23.0
>

--
Chuck Lever

2022-01-23 15:11:26

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
> NFS server patches should be sent to me these days.

Thanks, will remember this next time.

> > On Jan 21, 2022, at 1:50 PM, Dan Aloni <[email protected]> wrote:
> >
> > Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
> > RPC read layers"), a read of 0xfff is aligned up to server rsize of
> > 0x1000.
> >
> > As a result, in a test where the server has a file of size
> > 0x7fffffffffffffff, and the client tries to read from the offset
> > 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
> > returns an NFS code of EINVAL to the client. The client as a result
> > indefinitely retries the request.
>
> An infinite loop in this case is a client bug.
>
> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
> 5661 permits the NFSv4 READ operation to return
> NFS4ERR_INVAL.
>
> Was the client side fix for this issue rejected?

Yeah, see Trond's response in

https://lore.kernel.org/linux-nfs/[email protected]/

So it is both a client and server bugs?

> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 738d564ca4ce..754f4e9ff4a2 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > __be32 err;
> >
> > trace_nfsd_read_start(rqstp, fhp, offset, *count);
> > +
> > + if (unlikely(offset + *count > NFS_OFFSET_MAX))
> > + *count = NFS_OFFSET_MAX - offset;
>
> Can @offset ever be larger than NFS_OFFSET_MAX?

We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
(should it have been `>` rather?).

Seems it is missing from NFSv3, should add.

> Does this check have any effect on NFSv4 READ operations?

Indeed it doesn't - my expanded testing shows it only fixed for NFSv3.
Will send an updated patch.

--
Dan Aloni

2022-01-23 15:11:27

[permalink] [raw]

Subject: [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

Due to change in client 8cfb9015280d ("NFS: Always provide aligned
buffers to the RPC read layers"), a read of 0xfff is aligned up to
server rsize of 0x0fff.

As a result, in a test where the server has a file of size
0x7fffffffffffffff, and the client tries to read from the offset
0x7ffffffffffff000, the read causes loff_t overflow in the server and it
returns an NFS code of EINVAL to the client. The client as a result
indefinitely retries the request.

This fixes the issue at server side by trimming reads past
NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
in NFSv3.

Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")
Signed-off-by: Dan Aloni <[email protected]>
---
fs/nfsd/nfs4proc.c | 3 +++
fs/nfsd/vfs.c | 11 +++++++++++
2 files changed, 14 insertions(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 486c5dba4b65..3b1e395a93b6 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -788,6 +788,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
trace_nfsd_read_start(rqstp, &cstate->current_fh,
read->rd_offset, read->rd_length);

+ if (unlikely(read->rd_offset + read->rd_length > NFS_OFFSET_MAX))
+ read->rd_length = NFS_OFFSET_MAX - read->rd_offset;
+
/*
* If we do a zero copy read, then a client will see read data
* that reflects the state of the file *after* performing the
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 738d564ca4ce..4a209f807466 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1046,6 +1046,16 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
__be32 err;

trace_nfsd_read_start(rqstp, fhp, offset, *count);
+
+ if (unlikely(offset > NFS_OFFSET_MAX)) {
+ /* We can return this according to Section 3.3.6 */
+ err = nfserr_inval;
+ goto error;
+ }
+
+ if (unlikely(offset + *count > NFS_OFFSET_MAX))
+ *count = NFS_OFFSET_MAX - offset;
+
err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
if (err)
return err;
@@ -1058,6 +1068,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,

nfsd_file_put(nf);

+error:
trace_nfsd_read_done(rqstp, fhp, offset, *count);

return err;
--
2.23.0

2022-01-23 15:20:28

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

> On Jan 22, 2022, at 7:47 AM, Dan Aloni <[email protected]> wrote:
>
> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <[email protected]> wrote:
>>>
>>> Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
>>> RPC read layers"), a read of 0xfff is aligned up to server rsize of
>>> 0x1000.
>>>
>>> As a result, in a test where the server has a file of size
>>> 0x7fffffffffffffff, and the client tries to read from the offset
>>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
>>> returns an NFS code of EINVAL to the client. The client as a result
>>> indefinitely retries the request.
>>
>> An infinite loop in this case is a client bug.
>>
>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
>> 5661 permits the NFSv4 READ operation to return
>> NFS4ERR_INVAL.
>>
>> Was the client side fix for this issue rejected?
>
> Yeah, see Trond's response in
>
> https://lore.kernel.org/linux-nfs/[email protected]/
>
> So it is both a client and server bugs?

Splitting hairs, but yes there are issues on both sides
IMO. Bad behavior due to bugs on both sides is actually
not uncommon.

Trond is correct that the server is not dealing totally
correctly with the range of values in a READ request.

However, as I pointed out, the specification permits NFS
servers to return NFS[34]ERR_INVAL on READ. And in fact,
there is already code in the NFSv4 READ path that returns
INVAL, for example:

785 if (read->rd_offset >= OFFSET_MAX)
786 return nfserr_inval;

I'm not sure the specifications describe precisely when
the server /must/ return INVAL, but the client needs to
be prepared to handle it reasonably. If INVAL results in
an infinite loop, then that's a client bug.

IMO changing the alignment for that case is a band-aid.
The underlying looping behavior is what is the root
problem. (So... I agree with Trond's NACK, but for
different reasons).

>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 738d564ca4ce..754f4e9ff4a2 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> __be32 err;
>>>
>>> trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>> +
>>> + if (unlikely(offset + *count > NFS_OFFSET_MAX))
>>> + *count = NFS_OFFSET_MAX - offset;
>>
>> Can @offset ever be larger than NFS_OFFSET_MAX?
>
> We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
> (should it have been `>` rather?).

Don't think so, a zero-byte READ should be valid.

However it's rather interesting that it does not use
NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses
NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX?

--
Chuck Lever

2022-01-23 15:23:20

[permalink] [raw]

Subject: Re: [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

Some additional comments on v2 below. We need to sort the
NFS_OFFSET_MAX v. OFFSET_MAX issue before you send a v3.

> On Jan 22, 2022, at 7:49 AM, Dan Aloni <[email protected]> wrote:
>
> Due to change in client 8cfb9015280d ("NFS: Always provide aligned
> buffers to the RPC read layers"), a read of 0xfff is aligned up to
> server rsize of 0x0fff.

scripts/checkpatch.pl will complain about the way you name the
commit here. It will want:

Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers
to the RPC read layers") on the client,

> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
> returns an NFS code of EINVAL to the client. The client as a result
> indefinitely retries the request.
>
> This fixes the issue at server side by trimming reads past
> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
> in NFSv3.

RFC 1813 section 3.3.6 does say this:

>> It is possible for the server to return fewer than count
>> bytes of data. If the server returns less than the count
>> requested and eof set to FALSE, the client should issue
>> another READ to get the remaining data. A server may
>> return less data than requested under several
>> circumstances. The file may have been truncated by another
>> client or perhaps on the server itself, changing the file
>> size from what the requesting client believes to be the
>> case. This would reduce the actual amount of data
>> available to the client. It is possible that the server
>> may back off the transfer size and reduce the read request
>> return. Server resource exhaustion may also occur
>> necessitating a smaller read return.

Similar language in RFC 8881 section 18.22.4:

>> If the server returns a "short read" (i.e., fewer data than requested
>> and eof is set to FALSE), the client should send another READ to get
>> the remaining data. A server may return less data than requested
>> under several circumstances. The file may have been truncated by
>> another client or perhaps on the server itself, changing the file
>> size from what the requesting client believes to be the case. This
>> would reduce the actual amount of data available to the client. It
>> is possible that the server reduce the transfer size and so return a
>> short read result. Server resource exhaustion may also occur in a
>> short read.

So the server could be returning INVAL and leaving EOF set
to false. That might be triggering the client to retry the
READ. Does the server need to set the EOF flag if the READ
arguments cross the limit of the range that the server can
return? Does the client need to check for this case and
stop retrying? The specs aren't particularly clear on this
matter.

> Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")

It's arguable that you are fixing 8cfb9015280d. I don't
think that commit is actually broken.

Also, the server behavior is wrong even before that commit,
and that commit is a client change... Mentioning this commit
at the top of the patch description is fine, since that is
how you discovered the problem, but I'd prefer if you drop
this line.

The patch does warrant a Cc: stable, though.

> Signed-off-by: Dan Aloni <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 3 +++
> fs/nfsd/vfs.c | 11 +++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 486c5dba4b65..3b1e395a93b6 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -788,6 +788,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> trace_nfsd_read_start(rqstp, &cstate->current_fh,
> read->rd_offset, read->rd_length);
>
> + if (unlikely(read->rd_offset + read->rd_length > NFS_OFFSET_MAX))
> + read->rd_length = NFS_OFFSET_MAX - read->rd_offset;
> +

rd_offset is range-checked before the trace point, so I think
this check needs to go before the trace point as well. That
way the adjusted argument values are recorded.

And we need to verify whether NFS_OFFSET_MAX is the correct
constant for this check.

> /*
> * If we do a zero copy read, then a client will see read data
> * that reflects the state of the file *after* performing the
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 738d564ca4ce..4a209f807466 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1046,6 +1046,16 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> __be32 err;
>
> trace_nfsd_read_start(rqstp, fhp, offset, *count);
> +
> + if (unlikely(offset > NFS_OFFSET_MAX)) {
> + /* We can return this according to Section 3.3.6 */

RFC 1813 section 3.3.6 says that READ is permitted to return
NFS3ERR_INVAL, but does not mandate that status code in this
particular case (it's provided for general issues similar to
this). So returning INVAL here is an implementation choice.

Thus mentioning the spec here is IMO perhaps misleading, so
I'd like you to drop this comment.

> + err = nfserr_inval;
> + goto error;
> + }
> +
> + if (unlikely(offset + *count > NFS_OFFSET_MAX))
> + *count = NFS_OFFSET_MAX - offset;
> +

Same as above: these range-checks need to go before the trace
point, IMO.

> err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> if (err)
> return err;
> @@ -1058,6 +1068,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>
> nfsd_file_put(nf);
>
> +error:
> trace_nfsd_read_done(rqstp, fhp, offset, *count);
>
> return err;
> --
> 2.23.0

--
Chuck Lever

2022-01-23 15:24:13

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote:
>
> > On Jan 22, 2022, at 7:47 AM, Dan Aloni <[email protected]>
> > wrote:
> >
> > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
> > > > On Jan 21, 2022, at 1:50 PM, Dan Aloni <[email protected]>
> > > > wrote:
> > > >
> > > > Due to change 8cfb9015280d ("NFS: Always provide aligned
> > > > buffers to the
> > > > RPC read layers"), a read of 0xfff is aligned up to server
> > > > rsize of
> > > > 0x1000.
> > > >
> > > > As a result, in a test where the server has a file of size
> > > > 0x7fffffffffffffff, and the client tries to read from the
> > > > offset
> > > > 0x7ffffffffffff000, the read causes loff_t overflow in the
> > > > server and it
> > > > returns an NFS code of EINVAL to the client. The client as a
> > > > result
> > > > indefinitely retries the request.
> > >
> > > An infinite loop in this case is a client bug.
> > >
> > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
> > > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
> > > 5661 permits the NFSv4 READ operation to return
> > > NFS4ERR_INVAL.
> > >
> > > Was the client side fix for this issue rejected?
> >
> > Yeah, see Trond's response in
> >
> >
> > https://lore.kernel.org/linux-nfs/[email protected]/
> >
> > So it is both a client and server bugs?
>
> Splitting hairs, but yes there are issues on both sides
> IMO. Bad behavior due to bugs on both sides is actually
> not uncommon.
>
> Trond is correct that the server is not dealing totally
> correctly with the range of values in a READ request.
>
> However, as I pointed out, the specification permits NFS
> servers to return NFS[34]ERR_INVAL on READ. And in fact,
> there is already code in the NFSv4 READ path that returns
> INVAL, for example:
>
> 785 if (read->rd_offset >= OFFSET_MAX)
> 786 return nfserr_inval;
>
> I'm not sure the specifications describe precisely when
> the server /must/ return INVAL, but the client needs to
> be prepared to handle it reasonably. If INVAL results in
> an infinite loop, then that's a client bug.
>
> IMO changing the alignment for that case is a band-aid.
> The underlying looping behavior is what is the root
> problem. (So... I agree with Trond's NACK, but for
> different reasons).

If I'm reading Dan's test case correctly, the client is trying to read
a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000.
That means the end offset for that read is 0x7fffffffffffff000 + 0x1000
- 1 = 0x7fffffffffffffff.

IOW: as far as the server is concerned, there is no loff_t overflow on
either the start or end offset and so there is no reason for it to
return NFS4ERR_INVAL.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2022-01-23 15:26:57

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

On Sat, Jan 22, 2022 at 05:05:49PM +0000, Chuck Lever III wrote:
> >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>> index 738d564ca4ce..754f4e9ff4a2 100644
> >>> --- a/fs/nfsd/vfs.c
> >>> +++ b/fs/nfsd/vfs.c
> >>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>> __be32 err;
> >>>
> >>> trace_nfsd_read_start(rqstp, fhp, offset, *count);
> >>> +
> >>> + if (unlikely(offset + *count > NFS_OFFSET_MAX))
> >>> + *count = NFS_OFFSET_MAX - offset;
> >>
> >> Can @offset ever be larger than NFS_OFFSET_MAX?
> >
> > We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
> > (should it have been `>` rather?).
>
> Don't think so, a zero-byte READ should be valid.

Make sense. BTW, we have a `(argp->offset > NFS_OFFSET_MAX)` check
resulting in EINVAL under `nfsd3_proc_commit`. Does it apply to writes
as well?

> However it's rather interesting that it does not use
> NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses
> NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX?

NFS_OFFSET_MAX introduced in v2.3.31, which is before `OFFSET_MAX` was
moved to a header file, which explains the comment on top of it,
outdated for quite awhile:

/*
* This is really a general kernel constant, but since nothing like
* this is defined in the kernel headers, I have to do it here.
*/
#define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1))

And `OFFSET_MAX` in linux/fs.h was introduced in v2.3.99pre4. Seems
`OFFSET_MAX` always corresponds to 64-bit loff_t, so they seem
inter-changeable to me.

--
Dan Aloni

2022-01-23 15:34:36

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

> On Jan 22, 2022, at 1:27 PM, Trond Myklebust <[email protected]> wrote:
>
> On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote:
>>
>>> On Jan 22, 2022, at 7:47 AM, Dan Aloni <[email protected]>
>>> wrote:
>>>
>>> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
>>>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Due to change 8cfb9015280d ("NFS: Always provide aligned
>>>>> buffers to the
>>>>> RPC read layers"), a read of 0xfff is aligned up to server
>>>>> rsize of
>>>>> 0x1000.
>>>>>
>>>>> As a result, in a test where the server has a file of size
>>>>> 0x7fffffffffffffff, and the client tries to read from the
>>>>> offset
>>>>> 0x7ffffffffffff000, the read causes loff_t overflow in the
>>>>> server and it
>>>>> returns an NFS code of EINVAL to the client. The client as a
>>>>> result
>>>>> indefinitely retries the request.
>>>>
>>>> An infinite loop in this case is a client bug.
>>>>
>>>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
>>>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
>>>> 5661 permits the NFSv4 READ operation to return
>>>> NFS4ERR_INVAL.
>>>>
>>>> Was the client side fix for this issue rejected?
>>>
>>> Yeah, see Trond's response in
>>>
>>>
>>> https://lore.kernel.org/linux-nfs/[email protected]/
>>>
>>> So it is both a client and server bugs?
>>
>> Splitting hairs, but yes there are issues on both sides
>> IMO. Bad behavior due to bugs on both sides is actually
>> not uncommon.
>>
>> Trond is correct that the server is not dealing totally
>> correctly with the range of values in a READ request.
>>
>> However, as I pointed out, the specification permits NFS
>> servers to return NFS[34]ERR_INVAL on READ. And in fact,
>> there is already code in the NFSv4 READ path that returns
>> INVAL, for example:
>>
>> 785 if (read->rd_offset >= OFFSET_MAX)
>> 786 return nfserr_inval;
>>
>> I'm not sure the specifications describe precisely when
>> the server /must/ return INVAL, but the client needs to
>> be prepared to handle it reasonably. If INVAL results in
>> an infinite loop, then that's a client bug.
>>
>> IMO changing the alignment for that case is a band-aid.
>> The underlying looping behavior is what is the root
>> problem. (So... I agree with Trond's NACK, but for
>> different reasons).
>
> If I'm reading Dan's test case correctly, the client is trying to read
> a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000.
> That means the end offset for that read is 0x7fffffffffffff000 + 0x1000
> - 1 = 0x7fffffffffffffff.
>
> IOW: as far as the server is concerned, there is no loff_t overflow on
> either the start or end offset and so there is no reason for it to
> return NFS4ERR_INVAL.

Yep, I agree there's server misbehavior, and I think Dan's
server fix is on point.

I would like to know why the client is looping, though. INVAL
is a valid response the Linux server already uses in other
cases and by itself should not trigger a READ retry.

After checking the relevant XDR definitions, an NFS READ error
response doesn't include the EOF flag, so I'm a little mystified
why the client would need to retry after receiving INVAL.

--
Chuck Lever

2022-01-23 15:37:24

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

On Sat, 2022-01-22 at 20:15 +0000, Chuck Lever III wrote:
>
>
> > On Jan 22, 2022, at 1:27 PM, Trond Myklebust
> > <[email protected]> wrote:
> >
> > On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote:
> > >
> > > > On Jan 22, 2022, at 7:47 AM, Dan Aloni <[email protected]>
> > > > wrote:
> > > >
> > > > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III
> > > > wrote:
> > > > > > On Jan 21, 2022, at 1:50 PM, Dan Aloni
> > > > > > <[email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > Due to change 8cfb9015280d ("NFS: Always provide aligned
> > > > > > buffers to the
> > > > > > RPC read layers"), a read of 0xfff is aligned up to server
> > > > > > rsize of
> > > > > > 0x1000.
> > > > > >
> > > > > > As a result, in a test where the server has a file of size
> > > > > > 0x7fffffffffffffff, and the client tries to read from the
> > > > > > offset
> > > > > > 0x7ffffffffffff000, the read causes loff_t overflow in the
> > > > > > server and it
> > > > > > returns an NFS code of EINVAL to the client. The client as
> > > > > > a
> > > > > > result
> > > > > > indefinitely retries the request.
> > > > >
> > > > > An infinite loop in this case is a client bug.
> > > > >
> > > > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
> > > > > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
> > > > > 5661 permits the NFSv4 READ operation to return
> > > > > NFS4ERR_INVAL.
> > > > >
> > > > > Was the client side fix for this issue rejected?
> > > >
> > > > Yeah, see Trond's response in
> > > >
> > > >
> > > > https://lore.kernel.org/linux-nfs/[email protected]/
> > > >
> > > > So it is both a client and server bugs?
> > >
> > > Splitting hairs, but yes there are issues on both sides
> > > IMO. Bad behavior due to bugs on both sides is actually
> > > not uncommon.
> > >
> > > Trond is correct that the server is not dealing totally
> > > correctly with the range of values in a READ request.
> > >
> > > However, as I pointed out, the specification permits NFS
> > > servers to return NFS[34]ERR_INVAL on READ. And in fact,
> > > there is already code in the NFSv4 READ path that returns
> > > INVAL, for example:
> > >
> > > 785 if (read->rd_offset >= OFFSET_MAX)
> > > 786 return nfserr_inval;
> > >
> > > I'm not sure the specifications describe precisely when
> > > the server /must/ return INVAL, but the client needs to
> > > be prepared to handle it reasonably. If INVAL results in
> > > an infinite loop, then that's a client bug.
> > >
> > > IMO changing the alignment for that case is a band-aid.
> > > The underlying looping behavior is what is the root
> > > problem. (So... I agree with Trond's NACK, but for
> > > different reasons).
> >
> > If I'm reading Dan's test case correctly, the client is trying to
> > read
> > a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000.
> > That means the end offset for that read is 0x7fffffffffffff000 +
> > 0x1000
> > - 1 = 0x7fffffffffffffff.
> >
> > IOW: as far as the server is concerned, there is no loff_t overflow
> > on
> > either the start or end offset and so there is no reason for it to
> > return NFS4ERR_INVAL.
>
> Yep, I agree there's server misbehavior, and I think Dan's
> server fix is on point.
>
> I would like to know why the client is looping, though. INVAL
> is a valid response the Linux server already uses in other
> cases and by itself should not trigger a READ retry.
>
> After checking the relevant XDR definitions, an NFS READ error
> response doesn't include the EOF flag, so I'm a little mystified
> why the client would need to retry after receiving INVAL.

While we could certainly add that error to nfs_error_is_fatal(), the
question is why the client should need to handle NFS4ERR_INVAL if it is
sending valid arguments?

15.1.1.4. NFS4ERR_INVAL (Error Code 22)

The arguments for this operation are not valid for some reason, even
though they do match those specified in the XDR definition for the
request.

Sure... What does that mean, and what do I do?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2022-01-23 15:37:32

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

> On Jan 22, 2022, at 2:01 PM, Dan Aloni <[email protected]> wrote:
>
> On Sat, Jan 22, 2022 at 05:05:49PM +0000, Chuck Lever III wrote:
>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>> index 738d564ca4ce..754f4e9ff4a2 100644
>>>>> --- a/fs/nfsd/vfs.c
>>>>> +++ b/fs/nfsd/vfs.c
>>>>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>> __be32 err;
>>>>>
>>>>> trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>>>> +
>>>>> + if (unlikely(offset + *count > NFS_OFFSET_MAX))
>>>>> + *count = NFS_OFFSET_MAX - offset;
>>>>
>>>> Can @offset ever be larger than NFS_OFFSET_MAX?
>>>
>>> We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
>>> (should it have been `>` rather?).
>>
>> Don't think so, a zero-byte READ should be valid.
>
> Make sense. BTW, we have a `(argp->offset > NFS_OFFSET_MAX)` check
> resulting in EINVAL under `nfsd3_proc_commit`. Does it apply to writes
> as well?

Geez, that's whole 'nother can of worms.

RFC 1813 section 3.3.21 does not list NFS3ERR_INVAL, and does
not discuss what to do if the commit argument values are
outside the range which the server or local filesystem
supports.

RFC 8881 section 15.2 (Table 6) does not list NFS4ERR_INVAL
as a valid status code for the COMMIT operation, and likewise
section 18.3 does not discuss how the server should respond
when the commit argument values are invalid.

Aside from nfsd3_proc_commit, nfsd_commit() is used by NFSv3
and NFSv4, and it has:

1129 __be32 err = nfserr_inval;
1130
1131 if (offset < 0)
1132 goto out;
1133 if (count != 0) {
1134 end = offset + (loff_t)count - 1;
1135 if (end < offset)
1136 goto out;
1137 }
1138

which I think is going to be problematic. But no-one has
complained, so it's safe to defer changes here to another
patch, IMO.

>> However it's rather interesting that it does not use
>> NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses
>> NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX?
>
> NFS_OFFSET_MAX introduced in v2.3.31, which is before `OFFSET_MAX` was
> moved to a header file, which explains the comment on top of it,
> outdated for quite awhile:
>
> /*
> * This is really a general kernel constant, but since nothing like
> * this is defined in the kernel headers, I have to do it here.
> */
> #define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1))
>
> And `OFFSET_MAX` in linux/fs.h was introduced in v2.3.99pre4. Seems
> `OFFSET_MAX` always corresponds to 64-bit loff_t, so they seem
> inter-changeable to me.

For now, add OFFSET_MAX in the NFSv4 paths, and use NFS_OFFSET_MAX
in the NFSv3 paths, and at some point someone can propose a clean
up to replace NFS_OFFSET_MAX with OFFSET_MAX.

--
Chuck Lever

2022-01-24 06:36:27

[permalink] [raw]

Subject: Re: [PATCH v2] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

On Sat, Jan 22, 2022 at 05:37:16PM +0000, Chuck Lever III wrote:
> Similar language in RFC 8881 section 18.22.4:
>
> >> If the server returns a "short read" (i.e., fewer data than requested
> >> and eof is set to FALSE), the client should send another READ to get
> >> the remaining data. A server may return less data than requested
> >> under several circumstances. The file may have been truncated by
> >> another client or perhaps on the server itself, changing the file
> >> size from what the requesting client believes to be the case. This
> >> would reduce the actual amount of data available to the client. It
> >> is possible that the server reduce the transfer size and so return a
> >> short read result. Server resource exhaustion may also occur in a
> >> short read.
>
> So the server could be returning INVAL and leaving EOF set
> to false. That might be triggering the client to retry the
> READ. Does the server need to set the EOF flag if the READ
> arguments cross the limit of the range that the server can
> return? Does the client need to check for this case and
> stop retrying? The specs aren't particularly clear on this
> matter.

But eof only in `resok` and not in `resfail`. For reference, here's the
reply I got:

Network File System, READ Reply Error: NFS3ERR_INVAL
[Program Version: 3]
[V3 Procedure: READ (6)]
Status: NFS3ERR_INVAL (22)
file_attributes Regular File mode: 0600 uid: 0 gid: 0
attributes_follow: value follows (1)
attributes Regular File mode: 0600 uid: 0 gid: 0
Type: Regular File (1)
Mode: 0600, S_IRUSR, S_IWUSR
nlink: 1
uid: 0
gid: 0
size: 9223372036854775807
used: 4096
rdev: 0,0
fsid: 0x0000000000000002 (2)
fileid: 69
atime: Jan 18, 2022 20:20:33.611267439 IST
seconds: 1642530033
nano seconds: 611267439
mtime: Jan 18, 2022 20:20:33.571266608 IST
seconds: 1642530033
nano seconds: 571266608
ctime: Jan 18, 2022 20:20:33.571266608 IST
seconds: 1642530033
nano seconds: 571266608

--
Dan Aloni

2022-01-24 07:17:19

[permalink] [raw]

Subject: [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to the
RPC read layers") on the client, a read of 0xfff is aligned up to server
rsize of 0x1000.

As a result, in a test where the server has a file of size
0x7fffffffffffffff, and the client tries to read from the offset
0x7ffffffffffff000, the read causes loff_t overflow in the server and it
returns an NFS code of EINVAL to the client. The client as a result
indefinitely retries the request.

This fixes the issue at server side by trimming reads past
NFS_OFFSET_MAX. It also adds a missing check for out of bound offset in
NFSv3, copying a similar check from NFSv4.x.

Cc: <[email protected]>
Signed-off-by: Dan Aloni <[email protected]>
---
fs/nfsd/nfs4proc.c | 3 +++
fs/nfsd/vfs.c | 6 ++++++
2 files changed, 9 insertions(+)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 486c5dba4b65..816bdf212559 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (read->rd_offset >= OFFSET_MAX)
return nfserr_inval;

+ if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX))
+ read->rd_length = OFFSET_MAX - read->rd_offset;
+
trace_nfsd_read_start(rqstp, &cstate->current_fh,
read->rd_offset, read->rd_length);

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 738d564ca4ce..ad4df374433e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct file *file;
__be32 err;

+ if (unlikely(offset >= NFS_OFFSET_MAX))
+ return nfserr_inval;
+
+ if (unlikely(offset + *count > NFS_OFFSET_MAX))
+ *count = NFS_OFFSET_MAX - offset;
+
trace_nfsd_read_start(rqstp, fhp, offset, *count);
err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
if (err)
--
2.23.0

2022-01-24 09:41:03

[permalink] [raw]

Subject: Re: [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote:
> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to
> the
> RPC read layers") on the client, a read of 0xfff is aligned up to
> server
> rsize of 0x1000.
>
> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server and
> it
> returns an NFS code of EINVAL to the client. The client as a result
> indefinitely retries the request.
>
> This fixes the issue at server side by trimming reads past
> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
> in
> NFSv3, copying a similar check from NFSv4.x.
>
> Cc: <[email protected]>
> Signed-off-by: Dan Aloni <[email protected]>
> ---
> fs/nfsd/nfs4proc.c | 3 +++
> fs/nfsd/vfs.c      | 6 ++++++
> 2 files changed, 9 insertions(+)
>
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index 486c5dba4b65..816bdf212559 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct
> nfsd4_compound_state *cstate,
>         if (read->rd_offset >= OFFSET_MAX)
>                 return nfserr_inval;
>
> +       if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX))
> +               read->rd_length = OFFSET_MAX - read->rd_offset;
> +
>         trace_nfsd_read_start(rqstp, &cstate->current_fh,
>                               read->rd_offset, read->rd_length);
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 738d564ca4ce..ad4df374433e 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp,
> struct svc_fh *fhp,
>         struct file *file;
>         __be32 err;
>
> +       if (unlikely(offset >= NFS_OFFSET_MAX))
> +               return nfserr_inval;

POSIX does allow you to lseek to the maximum filesize offset (sb-
>s_maxbytes), however any subsequent read will return 0 bytes (i.e.
eof), whereas a subsequent write will return EFBIG.

> +
> +       if (unlikely(offset + *count > NFS_OFFSET_MAX))
> +               *count = NFS_OFFSET_MAX - offset;

Can we please fix this to use the actual per-filesystem file size
limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX?

Ditto for 'read' above.

> +
>         trace_nfsd_read_start(rqstp, fhp, offset, *count);
>         err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>         if (err)

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2022-01-24 10:29:18

[permalink] [raw]

Subject: Re: [PATCH v3] NFSD: trim reads past NFS_OFFSET_MAX and fix NFSv3 check

> On Jan 23, 2022, at 10:29 AM, Trond Myklebust <[email protected]> wrote:
>
> On Sun, 2022-01-23 at 11:50 +0200, Dan Aloni wrote:
>> Due to commit 8cfb9015280d ("NFS: Always provide aligned buffers to
>> the
>> RPC read layers") on the client, a read of 0xfff is aligned up to
>> server
>> rsize of 0x1000.
>>
>> As a result, in a test where the server has a file of size
>> 0x7fffffffffffffff, and the client tries to read from the offset
>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and
>> it
>> returns an NFS code of EINVAL to the client. The client as a result
>> indefinitely retries the request.
>>
>> This fixes the issue at server side by trimming reads past
>> NFS_OFFSET_MAX. It also adds a missing check for out of bound offset
>> in
>> NFSv3, copying a similar check from NFSv4.x.
>>
>> Cc: <[email protected]>
>> Signed-off-by: Dan Aloni <[email protected]>
>> ---
>> fs/nfsd/nfs4proc.c | 3 +++
>> fs/nfsd/vfs.c | 6 ++++++
>> 2 files changed, 9 insertions(+)
>>
>> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
>> index 486c5dba4b65..816bdf212559 100644
>> --- a/fs/nfsd/nfs4proc.c
>> +++ b/fs/nfsd/nfs4proc.c
>> @@ -785,6 +785,9 @@ nfsd4_read(struct svc_rqst *rqstp, struct
>> nfsd4_compound_state *cstate,
>> if (read->rd_offset >= OFFSET_MAX)
>> return nfserr_inval;
>>
>> + if (unlikely(read->rd_offset + read->rd_length > OFFSET_MAX))
>> + read->rd_length = OFFSET_MAX - read->rd_offset;
>> +
>> trace_nfsd_read_start(rqstp, &cstate->current_fh,
>> read->rd_offset, read->rd_length);
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index 738d564ca4ce..ad4df374433e 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -1045,6 +1045,12 @@ __be32 nfsd_read(struct svc_rqst *rqstp,
>> struct svc_fh *fhp,
>> struct file *file;
>> __be32 err;
>>
>> + if (unlikely(offset >= NFS_OFFSET_MAX))
>> + return nfserr_inval;
>
> POSIX does allow you to lseek to the maximum filesize offset (sb-
>> s_maxbytes), however any subsequent read will return 0 bytes (i.e.
> eof), whereas a subsequent write will return EFBIG.

I'm simply trying to clarify your request.

You've stated that the Linux NFS client does not handle INVAL
responses, even though both RFC 1813 and 8881 permit it. So
are you suggesting (here) that the Linux NFS server should
not return INVAL on READs beyond the filesystem's supported
maximum file size but instead return a successful 0-byte
result with EOF set?

>> +
>> + if (unlikely(offset + *count > NFS_OFFSET_MAX))
>> + *count = NFS_OFFSET_MAX - offset;
>
> Can we please fix this to use the actual per-filesystem file size
> limit, which is set as sb->s_maxbytes, instead of using NFS_OFFSET_MAX?
>
> Ditto for 'read' above.

That sounds reasonable. But I wonder if there are some other
places that need the same treatment.

>> +
>> trace_nfsd_read_start(rqstp, fhp, offset, *count);
>> err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
>> if (err)

--
Chuck Lever

2022-01-24 11:20:19

[permalink] [raw]

Subject: Re: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX

> On Jan 22, 2022, at 3:30 PM, Trond Myklebust <[email protected]> wrote:
>
> On Sat, 2022-01-22 at 20:15 +0000, Chuck Lever III wrote:
>>
>>> On Jan 22, 2022, at 1:27 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>
>>> On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote:
>>>>
>>>>> On Jan 22, 2022, at 7:47 AM, Dan Aloni <[email protected]>
>>>>> wrote:
>>>>>
>>>>> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III
>>>>> wrote:
>>>>>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni
>>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Due to change 8cfb9015280d ("NFS: Always provide aligned
>>>>>>> buffers to the
>>>>>>> RPC read layers"), a read of 0xfff is aligned up to server
>>>>>>> rsize of
>>>>>>> 0x1000.
>>>>>>>
>>>>>>> As a result, in a test where the server has a file of size
>>>>>>> 0x7fffffffffffffff, and the client tries to read from the
>>>>>>> offset
>>>>>>> 0x7ffffffffffff000, the read causes loff_t overflow in the
>>>>>>> server and it
>>>>>>> returns an NFS code of EINVAL to the client. The client as
>>>>>>> a
>>>>>>> result
>>>>>>> indefinitely retries the request.
>>>>>>
>>>>>> An infinite loop in this case is a client bug.
>>>>>>
>>>>>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
>>>>>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
>>>>>> 5661 permits the NFSv4 READ operation to return
>>>>>> NFS4ERR_INVAL.
>>>>>>
>>>>>> Was the client side fix for this issue rejected?
>>>>>
>>>>> Yeah, see Trond's response in
>>>>>
>>>>>
>>>>> https://lore.kernel.org/linux-nfs/[email protected]/
>>>>>
>>>>> So it is both a client and server bugs?
>>>>
>>>> Splitting hairs, but yes there are issues on both sides
>>>> IMO. Bad behavior due to bugs on both sides is actually
>>>> not uncommon.
>>>>
>>>> Trond is correct that the server is not dealing totally
>>>> correctly with the range of values in a READ request.
>>>>
>>>> However, as I pointed out, the specification permits NFS
>>>> servers to return NFS[34]ERR_INVAL on READ. And in fact,
>>>> there is already code in the NFSv4 READ path that returns
>>>> INVAL, for example:
>>>>
>>>> 785 if (read->rd_offset >= OFFSET_MAX)
>>>> 786 return nfserr_inval;
>>>>
>>>> I'm not sure the specifications describe precisely when
>>>> the server /must/ return INVAL, but the client needs to
>>>> be prepared to handle it reasonably. If INVAL results in
>>>> an infinite loop, then that's a client bug.
>>>>
>>>> IMO changing the alignment for that case is a band-aid.
>>>> The underlying looping behavior is what is the root
>>>> problem. (So... I agree with Trond's NACK, but for
>>>> different reasons).
>>>
>>> If I'm reading Dan's test case correctly, the client is trying to
>>> read
>>> a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000.
>>> That means the end offset for that read is 0x7fffffffffffff000 +
>>> 0x1000
>>> - 1 = 0x7fffffffffffffff.
>>>
>>> IOW: as far as the server is concerned, there is no loff_t overflow
>>> on
>>> either the start or end offset and so there is no reason for it to
>>> return NFS4ERR_INVAL.
>>
>> Yep, I agree there's server misbehavior, and I think Dan's
>> server fix is on point.
>>
>> I would like to know why the client is looping, though. INVAL
>> is a valid response the Linux server already uses in other
>> cases and by itself should not trigger a READ retry.
>>
>> After checking the relevant XDR definitions, an NFS READ error
>> response doesn't include the EOF flag, so I'm a little mystified
>> why the client would need to retry after receiving INVAL.
>
> While we could certainly add that error to nfs_error_is_fatal(), the
> question is why the client should need to handle NFS4ERR_INVAL if it is
> sending valid arguments?

As I said:

I agree that Dan's test case is sending values in a range
that NFSD should handle without error. That does need to
be fixed.

However, there are other instances where NFSD returns INVAL
to a READ (and it has done so for a long while). Those cases
really mustn't trigger an unterminating loop, especially
since a ^C is not likely to unblock the application. That's
why I'm still concerned about behavior when a server returns
INVAL on a READ.

> 15.1.1.4. NFS4ERR_INVAL (Error Code 22)
>
> The arguments for this operation are not valid for some reason, even
> though they do match those specified in the XDR definition for the
> request.
>
>
> Sure... What does that mean, and what do I do?

Let me try to paraphrase:

A. RFC 1813 and 8881 permit servers to return INVAL on READ,
but do not specify under which conditions to use it. This
ambiguity might be reason for a server implementation to
avoid that status code with READ. Have you considered
filing an errata?

B. Though the RFCs permit servers to return INVAL on READ,
the Linux NFS client does not support it. The client is
not spec-compliant in this regard, but that's because of
the ambiguity described in A.

C. Therefore the Linux NFS client treats INVAL on READ as
unexpected input.

I claim that when confronted with unexpected input (of any
form) a "good quality" client implementation should avoid
pathological behavior like unterminating loops.... That
behavior is both an attack surface and potentially a
problem if the client has to be rebooted to fully recover.

The specific behavior of returning INVAL on READ is being
addressed in the Linux server, but not root-causing and
addressing the client's response to this behavior leaves a
large set of potential issues in this same class.

--
Chuck Lever

2022-01-26 22:16:31