2020-11-22 21:07:40

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 5/8] SUNRPC: Don't truncate tail in xdr_inline_pages()

From: Trond Myklebust <[email protected]>

True that if the length of the pages[] array is not 4-byte aligned, then
we will need to store the padding in the tail, but there is no need to
truncate the total buffer length here.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xdr.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 3ce0a5daa9eb..5a450055469f 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -193,9 +193,6 @@ xdr_inline_pages(struct xdr_buf *xdr, unsigned int offset,

tail->iov_base = buf + offset;
tail->iov_len = buflen - offset;
- if ((xdr->page_len & 3) == 0)
- tail->iov_len -= sizeof(__be32);
-
xdr->buflen += len;
}
EXPORT_SYMBOL_GPL(xdr_inline_pages);
--
2.28.0


2020-11-22 21:07:43

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 6/8] SUNRPC: Fix up xdr_set_page()

From: Trond Myklebust <[email protected]>

While we always want to align to the next page and/or the beginning of
the tail buffer when we call xdr_set_next_page(), the functions
xdr_align_data() and xdr_expand_hole() really want to align to the next
object in that next page or tail.

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xdr.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 5a450055469f..ddd5cc2281ab 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1019,8 +1019,10 @@ static unsigned int xdr_set_page_base(struct xdr_stream *xdr,
static void xdr_set_page(struct xdr_stream *xdr, unsigned int base,
unsigned int len)
{
- if (xdr_set_page_base(xdr, base, len) == 0)
- xdr_set_iov(xdr, xdr->buf->tail, 0, xdr_stream_remaining(xdr));
+ if (xdr_set_page_base(xdr, base, len) == 0) {
+ base -= xdr->buf->page_len;
+ xdr_set_iov(xdr, xdr->buf->tail, base, len);
+ }
}

static void xdr_set_next_page(struct xdr_stream *xdr)
@@ -1029,17 +1031,18 @@ static void xdr_set_next_page(struct xdr_stream *xdr)

newbase = (1 + xdr->page_ptr - xdr->buf->pages) << PAGE_SHIFT;
newbase -= xdr->buf->page_base;
-
- xdr_set_page(xdr, newbase, PAGE_SIZE);
+ if (newbase < xdr->buf->page_len)
+ xdr_set_page_base(xdr, newbase, xdr_stream_remaining(xdr));
+ else
+ xdr_set_iov(xdr, xdr->buf->tail, 0, xdr_stream_remaining(xdr));
}

static bool xdr_set_next_buffer(struct xdr_stream *xdr)
{
if (xdr->page_ptr != NULL)
xdr_set_next_page(xdr);
- else if (xdr->iov == xdr->buf->head) {
- xdr_set_page(xdr, 0, PAGE_SIZE);
- }
+ else if (xdr->iov == xdr->buf->head)
+ xdr_set_page(xdr, 0, xdr_stream_remaining(xdr));
return xdr->p != xdr->end;
}

@@ -1277,7 +1280,7 @@ uint64_t xdr_align_data(struct xdr_stream *xdr, uint64_t offset, uint32_t length
}

xdr->nwords -= XDR_QUADLEN(length);
- xdr_set_page(xdr, from + length, PAGE_SIZE);
+ xdr_set_page(xdr, from + length, xdr_stream_remaining(xdr));
return length;
}
EXPORT_SYMBOL_GPL(xdr_align_data);
@@ -1314,7 +1317,7 @@ uint64_t xdr_expand_hole(struct xdr_stream *xdr, uint64_t offset, uint64_t lengt
_zero_pages(buf->pages, buf->page_base + offset, length);

buf->len += length - (from - offset) - truncated;
- xdr_set_page(xdr, offset + length, PAGE_SIZE);
+ xdr_set_page(xdr, offset + length, xdr_stream_remaining(xdr));
return length;
}
EXPORT_SYMBOL_GPL(xdr_expand_hole);
--
2.28.0

2020-11-22 21:09:12

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 7/8] SUNRPC: Fix open coded xdr_stream_remaining()

From: Trond Myklebust <[email protected]>

Signed-off-by: Trond Myklebust <[email protected]>
---
net/sunrpc/xdr.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index ddd5cc2281ab..c852d199c789 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1261,7 +1261,7 @@ uint64_t xdr_align_data(struct xdr_stream *xdr, uint64_t offset, uint32_t length

xdr_realign_pages(xdr);
from = xdr_page_pos(xdr);
- bytes = xdr->nwords << 2;
+ bytes = xdr_stream_remaining(xdr);
if (length < bytes)
bytes = length;

@@ -1298,7 +1298,7 @@ uint64_t xdr_expand_hole(struct xdr_stream *xdr, uint64_t offset, uint64_t lengt

xdr_realign_pages(xdr);
from = xdr_page_pos(xdr);
- bytes = xdr->nwords << 2;
+ bytes = xdr_stream_remaining(xdr);

if (offset + length + bytes > buf->page_len) {
unsigned int shift = (offset + length + bytes) - buf->page_len;
--
2.28.0

2020-11-23 02:31:08

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 5/8] SUNRPC: Don't truncate tail in xdr_inline_pages()



> On Nov 22, 2020, at 3:52 PM, [email protected] wrote:
>
> From: Trond Myklebust <[email protected]>
>
> True that if the length of the pages[] array is not 4-byte aligned, then
> we will need to store the padding in the tail, but there is no need to
> truncate the total buffer length here.

This description confuses me. The existing code reduces the length of
the tail, not the "total buffer length." And what the removed logic is
doing is taking out the length of the XDR pad for the pages array when
it is not expected to be used.


> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> net/sunrpc/xdr.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index 3ce0a5daa9eb..5a450055469f 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -193,9 +193,6 @@ xdr_inline_pages(struct xdr_buf *xdr, unsigned int offset,
>
> tail->iov_base = buf + offset;
> tail->iov_len = buflen - offset;
> - if ((xdr->page_len & 3) == 0)
> - tail->iov_len -= sizeof(__be32);
> -
> xdr->buflen += len;
> }
> EXPORT_SYMBOL_GPL(xdr_inline_pages);
> --
> 2.28.0
>

--
Chuck Lever



2020-11-23 04:45:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 5/8] SUNRPC: Don't truncate tail in xdr_inline_pages()

On Sun, 2020-11-22 at 20:24 -0500, Chuck Lever wrote:
>
>
> > On Nov 22, 2020, at 3:52 PM, [email protected] wrote:
> >
> > From: Trond Myklebust <[email protected]>
> >
> > True that if the length of the pages[] array is not 4-byte aligned,
> > then
> > we will need to store the padding in the tail, but there is no need
> > to
> > truncate the total buffer length here.
>
> This description confuses me. The existing code reduces the length of
> the tail, not the "total buffer length." And what the removed logic
> is
> doing is taking out the length of the XDR pad for the pages array
> when
> it is not expected to be used.

Why are we bothering to do that? There is nothing problematic with just
ignoring this test and leaving the tail length as it is, nor is there
anything to be gained by applying it.

> > Signed-off-by: Trond Myklebust <[email protected]>
> > ---
> > net/sunrpc/xdr.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> > index 3ce0a5daa9eb..5a450055469f 100644
> > --- a/net/sunrpc/xdr.c
> > +++ b/net/sunrpc/xdr.c
> > @@ -193,9 +193,6 @@ xdr_inline_pages(struct xdr_buf *xdr, unsigned
> > int offset,
> >
> >         tail->iov_base = buf + offset;
> >         tail->iov_len = buflen - offset;
> > -       if ((xdr->page_len & 3) == 0)
> > -               tail->iov_len -= sizeof(__be32);
> > -
> >         xdr->buflen += len;
> > }
> > EXPORT_SYMBOL_GPL(xdr_inline_pages);
> > --
> > 2.28.0
> >
>
> --
> Chuck Lever
>
>
>

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-11-23 15:11:11

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 5/8] SUNRPC: Don't truncate tail in xdr_inline_pages()



> On Nov 22, 2020, at 11:29 PM, Trond Myklebust <[email protected]> wrote:
>
> On Sun, 2020-11-22 at 20:24 -0500, Chuck Lever wrote:
>>
>>
>>> On Nov 22, 2020, at 3:52 PM, [email protected] wrote:
>>>
>>> From: Trond Myklebust <[email protected]>
>>>
>>> True that if the length of the pages[] array is not 4-byte aligned,
>>> then
>>> we will need to store the padding in the tail, but there is no need
>>> to
>>> truncate the total buffer length here.
>>
>> This description confuses me. The existing code reduces the length of
>> the tail, not the "total buffer length." And what the removed logic
>> is
>> doing is taking out the length of the XDR pad for the pages array
>> when
>> it is not expected to be used.
>
> Why are we bothering to do that? There is nothing problematic with just
> ignoring this test and leaving the tail length as it is, nor is there
> anything to be gained by applying it.

You are correct that leaving the buffer a little long is not going
to harm normal operation. After all, we lived with a wildly over-
estimated slack length for years.

The purpose of this code path is to prepare the receive buffer with
the memory resources and expected length of the Reply. The series
of patches that introduced this particular change was all about
ensuring that the estimated length of the reply message was exact.

If the reply message size is overestimated, that moves the end-of-
message sentinel that is later set by xdr_init_decode(). We then
miss subtle problems like our fixed size estimates are incorrect
or a man-in-the-middle is extending the RPC message or the server
is malfunctioning.

<scratches chin>

After moving the ->pages pad into ->pages, I'm wondering if you
should revert 02ef04e432ba ("NFS: Account for XDR pad of buf->pages") --
the maxsz macros don't need to account for the XDR pad of ->pages
any more. Then the below hunk makes sense. The patch description
still doesn't, though ;-)

And then you should confirm that we are still getting the receive
buffer size estimate right for krb5i and krb5p.


>>> Signed-off-by: Trond Myklebust <[email protected]>
>>> ---
>>> net/sunrpc/xdr.c | 3 ---
>>> 1 file changed, 3 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>>> index 3ce0a5daa9eb..5a450055469f 100644
>>> --- a/net/sunrpc/xdr.c
>>> +++ b/net/sunrpc/xdr.c
>>> @@ -193,9 +193,6 @@ xdr_inline_pages(struct xdr_buf *xdr, unsigned
>>> int offset,
>>>
>>> tail->iov_base = buf + offset;
>>> tail->iov_len = buflen - offset;
>>> - if ((xdr->page_len & 3) == 0)
>>> - tail->iov_len -= sizeof(__be32);
>>> -
>>> xdr->buflen += len;
>>> }
>>> EXPORT_SYMBOL_GPL(xdr_inline_pages);
>>> --
>>> 2.28.0
>>>
>>
>> --
>> Chuck Lever
>>
>>
>>
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]

--
Chuck Lever



2020-11-23 15:39:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 5/8] SUNRPC: Don't truncate tail in xdr_inline_pages()

On Mon, 2020-11-23 at 09:52 -0500, Chuck Lever wrote:
>
>
> > On Nov 22, 2020, at 11:29 PM, Trond Myklebust <
> > [email protected]> wrote:
> >
> > On Sun, 2020-11-22 at 20:24 -0500, Chuck Lever wrote:
> > >
> > >
> > > > On Nov 22, 2020, at 3:52 PM, [email protected] wrote:
> > > >
> > > > From: Trond Myklebust <[email protected]>
> > > >
> > > > True that if the length of the pages[] array is not 4-byte
> > > > aligned,
> > > > then
> > > > we will need to store the padding in the tail, but there is no
> > > > need
> > > > to
> > > > truncate the total buffer length here.
> > >
> > > This description confuses me. The existing code reduces the
> > > length of
> > > the tail, not the "total buffer length." And what the removed
> > > logic
> > > is
> > > doing is taking out the length of the XDR pad for the pages array
> > > when
> > > it is not expected to be used.
> >
> > Why are we bothering to do that? There is nothing problematic with
> > just
> > ignoring this test and leaving the tail length as it is, nor is
> > there
> > anything to be gained by applying it.
>
> You are correct that leaving the buffer a little long is not going
> to harm normal operation. After all, we lived with a wildly over-
> estimated slack length for years.
>
> The purpose of this code path is to prepare the receive buffer with
> the memory resources and expected length of the Reply. The series
> of patches that introduced this particular change was all about
> ensuring that the estimated length of the reply message was exact.
>
> If the reply message size is overestimated, that moves the end-of-
> message sentinel that is later set by xdr_init_decode(). We then
> miss subtle problems like our fixed size estimates are incorrect
> or a man-in-the-middle is extending the RPC message or the server
> is malfunctioning.
>
> <scratches chin>
>
> After moving the ->pages pad into ->pages, I'm wondering if you
> should revert 02ef04e432ba ("NFS: Account for XDR pad of buf->pages")
> --
> the maxsz macros don't need to account for the XDR pad of ->pages
> any more. Then the below hunk makes sense. The patch description
> still doesn't, though ;-)
>

I don't think it needs to be reverted. I think you are right to include
the padding in the buffer size that we use to set the value of task-
>tk_rqstp->rq_rcvsize.

That said, it seems wrong to include that padding as part of the
'hdrsize' argument in rpc_prepare_reply_pages(). That just causes
confusion, because the padding is not part of the header in front of
the array of pages. It is part of the tail data after the array of
pages. So I think a cleanup there may be warranted.

The other thing that I'm considering is that we may want to optimise to
avoid setting up an RDMA SEND just for the padding if that is truly the
last word in the RPC call (it matters less if there is other data that
requires us to set up such a SEND anyway). Not sure how to do that in a
clean manner, though. Perhaps we'd have to pass in the padding size as
a separate argument to xdr_inline_pages() (and also to
rpc_prepare_reply_pages())?


> And then you should confirm that we are still getting the receive
> buffer size estimate right for krb5i and krb5p.
>
>
> > > > Signed-off-by: Trond Myklebust
> > > > <[email protected]>
> > > > ---
> > > > net/sunrpc/xdr.c | 3 ---
> > > > 1 file changed, 3 deletions(-)
> > > >
> > > > diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> > > > index 3ce0a5daa9eb..5a450055469f 100644
> > > > --- a/net/sunrpc/xdr.c
> > > > +++ b/net/sunrpc/xdr.c
> > > > @@ -193,9 +193,6 @@ xdr_inline_pages(struct xdr_buf *xdr,
> > > > unsigned
> > > > int offset,
> > > >
> > > >         tail->iov_base = buf + offset;
> > > >         tail->iov_len = buflen - offset;
> > > > -       if ((xdr->page_len & 3) == 0)
> > > > -               tail->iov_len -= sizeof(__be32);
> > > > -
> > > >         xdr->buflen += len;
> > > > }
> > > > EXPORT_SYMBOL_GPL(xdr_inline_pages);
> > > > --
> > > > 2.28.0
>

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2020-11-23 15:50:10

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 5/8] SUNRPC: Don't truncate tail in xdr_inline_pages()



> On Nov 23, 2020, at 10:37 AM, Trond Myklebust <[email protected]> wrote:
>
> On Mon, 2020-11-23 at 09:52 -0500, Chuck Lever wrote:
>>
>>
>>> On Nov 22, 2020, at 11:29 PM, Trond Myklebust <
>>> [email protected]> wrote:
>>>
>>> On Sun, 2020-11-22 at 20:24 -0500, Chuck Lever wrote:
>>>>
>>>>
>>>>> On Nov 22, 2020, at 3:52 PM, [email protected] wrote:
>>>>>
>>>>> From: Trond Myklebust <[email protected]>
>>>>>
>>>>> True that if the length of the pages[] array is not 4-byte
>>>>> aligned,
>>>>> then
>>>>> we will need to store the padding in the tail, but there is no
>>>>> need
>>>>> to
>>>>> truncate the total buffer length here.
>>>>
>>>> This description confuses me. The existing code reduces the
>>>> length of
>>>> the tail, not the "total buffer length." And what the removed
>>>> logic
>>>> is
>>>> doing is taking out the length of the XDR pad for the pages array
>>>> when
>>>> it is not expected to be used.
>>>
>>> Why are we bothering to do that? There is nothing problematic with
>>> just
>>> ignoring this test and leaving the tail length as it is, nor is
>>> there
>>> anything to be gained by applying it.
>>
>> You are correct that leaving the buffer a little long is not going
>> to harm normal operation. After all, we lived with a wildly over-
>> estimated slack length for years.
>>
>> The purpose of this code path is to prepare the receive buffer with
>> the memory resources and expected length of the Reply. The series
>> of patches that introduced this particular change was all about
>> ensuring that the estimated length of the reply message was exact.
>>
>> If the reply message size is overestimated, that moves the end-of-
>> message sentinel that is later set by xdr_init_decode(). We then
>> miss subtle problems like our fixed size estimates are incorrect
>> or a man-in-the-middle is extending the RPC message or the server
>> is malfunctioning.
>>
>> <scratches chin>
>>
>> After moving the ->pages pad into ->pages, I'm wondering if you
>> should revert 02ef04e432ba ("NFS: Account for XDR pad of buf->pages")
>> --
>> the maxsz macros don't need to account for the XDR pad of ->pages
>> any more. Then the below hunk makes sense. The patch description
>> still doesn't, though ;-)
>>
>
> I don't think it needs to be reverted. I think you are right to include
> the padding in the buffer size that we use to set the value of task-
>> tk_rqstp->rq_rcvsize.
>
> That said, it seems wrong to include that padding as part of the
> 'hdrsize' argument in rpc_prepare_reply_pages(). That just causes
> confusion, because the padding is not part of the header in front of
> the array of pages. It is part of the tail data after the array of
> pages. So I think a cleanup there may be warranted.

Agreed, dealing with the tail size is confusing.


> The other thing that I'm considering is that we may want to optimise to
> avoid setting up an RDMA SEND just for the padding if that is truly the
> last word in the RPC call (it matters less if there is other data that
> requires us to set up such a SEND anyway). Not sure how to do that in a
> clean manner, though. Perhaps we'd have to pass in the padding size as
> a separate argument to xdr_inline_pages() (and also to
> rpc_prepare_reply_pages())?

In the current version of RPC/RDMA, there's always exactly one RDMA
Send per RPC message.

The Linux client implementation is also careful to exclude XDR padding
in both Read and Write chunks because the protocol makes the inclusion
of padding on the wire optional.

The only issue I see is that the upper layer needs to identify to the
transport the exact size of the data item that is being transferred
in a chunk so that the padding can be properly excluded. Currently
rpcrdma makes some assumptions about how the data items are laid out
in the xdr_buf when XDRBUF_READ/WRITE is set.


>> And then you should confirm that we are still getting the receive
>> buffer size estimate right for krb5i and krb5p.
>>
>>
>>>>> Signed-off-by: Trond Myklebust
>>>>> <[email protected]>
>>>>> ---
>>>>> net/sunrpc/xdr.c | 3 ---
>>>>> 1 file changed, 3 deletions(-)
>>>>>
>>>>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>>>>> index 3ce0a5daa9eb..5a450055469f 100644
>>>>> --- a/net/sunrpc/xdr.c
>>>>> +++ b/net/sunrpc/xdr.c
>>>>> @@ -193,9 +193,6 @@ xdr_inline_pages(struct xdr_buf *xdr,
>>>>> unsigned
>>>>> int offset,
>>>>>
>>>>> tail->iov_base = buf + offset;
>>>>> tail->iov_len = buflen - offset;
>>>>> - if ((xdr->page_len & 3) == 0)
>>>>> - tail->iov_len -= sizeof(__be32);
>>>>> -
>>>>> xdr->buflen += len;
>>>>> }
>>>>> EXPORT_SYMBOL_GPL(xdr_inline_pages);
>>>>> --
>>>>> 2.28.0
>>
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]

--
Chuck Lever



2020-11-23 17:27:20

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 5/8] SUNRPC: Don't truncate tail in xdr_inline_pages()

Hi Trond,

On Sun, Nov 22, 2020 at 4:07 PM <[email protected]> wrote:
>
> From: Trond Myklebust <[email protected]>
>
> True that if the length of the pages[] array is not 4-byte aligned, then
> we will need to store the padding in the tail, but there is no need to
> truncate the total buffer length here.

Just a heads up, after applying this patch there are a *lot* of
xfstests that fail when run on v4.2 against a server that supports
READ_PLUS

Anna

>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> net/sunrpc/xdr.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index 3ce0a5daa9eb..5a450055469f 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -193,9 +193,6 @@ xdr_inline_pages(struct xdr_buf *xdr, unsigned int offset,
>
> tail->iov_base = buf + offset;
> tail->iov_len = buflen - offset;
> - if ((xdr->page_len & 3) == 0)
> - tail->iov_len -= sizeof(__be32);
> -
> xdr->buflen += len;
> }
> EXPORT_SYMBOL_GPL(xdr_inline_pages);
> --
> 2.28.0
>