2020-12-08 23:31:08

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v5] xprtrdma: Fix XDRBUF_SPARSE_PAGES support

Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
only when it has determined that a Reply chunk is necessary. There
are plenty of cases where no Reply chunk is needed, but the
XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
rpcrdma_inline_fixup() when it tries to copy parts of the received
Reply into a missing page.

To avoid crashing, handle sparse page allocation up front.

Until XATTR support was added, this issue did not appear often
because the only SPARSE_PAGES consumer always expected a reply large
enough to always require a Reply chunk.

Reported-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Cc: <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 40 +++++++++++++++++++++++++++++++---------
1 file changed, 31 insertions(+), 9 deletions(-)

Hi-

v4 had a bug, which I've fixed. This version has been tested.

In kernels before 5.10-rc5, there are still problems with the way
LISTXATTRS and GETXATTR deal with the tail / XDR pad for the page
content that this patch does not address. So backporting this fix
alone is not enough to get those working again -- more surgery would
be required.

Since none of the other SPARSE_PAGES users have a problem, let's
leave this one on the cutting room floor. It's here in the mail
archive if anyone needs it.


Changes since v4:
- xdr_buf_pagecount() was simply the wrong thing to use.

Changes since v3:
- I swear I am not drunk. I forgot to commit the change before mailing it.

Changes since v2:
- Actually fix the xdr_buf_pagecount() problem

Changes since RFC:
- Ensure xdr_buf_pagecount() is defined in rpc_rdma.c
- noinline the sparse page allocator -- it's an uncommon path

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 0f5120c7668f..c48536f2121f 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -179,6 +179,31 @@ rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
r_xprt->rx_ep->re_max_inline_recv;
}

+/* ACL likes to be lazy in allocating pages. For TCP, these
+ * pages can be allocated during receive processing. Not true
+ * for RDMA, which must always provision receive buffers
+ * up front.
+ */
+static noinline int
+rpcrdma_alloc_sparse_pages(struct xdr_buf *buf)
+{
+ struct page **ppages;
+ int len;
+
+ len = buf->page_len;
+ ppages = buf->pages + (buf->page_base >> PAGE_SHIFT);
+ while (len > 0) {
+ if (!*ppages)
+ *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
+ if (!*ppages)
+ return -ENOBUFS;
+ ppages++;
+ len -= PAGE_SIZE;
+ }
+
+ return 0;
+}
+
/* Split @vec on page boundaries into SGEs. FMR registers pages, not
* a byte range. Other modes coalesce these SGEs into a single MR
* when they can.
@@ -233,15 +258,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
page_base = offset_in_page(xdrbuf->page_base);
while (len) {
- /* ACL likes to be lazy in allocating pages - ACLs
- * are small by default but can get huge.
- */
- if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
- if (!*ppages)
- *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
- if (!*ppages)
- return -ENOBUFS;
- }
seg->mr_page = *ppages;
seg->mr_offset = (char *)page_base;
seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
@@ -867,6 +883,12 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
__be32 *p;
int ret;

+ if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
+ ret = rpcrdma_alloc_sparse_pages(&rqst->rq_rcv_buf);
+ if (ret)
+ return ret;
+ }
+
rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
rqst);



2020-12-09 16:49:38

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v5] xprtrdma: Fix XDRBUF_SPARSE_PAGES support

On Tue, Dec 8, 2020 at 6:31 PM Chuck Lever <[email protected]> wrote:
>
> Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
> only when it has determined that a Reply chunk is necessary. There
> are plenty of cases where no Reply chunk is needed, but the
> XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
> rpcrdma_inline_fixup() when it tries to copy parts of the received
> Reply into a missing page.
>
> To avoid crashing, handle sparse page allocation up front.
>
> Until XATTR support was added, this issue did not appear often
> because the only SPARSE_PAGES consumer always expected a reply large
> enough to always require a Reply chunk.
>
> Reported-by: Olga Kornievskaia <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
> Cc: <[email protected]>
> ---
> net/sunrpc/xprtrdma/rpc_rdma.c | 40 +++++++++++++++++++++++++++++++---------
> 1 file changed, 31 insertions(+), 9 deletions(-)
>
> Hi-
>
> v4 had a bug, which I've fixed. This version has been tested.

This version on top of the same commit (rc4) passes generic/013
without oopsing for me too.

> In kernels before 5.10-rc5, there are still problems with the way
> LISTXATTRS and GETXATTR deal with the tail / XDR pad for the page
> content that this patch does not address. So backporting this fix
> alone is not enough to get those working again -- more surgery would
> be required.
>
> Since none of the other SPARSE_PAGES users have a problem, let's
> leave this one on the cutting room floor. It's here in the mail
> archive if anyone needs it.
>
>
> Changes since v4:
> - xdr_buf_pagecount() was simply the wrong thing to use.
>
> Changes since v3:
> - I swear I am not drunk. I forgot to commit the change before mailing it.
>
> Changes since v2:
> - Actually fix the xdr_buf_pagecount() problem
>
> Changes since RFC:
> - Ensure xdr_buf_pagecount() is defined in rpc_rdma.c
> - noinline the sparse page allocator -- it's an uncommon path
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 0f5120c7668f..c48536f2121f 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -179,6 +179,31 @@ rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
> r_xprt->rx_ep->re_max_inline_recv;
> }
>
> +/* ACL likes to be lazy in allocating pages. For TCP, these
> + * pages can be allocated during receive processing. Not true
> + * for RDMA, which must always provision receive buffers
> + * up front.
> + */
> +static noinline int
> +rpcrdma_alloc_sparse_pages(struct xdr_buf *buf)
> +{
> + struct page **ppages;
> + int len;
> +
> + len = buf->page_len;
> + ppages = buf->pages + (buf->page_base >> PAGE_SHIFT);
> + while (len > 0) {
> + if (!*ppages)
> + *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
> + if (!*ppages)
> + return -ENOBUFS;
> + ppages++;
> + len -= PAGE_SIZE;
> + }
> +
> + return 0;
> +}
> +
> /* Split @vec on page boundaries into SGEs. FMR registers pages, not
> * a byte range. Other modes coalesce these SGEs into a single MR
> * when they can.
> @@ -233,15 +258,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
> ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
> page_base = offset_in_page(xdrbuf->page_base);
> while (len) {
> - /* ACL likes to be lazy in allocating pages - ACLs
> - * are small by default but can get huge.
> - */
> - if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
> - if (!*ppages)
> - *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
> - if (!*ppages)
> - return -ENOBUFS;
> - }
> seg->mr_page = *ppages;
> seg->mr_offset = (char *)page_base;
> seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
> @@ -867,6 +883,12 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
> __be32 *p;
> int ret;
>
> + if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
> + ret = rpcrdma_alloc_sparse_pages(&rqst->rq_rcv_buf);
> + if (ret)
> + return ret;
> + }
> +
> rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
> xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
> rqst);
>
>

2020-12-09 17:03:35

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v5] xprtrdma: Fix XDRBUF_SPARSE_PAGES support



> On Dec 9, 2020, at 11:47 AM, Olga Kornievskaia <[email protected]> wrote:
>
> On Tue, Dec 8, 2020 at 6:31 PM Chuck Lever <[email protected]> wrote:
>>
>> Olga K. observed that rpcrdma_marsh_req() allocates sparse pages
>> only when it has determined that a Reply chunk is necessary. There
>> are plenty of cases where no Reply chunk is needed, but the
>> XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in
>> rpcrdma_inline_fixup() when it tries to copy parts of the received
>> Reply into a missing page.
>>
>> To avoid crashing, handle sparse page allocation up front.
>>
>> Until XATTR support was added, this issue did not appear often
>> because the only SPARSE_PAGES consumer always expected a reply large
>> enough to always require a Reply chunk.
>>
>> Reported-by: Olga Kornievskaia <[email protected]>
>> Signed-off-by: Chuck Lever <[email protected]>
>> Cc: <[email protected]>
>> ---
>> net/sunrpc/xprtrdma/rpc_rdma.c | 40 +++++++++++++++++++++++++++++++---------
>> 1 file changed, 31 insertions(+), 9 deletions(-)
>>
>> Hi-
>>
>> v4 had a bug, which I've fixed. This version has been tested.
>
> This version on top of the same commit (rc4) passes generic/013
> without oopsing for me too.

Excellent, thanks for confirming!


>> In kernels before 5.10-rc5, there are still problems with the way
>> LISTXATTRS and GETXATTR deal with the tail / XDR pad for the page
>> content that this patch does not address. So backporting this fix
>> alone is not enough to get those working again -- more surgery would
>> be required.
>>
>> Since none of the other SPARSE_PAGES users have a problem, let's
>> leave this one on the cutting room floor. It's here in the mail
>> archive if anyone needs it.
>>
>>
>> Changes since v4:
>> - xdr_buf_pagecount() was simply the wrong thing to use.
>>
>> Changes since v3:
>> - I swear I am not drunk. I forgot to commit the change before mailing it.
>>
>> Changes since v2:
>> - Actually fix the xdr_buf_pagecount() problem
>>
>> Changes since RFC:
>> - Ensure xdr_buf_pagecount() is defined in rpc_rdma.c
>> - noinline the sparse page allocator -- it's an uncommon path
>>
>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>> index 0f5120c7668f..c48536f2121f 100644
>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>> @@ -179,6 +179,31 @@ rpcrdma_nonpayload_inline(const struct rpcrdma_xprt *r_xprt,
>> r_xprt->rx_ep->re_max_inline_recv;
>> }
>>
>> +/* ACL likes to be lazy in allocating pages. For TCP, these
>> + * pages can be allocated during receive processing. Not true
>> + * for RDMA, which must always provision receive buffers
>> + * up front.
>> + */
>> +static noinline int
>> +rpcrdma_alloc_sparse_pages(struct xdr_buf *buf)
>> +{
>> + struct page **ppages;
>> + int len;
>> +
>> + len = buf->page_len;
>> + ppages = buf->pages + (buf->page_base >> PAGE_SHIFT);
>> + while (len > 0) {
>> + if (!*ppages)
>> + *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
>> + if (!*ppages)
>> + return -ENOBUFS;
>> + ppages++;
>> + len -= PAGE_SIZE;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> /* Split @vec on page boundaries into SGEs. FMR registers pages, not
>> * a byte range. Other modes coalesce these SGEs into a single MR
>> * when they can.
>> @@ -233,15 +258,6 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>> ppages = xdrbuf->pages + (xdrbuf->page_base >> PAGE_SHIFT);
>> page_base = offset_in_page(xdrbuf->page_base);
>> while (len) {
>> - /* ACL likes to be lazy in allocating pages - ACLs
>> - * are small by default but can get huge.
>> - */
>> - if (unlikely(xdrbuf->flags & XDRBUF_SPARSE_PAGES)) {
>> - if (!*ppages)
>> - *ppages = alloc_page(GFP_NOWAIT | __GFP_NOWARN);
>> - if (!*ppages)
>> - return -ENOBUFS;
>> - }
>> seg->mr_page = *ppages;
>> seg->mr_offset = (char *)page_base;
>> seg->mr_len = min_t(u32, PAGE_SIZE - page_base, len);
>> @@ -867,6 +883,12 @@ rpcrdma_marshal_req(struct rpcrdma_xprt *r_xprt, struct rpc_rqst *rqst)
>> __be32 *p;
>> int ret;
>>
>> + if (unlikely(rqst->rq_rcv_buf.flags & XDRBUF_SPARSE_PAGES)) {
>> + ret = rpcrdma_alloc_sparse_pages(&rqst->rq_rcv_buf);
>> + if (ret)
>> + return ret;
>> + }
>> +
>> rpcrdma_set_xdrlen(&req->rl_hdrbuf, 0);
>> xdr_init_encode(xdr, &req->rl_hdrbuf, rdmab_data(req->rl_rdmabuf),
>> rqst);

--
Chuck Lever