2020-11-24 17:30:39

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

By switching to an XFS-backed export, I am able to reproduce the
ibcomp worker crash on my client with xfstests generic/013.

For the failing LISTXATTRS operation, xdr_inline_pages() is called
with page_len=12 and buflen=128. Then:

- Because buflen is small, rpcrdma_marshal_req will not set up a
Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
get invoked at all.

- Because page_len is non-zero, rpcrdma_inline_fixup() tries to
copy received data into rq_rcv_buf->pages, but they're missing.

The result is that the ibcomp worker faults and dies. Sometimes that
causes a visible crash, and sometimes it results in a transport
hang without other symptoms.

RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
should eventually be fixed or replaced. However, my preference is
that upper-layer operations should explicitly allocate their receive
buffers (using GFP_KERNEL) when possible, rather than relying on
XDRBUF_SPARSE_PAGES.

Reported-by: Olga kornievskaia <[email protected]>
Suggested-by: Olga kornievskaia <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfs/nfs42proc.c | 17 ++++++++++-------
fs/nfs/nfs42xdr.c | 1 -
2 files changed, 10 insertions(+), 8 deletions(-)

Hi-

I like Olga's proposed approach. What do you think of this patch?


diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 2b2211d1234e..24810305ec1c 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
.rpc_resp = &res,
};
u32 xdrlen;
- int ret, np;
+ int ret, np, i;


res.scratch = alloc_page(GFP_KERNEL);
@@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
xdrlen = server->lxasize;
np = xdrlen / PAGE_SIZE + 1;

+ ret = -ENOMEM;
pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
- if (pages == NULL) {
- __free_page(res.scratch);
- return -ENOMEM;
+ if (pages == NULL)
+ goto out_free;
+ for (i = 0; i < np; i++) {
+ pages[i] = alloc_page(GFP_KERNEL);
+ if (!pages[i])
+ goto out_free;
}

arg.xattr_pages = pages;
@@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
*eofp = res.eof;
}

+out_free:
while (--np >= 0) {
if (pages[np])
__free_page(pages[np]);
}
-
- __free_page(res.scratch);
kfree(pages);
-
+ __free_page(res.scratch);
return ret;

}
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 6e060a88f98c..8432bd6b95f0 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,

rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
hdr.replen);
- req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;

encode_nops(&hdr);
}



2020-11-24 23:54:52

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Tue, Nov 24, 2020 at 12:30 PM Chuck Lever <[email protected]> wrote:
>
> By switching to an XFS-backed export, I am able to reproduce the
> ibcomp worker crash on my client with xfstests generic/013.
>
> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> with page_len=12 and buflen=128. Then:
>
> - Because buflen is small, rpcrdma_marshal_req will not set up a
> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> get invoked at all.
>
> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> copy received data into rq_rcv_buf->pages, but they're missing.
>
> The result is that the ibcomp worker faults and dies. Sometimes that
> causes a visible crash, and sometimes it results in a transport
> hang without other symptoms.
>
> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> should eventually be fixed or replaced. However, my preference is
> that upper-layer operations should explicitly allocate their receive
> buffers (using GFP_KERNEL) when possible, rather than relying on
> XDRBUF_SPARSE_PAGES.
>
> Reported-by: Olga kornievskaia <[email protected]>
> Suggested-by: Olga kornievskaia <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>

Yes!
Reviewed-by && Tested-by.

> ---
> fs/nfs/nfs42proc.c | 17 ++++++++++-------
> fs/nfs/nfs42xdr.c | 1 -
> 2 files changed, 10 insertions(+), 8 deletions(-)
>
> Hi-
>
> I like Olga's proposed approach. What do you think of this patch?
>
>
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 2b2211d1234e..24810305ec1c 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> .rpc_resp = &res,
> };
> u32 xdrlen;
> - int ret, np;
> + int ret, np, i;
>
>
> res.scratch = alloc_page(GFP_KERNEL);
> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> xdrlen = server->lxasize;
> np = xdrlen / PAGE_SIZE + 1;
>
> + ret = -ENOMEM;
> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> - if (pages == NULL) {
> - __free_page(res.scratch);
> - return -ENOMEM;
> + if (pages == NULL)
> + goto out_free;
> + for (i = 0; i < np; i++) {
> + pages[i] = alloc_page(GFP_KERNEL);
> + if (!pages[i])
> + goto out_free;
> }
>
> arg.xattr_pages = pages;
> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> *eofp = res.eof;
> }
>
> +out_free:
> while (--np >= 0) {
> if (pages[np])
> __free_page(pages[np]);
> }
> -
> - __free_page(res.scratch);
> kfree(pages);
> -
> + __free_page(res.scratch);
> return ret;
>
> }
> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> index 6e060a88f98c..8432bd6b95f0 100644
> --- a/fs/nfs/nfs42xdr.c
> +++ b/fs/nfs/nfs42xdr.c
> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
>
> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> hdr.replen);
> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
>
> encode_nops(&hdr);
> }
>
>

2020-11-24 23:55:14

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation



> On Nov 24, 2020, at 3:06 PM, Frank van der Linden <[email protected]> wrote:
>
> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> By switching to an XFS-backed export, I am able to reproduce the
>> ibcomp worker crash on my client with xfstests generic/013.
>>
>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
>> with page_len=12 and buflen=128. Then:
>>
>> - Because buflen is small, rpcrdma_marshal_req will not set up a
>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
>> get invoked at all.
>>
>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
>> copy received data into rq_rcv_buf->pages, but they're missing.
>>
>> The result is that the ibcomp worker faults and dies. Sometimes that
>> causes a visible crash, and sometimes it results in a transport
>> hang without other symptoms.
>>
>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
>> should eventually be fixed or replaced. However, my preference is
>> that upper-layer operations should explicitly allocate their receive
>> buffers (using GFP_KERNEL) when possible, rather than relying on
>> XDRBUF_SPARSE_PAGES.
>>
>> Reported-by: Olga kornievskaia <[email protected]>
>> Suggested-by: Olga kornievskaia <[email protected]>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> fs/nfs/nfs42proc.c | 17 ++++++++++-------
>> fs/nfs/nfs42xdr.c | 1 -
>> 2 files changed, 10 insertions(+), 8 deletions(-)
>>
>> Hi-
>>
>> I like Olga's proposed approach. What do you think of this patch?
>>
>>
>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
>> index 2b2211d1234e..24810305ec1c 100644
>> --- a/fs/nfs/nfs42proc.c
>> +++ b/fs/nfs/nfs42proc.c
>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>> .rpc_resp = &res,
>> };
>> u32 xdrlen;
>> - int ret, np;
>> + int ret, np, i;
>>
>>
>> res.scratch = alloc_page(GFP_KERNEL);
>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>> xdrlen = server->lxasize;
>> np = xdrlen / PAGE_SIZE + 1;
>>
>> + ret = -ENOMEM;
>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
>> - if (pages == NULL) {
>> - __free_page(res.scratch);
>> - return -ENOMEM;
>> + if (pages == NULL)
>> + goto out_free;
>> + for (i = 0; i < np; i++) {
>> + pages[i] = alloc_page(GFP_KERNEL);
>> + if (!pages[i])
>> + goto out_free;
>> }
>>
>> arg.xattr_pages = pages;
>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>> *eofp = res.eof;
>> }
>>
>> +out_free:
>> while (--np >= 0) {
>> if (pages[np])
>> __free_page(pages[np]);
>> }
>> -
>> - __free_page(res.scratch);
>> kfree(pages);
>> -
>> + __free_page(res.scratch);
>> return ret;
>>
>> }
>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
>> index 6e060a88f98c..8432bd6b95f0 100644
>> --- a/fs/nfs/nfs42xdr.c
>> +++ b/fs/nfs/nfs42xdr.c
>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
>>
>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
>> hdr.replen);
>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
>>
>> encode_nops(&hdr);
>> }
>>
>>
>
> I can see why this is the simplest and most pragmatic solution, so it's
> fine with me.

Thanks. I've added Olga's Reviewed/Tested-by to my local copy.
May I add a Reviewed-by from you?


> Why doesn't this happen with getxattr? Do we need to convert that too?

If a GETXATTR request can be generated such that buflen is less than
a page, but page_len > 0, then yes, it will happen there too. As Olga
says, NFS is unaware of the transport's inline threshold setting, so
there will always be some buflen value under which bad things happen.

Either way, I would prefer to see GETXATTR converted to avoid using
XDRBUF_SPARSE_PAGES. Does NFSv4 GETACL provide a good example?


> The basic issue here is that the RPC code does not deal with inlined data
> that exceeds PAGE_SIZE. That can only be done with raw pages.
>
> Since the upper layer has already allocated a buffer in the case of listxattr
> and getxattr, I would love to be able to just XDR code in to that buffer,
> and void the whole alloc+copy situation.

Do you mean like the READDIR entry encoders and decoders?


> But sadly, it might be > PAGE_SIZE,
> so the XDR code doesn't allow it. It's not all bad, having to use pages
> allows them to be directly hooked in to the cache in the case of getxattr,
> but for listxattr, decoding directly in to the provided buffer would be nice.
>
> Hm, I wonder if that restriction actually holds for listxattr - the invidual
> XDR items (xattr names) should never exceed PAGE_SIZE..

You'll have to worry about XDR data items crossing page boundaries.
Our XDR code uses a scratch buffer for that, so it can be handled
transparently.


--
Chuck Lever



2020-11-24 23:56:15

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> By switching to an XFS-backed export, I am able to reproduce the
> ibcomp worker crash on my client with xfstests generic/013.
>
> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> with page_len=12 and buflen=128. Then:
>
> - Because buflen is small, rpcrdma_marshal_req will not set up a
> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> get invoked at all.
>
> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> copy received data into rq_rcv_buf->pages, but they're missing.
>
> The result is that the ibcomp worker faults and dies. Sometimes that
> causes a visible crash, and sometimes it results in a transport
> hang without other symptoms.
>
> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> should eventually be fixed or replaced. However, my preference is
> that upper-layer operations should explicitly allocate their receive
> buffers (using GFP_KERNEL) when possible, rather than relying on
> XDRBUF_SPARSE_PAGES.
>
> Reported-by: Olga kornievskaia <[email protected]>
> Suggested-by: Olga kornievskaia <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> fs/nfs/nfs42proc.c | 17 ++++++++++-------
> fs/nfs/nfs42xdr.c | 1 -
> 2 files changed, 10 insertions(+), 8 deletions(-)
>
> Hi-
>
> I like Olga's proposed approach. What do you think of this patch?
>
>
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 2b2211d1234e..24810305ec1c 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> .rpc_resp = &res,
> };
> u32 xdrlen;
> - int ret, np;
> + int ret, np, i;
>
>
> res.scratch = alloc_page(GFP_KERNEL);
> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> xdrlen = server->lxasize;
> np = xdrlen / PAGE_SIZE + 1;
>
> + ret = -ENOMEM;
> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> - if (pages == NULL) {
> - __free_page(res.scratch);
> - return -ENOMEM;
> + if (pages == NULL)
> + goto out_free;
> + for (i = 0; i < np; i++) {
> + pages[i] = alloc_page(GFP_KERNEL);
> + if (!pages[i])
> + goto out_free;
> }
>
> arg.xattr_pages = pages;
> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> *eofp = res.eof;
> }
>
> +out_free:
> while (--np >= 0) {
> if (pages[np])
> __free_page(pages[np]);
> }
> -
> - __free_page(res.scratch);
> kfree(pages);
> -
> + __free_page(res.scratch);
> return ret;
>
> }
> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> index 6e060a88f98c..8432bd6b95f0 100644
> --- a/fs/nfs/nfs42xdr.c
> +++ b/fs/nfs/nfs42xdr.c
> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
>
> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> hdr.replen);
> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
>
> encode_nops(&hdr);
> }
>
>

I can see why this is the simplest and most pragmatic solution, so it's
fine with me.

Why doesn't this happen with getxattr? Do we need to convert that too?

The basic issue here is that the RPC code does not deal with inlined data
that exceeds PAGE_SIZE. That can only be done with raw pages.

Since the upper layer has already allocated a buffer in the case of listxattr
and getxattr, I would love to be able to just XDR code in to that buffer,
and void the whole alloc+copy situation. But sadly, it might be > PAGE_SIZE,
so the XDR code doesn't allow it. It's not all bad, having to use pages
allows them to be directly hooked in to the cache in the case of getxattr,
but for listxattr, decoding directly in to the provided buffer would be nice.

Hm, I wonder if that restriction actually holds for listxattr - the invidual
XDR items (xattr names) should never exceed PAGE_SIZE..

- Frank

2020-11-24 23:57:30

by Kornievskaia, Olga

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation



On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> By switching to an XFS-backed export, I am able to reproduce the
> ibcomp worker crash on my client with xfstests generic/013.
>
> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> with page_len=12 and buflen=128. Then:
>
> - Because buflen is small, rpcrdma_marshal_req will not set up a
> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> get invoked at all.
>
> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> copy received data into rq_rcv_buf->pages, but they're missing.
>
> The result is that the ibcomp worker faults and dies. Sometimes that
> causes a visible crash, and sometimes it results in a transport
> hang without other symptoms.
>
> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> should eventually be fixed or replaced. However, my preference is
> that upper-layer operations should explicitly allocate their receive
> buffers (using GFP_KERNEL) when possible, rather than relying on
> XDRBUF_SPARSE_PAGES.
>
> Reported-by: Olga kornievskaia <[email protected]>
> Suggested-by: Olga kornievskaia <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> fs/nfs/nfs42proc.c | 17 ++++++++++-------
> fs/nfs/nfs42xdr.c | 1 -
> 2 files changed, 10 insertions(+), 8 deletions(-)
>
> Hi-
>
> I like Olga's proposed approach. What do you think of this patch?
>
>
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 2b2211d1234e..24810305ec1c 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> .rpc_resp = &res,
> };
> u32 xdrlen;
> - int ret, np;
> + int ret, np, i;
>
>
> res.scratch = alloc_page(GFP_KERNEL);
> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> xdrlen = server->lxasize;
> np = xdrlen / PAGE_SIZE + 1;
>
> + ret = -ENOMEM;
> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> - if (pages == NULL) {
> - __free_page(res.scratch);
> - return -ENOMEM;
> + if (pages == NULL)
> + goto out_free;
> + for (i = 0; i < np; i++) {
> + pages[i] = alloc_page(GFP_KERNEL);
> + if (!pages[i])
> + goto out_free;
> }
>
> arg.xattr_pages = pages;
> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> *eofp = res.eof;
> }
>
> +out_free:
> while (--np >= 0) {
> if (pages[np])
> __free_page(pages[np]);
> }
> -
> - __free_page(res.scratch);
> kfree(pages);
> -
> + __free_page(res.scratch);
> return ret;
>
> }
> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> index 6e060a88f98c..8432bd6b95f0 100644
> --- a/fs/nfs/nfs42xdr.c
> +++ b/fs/nfs/nfs42xdr.c
> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
>
> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> hdr.replen);
> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
>
> encode_nops(&hdr);
> }
>
>

I can see why this is the simplest and most pragmatic solution, so it's
fine with me.

Why doesn't this happen with getxattr? Do we need to convert that too?

[olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.

What did you use to test this feature?


The basic issue here is that the RPC code does not deal with inlined data
that exceeds PAGE_SIZE. That can only be done with raw pages.

Since the upper layer has already allocated a buffer in the case of listxattr
and getxattr, I would love to be able to just XDR code in to that buffer,
and void the whole alloc+copy situation. But sadly, it might be > PAGE_SIZE,
so the XDR code doesn't allow it. It's not all bad, having to use pages
allows them to be directly hooked in to the cache in the case of getxattr,
but for listxattr, decoding directly in to the provided buffer would be nice.

Hm, I wonder if that restriction actually holds for listxattr - the invidual
XDR items (xattr names) should never exceed PAGE_SIZE..

- Frank

2020-11-24 23:58:08

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
>
>
> On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
>
> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> > By switching to an XFS-backed export, I am able to reproduce the
> > ibcomp worker crash on my client with xfstests generic/013.
> >
> > For the failing LISTXATTRS operation, xdr_inline_pages() is called
> > with page_len=12 and buflen=128. Then:
> >
> > - Because buflen is small, rpcrdma_marshal_req will not set up a
> > Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> > get invoked at all.
> >
> > - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> > copy received data into rq_rcv_buf->pages, but they're missing.
> >
> > The result is that the ibcomp worker faults and dies. Sometimes that
> > causes a visible crash, and sometimes it results in a transport
> > hang without other symptoms.
> >
> > RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> > should eventually be fixed or replaced. However, my preference is
> > that upper-layer operations should explicitly allocate their receive
> > buffers (using GFP_KERNEL) when possible, rather than relying on
> > XDRBUF_SPARSE_PAGES.
> >
> > Reported-by: Olga kornievskaia <[email protected]>
> > Suggested-by: Olga kornievskaia <[email protected]>
> > Signed-off-by: Chuck Lever <[email protected]>
> > ---
> > fs/nfs/nfs42proc.c | 17 ++++++++++-------
> > fs/nfs/nfs42xdr.c | 1 -
> > 2 files changed, 10 insertions(+), 8 deletions(-)
> >
> > Hi-
> >
> > I like Olga's proposed approach. What do you think of this patch?
> >
> >
> > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > index 2b2211d1234e..24810305ec1c 100644
> > --- a/fs/nfs/nfs42proc.c
> > +++ b/fs/nfs/nfs42proc.c
> > @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > .rpc_resp = &res,
> > };
> > u32 xdrlen;
> > - int ret, np;
> > + int ret, np, i;
> >
> >
> > res.scratch = alloc_page(GFP_KERNEL);
> > @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > xdrlen = server->lxasize;
> > np = xdrlen / PAGE_SIZE + 1;
> >
> > + ret = -ENOMEM;
> > pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> > - if (pages == NULL) {
> > - __free_page(res.scratch);
> > - return -ENOMEM;
> > + if (pages == NULL)
> > + goto out_free;
> > + for (i = 0; i < np; i++) {
> > + pages[i] = alloc_page(GFP_KERNEL);
> > + if (!pages[i])
> > + goto out_free;
> > }
> >
> > arg.xattr_pages = pages;
> > @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > *eofp = res.eof;
> > }
> >
> > +out_free:
> > while (--np >= 0) {
> > if (pages[np])
> > __free_page(pages[np]);
> > }
> > -
> > - __free_page(res.scratch);
> > kfree(pages);
> > -
> > + __free_page(res.scratch);
> > return ret;
> >
> > }
> > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > index 6e060a88f98c..8432bd6b95f0 100644
> > --- a/fs/nfs/nfs42xdr.c
> > +++ b/fs/nfs/nfs42xdr.c
> > @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> >
> > rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> > hdr.replen);
> > - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> >
> > encode_nops(&hdr);
> > }
> >
> >
>
> I can see why this is the simplest and most pragmatic solution, so it's
> fine with me.
>
> Why doesn't this happen with getxattr? Do we need to convert that too?
>
> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.

I'm attaching the test program I used, it should give things a better workout.

- Frank


Attachments:
(No filename) (4.87 kB)
xattr.c (25.56 kB)
Download all attachments

2020-11-24 23:59:46

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Tue, Nov 24, 2020 at 03:18:55PM -0500, Chuck Lever wrote:
> > On Nov 24, 2020, at 3:06 PM, Frank van der Linden <[email protected]> wrote:
> >
> > On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>
> >> By switching to an XFS-backed export, I am able to reproduce the
> >> ibcomp worker crash on my client with xfstests generic/013.
> >>
> >> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> >> with page_len=12 and buflen=128. Then:
> >>
> >> - Because buflen is small, rpcrdma_marshal_req will not set up a
> >> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> >> get invoked at all.
> >>
> >> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> >> copy received data into rq_rcv_buf->pages, but they're missing.
> >>
> >> The result is that the ibcomp worker faults and dies. Sometimes that
> >> causes a visible crash, and sometimes it results in a transport
> >> hang without other symptoms.
> >>
> >> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> >> should eventually be fixed or replaced. However, my preference is
> >> that upper-layer operations should explicitly allocate their receive
> >> buffers (using GFP_KERNEL) when possible, rather than relying on
> >> XDRBUF_SPARSE_PAGES.
> >>
> >> Reported-by: Olga kornievskaia <[email protected]>
> >> Suggested-by: Olga kornievskaia <[email protected]>
> >> Signed-off-by: Chuck Lever <[email protected]>
> >> ---
> >> fs/nfs/nfs42proc.c | 17 ++++++++++-------
> >> fs/nfs/nfs42xdr.c | 1 -
> >> 2 files changed, 10 insertions(+), 8 deletions(-)
> >>
> >> Hi-
> >>
> >> I like Olga's proposed approach. What do you think of this patch?
> >>
> >>
> >> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> >> index 2b2211d1234e..24810305ec1c 100644
> >> --- a/fs/nfs/nfs42proc.c
> >> +++ b/fs/nfs/nfs42proc.c
> >> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >> .rpc_resp = &res,
> >> };
> >> u32 xdrlen;
> >> - int ret, np;
> >> + int ret, np, i;
> >>
> >>
> >> res.scratch = alloc_page(GFP_KERNEL);
> >> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >> xdrlen = server->lxasize;
> >> np = xdrlen / PAGE_SIZE + 1;
> >>
> >> + ret = -ENOMEM;
> >> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> >> - if (pages == NULL) {
> >> - __free_page(res.scratch);
> >> - return -ENOMEM;
> >> + if (pages == NULL)
> >> + goto out_free;
> >> + for (i = 0; i < np; i++) {
> >> + pages[i] = alloc_page(GFP_KERNEL);
> >> + if (!pages[i])
> >> + goto out_free;
> >> }
> >>
> >> arg.xattr_pages = pages;
> >> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >> *eofp = res.eof;
> >> }
> >>
> >> +out_free:
> >> while (--np >= 0) {
> >> if (pages[np])
> >> __free_page(pages[np]);
> >> }
> >> -
> >> - __free_page(res.scratch);
> >> kfree(pages);
> >> -
> >> + __free_page(res.scratch);
> >> return ret;
> >>
> >> }
> >> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> >> index 6e060a88f98c..8432bd6b95f0 100644
> >> --- a/fs/nfs/nfs42xdr.c
> >> +++ b/fs/nfs/nfs42xdr.c
> >> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> >>
> >> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> >> hdr.replen);
> >> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> >>
> >> encode_nops(&hdr);
> >> }
> >>
> >>
> >
> > I can see why this is the simplest and most pragmatic solution, so it's
> > fine with me.
>
> Thanks. I've added Olga's Reviewed/Tested-by to my local copy.
> May I add a Reviewed-by from you?

Yep, thanks.

> > Why doesn't this happen with getxattr? Do we need to convert that too?
>
> If a GETXATTR request can be generated such that buflen is less than
> a page, but page_len > 0, then yes, it will happen there too. As Olga
> says, NFS is unaware of the transport's inline threshold setting, so
> there will always be some buflen value under which bad things happen.
>
> Either way, I would prefer to see GETXATTR converted to avoid using
> XDRBUF_SPARSE_PAGES. Does NFSv4 GETACL provide a good example?

Hm, in that case, it should be converted. That's easy enough to do.
I'll send a small patch.

>
>
> > The basic issue here is that the RPC code does not deal with inlined data
> > that exceeds PAGE_SIZE. That can only be done with raw pages.
> >
> > Since the upper layer has already allocated a buffer in the case of listxattr
> > and getxattr, I would love to be able to just XDR code in to that buffer,
> > and void the whole alloc+copy situation.
>
> Do you mean like the READDIR entry encoders and decoders?

It works out for READDIR - you use page cache fillers that do the XDR
decoding in to pages that are part of i_mapping. For xattrs, that's
not a good fit - I looked at it, but it would have been messy.

The main thing I'm complaining about here is that the upper-layer xattr
code already kvallocs a buffer, but because of restrictions in the XDR
code, you always end up doing an extra copy for get/set, since you need
to allocate pages. But, this doesn't apply to listxattr.


>
>
> > But sadly, it might be > PAGE_SIZE,
> > so the XDR code doesn't allow it. It's not all bad, having to use pages
> > allows them to be directly hooked in to the cache in the case of getxattr,
> > but for listxattr, decoding directly in to the provided buffer would be nice.
> >
> > Hm, I wonder if that restriction actually holds for listxattr - the invidual
> > XDR items (xattr names) should never exceed PAGE_SIZE..
>
> You'll have to worry about XDR data items crossing page boundaries.
> Our XDR code uses a scratch buffer for that, so it can be handled
> transparently.

Yep, and this is what listxattr does. I was confusing the get/set and list
cases, sorry about the confusion.

Let me just quickly convert getxattr.

- Frank

2020-11-25 00:02:58

by Kornievskaia, Olga

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation



On 11/24/20, 4:20 PM, "Frank van der Linden" <[email protected]> wrote:

NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe.




On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
>
>
> On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
>
> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >
> >
> >
> > By switching to an XFS-backed export, I am able to reproduce the
> > ibcomp worker crash on my client with xfstests generic/013.
> >
> > For the failing LISTXATTRS operation, xdr_inline_pages() is called
> > with page_len=12 and buflen=128. Then:
> >
> > - Because buflen is small, rpcrdma_marshal_req will not set up a
> > Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> > get invoked at all.
> >
> > - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> > copy received data into rq_rcv_buf->pages, but they're missing.
> >
> > The result is that the ibcomp worker faults and dies. Sometimes that
> > causes a visible crash, and sometimes it results in a transport
> > hang without other symptoms.
> >
> > RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> > should eventually be fixed or replaced. However, my preference is
> > that upper-layer operations should explicitly allocate their receive
> > buffers (using GFP_KERNEL) when possible, rather than relying on
> > XDRBUF_SPARSE_PAGES.
> >
> > Reported-by: Olga kornievskaia <[email protected]>
> > Suggested-by: Olga kornievskaia <[email protected]>
> > Signed-off-by: Chuck Lever <[email protected]>
> > ---
> > fs/nfs/nfs42proc.c | 17 ++++++++++-------
> > fs/nfs/nfs42xdr.c | 1 -
> > 2 files changed, 10 insertions(+), 8 deletions(-)
> >
> > Hi-
> >
> > I like Olga's proposed approach. What do you think of this patch?
> >
> >
> > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > index 2b2211d1234e..24810305ec1c 100644
> > --- a/fs/nfs/nfs42proc.c
> > +++ b/fs/nfs/nfs42proc.c
> > @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > .rpc_resp = &res,
> > };
> > u32 xdrlen;
> > - int ret, np;
> > + int ret, np, i;
> >
> >
> > res.scratch = alloc_page(GFP_KERNEL);
> > @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > xdrlen = server->lxasize;
> > np = xdrlen / PAGE_SIZE + 1;
> >
> > + ret = -ENOMEM;
> > pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> > - if (pages == NULL) {
> > - __free_page(res.scratch);
> > - return -ENOMEM;
> > + if (pages == NULL)
> > + goto out_free;
> > + for (i = 0; i < np; i++) {
> > + pages[i] = alloc_page(GFP_KERNEL);
> > + if (!pages[i])
> > + goto out_free;
> > }
> >
> > arg.xattr_pages = pages;
> > @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > *eofp = res.eof;
> > }
> >
> > +out_free:
> > while (--np >= 0) {
> > if (pages[np])
> > __free_page(pages[np]);
> > }
> > -
> > - __free_page(res.scratch);
> > kfree(pages);
> > -
> > + __free_page(res.scratch);
> > return ret;
> >
> > }
> > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > index 6e060a88f98c..8432bd6b95f0 100644
> > --- a/fs/nfs/nfs42xdr.c
> > +++ b/fs/nfs/nfs42xdr.c
> > @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> >
> > rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> > hdr.replen);
> > - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> >
> > encode_nops(&hdr);
> > }
> >
> >
>
> I can see why this is the simplest and most pragmatic solution, so it's
> fine with me.
>
> Why doesn't this happen with getxattr? Do we need to convert that too?
>
> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.

I'm attaching the test program I used, it should give things a better workout.

[olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...

This is my initial report: no real exercise of the GETXATTR code as far as I can tell.

- Frank

2020-11-26 00:25:26

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
>
>
> On 11/24/20, 4:20 PM, "Frank van der Linden" <[email protected]> wrote:
>
> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
> >
> >
> > On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
> >
> > On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> > >
> > >
> > >
> > > By switching to an XFS-backed export, I am able to reproduce the
> > > ibcomp worker crash on my client with xfstests generic/013.
> > >
> > > For the failing LISTXATTRS operation, xdr_inline_pages() is called
> > > with page_len=12 and buflen=128. Then:
> > >
> > > - Because buflen is small, rpcrdma_marshal_req will not set up a
> > > Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> > > get invoked at all.
> > >
> > > - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> > > copy received data into rq_rcv_buf->pages, but they're missing.
> > >
> > > The result is that the ibcomp worker faults and dies. Sometimes that
> > > causes a visible crash, and sometimes it results in a transport
> > > hang without other symptoms.
> > >
> > > RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> > > should eventually be fixed or replaced. However, my preference is
> > > that upper-layer operations should explicitly allocate their receive
> > > buffers (using GFP_KERNEL) when possible, rather than relying on
> > > XDRBUF_SPARSE_PAGES.
> > >
> > > Reported-by: Olga kornievskaia <[email protected]>
> > > Suggested-by: Olga kornievskaia <[email protected]>
> > > Signed-off-by: Chuck Lever <[email protected]>
> > > ---
> > > fs/nfs/nfs42proc.c | 17 ++++++++++-------
> > > fs/nfs/nfs42xdr.c | 1 -
> > > 2 files changed, 10 insertions(+), 8 deletions(-)
> > >
> > > Hi-
> > >
> > > I like Olga's proposed approach. What do you think of this patch?
> > >
> > >
> > > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > > index 2b2211d1234e..24810305ec1c 100644
> > > --- a/fs/nfs/nfs42proc.c
> > > +++ b/fs/nfs/nfs42proc.c
> > > @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > .rpc_resp = &res,
> > > };
> > > u32 xdrlen;
> > > - int ret, np;
> > > + int ret, np, i;
> > >
> > >
> > > res.scratch = alloc_page(GFP_KERNEL);
> > > @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > xdrlen = server->lxasize;
> > > np = xdrlen / PAGE_SIZE + 1;
> > >
> > > + ret = -ENOMEM;
> > > pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> > > - if (pages == NULL) {
> > > - __free_page(res.scratch);
> > > - return -ENOMEM;
> > > + if (pages == NULL)
> > > + goto out_free;
> > > + for (i = 0; i < np; i++) {
> > > + pages[i] = alloc_page(GFP_KERNEL);
> > > + if (!pages[i])
> > > + goto out_free;
> > > }
> > >
> > > arg.xattr_pages = pages;
> > > @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > *eofp = res.eof;
> > > }
> > >
> > > +out_free:
> > > while (--np >= 0) {
> > > if (pages[np])
> > > __free_page(pages[np]);
> > > }
> > > -
> > > - __free_page(res.scratch);
> > > kfree(pages);
> > > -
> > > + __free_page(res.scratch);
> > > return ret;
> > >
> > > }
> > > diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > > index 6e060a88f98c..8432bd6b95f0 100644
> > > --- a/fs/nfs/nfs42xdr.c
> > > +++ b/fs/nfs/nfs42xdr.c
> > > @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> > >
> > > rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> > > hdr.replen);
> > > - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> > >
> > > encode_nops(&hdr);
> > > }
> > >
> > >
> >
> > I can see why this is the simplest and most pragmatic solution, so it's
> > fine with me.
> >
> > Why doesn't this happen with getxattr? Do we need to convert that too?
> >
> > [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
>
> I'm attaching the test program I used, it should give things a better workout.
>
> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
>
> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.

True, the test is heavier on the setxattr / listxattr side. And with caching,
you're not going to see a lot of GETXATTR calls. I used the same test program
with caching off, and it works fine, though.

In any case, after converting GETXATTR to pre-allocate pages, I noticed that,
when I disabled caching, I was getting EIO instead of ERANGE back from
calls that test the case of calling getxattr() with a buffer length that
is insufficient. The behavior is somewhat strange - if you, say, set an xattr
of length 59, then calls with lengths 56-59 get -ERANGE from decode_getxattr
(correct), but calls with lengths 53-55 get EIO (should be -ERANGE).

E.g. non-aligned values to rpc_prepare_reply_pages make the RPC call error
out early, even before it gets to decode_getxattr.

I noticed that all other code always seems to specify multiples of PAGE_SIZE
as the length to rpc_prepare_reply_pages. But the code itself suggests that
it certainly *intends* to be prepared to handle any length, aligned or not.

However, apparently, it at least doesn't deal with non-aligned lengths,
making things fail further along down the line.

I need to look at this a bit more.

- Frank

2020-11-26 17:12:18

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation



> On Nov 25, 2020, at 7:21 PM, Frank van der Linden <[email protected]> wrote:
>
> On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
>>
>>
>> On 11/24/20, 4:20 PM, "Frank van der Linden" <[email protected]> wrote:
>>
>> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
>>>
>>>
>>> On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
>>>
>>> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
>>>>
>>>>
>>>>
>>>> By switching to an XFS-backed export, I am able to reproduce the
>>>> ibcomp worker crash on my client with xfstests generic/013.
>>>>
>>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
>>>> with page_len=12 and buflen=128. Then:
>>>>
>>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
>>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
>>>> get invoked at all.
>>>>
>>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
>>>> copy received data into rq_rcv_buf->pages, but they're missing.
>>>>
>>>> The result is that the ibcomp worker faults and dies. Sometimes that
>>>> causes a visible crash, and sometimes it results in a transport
>>>> hang without other symptoms.
>>>>
>>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
>>>> should eventually be fixed or replaced. However, my preference is
>>>> that upper-layer operations should explicitly allocate their receive
>>>> buffers (using GFP_KERNEL) when possible, rather than relying on
>>>> XDRBUF_SPARSE_PAGES.
>>>>
>>>> Reported-by: Olga kornievskaia <[email protected]>
>>>> Suggested-by: Olga kornievskaia <[email protected]>
>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>> ---
>>>> fs/nfs/nfs42proc.c | 17 ++++++++++-------
>>>> fs/nfs/nfs42xdr.c | 1 -
>>>> 2 files changed, 10 insertions(+), 8 deletions(-)
>>>>
>>>> Hi-
>>>>
>>>> I like Olga's proposed approach. What do you think of this patch?
>>>>
>>>>
>>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
>>>> index 2b2211d1234e..24810305ec1c 100644
>>>> --- a/fs/nfs/nfs42proc.c
>>>> +++ b/fs/nfs/nfs42proc.c
>>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>> .rpc_resp = &res,
>>>> };
>>>> u32 xdrlen;
>>>> - int ret, np;
>>>> + int ret, np, i;
>>>>
>>>>
>>>> res.scratch = alloc_page(GFP_KERNEL);
>>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>> xdrlen = server->lxasize;
>>>> np = xdrlen / PAGE_SIZE + 1;
>>>>
>>>> + ret = -ENOMEM;
>>>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
>>>> - if (pages == NULL) {
>>>> - __free_page(res.scratch);
>>>> - return -ENOMEM;
>>>> + if (pages == NULL)
>>>> + goto out_free;
>>>> + for (i = 0; i < np; i++) {
>>>> + pages[i] = alloc_page(GFP_KERNEL);
>>>> + if (!pages[i])
>>>> + goto out_free;
>>>> }
>>>>
>>>> arg.xattr_pages = pages;
>>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>> *eofp = res.eof;
>>>> }
>>>>
>>>> +out_free:
>>>> while (--np >= 0) {
>>>> if (pages[np])
>>>> __free_page(pages[np]);
>>>> }
>>>> -
>>>> - __free_page(res.scratch);
>>>> kfree(pages);
>>>> -
>>>> + __free_page(res.scratch);
>>>> return ret;
>>>>
>>>> }
>>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
>>>> index 6e060a88f98c..8432bd6b95f0 100644
>>>> --- a/fs/nfs/nfs42xdr.c
>>>> +++ b/fs/nfs/nfs42xdr.c
>>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
>>>>
>>>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
>>>> hdr.replen);
>>>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
>>>>
>>>> encode_nops(&hdr);
>>>> }
>>>>
>>>>
>>>
>>> I can see why this is the simplest and most pragmatic solution, so it's
>>> fine with me.
>>>
>>> Why doesn't this happen with getxattr? Do we need to convert that too?
>>>
>>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
>>
>> I'm attaching the test program I used, it should give things a better workout.
>>
>> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
>>
>> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
>
> True, the test is heavier on the setxattr / listxattr side. And with caching,
> you're not going to see a lot of GETXATTR calls. I used the same test program
> with caching off, and it works fine, though.

I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
and generic/013 rather aggressively informed me that GETXATTR was no
longer working. There is some test coverage there, fwiw.


> In any case, after converting GETXATTR to pre-allocate pages, I noticed that,
> when I disabled caching, I was getting EIO instead of ERANGE back from
> calls that test the case of calling getxattr() with a buffer length that
> is insufficient.

Was TCP the underlying RPC transport?


> The behavior is somewhat strange - if you, say, set an xattr
> of length 59, then calls with lengths 56-59 get -ERANGE from decode_getxattr
> (correct), but calls with lengths 53-55 get EIO (should be -ERANGE).
>
> E.g. non-aligned values to rpc_prepare_reply_pages make the RPC call error
> out early, even before it gets to decode_getxattr.
>
> I noticed that all other code always seems to specify multiples of PAGE_SIZE
> as the length to rpc_prepare_reply_pages. But the code itself suggests that
> it certainly *intends* to be prepared to handle any length, aligned or not.
>
> However, apparently, it at least doesn't deal with non-aligned lengths,
> making things fail further along down the line.
>
> I need to look at this a bit more.
>
> - Frank

--
Chuck Lever



2020-11-27 08:40:41

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Thu, Nov 26, 2020 at 12:10:21PM -0500, Chuck Lever wrote:
>
>
> > On Nov 25, 2020, at 7:21 PM, Frank van der Linden <[email protected]> wrote:
> >
> > On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
> >>
> >>
> >> On 11/24/20, 4:20 PM, "Frank van der Linden" <[email protected]> wrote:
> >>
> >> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
> >>>
> >>>
> >>> On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
> >>>
> >>> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> >>>>
> >>>>
> >>>>
> >>>> By switching to an XFS-backed export, I am able to reproduce the
> >>>> ibcomp worker crash on my client with xfstests generic/013.
> >>>>
> >>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> >>>> with page_len=12 and buflen=128. Then:
> >>>>
> >>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
> >>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> >>>> get invoked at all.
> >>>>
> >>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> >>>> copy received data into rq_rcv_buf->pages, but they're missing.
> >>>>
> >>>> The result is that the ibcomp worker faults and dies. Sometimes that
> >>>> causes a visible crash, and sometimes it results in a transport
> >>>> hang without other symptoms.
> >>>>
> >>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> >>>> should eventually be fixed or replaced. However, my preference is
> >>>> that upper-layer operations should explicitly allocate their receive
> >>>> buffers (using GFP_KERNEL) when possible, rather than relying on
> >>>> XDRBUF_SPARSE_PAGES.
> >>>>
> >>>> Reported-by: Olga kornievskaia <[email protected]>
> >>>> Suggested-by: Olga kornievskaia <[email protected]>
> >>>> Signed-off-by: Chuck Lever <[email protected]>
> >>>> ---
> >>>> fs/nfs/nfs42proc.c | 17 ++++++++++-------
> >>>> fs/nfs/nfs42xdr.c | 1 -
> >>>> 2 files changed, 10 insertions(+), 8 deletions(-)
> >>>>
> >>>> Hi-
> >>>>
> >>>> I like Olga's proposed approach. What do you think of this patch?
> >>>>
> >>>>
> >>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> >>>> index 2b2211d1234e..24810305ec1c 100644
> >>>> --- a/fs/nfs/nfs42proc.c
> >>>> +++ b/fs/nfs/nfs42proc.c
> >>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >>>> .rpc_resp = &res,
> >>>> };
> >>>> u32 xdrlen;
> >>>> - int ret, np;
> >>>> + int ret, np, i;
> >>>>
> >>>>
> >>>> res.scratch = alloc_page(GFP_KERNEL);
> >>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >>>> xdrlen = server->lxasize;
> >>>> np = xdrlen / PAGE_SIZE + 1;
> >>>>
> >>>> + ret = -ENOMEM;
> >>>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> >>>> - if (pages == NULL) {
> >>>> - __free_page(res.scratch);
> >>>> - return -ENOMEM;
> >>>> + if (pages == NULL)
> >>>> + goto out_free;
> >>>> + for (i = 0; i < np; i++) {
> >>>> + pages[i] = alloc_page(GFP_KERNEL);
> >>>> + if (!pages[i])
> >>>> + goto out_free;
> >>>> }
> >>>>
> >>>> arg.xattr_pages = pages;
> >>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> >>>> *eofp = res.eof;
> >>>> }
> >>>>
> >>>> +out_free:
> >>>> while (--np >= 0) {
> >>>> if (pages[np])
> >>>> __free_page(pages[np]);
> >>>> }
> >>>> -
> >>>> - __free_page(res.scratch);
> >>>> kfree(pages);
> >>>> -
> >>>> + __free_page(res.scratch);
> >>>> return ret;
> >>>>
> >>>> }
> >>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> >>>> index 6e060a88f98c..8432bd6b95f0 100644
> >>>> --- a/fs/nfs/nfs42xdr.c
> >>>> +++ b/fs/nfs/nfs42xdr.c
> >>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> >>>>
> >>>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> >>>> hdr.replen);
> >>>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> >>>>
> >>>> encode_nops(&hdr);
> >>>> }
> >>>>
> >>>>
> >>>
> >>> I can see why this is the simplest and most pragmatic solution, so it's
> >>> fine with me.
> >>>
> >>> Why doesn't this happen with getxattr? Do we need to convert that too?
> >>>
> >>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
> >>
> >> I'm attaching the test program I used, it should give things a better workout.
> >>
> >> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
> >>
> >> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
> >
> > True, the test is heavier on the setxattr / listxattr side. And with caching,
> > you're not going to see a lot of GETXATTR calls. I used the same test program
> > with caching off, and it works fine, though.
>
> I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
> and generic/013 rather aggressively informed me that GETXATTR was no
> longer working. There is some test coverage there, fwiw.

Oh, the coverage was good - in my testing I also used a collection of
small unit test programs, and I was the one who made the xattr tests
in xfstests work on NFS.

>
>
> > In any case, after converting GETXATTR to pre-allocate pages, I noticed that,
> > when I disabled caching, I was getting EIO instead of ERANGE back from
> > calls that test the case of calling getxattr() with a buffer length that
> > is insufficient.
>
> Was TCP the underlying RPC transport?

Yes, this was TCP. I have set up rdma over rxe, which I'll test too if I
can get this figured out. It might be a long standing bug in xdr_inline_pages,
as listxattr / getxattr seem to be simply the first ones to pass in a
length that is not page aligned.

It does make some sense to round the length up to PAGE_SIZE, and just check if
the received data fits when decoding, like other calls do. It improves your
chances of getting a result that you can still cache, even if it's too big for
the length that was asked for. E.g. if the result is > requested_length, but
< ROUND_UP(requested_length, PAGE_SIZE), you can cache it, even though you
have to return -ERANGE to the caller.

- Frank

2020-11-27 17:08:30

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation



> On Nov 26, 2020, at 2:32 PM, Frank van der Linden <[email protected]> wrote:
>
> On Thu, Nov 26, 2020 at 12:10:21PM -0500, Chuck Lever wrote:
>>
>>
>>> On Nov 25, 2020, at 7:21 PM, Frank van der Linden <[email protected]> wrote:
>>>
>>> On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
>>>>
>>>>
>>>> On 11/24/20, 4:20 PM, "Frank van der Linden" <[email protected]> wrote:
>>>>
>>>> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
>>>>>
>>>>>
>>>>> On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
>>>>>
>>>>> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> By switching to an XFS-backed export, I am able to reproduce the
>>>>>> ibcomp worker crash on my client with xfstests generic/013.
>>>>>>
>>>>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
>>>>>> with page_len=12 and buflen=128. Then:
>>>>>>
>>>>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
>>>>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
>>>>>> get invoked at all.
>>>>>>
>>>>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
>>>>>> copy received data into rq_rcv_buf->pages, but they're missing.
>>>>>>
>>>>>> The result is that the ibcomp worker faults and dies. Sometimes that
>>>>>> causes a visible crash, and sometimes it results in a transport
>>>>>> hang without other symptoms.
>>>>>>
>>>>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
>>>>>> should eventually be fixed or replaced. However, my preference is
>>>>>> that upper-layer operations should explicitly allocate their receive
>>>>>> buffers (using GFP_KERNEL) when possible, rather than relying on
>>>>>> XDRBUF_SPARSE_PAGES.
>>>>>>
>>>>>> Reported-by: Olga kornievskaia <[email protected]>
>>>>>> Suggested-by: Olga kornievskaia <[email protected]>
>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>> ---
>>>>>> fs/nfs/nfs42proc.c | 17 ++++++++++-------
>>>>>> fs/nfs/nfs42xdr.c | 1 -
>>>>>> 2 files changed, 10 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> Hi-
>>>>>>
>>>>>> I like Olga's proposed approach. What do you think of this patch?
>>>>>>
>>>>>>
>>>>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
>>>>>> index 2b2211d1234e..24810305ec1c 100644
>>>>>> --- a/fs/nfs/nfs42proc.c
>>>>>> +++ b/fs/nfs/nfs42proc.c
>>>>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>>> .rpc_resp = &res,
>>>>>> };
>>>>>> u32 xdrlen;
>>>>>> - int ret, np;
>>>>>> + int ret, np, i;
>>>>>>
>>>>>>
>>>>>> res.scratch = alloc_page(GFP_KERNEL);
>>>>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>>> xdrlen = server->lxasize;
>>>>>> np = xdrlen / PAGE_SIZE + 1;
>>>>>>
>>>>>> + ret = -ENOMEM;
>>>>>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
>>>>>> - if (pages == NULL) {
>>>>>> - __free_page(res.scratch);
>>>>>> - return -ENOMEM;
>>>>>> + if (pages == NULL)
>>>>>> + goto out_free;
>>>>>> + for (i = 0; i < np; i++) {
>>>>>> + pages[i] = alloc_page(GFP_KERNEL);
>>>>>> + if (!pages[i])
>>>>>> + goto out_free;
>>>>>> }
>>>>>>
>>>>>> arg.xattr_pages = pages;
>>>>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
>>>>>> *eofp = res.eof;
>>>>>> }
>>>>>>
>>>>>> +out_free:
>>>>>> while (--np >= 0) {
>>>>>> if (pages[np])
>>>>>> __free_page(pages[np]);
>>>>>> }
>>>>>> -
>>>>>> - __free_page(res.scratch);
>>>>>> kfree(pages);
>>>>>> -
>>>>>> + __free_page(res.scratch);
>>>>>> return ret;
>>>>>>
>>>>>> }
>>>>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
>>>>>> index 6e060a88f98c..8432bd6b95f0 100644
>>>>>> --- a/fs/nfs/nfs42xdr.c
>>>>>> +++ b/fs/nfs/nfs42xdr.c
>>>>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
>>>>>>
>>>>>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
>>>>>> hdr.replen);
>>>>>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
>>>>>>
>>>>>> encode_nops(&hdr);
>>>>>> }
>>>>>>
>>>>>>
>>>>>
>>>>> I can see why this is the simplest and most pragmatic solution, so it's
>>>>> fine with me.
>>>>>
>>>>> Why doesn't this happen with getxattr? Do we need to convert that too?
>>>>>
>>>>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
>>>>
>>>> I'm attaching the test program I used, it should give things a better workout.
>>>>
>>>> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
>>>>
>>>> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
>>>
>>> True, the test is heavier on the setxattr / listxattr side. And with caching,
>>> you're not going to see a lot of GETXATTR calls. I used the same test program
>>> with caching off, and it works fine, though.
>>
>> I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
>> and generic/013 rather aggressively informed me that GETXATTR was no
>> longer working. There is some test coverage there, fwiw.
>
> Oh, the coverage was good - in my testing I also used a collection of
> small unit test programs, and I was the one who made the xattr tests
> in xfstests work on NFS.
>
>>
>>
>>> In any case, after converting GETXATTR to pre-allocate pages, I noticed that,
>>> when I disabled caching, I was getting EIO instead of ERANGE back from
>>> calls that test the case of calling getxattr() with a buffer length that
>>> is insufficient.
>>
>> Was TCP the underlying RPC transport?
>
> Yes, this was TCP. I have set up rdma over rxe, which I'll test too if I
> can get this figured out. It might be a long standing bug in xdr_inline_pages,
> as listxattr / getxattr seem to be simply the first ones to pass in a
> length that is not page aligned.

Or your maxsz macro could be missing a "+ 1" for the XDR pad needed for
the unaligned length cases.


> It does make some sense to round the length up to PAGE_SIZE, and just check if
> the received data fits when decoding, like other calls do. It improves your
> chances of getting a result that you can still cache, even if it's too big for
> the length that was asked for. E.g. if the result is > requested_length, but
> < ROUND_UP(requested_length, PAGE_SIZE), you can cache it, even though you
> have to return -ERANGE to the caller.
>
> - Frank

--
Chuck Lever



2020-12-01 20:06:07

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Fri, Nov 27, 2020 at 3:40 AM Frank van der Linden
<[email protected]> wrote:
>
> On Thu, Nov 26, 2020 at 12:10:21PM -0500, Chuck Lever wrote:
> >
> >
> > > On Nov 25, 2020, at 7:21 PM, Frank van der Linden <[email protected]> wrote:
> > >
> > > On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
> > >>
> > >>
> > >> On 11/24/20, 4:20 PM, "Frank van der Linden" <[email protected]> wrote:
> > >>
> > >> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
> > >>>
> > >>>
> > >>> On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
> > >>>
> > >>> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> By switching to an XFS-backed export, I am able to reproduce the
> > >>>> ibcomp worker crash on my client with xfstests generic/013.
> > >>>>
> > >>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> > >>>> with page_len=12 and buflen=128. Then:
> > >>>>
> > >>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
> > >>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> > >>>> get invoked at all.
> > >>>>
> > >>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> > >>>> copy received data into rq_rcv_buf->pages, but they're missing.
> > >>>>
> > >>>> The result is that the ibcomp worker faults and dies. Sometimes that
> > >>>> causes a visible crash, and sometimes it results in a transport
> > >>>> hang without other symptoms.
> > >>>>
> > >>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> > >>>> should eventually be fixed or replaced. However, my preference is
> > >>>> that upper-layer operations should explicitly allocate their receive
> > >>>> buffers (using GFP_KERNEL) when possible, rather than relying on
> > >>>> XDRBUF_SPARSE_PAGES.
> > >>>>
> > >>>> Reported-by: Olga kornievskaia <[email protected]>
> > >>>> Suggested-by: Olga kornievskaia <[email protected]>
> > >>>> Signed-off-by: Chuck Lever <[email protected]>
> > >>>> ---
> > >>>> fs/nfs/nfs42proc.c | 17 ++++++++++-------
> > >>>> fs/nfs/nfs42xdr.c | 1 -
> > >>>> 2 files changed, 10 insertions(+), 8 deletions(-)
> > >>>>
> > >>>> Hi-
> > >>>>
> > >>>> I like Olga's proposed approach. What do you think of this patch?
> > >>>>
> > >>>>
> > >>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > >>>> index 2b2211d1234e..24810305ec1c 100644
> > >>>> --- a/fs/nfs/nfs42proc.c
> > >>>> +++ b/fs/nfs/nfs42proc.c
> > >>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > >>>> .rpc_resp = &res,
> > >>>> };
> > >>>> u32 xdrlen;
> > >>>> - int ret, np;
> > >>>> + int ret, np, i;
> > >>>>
> > >>>>
> > >>>> res.scratch = alloc_page(GFP_KERNEL);
> > >>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > >>>> xdrlen = server->lxasize;
> > >>>> np = xdrlen / PAGE_SIZE + 1;
> > >>>>
> > >>>> + ret = -ENOMEM;
> > >>>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> > >>>> - if (pages == NULL) {
> > >>>> - __free_page(res.scratch);
> > >>>> - return -ENOMEM;
> > >>>> + if (pages == NULL)
> > >>>> + goto out_free;
> > >>>> + for (i = 0; i < np; i++) {
> > >>>> + pages[i] = alloc_page(GFP_KERNEL);
> > >>>> + if (!pages[i])
> > >>>> + goto out_free;
> > >>>> }
> > >>>>
> > >>>> arg.xattr_pages = pages;
> > >>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > >>>> *eofp = res.eof;
> > >>>> }
> > >>>>
> > >>>> +out_free:
> > >>>> while (--np >= 0) {
> > >>>> if (pages[np])
> > >>>> __free_page(pages[np]);
> > >>>> }
> > >>>> -
> > >>>> - __free_page(res.scratch);
> > >>>> kfree(pages);
> > >>>> -
> > >>>> + __free_page(res.scratch);
> > >>>> return ret;
> > >>>>
> > >>>> }
> > >>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > >>>> index 6e060a88f98c..8432bd6b95f0 100644
> > >>>> --- a/fs/nfs/nfs42xdr.c
> > >>>> +++ b/fs/nfs/nfs42xdr.c
> > >>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> > >>>>
> > >>>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> > >>>> hdr.replen);
> > >>>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> > >>>>
> > >>>> encode_nops(&hdr);
> > >>>> }
> > >>>>
> > >>>>
> > >>>
> > >>> I can see why this is the simplest and most pragmatic solution, so it's
> > >>> fine with me.
> > >>>
> > >>> Why doesn't this happen with getxattr? Do we need to convert that too?
> > >>>
> > >>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
> > >>
> > >> I'm attaching the test program I used, it should give things a better workout.
> > >>
> > >> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
> > >>
> > >> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
> > >
> > > True, the test is heavier on the setxattr / listxattr side. And with caching,
> > > you're not going to see a lot of GETXATTR calls. I used the same test program
> > > with caching off, and it works fine, though.
> >
> > I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
> > and generic/013 rather aggressively informed me that GETXATTR was no
> > longer working. There is some test coverage there, fwiw.
>
> Oh, the coverage was good - in my testing I also used a collection of
> small unit test programs, and I was the one who made the xattr tests
> in xfstests work on NFS.

I have just oops-ed the kernel trying to send a getxattr when
userspace provided a small buffer.

File with extended attributes was created using your application but
modified to leave the file behind. Then I coded up this to get the
extended attirbutes. Test coverage doesn't test for this.

int main(int argc, char *argv[]) {

int fd, len = 8;
char buf[8];

fd = open("/mnt/test_xattr_probeJxfiVU", O_RDWR | O_CREAT, S_IRWXU);
if (fd < 0) exit(0);

if (getxattr("/mnt/test_xattr_probeJxfiVU", "user.test_xattr_probe",
buf, len) < 0) exit(0);

return 0;
}

Which again produces the KASAN's
[ 5915.393103] BUG: KASAN: wild-memory-access in
rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]


This is my proposed fix. Will send a proper patch if agreed:

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 4fc61e3d098d..720fbaddcb90 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1189,12 +1189,17 @@ static ssize_t _nfs42_proc_getxattr(struct
inode *inode, const char *name,
.rpc_argp = &arg,
.rpc_resp = &res,
};
- int ret, np;
+ int ret = -ENOMEM, np = NFS4XATTR_MAXPAGES, i;

+ for (i = 0; i < NFS4XATTR_MAXPAGES; i++) {
+ pages[i] = alloc_page(GFP_KERNEL);
+ if (!pages[i])
+ goto out_free_pages;
+ }
ret = nfs4_call_sync(server->client, server, &msg, &arg.seq_args,
&res.seq_res, 0);
if (ret < 0)
- return ret;
+ goto out_free_pages;

/*
* Normally, the caching is done one layer up, but for successful
@@ -1209,16 +1214,19 @@ static ssize_t _nfs42_proc_getxattr(struct
inode *inode, const char *name,
nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);

if (buflen) {
- if (res.xattr_len > buflen)
- return -ERANGE;
+ if (res.xattr_len > buflen) {
+ ret = -ERANGE;
+ goto out_free_pages;
+ }
_copy_from_pages(buf, pages, 0, res.xattr_len);
}

- np = DIV_ROUND_UP(res.xattr_len, PAGE_SIZE);
+ ret = res.xattr_len;
+out_free_pages:
while (--np >= 0)
__free_page(pages[np]);

- return res.xattr_len;
+ return ret;
}

static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,

>
> >
> >
> > > In any case, after converting GETXATTR to pre-allocate pages, I noticed that,
> > > when I disabled caching, I was getting EIO instead of ERANGE back from
> > > calls that test the case of calling getxattr() with a buffer length that
> > > is insufficient.
> >
> > Was TCP the underlying RPC transport?
>
> Yes, this was TCP. I have set up rdma over rxe, which I'll test too if I
> can get this figured out. It might be a long standing bug in xdr_inline_pages,
> as listxattr / getxattr seem to be simply the first ones to pass in a
> length that is not page aligned.
>
> It does make some sense to round the length up to PAGE_SIZE, and just check if
> the received data fits when decoding, like other calls do. It improves your
> chances of getting a result that you can still cache, even if it's too big for
> the length that was asked for. E.g. if the result is > requested_length, but
> < ROUND_UP(requested_length, PAGE_SIZE), you can cache it, even though you
> have to return -ERANGE to the caller.
>
> - Frank

2020-12-01 20:28:53

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Tue, Dec 01, 2020 at 03:01:57PM -0500, Olga Kornievskaia wrote:
>
>
> On Fri, Nov 27, 2020 at 3:40 AM Frank van der Linden
> <[email protected]> wrote:
> >
> > On Thu, Nov 26, 2020 at 12:10:21PM -0500, Chuck Lever wrote:
> > >
> > >
> > > > On Nov 25, 2020, at 7:21 PM, Frank van der Linden <[email protected]> wrote:
> > > >
> > > > On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
> > > >>
> > > >>
> > > >> On 11/24/20, 4:20 PM, "Frank van der Linden" <[email protected]> wrote:
> > > >>
> > > >> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
> > > >>>
> > > >>>
> > > >>> On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
> > > >>>
> > > >>> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> By switching to an XFS-backed export, I am able to reproduce the
> > > >>>> ibcomp worker crash on my client with xfstests generic/013.
> > > >>>>
> > > >>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> > > >>>> with page_len=12 and buflen=128. Then:
> > > >>>>
> > > >>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
> > > >>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> > > >>>> get invoked at all.
> > > >>>>
> > > >>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> > > >>>> copy received data into rq_rcv_buf->pages, but they're missing.
> > > >>>>
> > > >>>> The result is that the ibcomp worker faults and dies. Sometimes that
> > > >>>> causes a visible crash, and sometimes it results in a transport
> > > >>>> hang without other symptoms.
> > > >>>>
> > > >>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> > > >>>> should eventually be fixed or replaced. However, my preference is
> > > >>>> that upper-layer operations should explicitly allocate their receive
> > > >>>> buffers (using GFP_KERNEL) when possible, rather than relying on
> > > >>>> XDRBUF_SPARSE_PAGES.
> > > >>>>
> > > >>>> Reported-by: Olga kornievskaia <[email protected]>
> > > >>>> Suggested-by: Olga kornievskaia <[email protected]>
> > > >>>> Signed-off-by: Chuck Lever <[email protected]>
> > > >>>> ---
> > > >>>> fs/nfs/nfs42proc.c | 17 ++++++++++-------
> > > >>>> fs/nfs/nfs42xdr.c | 1 -
> > > >>>> 2 files changed, 10 insertions(+), 8 deletions(-)
> > > >>>>
> > > >>>> Hi-
> > > >>>>
> > > >>>> I like Olga's proposed approach. What do you think of this patch?
> > > >>>>
> > > >>>>
> > > >>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > > >>>> index 2b2211d1234e..24810305ec1c 100644
> > > >>>> --- a/fs/nfs/nfs42proc.c
> > > >>>> +++ b/fs/nfs/nfs42proc.c
> > > >>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > >>>> .rpc_resp = &res,
> > > >>>> };
> > > >>>> u32 xdrlen;
> > > >>>> - int ret, np;
> > > >>>> + int ret, np, i;
> > > >>>>
> > > >>>>
> > > >>>> res.scratch = alloc_page(GFP_KERNEL);
> > > >>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > >>>> xdrlen = server->lxasize;
> > > >>>> np = xdrlen / PAGE_SIZE + 1;
> > > >>>>
> > > >>>> + ret = -ENOMEM;
> > > >>>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> > > >>>> - if (pages == NULL) {
> > > >>>> - __free_page(res.scratch);
> > > >>>> - return -ENOMEM;
> > > >>>> + if (pages == NULL)
> > > >>>> + goto out_free;
> > > >>>> + for (i = 0; i < np; i++) {
> > > >>>> + pages[i] = alloc_page(GFP_KERNEL);
> > > >>>> + if (!pages[i])
> > > >>>> + goto out_free;
> > > >>>> }
> > > >>>>
> > > >>>> arg.xattr_pages = pages;
> > > >>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > >>>> *eofp = res.eof;
> > > >>>> }
> > > >>>>
> > > >>>> +out_free:
> > > >>>> while (--np >= 0) {
> > > >>>> if (pages[np])
> > > >>>> __free_page(pages[np]);
> > > >>>> }
> > > >>>> -
> > > >>>> - __free_page(res.scratch);
> > > >>>> kfree(pages);
> > > >>>> -
> > > >>>> + __free_page(res.scratch);
> > > >>>> return ret;
> > > >>>>
> > > >>>> }
> > > >>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > > >>>> index 6e060a88f98c..8432bd6b95f0 100644
> > > >>>> --- a/fs/nfs/nfs42xdr.c
> > > >>>> +++ b/fs/nfs/nfs42xdr.c
> > > >>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> > > >>>>
> > > >>>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> > > >>>> hdr.replen);
> > > >>>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> > > >>>>
> > > >>>> encode_nops(&hdr);
> > > >>>> }
> > > >>>>
> > > >>>>
> > > >>>
> > > >>> I can see why this is the simplest and most pragmatic solution, so it's
> > > >>> fine with me.
> > > >>>
> > > >>> Why doesn't this happen with getxattr? Do we need to convert that too?
> > > >>>
> > > >>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
> > > >>
> > > >> I'm attaching the test program I used, it should give things a better workout.
> > > >>
> > > >> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
> > > >>
> > > >> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
> > > >
> > > > True, the test is heavier on the setxattr / listxattr side. And with caching,
> > > > you're not going to see a lot of GETXATTR calls. I used the same test program
> > > > with caching off, and it works fine, though.
> > >
> > > I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
> > > and generic/013 rather aggressively informed me that GETXATTR was no
> > > longer working. There is some test coverage there, fwiw.
> >
> > Oh, the coverage was good - in my testing I also used a collection of
> > small unit test programs, and I was the one who made the xattr tests
> > in xfstests work on NFS.
>
> I have just oops-ed the kernel trying to send a getxattr when
> userspace provided a small buffer.
>
> File with extended attributes was created using your application but
> modified to leave the file behind. Then I coded up this to get the
> extended attirbutes. Test coverage doesn't test for this.
>
> int main(int argc, char *argv[]) {
>
> int fd, len = 8;
> char buf[8];
>
> fd = open("/mnt/test_xattr_probeJxfiVU", O_RDWR | O_CREAT, S_IRWXU);
> if (fd < 0) exit(0);
>
> if (getxattr("/mnt/test_xattr_probeJxfiVU", "user.test_xattr_probe",
> buf, len) < 0) exit(0);
>
> return 0;
> }
>
> Which again produces the KASAN's
> [ 5915.393103] BUG: KASAN: wild-memory-access in
> rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]
>
>
> This is my proposed fix. Will send a proper patch if agreed:

I was just about to send a patch that does the pre-alloc, and rounds up
the inserted page_len to the page allocation so that it'll catch some
more replies to cache.

Let me send it..

- Frank

2020-12-01 21:50:05

by Frank van der Linden

[permalink] [raw]
Subject: Re: [UNVERIFIED SENDER] Re: [PATCH v1] NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation

On Tue, Dec 01, 2020 at 08:25:32PM +0000, Frank van der Linden wrote:
> On Tue, Dec 01, 2020 at 03:01:57PM -0500, Olga Kornievskaia wrote:
> >
> >
> > On Fri, Nov 27, 2020 at 3:40 AM Frank van der Linden
> > <[email protected]> wrote:
> > >
> > > On Thu, Nov 26, 2020 at 12:10:21PM -0500, Chuck Lever wrote:
> > > >
> > > >
> > > > > On Nov 25, 2020, at 7:21 PM, Frank van der Linden <[email protected]> wrote:
> > > > >
> > > > > On Tue, Nov 24, 2020 at 10:40:25PM +0000, Kornievskaia, Olga wrote:
> > > > >>
> > > > >>
> > > > >> On 11/24/20, 4:20 PM, "Frank van der Linden" <[email protected]> wrote:
> > > > >>
> > > > >> On Tue, Nov 24, 2020 at 08:50:36PM +0000, Kornievskaia, Olga wrote:
> > > > >>>
> > > > >>>
> > > > >>> On 11/24/20, 3:06 PM, "Frank van der Linden" <[email protected]> wrote:
> > > > >>>
> > > > >>> On Tue, Nov 24, 2020 at 12:26:32PM -0500, Chuck Lever wrote:
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> By switching to an XFS-backed export, I am able to reproduce the
> > > > >>>> ibcomp worker crash on my client with xfstests generic/013.
> > > > >>>>
> > > > >>>> For the failing LISTXATTRS operation, xdr_inline_pages() is called
> > > > >>>> with page_len=12 and buflen=128. Then:
> > > > >>>>
> > > > >>>> - Because buflen is small, rpcrdma_marshal_req will not set up a
> > > > >>>> Reply chunk and the rpcrdma's XDRBUF_SPARSE_PAGES logic does not
> > > > >>>> get invoked at all.
> > > > >>>>
> > > > >>>> - Because page_len is non-zero, rpcrdma_inline_fixup() tries to
> > > > >>>> copy received data into rq_rcv_buf->pages, but they're missing.
> > > > >>>>
> > > > >>>> The result is that the ibcomp worker faults and dies. Sometimes that
> > > > >>>> causes a visible crash, and sometimes it results in a transport
> > > > >>>> hang without other symptoms.
> > > > >>>>
> > > > >>>> RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
> > > > >>>> should eventually be fixed or replaced. However, my preference is
> > > > >>>> that upper-layer operations should explicitly allocate their receive
> > > > >>>> buffers (using GFP_KERNEL) when possible, rather than relying on
> > > > >>>> XDRBUF_SPARSE_PAGES.
> > > > >>>>
> > > > >>>> Reported-by: Olga kornievskaia <[email protected]>
> > > > >>>> Suggested-by: Olga kornievskaia <[email protected]>
> > > > >>>> Signed-off-by: Chuck Lever <[email protected]>
> > > > >>>> ---
> > > > >>>> fs/nfs/nfs42proc.c | 17 ++++++++++-------
> > > > >>>> fs/nfs/nfs42xdr.c | 1 -
> > > > >>>> 2 files changed, 10 insertions(+), 8 deletions(-)
> > > > >>>>
> > > > >>>> Hi-
> > > > >>>>
> > > > >>>> I like Olga's proposed approach. What do you think of this patch?
> > > > >>>>
> > > > >>>>
> > > > >>>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> > > > >>>> index 2b2211d1234e..24810305ec1c 100644
> > > > >>>> --- a/fs/nfs/nfs42proc.c
> > > > >>>> +++ b/fs/nfs/nfs42proc.c
> > > > >>>> @@ -1241,7 +1241,7 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > > >>>> .rpc_resp = &res,
> > > > >>>> };
> > > > >>>> u32 xdrlen;
> > > > >>>> - int ret, np;
> > > > >>>> + int ret, np, i;
> > > > >>>>
> > > > >>>>
> > > > >>>> res.scratch = alloc_page(GFP_KERNEL);
> > > > >>>> @@ -1253,10 +1253,14 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > > >>>> xdrlen = server->lxasize;
> > > > >>>> np = xdrlen / PAGE_SIZE + 1;
> > > > >>>>
> > > > >>>> + ret = -ENOMEM;
> > > > >>>> pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
> > > > >>>> - if (pages == NULL) {
> > > > >>>> - __free_page(res.scratch);
> > > > >>>> - return -ENOMEM;
> > > > >>>> + if (pages == NULL)
> > > > >>>> + goto out_free;
> > > > >>>> + for (i = 0; i < np; i++) {
> > > > >>>> + pages[i] = alloc_page(GFP_KERNEL);
> > > > >>>> + if (!pages[i])
> > > > >>>> + goto out_free;
> > > > >>>> }
> > > > >>>>
> > > > >>>> arg.xattr_pages = pages;
> > > > >>>> @@ -1271,14 +1275,13 @@ static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
> > > > >>>> *eofp = res.eof;
> > > > >>>> }
> > > > >>>>
> > > > >>>> +out_free:
> > > > >>>> while (--np >= 0) {
> > > > >>>> if (pages[np])
> > > > >>>> __free_page(pages[np]);
> > > > >>>> }
> > > > >>>> -
> > > > >>>> - __free_page(res.scratch);
> > > > >>>> kfree(pages);
> > > > >>>> -
> > > > >>>> + __free_page(res.scratch);
> > > > >>>> return ret;
> > > > >>>>
> > > > >>>> }
> > > > >>>> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> > > > >>>> index 6e060a88f98c..8432bd6b95f0 100644
> > > > >>>> --- a/fs/nfs/nfs42xdr.c
> > > > >>>> +++ b/fs/nfs/nfs42xdr.c
> > > > >>>> @@ -1528,7 +1528,6 @@ static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> > > > >>>>
> > > > >>>> rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> > > > >>>> hdr.replen);
> > > > >>>> - req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> > > > >>>>
> > > > >>>> encode_nops(&hdr);
> > > > >>>> }
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>> I can see why this is the simplest and most pragmatic solution, so it's
> > > > >>> fine with me.
> > > > >>>
> > > > >>> Why doesn't this happen with getxattr? Do we need to convert that too?
> > > > >>>
> > > > >>> [olga] I don't know if GETXATTR/SETXATTR works. I'm not sure what tests exercise those operations. I just ran into the fact that generic/013 wasn't passing. And I don't see that it's an xattr specific tests. I'm not sure how it ends up triggering is usage of xattr.
> > > > >>
> > > > >> I'm attaching the test program I used, it should give things a better workout.
> > > > >>
> > > > >> [olga] I'm not sure if I'm doing something wrong but there are only 2 GETXATTR call on the network trace from running this application and both calls are returning an error (ERR_NOXATTR). Which btw might explain why no problems are seen since no decoding of data is happening. There are lots of SETXATTRs and REMOVEXATTR and there is a LISTXATTR (which btw network trace is marking as malformed so there might something bad there). Anyway...
> > > > >>
> > > > >> This is my initial report: no real exercise of the GETXATTR code as far as I can tell.
> > > > >
> > > > > True, the test is heavier on the setxattr / listxattr side. And with caching,
> > > > > you're not going to see a lot of GETXATTR calls. I used the same test program
> > > > > with caching off, and it works fine, though.
> > > >
> > > > I unintentionally broke GETXATTR while developing the LISTXATTRS fix,
> > > > and generic/013 rather aggressively informed me that GETXATTR was no
> > > > longer working. There is some test coverage there, fwiw.
> > >
> > > Oh, the coverage was good - in my testing I also used a collection of
> > > small unit test programs, and I was the one who made the xattr tests
> > > in xfstests work on NFS.
> >
> > I have just oops-ed the kernel trying to send a getxattr when
> > userspace provided a small buffer.
> >
> > File with extended attributes was created using your application but
> > modified to leave the file behind. Then I coded up this to get the
> > extended attirbutes. Test coverage doesn't test for this.
> >
> > int main(int argc, char *argv[]) {
> >
> > int fd, len = 8;
> > char buf[8];
> >
> > fd = open("/mnt/test_xattr_probeJxfiVU", O_RDWR | O_CREAT, S_IRWXU);
> > if (fd < 0) exit(0);
> >
> > if (getxattr("/mnt/test_xattr_probeJxfiVU", "user.test_xattr_probe",
> > buf, len) < 0) exit(0);
> >
> > return 0;
> > }
> >
> > Which again produces the KASAN's
> > [ 5915.393103] BUG: KASAN: wild-memory-access in
> > rpcrdma_complete_rqst+0x41b/0x680 [rpcrdma]
> >
> >
> > This is my proposed fix. Will send a proper patch if agreed:
>
> I was just about to send a patch that does the pre-alloc, and rounds up
> the inserted page_len to the page allocation so that it'll catch some
> more replies to cache.
>
> Let me send it..

Ok, just sent the patch. I tested it with TCP and RDMA.

I'll extend my test program a bit to take care of the uncached-
getxattr-with-short-length case.

Thanks for looking at this.

- Frank