2014-07-10 17:44:20

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH 0/2] NFS/RDMA server patches for 3.17

Hi Steve-

Can you include these two potential 3.17 candidate patches in
your testing branch?

---

Chuck Lever (2):
svcrdma: Remove extra writeargs sanity check for NFSv2/3
svcrdma: Increase credit limit to 32


fs/nfsd/nfs3xdr.c | 15 +--------------
fs/nfsd/nfsxdr.c | 16 +---------------
include/linux/sunrpc/svc_rdma.h | 3 +--
3 files changed, 3 insertions(+), 31 deletions(-)

--
Chuck Lever


2014-07-10 19:00:44

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH 0/2] NFS/RDMA server patches for 3.17

Done!

(my email is [email protected])

Steve.

On 7/10/2014 12:44 PM, Chuck Lever wrote:
> Hi Steve-
>
> Can you include these two potential 3.17 candidate patches in
> your testing branch?
>
> ---
>
> Chuck Lever (2):
> svcrdma: Remove extra writeargs sanity check for NFSv2/3
> svcrdma: Increase credit limit to 32
>
>
> fs/nfsd/nfs3xdr.c | 15 +--------------
> fs/nfsd/nfsxdr.c | 16 +---------------
> include/linux/sunrpc/svc_rdma.h | 3 +--
> 3 files changed, 3 insertions(+), 31 deletions(-)
>


2014-07-10 19:07:45

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 2/2] svcrdma: Remove extra writeargs sanity check for NFSv2/3

Hi Bruce-

On Jul 10, 2014, at 2:49 PM, J. Bruce Fields <[email protected]> wrote:

> On Thu, Jul 10, 2014 at 02:24:57PM -0400, Chuck Lever wrote:
>>
>> On Jul 10, 2014, at 2:19 PM, J. Bruce Fields <[email protected]> wrote:
>>
>>> On Thu, Jul 10, 2014 at 01:44:35PM -0400, Chuck Lever wrote:
>>>> The server should comply with RFC 5667,
>>>
>>> Where's the relevant language? (I took a quick look but didn't see it.)
>>
>> Sorry, I listed the wrong RFC when I wrote the description of bug 246.
>> It?s actually RFC 5666, section 3.7.
>
> Thanks.
>
>>> So I think you just want to drop the round-up of len, not the whole
>>> check.
>>
>> I?ll try that, thanks!

Works-as-expected.

> Actually, I'd really rather this get fixed up in the rpc layer. The
> padding is really required for correct xdr.

How so? All of NFSv4 and all other NFSv3 operations work as expected
without that padding present. There doesn?t seem to be any operational
dependency on the presence of padding. Help?

> The fact that RDMA doesn't
> require those zeroes to be literally transmitted across the network
> sounds like a transport-level detail that should be hidden from the xdr
> code.

The best I can think of is adding a false page array entry to the
xdr_buf if the last incoming page is short by a few bytes.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-07-10 18:49:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/2] svcrdma: Remove extra writeargs sanity check for NFSv2/3

On Thu, Jul 10, 2014 at 02:24:57PM -0400, Chuck Lever wrote:
>
> On Jul 10, 2014, at 2:19 PM, J. Bruce Fields <[email protected]> wrote:
>
> > On Thu, Jul 10, 2014 at 01:44:35PM -0400, Chuck Lever wrote:
> >> The server should comply with RFC 5667,
> >
> > Where's the relevant language? (I took a quick look but didn't see it.)
>
> Sorry, I listed the wrong RFC when I wrote the description of bug 246.
> It’s actually RFC 5666, section 3.7.

Thanks.

> > So I think you just want to drop the round-up of len, not the whole
> > check.
>
> I’ll try that, thanks!

Actually, I'd really rather this get fixed up in the rpc layer. The
padding is really required for correct xdr. The fact that RDMA doesn't
require those zeroes to be literally transmitted across the network
sounds like a transport-level detail that should be hidden from the xdr
code.

--b.

2014-07-10 17:44:29

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH 1/2] svcrdma: Increase credit limit to 32

Signed-off-by: Chuck Lever <[email protected]>
---

include/linux/sunrpc/svc_rdma.h | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 5cf99a0..975da75 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -174,8 +174,7 @@ struct svcxprt_rdma {
* page size of 4k, or 32k * 2 ops / 4k = 16 outstanding RDMA_READ. */
#define RPCRDMA_ORD (64/4)
#define RPCRDMA_SQ_DEPTH_MULT 8
-#define RPCRDMA_MAX_THREADS 16
-#define RPCRDMA_MAX_REQUESTS 16
+#define RPCRDMA_MAX_REQUESTS 32
#define RPCRDMA_MAX_REQ_SIZE 4096

/* svc_rdma_marshal.c */


2014-07-10 18:19:45

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/2] svcrdma: Remove extra writeargs sanity check for NFSv2/3

On Thu, Jul 10, 2014 at 01:44:35PM -0400, Chuck Lever wrote:
> The server should comply with RFC 5667,

Where's the relevant language? (I took a quick look but didn't see it.)

> and handle WRITE payloads
> that may not have a zero pad on the end (XDR length round-up).
>
> Fixes: f34b9568 (The NFSv2/NFSv3 server does not handle zero...)
> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=246
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> fs/nfsd/nfs3xdr.c | 15 +--------------
> fs/nfsd/nfsxdr.c | 16 +---------------
> 2 files changed, 2 insertions(+), 29 deletions(-)
>
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index e6c01e8..6074f6a 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -361,7 +361,7 @@ int
> nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
> struct nfsd3_writeargs *args)
> {
> - unsigned int len, v, hdr, dlen;
> + unsigned int len, v, hdr;
> u32 max_blocksize = svc_max_payload(rqstp);
>
> p = decode_fh(p, &args->fh);
> @@ -383,19 +383,6 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
> * bytes.
> */
> hdr = (void*)p - rqstp->rq_arg.head[0].iov_base;
> - dlen = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len
> - - hdr;
> - /*
> - * Round the length of the data which was specified up to
> - * the next multiple of XDR units and then compare that
> - * against the length which was actually received.
> - * Note that when RPCSEC/GSS (for example) is used, the
> - * data buffer can be padded so dlen might be larger
> - * than required. It must never be smaller.
> - */
> - if (dlen < XDR_QUADLEN(len)*4)
> - return 0;
> -
> if (args->count > max_blocksize) {
> args->count = max_blocksize;
> len = args->len = max_blocksize;
And then:
}
rqstp->rq_vec[0].iov_base = (void*)p;
rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
v = 0;
while (len > rqstp->rq_vec[v].iov_len) {
len -= rqstp->rq_vec[v].iov_len;
v++;
rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_pages[v]);
rqstp->rq_vec[v].iov_len = PAGE_SIZE;
}
rqstp->rq_vec[v].iov_len = len;
args->vlen = v + 1;
return 1;

So if len is allowed to be larger than the available space, then the rq_vec
will get filled with data beyond the end of the client's request.

I believe the max_blocksize check will at least ensure that rq_pages[v]
is always still a valid page, so we shouldn't crash. But it could
contain random data, and writing that data to a file could disclose
information we don't mean to.

Am I missing something? The v2 case looks similar.

So I think you just want to drop the round-up of len, not the whole
check.

--b.


> diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
> index 1ac306b..1f5347e 100644
> --- a/fs/nfsd/nfsxdr.c
> +++ b/fs/nfsd/nfsxdr.c
> @@ -280,7 +280,7 @@ int
> nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
> struct nfsd_writeargs *args)
> {
> - unsigned int len, hdr, dlen;
> + unsigned int len, hdr;
> int v;
>
> p = decode_fh(p, &args->fh);
> @@ -302,20 +302,6 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
> * bytes.
> */
> hdr = (void*)p - rqstp->rq_arg.head[0].iov_base;
> - dlen = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len
> - - hdr;
> -
> - /*
> - * Round the length of the data which was specified up to
> - * the next multiple of XDR units and then compare that
> - * against the length which was actually received.
> - * Note that when RPCSEC/GSS (for example) is used, the
> - * data buffer can be padded so dlen might be larger
> - * than required. It must never be smaller.
> - */
> - if (dlen < XDR_QUADLEN(len)*4)
> - return 0;
> -
> rqstp->rq_vec[0].iov_base = (void*)p;
> rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
> v = 0;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
:set

> diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
> index 1ac306b..1f5347e 100644
> --- a/fs/nfsd/nfsxdr.c
> +++ b/fs/nfsd/nfsxdr.c
> @@ -280,7 +280,7 @@ int
> nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
> struct nfsd_writeargs *args)
> {
> - unsigned int len, hdr, dlen;
> + unsigned int len, hdr;
> int v;
>
> p = decode_fh(p, &args->fh);
> @@ -302,20 +302,6 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
> * bytes.
> */
> hdr = (void*)p - rqstp->rq_arg.head[0].iov_base;
> - dlen = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len
> - - hdr;
> -
> - /*
> - * Round the length of the data which was specified up to
> - * the next multiple of XDR units and then compare that
> - * against the length which was actually received.
> - * Note that when RPCSEC/GSS (for example) is used, the
> - * data buffer can be padded so dlen might be larger
> - * than required. It must never be smaller.
> - */
> - if (dlen < XDR_QUADLEN(len)*4)
> - return 0;
> -
> rqstp->rq_vec[0].iov_base = (void*)p;
> rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
> v = 0;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-07-10 20:44:10

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 2/2] svcrdma: Remove extra writeargs sanity check for NFSv2/3

[ Correcting Steve?s e-mail address ]

On Jul 10, 2014, at 3:43 PM, J. Bruce Fields <[email protected]> wrote:

> On Thu, Jul 10, 2014 at 03:07:34PM -0400, Chuck Lever wrote:
>> Hi Bruce-
>>
>> On Jul 10, 2014, at 2:49 PM, J. Bruce Fields <[email protected]> wrote:
>>
>>> On Thu, Jul 10, 2014 at 02:24:57PM -0400, Chuck Lever wrote:
>>>>
>>>> On Jul 10, 2014, at 2:19 PM, J. Bruce Fields <[email protected]> wrote:
>>>>
>>>>> On Thu, Jul 10, 2014 at 01:44:35PM -0400, Chuck Lever wrote:
>>>>>> The server should comply with RFC 5667,
>>>>>
>>>>> Where's the relevant language? (I took a quick look but didn't see it.)
>>>>
>>>> Sorry, I listed the wrong RFC when I wrote the description of bug 246.
>>>> It?s actually RFC 5666, section 3.7.
>>>
>>> Thanks.
>>>
>>>>> So I think you just want to drop the round-up of len, not the whole
>>>>> check.
>>>>
>>>> I?ll try that, thanks!
>>
>> Works-as-expected.
>>
>>> Actually, I'd really rather this get fixed up in the rpc layer. The
>>> padding is really required for correct xdr.
>>
>> How so?
>
> Well, to be spec-lawyerly, rfc 1832 3.9 defines opaque data as including
> the zero-padding; a sequence of bytes isn't legal xdr if it just ends
> early.

RFC 5666 section 3.7 is talking about the situation where the transport
is dealing with a list of pages, not a stream of bytes. Each item in
the list is handled by a separate RDMA READ. The client isn?t sending
the payload bytes, the server is pulling them from the client, so the
integrity of the RPC request stream isn?t affected if the server doesn?t
read the last item in the page list, for example.

>
>> All of NFSv4 and all other NFSv3 operations work as expected
>> without that padding present. There doesn?t seem to be any operational
>> dependency on the presence of padding. Help?
>
> I can believe that the code deals with it now, I just wonder if this
> check may not be the only case where someone writing xdr code expects
> total length to be a multiple of four.

The exception in section 3.7 applies only to lists of pages. RPC/RDMA
requests sent via RDMA SEND or via RDMA_NOMSG would terminate the XDR
byte stream normally with a zero pad. Since such requests are transmitted
via a single RDMA operation anyway, there?s scant benefit to leaving
off the pad bytes.

Thus this exception applies only to the two operations that RFC 5667
allows to use a read chunk (page list): WRITE and SYMLINK.

> The drc code also depends on the length being right, see
> nfsd_cache_csum. I don't know whether that will cause a practical
> problem in this case.

OK. DRC is kind of important, at least for NFSv3.

> (What about the krb5i case?)
>
>>> The fact that RDMA doesn't
>>> require those zeroes to be literally transmitted across the network
>>> sounds like a transport-level detail that should be hidden from the xdr
>>> code.
>>
>> The best I can think of is adding a false page array entry to the
>> xdr_buf if the last incoming page is short by a few bytes.
>
> The padding just gets added to the end of whichever page the write ended
> on, and you only use another page if you run out of space, right?

I think I misunderstood the RDMA READ code. No extra page should be
necessary.

The only question I have is whether the receive pages are set up so
that the server itself can write pad bytes into them before the RDMA
READs are done.

> I don't know, if that's a huge pain then I guess we can live with this.

Fixing svcrdma was my first approach, but I abandoned it once I
realized that NFSv3 WRITE was the only failing case.

I think the sanity check you pointed out is strictly satisfied by
testing against the unaligned number of bytes. Is there a strong
reason to do the extra math for that check during each WRITE?

Otherwise, I?m looking closely at hooking this up in the transport
as you suggested, for comparison.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-07-10 17:44:37

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH 2/2] svcrdma: Remove extra writeargs sanity check for NFSv2/3

The server should comply with RFC 5667, and handle WRITE payloads
that may not have a zero pad on the end (XDR length round-up).

Fixes: f34b9568 (The NFSv2/NFSv3 server does not handle zero...)
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=246
Signed-off-by: Chuck Lever <[email protected]>
---

fs/nfsd/nfs3xdr.c | 15 +--------------
fs/nfsd/nfsxdr.c | 16 +---------------
2 files changed, 2 insertions(+), 29 deletions(-)

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index e6c01e8..6074f6a 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -361,7 +361,7 @@ int
nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
struct nfsd3_writeargs *args)
{
- unsigned int len, v, hdr, dlen;
+ unsigned int len, v, hdr;
u32 max_blocksize = svc_max_payload(rqstp);

p = decode_fh(p, &args->fh);
@@ -383,19 +383,6 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
* bytes.
*/
hdr = (void*)p - rqstp->rq_arg.head[0].iov_base;
- dlen = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len
- - hdr;
- /*
- * Round the length of the data which was specified up to
- * the next multiple of XDR units and then compare that
- * against the length which was actually received.
- * Note that when RPCSEC/GSS (for example) is used, the
- * data buffer can be padded so dlen might be larger
- * than required. It must never be smaller.
- */
- if (dlen < XDR_QUADLEN(len)*4)
- return 0;
-
if (args->count > max_blocksize) {
args->count = max_blocksize;
len = args->len = max_blocksize;
diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
index 1ac306b..1f5347e 100644
--- a/fs/nfsd/nfsxdr.c
+++ b/fs/nfsd/nfsxdr.c
@@ -280,7 +280,7 @@ int
nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
struct nfsd_writeargs *args)
{
- unsigned int len, hdr, dlen;
+ unsigned int len, hdr;
int v;

p = decode_fh(p, &args->fh);
@@ -302,20 +302,6 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
* bytes.
*/
hdr = (void*)p - rqstp->rq_arg.head[0].iov_base;
- dlen = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len
- - hdr;
-
- /*
- * Round the length of the data which was specified up to
- * the next multiple of XDR units and then compare that
- * against the length which was actually received.
- * Note that when RPCSEC/GSS (for example) is used, the
- * data buffer can be padded so dlen might be larger
- * than required. It must never be smaller.
- */
- if (dlen < XDR_QUADLEN(len)*4)
- return 0;
-
rqstp->rq_vec[0].iov_base = (void*)p;
rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
v = 0;


2014-07-10 19:43:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/2] svcrdma: Remove extra writeargs sanity check for NFSv2/3

On Thu, Jul 10, 2014 at 03:07:34PM -0400, Chuck Lever wrote:
> Hi Bruce-
>
> On Jul 10, 2014, at 2:49 PM, J. Bruce Fields <[email protected]> wrote:
>
> > On Thu, Jul 10, 2014 at 02:24:57PM -0400, Chuck Lever wrote:
> >>
> >> On Jul 10, 2014, at 2:19 PM, J. Bruce Fields <[email protected]> wrote:
> >>
> >>> On Thu, Jul 10, 2014 at 01:44:35PM -0400, Chuck Lever wrote:
> >>>> The server should comply with RFC 5667,
> >>>
> >>> Where's the relevant language? (I took a quick look but didn't see it.)
> >>
> >> Sorry, I listed the wrong RFC when I wrote the description of bug 246.
> >> It’s actually RFC 5666, section 3.7.
> >
> > Thanks.
> >
> >>> So I think you just want to drop the round-up of len, not the whole
> >>> check.
> >>
> >> I’ll try that, thanks!
>
> Works-as-expected.
>
> > Actually, I'd really rather this get fixed up in the rpc layer. The
> > padding is really required for correct xdr.
>
> How so?

Well, to be spec-lawyerly, rfc 1832 3.9 defines opaque data as including
the zero-padding; a sequence of bytes isn't legal xdr if it just ends
early.

> All of NFSv4 and all other NFSv3 operations work as expected
> without that padding present. There doesn’t seem to be any operational
> dependency on the presence of padding. Help?

I can believe that the code deals with it now, I just wonder if this
check may not be the only case where someone writing xdr code expects
total length to be a multiple of four.

The drc code also depends on the length being right, see
nfsd_cache_csum. I don't know whether that will cause a practical
problem in this case.

(What about the krb5i case?)

> > The fact that RDMA doesn't
> > require those zeroes to be literally transmitted across the network
> > sounds like a transport-level detail that should be hidden from the xdr
> > code.
>
> The best I can think of is adding a false page array entry to the
> xdr_buf if the last incoming page is short by a few bytes.

The padding just gets added to the end of whichever page the write ended
on, and you only use another page if you run out of space, right?

I don't know, if that's a huge pain then I guess we can live with this.

--b.

2014-07-10 18:25:07

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 2/2] svcrdma: Remove extra writeargs sanity check for NFSv2/3


On Jul 10, 2014, at 2:19 PM, J. Bruce Fields <[email protected]> wrote:

> On Thu, Jul 10, 2014 at 01:44:35PM -0400, Chuck Lever wrote:
>> The server should comply with RFC 5667,
>
> Where's the relevant language? (I took a quick look but didn't see it.)

Sorry, I listed the wrong RFC when I wrote the description of bug 246.
It?s actually RFC 5666, section 3.7.

>
>> and handle WRITE payloads
>> that may not have a zero pad on the end (XDR length round-up).
>>
>> Fixes: f34b9568 (The NFSv2/NFSv3 server does not handle zero...)
>> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=246
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>>
>> fs/nfsd/nfs3xdr.c | 15 +--------------
>> fs/nfsd/nfsxdr.c | 16 +---------------
>> 2 files changed, 2 insertions(+), 29 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
>> index e6c01e8..6074f6a 100644
>> --- a/fs/nfsd/nfs3xdr.c
>> +++ b/fs/nfsd/nfs3xdr.c
>> @@ -361,7 +361,7 @@ int
>> nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
>> struct nfsd3_writeargs *args)
>> {
>> - unsigned int len, v, hdr, dlen;
>> + unsigned int len, v, hdr;
>> u32 max_blocksize = svc_max_payload(rqstp);
>>
>> p = decode_fh(p, &args->fh);
>> @@ -383,19 +383,6 @@ nfs3svc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
>> * bytes.
>> */
>> hdr = (void*)p - rqstp->rq_arg.head[0].iov_base;
>> - dlen = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len
>> - - hdr;
>> - /*
>> - * Round the length of the data which was specified up to
>> - * the next multiple of XDR units and then compare that
>> - * against the length which was actually received.
>> - * Note that when RPCSEC/GSS (for example) is used, the
>> - * data buffer can be padded so dlen might be larger
>> - * than required. It must never be smaller.
>> - */
>> - if (dlen < XDR_QUADLEN(len)*4)
>> - return 0;
>> -
>> if (args->count > max_blocksize) {
>> args->count = max_blocksize;
>> len = args->len = max_blocksize;
> And then:
> }
> rqstp->rq_vec[0].iov_base = (void*)p;
> rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
> v = 0;
> while (len > rqstp->rq_vec[v].iov_len) {
> len -= rqstp->rq_vec[v].iov_len;
> v++;
> rqstp->rq_vec[v].iov_base = page_address(rqstp->rq_pages[v]);
> rqstp->rq_vec[v].iov_len = PAGE_SIZE;
> }
> rqstp->rq_vec[v].iov_len = len;
> args->vlen = v + 1;
> return 1;
>
> So if len is allowed to be larger than the available space, then the rq_vec
> will get filled with data beyond the end of the client's request.
>
> I believe the max_blocksize check will at least ensure that rq_pages[v]
> is always still a valid page, so we shouldn't crash. But it could
> contain random data, and writing that data to a file could disclose
> information we don't mean to.

> Am I missing something? The v2 case looks similar.
>
> So I think you just want to drop the round-up of len, not the whole
> check.

I?ll try that, thanks!

> --b.
>
>
>> diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
>> index 1ac306b..1f5347e 100644
>> --- a/fs/nfsd/nfsxdr.c
>> +++ b/fs/nfsd/nfsxdr.c
>> @@ -280,7 +280,7 @@ int
>> nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
>> struct nfsd_writeargs *args)
>> {
>> - unsigned int len, hdr, dlen;
>> + unsigned int len, hdr;
>> int v;
>>
>> p = decode_fh(p, &args->fh);
>> @@ -302,20 +302,6 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
>> * bytes.
>> */
>> hdr = (void*)p - rqstp->rq_arg.head[0].iov_base;
>> - dlen = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len
>> - - hdr;
>> -
>> - /*
>> - * Round the length of the data which was specified up to
>> - * the next multiple of XDR units and then compare that
>> - * against the length which was actually received.
>> - * Note that when RPCSEC/GSS (for example) is used, the
>> - * data buffer can be padded so dlen might be larger
>> - * than required. It must never be smaller.
>> - */
>> - if (dlen < XDR_QUADLEN(len)*4)
>> - return 0;
>> -
>> rqstp->rq_vec[0].iov_base = (void*)p;
>> rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
>> v = 0;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> :set
>
>> diff --git a/fs/nfsd/nfsxdr.c b/fs/nfsd/nfsxdr.c
>> index 1ac306b..1f5347e 100644
>> --- a/fs/nfsd/nfsxdr.c
>> +++ b/fs/nfsd/nfsxdr.c
>> @@ -280,7 +280,7 @@ int
>> nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
>> struct nfsd_writeargs *args)
>> {
>> - unsigned int len, hdr, dlen;
>> + unsigned int len, hdr;
>> int v;
>>
>> p = decode_fh(p, &args->fh);
>> @@ -302,20 +302,6 @@ nfssvc_decode_writeargs(struct svc_rqst *rqstp, __be32 *p,
>> * bytes.
>> */
>> hdr = (void*)p - rqstp->rq_arg.head[0].iov_base;
>> - dlen = rqstp->rq_arg.head[0].iov_len + rqstp->rq_arg.page_len
>> - - hdr;
>> -
>> - /*
>> - * Round the length of the data which was specified up to
>> - * the next multiple of XDR units and then compare that
>> - * against the length which was actually received.
>> - * Note that when RPCSEC/GSS (for example) is used, the
>> - * data buffer can be padded so dlen might be larger
>> - * than required. It must never be smaller.
>> - */
>> - if (dlen < XDR_QUADLEN(len)*4)
>> - return 0;
>> -
>> rqstp->rq_vec[0].iov_base = (void*)p;
>> rqstp->rq_vec[0].iov_len = rqstp->rq_arg.head[0].iov_len - hdr;
>> v = 0;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-07-11 18:49:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 2/2] svcrdma: Remove extra writeargs sanity check for NFSv2/3

On Thu, Jul 10, 2014 at 04:43:54PM -0400, Chuck Lever wrote:
> I think the sanity check you pointed out is strictly satisfied by
> testing against the unaligned number of bytes. Is there a strong
> reason to do the extra math for that check during each WRITE?

I can't think of any good reason why the check should be against the
rounded-up length.

So my worry was just a more general "uh-oh, a non-multiple-of-4-length
is pretty weird, future me may not remember that's a possibility."

--b.