The GSS Message Integrity Check data for krb5i may lie partially in the XDR
reply buffer's pages and tail. If so, we try to copy the entire MIC into
free space in the tail. But as the estimations of the slack space required
for authentication and verification have improved there may be less free
space in the tail to complete this copy -- see commit 2c94b8eca1a2
("SUNRPC: Use au_rslack when computing reply buffer size"). In fact, there
may only be room in the tail for a single copy of the MIC, and not part of
the MIC and then another complete copy.
The real world failure reported is that `ls` of a directory on NFS may
sometimes return -EIO, which can be traced back to xdr_buf_read_netobj()
failing to find available free space in the tail to copy the MIC.
Fix this by checking for the case of the MIC crossing the boundaries of
head, pages, and tail. If so, shift the buffer until the MIC is contained
completely within the pages or tail. This allows the remainder of the
function to create a sub buffer that directly address the complete MIC.
Signed-off-by: Benjamin Coddington <[email protected]>
Cc: [email protected] # v5.1
---
net/sunrpc/xdr.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index 48c93b9e525e..a29ce73c3029 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -1237,16 +1237,29 @@ xdr_encode_word(struct xdr_buf *buf, unsigned int base, u32 obj)
EXPORT_SYMBOL_GPL(xdr_encode_word);
/* If the netobj starting offset bytes from the start of xdr_buf is contained
- * entirely in the head or the tail, set object to point to it; otherwise
- * try to find space for it at the end of the tail, copy it there, and
- * set obj to point to it. */
+ * entirely in the head, pages, or tail, set object to point to it; otherwise
+ * shift the buffer until it is contained entirely within the pages or tail.
+ */
int xdr_buf_read_netobj(struct xdr_buf *buf, struct xdr_netobj *obj, unsigned int offset)
{
struct xdr_buf subbuf;
+ unsigned int boundary;
if (xdr_decode_word(buf, offset, &obj->len))
return -EFAULT;
- if (xdr_buf_subsegment(buf, &subbuf, offset + 4, obj->len))
+ offset += 4;
+
+ /* Is the obj partially in the head? */
+ boundary = buf->head[0].iov_len;
+ if (offset < boundary && (offset + obj->len) > boundary)
+ xdr_shift_buf(buf, boundary - offset);
+
+ /* Is the obj partially in the pages? */
+ boundary += buf->page_len;
+ if (offset < boundary && (offset + obj->len) > boundary)
+ xdr_shrink_pagelen(buf, boundary - offset);
+
+ if (xdr_buf_subsegment(buf, &subbuf, offset, obj->len))
return -EFAULT;
/* Is the obj contained entirely in the head? */
@@ -1258,17 +1271,10 @@ int xdr_buf_read_netobj(struct xdr_buf *buf, struct xdr_netobj *obj, unsigned in
if (subbuf.tail[0].iov_len == obj->len)
return 0;
- /* use end of tail as storage for obj:
- * (We don't copy to the beginning because then we'd have
- * to worry about doing a potentially overlapping copy.
- * This assumes the object is at most half the length of the
- * tail.) */
+ /* obj is in the pages: move to end of the tail */
if (obj->len > buf->buflen - buf->len)
return -ENOMEM;
- if (buf->tail[0].iov_len != 0)
- obj->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
- else
- obj->data = buf->head[0].iov_base + buf->head[0].iov_len;
+ obj->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
__read_bytes_from_xdr_buf(&subbuf, obj->data, obj->len);
return 0;
}
--
2.20.1
Hi Ben-
> On Sep 15, 2019, at 11:41 AM, Benjamin Coddington <[email protected]> wrote:
>
> The GSS Message Integrity Check data for krb5i may lie partially in the XDR
> reply buffer's pages and tail. If so, we try to copy the entire MIC into
> free space in the tail. But as the estimations of the slack space required
> for authentication and verification have improved there may be less free
> space in the tail to complete this copy -- see commit 2c94b8eca1a2
> ("SUNRPC: Use au_rslack when computing reply buffer size"). In fact, there
> may only be room in the tail for a single copy of the MIC, and not part of
> the MIC and then another complete copy.
>
> The real world failure reported is that `ls` of a directory on NFS may
> sometimes return -EIO, which can be traced back to xdr_buf_read_netobj()
> failing to find available free space in the tail to copy the MIC.
>
> Fix this by checking for the case of the MIC crossing the boundaries of
> head, pages, and tail. If so, shift the buffer until the MIC is contained
> completely within the pages or tail. This allows the remainder of the
> function to create a sub buffer that directly address the complete MIC.
>
> Signed-off-by: Benjamin Coddington <[email protected]>
> Cc: [email protected] # v5.1
> ---
> net/sunrpc/xdr.c | 32 +++++++++++++++++++-------------
> 1 file changed, 19 insertions(+), 13 deletions(-)
>
> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index 48c93b9e525e..a29ce73c3029 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -1237,16 +1237,29 @@ xdr_encode_word(struct xdr_buf *buf, unsigned int base, u32 obj)
> EXPORT_SYMBOL_GPL(xdr_encode_word);
>
> /* If the netobj starting offset bytes from the start of xdr_buf is contained
> - * entirely in the head or the tail, set object to point to it; otherwise
> - * try to find space for it at the end of the tail, copy it there, and
> - * set obj to point to it. */
> + * entirely in the head, pages, or tail, set object to point to it; otherwise
> + * shift the buffer until it is contained entirely within the pages or tail.
> + */
> int xdr_buf_read_netobj(struct xdr_buf *buf, struct xdr_netobj *obj, unsigned int offset)
> {
> struct xdr_buf subbuf;
> + unsigned int boundary;
>
> if (xdr_decode_word(buf, offset, &obj->len))
> return -EFAULT;
> - if (xdr_buf_subsegment(buf, &subbuf, offset + 4, obj->len))
> + offset += 4;
> +
> + /* Is the obj partially in the head? */
> + boundary = buf->head[0].iov_len;
> + if (offset < boundary && (offset + obj->len) > boundary)
> + xdr_shift_buf(buf, boundary - offset);
> +
> + /* Is the obj partially in the pages? */
> + boundary += buf->page_len;
> + if (offset < boundary && (offset + obj->len) > boundary)
> + xdr_shrink_pagelen(buf, boundary - offset);
> +
> + if (xdr_buf_subsegment(buf, &subbuf, offset, obj->len))
> return -EFAULT;
>
> /* Is the obj contained entirely in the head? */
> @@ -1258,17 +1271,10 @@ int xdr_buf_read_netobj(struct xdr_buf *buf, struct xdr_netobj *obj, unsigned in
> if (subbuf.tail[0].iov_len == obj->len)
> return 0;
>
> - /* use end of tail as storage for obj:
> - * (We don't copy to the beginning because then we'd have
> - * to worry about doing a potentially overlapping copy.
> - * This assumes the object is at most half the length of the
> - * tail.) */
> + /* obj is in the pages: move to end of the tail */
How about "/* Find a contiguous area in @buf to hold all of @obj */" ?
> if (obj->len > buf->buflen - buf->len)
> return -ENOMEM;
> - if (buf->tail[0].iov_len != 0)
> - obj->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
> - else
> - obj->data = buf->head[0].iov_base + buf->head[0].iov_len;
> + obj->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
Your new code assumes that when krb5i is in use, the upper layer will always
provide a non-NULL tail->iov_len. I wouldn't swear that will always be true:
The reason for the "if (buf->tail[0].iov_len)" check is to see whether the
upper layer indeed has set up a tail. iov_len will be non-zero only when the
upper layer has provided a tail buffer.
> we can definitely keep the check, but
> the second half of the statement also assumes a contiguous head/tail range.
Well, it assumes that there is space in the head buffer after its end. That's
not necessarily the tail. Are we sure that in the post-35e77d21baa0 world
there will always be enough space after head->iov_len?
A reasonable test here would be to enable SLUB poisoning and and try some
complex workloads on an NFSv4 krb5i mount.
> I think it's safe to just remove the test altogether and place the netobj at
> the end of the tail.
I'm not convinced :-) I'd like to see more justification for this claim.
This is why in the long run we are better off using a scratch buffer instead
of finding a spot in @buf. The rules about how the rq_rcv_buf is set up are
gray; this function makes a lot of "clever" assumptions about that, and that
makes this logic quite brittle.
Now that we have an xdr_stream in gss_unwrap_resp_integ(), I wonder if you
could use the stream's scratch xdr_buf. If not, a kmalloc should do the trick.
> __read_bytes_from_xdr_buf(&subbuf, obj->data, obj->len);
> return 0;
> }
> --
> 2.20.1
>
--
Chuck Lever
On Sun, 2019-09-15 at 12:43 -0400, Chuck Lever wrote:
> Hi Ben-
>
>
> > On Sep 15, 2019, at 11:41 AM, Benjamin Coddington <
> > [email protected]> wrote:
> >
> > The GSS Message Integrity Check data for krb5i may lie partially in
> > the XDR
> > reply buffer's pages and tail. If so, we try to copy the entire
> > MIC into
> > free space in the tail. But as the estimations of the slack space
> > required
> > for authentication and verification have improved there may be less
> > free
> > space in the tail to complete this copy -- see commit 2c94b8eca1a2
> > ("SUNRPC: Use au_rslack when computing reply buffer size"). In
> > fact, there
> > may only be room in the tail for a single copy of the MIC, and not
> > part of
> > the MIC and then another complete copy.
> >
> > The real world failure reported is that `ls` of a directory on NFS
> > may
> > sometimes return -EIO, which can be traced back to
> > xdr_buf_read_netobj()
> > failing to find available free space in the tail to copy the MIC.
> >
> > Fix this by checking for the case of the MIC crossing the
> > boundaries of
> > head, pages, and tail. If so, shift the buffer until the MIC is
> > contained
> > completely within the pages or tail. This allows the remainder of
> > the
> > function to create a sub buffer that directly address the complete
> > MIC.
> >
> > Signed-off-by: Benjamin Coddington <[email protected]>
> > Cc: [email protected] # v5.1
> > ---
> > net/sunrpc/xdr.c | 32 +++++++++++++++++++-------------
> > 1 file changed, 19 insertions(+), 13 deletions(-)
> >
> > diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> > index 48c93b9e525e..a29ce73c3029 100644
> > --- a/net/sunrpc/xdr.c
> > +++ b/net/sunrpc/xdr.c
> > @@ -1237,16 +1237,29 @@ xdr_encode_word(struct xdr_buf *buf,
> > unsigned int base, u32 obj)
> > EXPORT_SYMBOL_GPL(xdr_encode_word);
> >
> > /* If the netobj starting offset bytes from the start of xdr_buf is
> > contained
> > - * entirely in the head or the tail, set object to point to it;
> > otherwise
> > - * try to find space for it at the end of the tail, copy it there,
> > and
> > - * set obj to point to it. */
> > + * entirely in the head, pages, or tail, set object to point to
> > it; otherwise
> > + * shift the buffer until it is contained entirely within the
> > pages or tail.
> > + */
> > int xdr_buf_read_netobj(struct xdr_buf *buf, struct xdr_netobj
> > *obj, unsigned int offset)
> > {
> > struct xdr_buf subbuf;
> > + unsigned int boundary;
> >
> > if (xdr_decode_word(buf, offset, &obj->len))
> > return -EFAULT;
> > - if (xdr_buf_subsegment(buf, &subbuf, offset + 4, obj->len))
> > + offset += 4;
> > +
> > + /* Is the obj partially in the head? */
> > + boundary = buf->head[0].iov_len;
> > + if (offset < boundary && (offset + obj->len) > boundary)
> > + xdr_shift_buf(buf, boundary - offset);
> > +
> > + /* Is the obj partially in the pages? */
> > + boundary += buf->page_len;
> > + if (offset < boundary && (offset + obj->len) > boundary)
> > + xdr_shrink_pagelen(buf, boundary - offset);
> > +
> > + if (xdr_buf_subsegment(buf, &subbuf, offset, obj->len))
> > return -EFAULT;
> >
> > /* Is the obj contained entirely in the head? */
> > @@ -1258,17 +1271,10 @@ int xdr_buf_read_netobj(struct xdr_buf
> > *buf, struct xdr_netobj *obj, unsigned in
> > if (subbuf.tail[0].iov_len == obj->len)
> > return 0;
> >
> > - /* use end of tail as storage for obj:
> > - * (We don't copy to the beginning because then we'd have
> > - * to worry about doing a potentially overlapping copy.
> > - * This assumes the object is at most half the length of the
> > - * tail.) */
> > + /* obj is in the pages: move to end of the tail */
>
> How about "/* Find a contiguous area in @buf to hold all of @obj */"
> ?
>
>
> > if (obj->len > buf->buflen - buf->len)
> > return -ENOMEM;
> > - if (buf->tail[0].iov_len != 0)
> > - obj->data = buf->tail[0].iov_base + buf-
> > >tail[0].iov_len;
> > - else
> > - obj->data = buf->head[0].iov_base + buf-
> > >head[0].iov_len;
> > + obj->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
>
> Your new code assumes that when krb5i is in use, the upper layer will
> always
> provide a non-NULL tail->iov_len. I wouldn't swear that will always
> be true:
> The reason for the "if (buf->tail[0].iov_len)" check is to see
> whether the
> upper layer indeed has set up a tail. iov_len will be non-zero only
> when the
> upper layer has provided a tail buffer.
For the case where we only have kvec data, then xdr_buf_init() assigns
all the allocated buffer to the header, so the mic should be
contiguously buffered there.
If we have page data, then the fact that auth->au_rslack > auth-
>au_ralign should ensure that we have allocated a sufficiently large
tail buffer (see rpc_prepare_reply_pages()).
BTW: this also means we should really only need to move the mic in the
case where buf->page_len != 0.
It also means that we should probably change xdr_inline_pages() to be a
static function, since the assumption above fails if the user does not
call rpc_prepare_reply_pages() in order to calculate the header
alignment correctly.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On 15 Sep 2019, at 13:30, Trond Myklebust wrote:
> On Sun, 2019-09-15 at 12:43 -0400, Chuck Lever wrote:
>> Hi Ben-
>>
>>
>>> On Sep 15, 2019, at 11:41 AM, Benjamin Coddington <
>>> [email protected]> wrote:
>>>
>>> The GSS Message Integrity Check data for krb5i may lie partially in
>>> the XDR
>>> reply buffer's pages and tail. If so, we try to copy the entire
>>> MIC into
>>> free space in the tail. But as the estimations of the slack space
>>> required
>>> for authentication and verification have improved there may be less
>>> free
>>> space in the tail to complete this copy -- see commit 2c94b8eca1a2
>>> ("SUNRPC: Use au_rslack when computing reply buffer size"). In
>>> fact, there
>>> may only be room in the tail for a single copy of the MIC, and not
>>> part of
>>> the MIC and then another complete copy.
>>>
>>> The real world failure reported is that `ls` of a directory on NFS
>>> may
>>> sometimes return -EIO, which can be traced back to
>>> xdr_buf_read_netobj()
>>> failing to find available free space in the tail to copy the MIC.
>>>
>>> Fix this by checking for the case of the MIC crossing the
>>> boundaries of
>>> head, pages, and tail. If so, shift the buffer until the MIC is
>>> contained
>>> completely within the pages or tail. This allows the remainder of
>>> the
>>> function to create a sub buffer that directly address the complete
>>> MIC.
>>>
>>> Signed-off-by: Benjamin Coddington <[email protected]>
>>> Cc: [email protected] # v5.1
>>> ---
>>> net/sunrpc/xdr.c | 32 +++++++++++++++++++-------------
>>> 1 file changed, 19 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>>> index 48c93b9e525e..a29ce73c3029 100644
>>> --- a/net/sunrpc/xdr.c
>>> +++ b/net/sunrpc/xdr.c
>>> @@ -1237,16 +1237,29 @@ xdr_encode_word(struct xdr_buf *buf,
>>> unsigned int base, u32 obj)
>>> EXPORT_SYMBOL_GPL(xdr_encode_word);
>>>
>>> /* If the netobj starting offset bytes from the start of xdr_buf is
>>> contained
>>> - * entirely in the head or the tail, set object to point to it;
>>> otherwise
>>> - * try to find space for it at the end of the tail, copy it there,
>>> and
>>> - * set obj to point to it. */
>>> + * entirely in the head, pages, or tail, set object to point to
>>> it; otherwise
>>> + * shift the buffer until it is contained entirely within the
>>> pages or tail.
>>> + */
>>> int xdr_buf_read_netobj(struct xdr_buf *buf, struct xdr_netobj
>>> *obj, unsigned int offset)
>>> {
>>> struct xdr_buf subbuf;
>>> + unsigned int boundary;
>>>
>>> if (xdr_decode_word(buf, offset, &obj->len))
>>> return -EFAULT;
>>> - if (xdr_buf_subsegment(buf, &subbuf, offset + 4, obj->len))
>>> + offset += 4;
>>> +
>>> + /* Is the obj partially in the head? */
>>> + boundary = buf->head[0].iov_len;
>>> + if (offset < boundary && (offset + obj->len) > boundary)
>>> + xdr_shift_buf(buf, boundary - offset);
>>> +
>>> + /* Is the obj partially in the pages? */
>>> + boundary += buf->page_len;
>>> + if (offset < boundary && (offset + obj->len) > boundary)
>>> + xdr_shrink_pagelen(buf, boundary - offset);
>>> +
>>> + if (xdr_buf_subsegment(buf, &subbuf, offset, obj->len))
>>> return -EFAULT;
>>>
>>> /* Is the obj contained entirely in the head? */
>>> @@ -1258,17 +1271,10 @@ int xdr_buf_read_netobj(struct xdr_buf
>>> *buf, struct xdr_netobj *obj, unsigned in
>>> if (subbuf.tail[0].iov_len == obj->len)
>>> return 0;
>>>
>>> - /* use end of tail as storage for obj:
>>> - * (We don't copy to the beginning because then we'd have
>>> - * to worry about doing a potentially overlapping copy.
>>> - * This assumes the object is at most half the length of the
>>> - * tail.) */
>>> + /* obj is in the pages: move to end of the tail */
>>
>> How about "/* Find a contiguous area in @buf to hold all of @obj */"
>> ?
>>
>>> if (obj->len > buf->buflen - buf->len)
>>> return -ENOMEM;
>>> - if (buf->tail[0].iov_len != 0)
>>> - obj->data = buf->tail[0].iov_base + buf-
>>>> tail[0].iov_len;
>>> - else
>>> - obj->data = buf->head[0].iov_base + buf-
>>>> head[0].iov_len;
>>> + obj->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
>>
>> Your new code assumes that when krb5i is in use, the upper layer will
>> always provide a non-NULL tail->iov_len. I wouldn't swear that will
>> always be true: The reason for the "if (buf->tail[0].iov_len)" check is
>> to see whether the upper layer indeed has set up a tail. iov_len will be
>> non-zero only when the upper layer has provided a tail buffer.
Alright.. I'll bring the test back on v3.
> For the case where we only have kvec data, then xdr_buf_init() assigns
> all the allocated buffer to the header, so the mic should be
> contiguously buffered there.
>
> If we have page data, then the fact that auth->au_rslack > auth-
>> au_ralign should ensure that we have allocated a sufficiently large
> tail buffer (see rpc_prepare_reply_pages()).
>
> BTW: this also means we should really only need to move the mic in the
> case where buf->page_len != 0.
> It also means that we should probably change xdr_inline_pages() to be a
> static function, since the assumption above fails if the user does not
> call rpc_prepare_reply_pages() in order to calculate the header
> alignment correctly.
Sounds like a third patch would make this series even better.. thanks for
the review and advice! I'll send a v3 with Chuck's suggestions this morning
and put the optimizations Trond would like to see together in a third patch
when I have some slack time.
Ben