In some instances, the code in rpcrdma_convert_iovs was failing to
include a chunk for the xdr tail. This was caused by defensive code
that checked if the current xdr pos was greater than the total xdr
length before encoding the tail. However, the xdr->len is 0 for
non-write operations, and therefore this check was invalid for such
operations.
With this change, the tail will always be marshaled when present. This
makes the maintenance of the pos index unnecessary. The test of
xdrbuf->len to decide if a message should be logged is also incorrect.
This fixes an observed problem when reading certain file sizes over
NFS/RDMA. Please consider this for 2.6.24.
Signed-off-by: Tom Tucker <[email protected]>
Signed-off-by: Tom Talpey <[email protected]>
Signed-off-by: James Lentini <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -92,7 +92,6 @@
seg[n].mr_page = NULL;
seg[n].mr_offset = xdrbuf->head[0].iov_base;
seg[n].mr_len = xdrbuf->head[0].iov_len;
- pos += xdrbuf->head[0].iov_len;
++n;
}
@@ -104,7 +103,6 @@
seg[n].mr_len = min_t(u32,
PAGE_SIZE - xdrbuf->page_base, xdrbuf->page_len);
len = xdrbuf->page_len - seg[n].mr_len;
- pos += len;
++n;
p = 1;
while (len > 0) {
@@ -119,20 +117,15 @@
}
}
- if (pos < xdrbuf->len && xdrbuf->tail[0].iov_len) {
+ if (xdrbuf->tail[0].iov_len) {
if (n == nsegs)
return 0;
seg[n].mr_page = NULL;
seg[n].mr_offset = xdrbuf->tail[0].iov_base;
seg[n].mr_len = xdrbuf->tail[0].iov_len;
- pos += xdrbuf->tail[0].iov_len;
++n;
}
- if (pos < xdrbuf->len)
- dprintk("RPC: %s: marshaled only %d of %d\n",
- __func__, pos, xdrbuf->len);
-
return n;
}
Trond,
I wanted to make sure this didn't get lost in the shuffle. This fixes
an observed problem. Does this look appropriate for 2.6.24?
james
On Thu, 29 Nov 2007, James Lentini wrote:
>
> In some instances, the code in rpcrdma_convert_iovs was failing to
> include a chunk for the xdr tail. This was caused by defensive code
> that checked if the current xdr pos was greater than the total xdr
> length before encoding the tail. However, the xdr->len is 0 for
> non-write operations, and therefore this check was invalid for such
> operations.
>
> With this change, the tail will always be marshaled when present. This
> makes the maintenance of the pos index unnecessary. The test of
> xdrbuf->len to decide if a message should be logged is also incorrect.
>
> This fixes an observed problem when reading certain file sizes over
> NFS/RDMA. Please consider this for 2.6.24.
>
> Signed-off-by: Tom Tucker <[email protected]>
> Signed-off-by: Tom Talpey <[email protected]>
> Signed-off-by: James Lentini <[email protected]>
>
> ---
> net/sunrpc/xprtrdma/rpc_rdma.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -92,7 +92,6 @@
> seg[n].mr_page = NULL;
> seg[n].mr_offset = xdrbuf->head[0].iov_base;
> seg[n].mr_len = xdrbuf->head[0].iov_len;
> - pos += xdrbuf->head[0].iov_len;
> ++n;
> }
>
> @@ -104,7 +103,6 @@
> seg[n].mr_len = min_t(u32,
> PAGE_SIZE - xdrbuf->page_base, xdrbuf->page_len);
> len = xdrbuf->page_len - seg[n].mr_len;
> - pos += len;
> ++n;
> p = 1;
> while (len > 0) {
> @@ -119,20 +117,15 @@
> }
> }
>
> - if (pos < xdrbuf->len && xdrbuf->tail[0].iov_len) {
> + if (xdrbuf->tail[0].iov_len) {
> if (n == nsegs)
> return 0;
> seg[n].mr_page = NULL;
> seg[n].mr_offset = xdrbuf->tail[0].iov_base;
> seg[n].mr_len = xdrbuf->tail[0].iov_len;
> - pos += xdrbuf->tail[0].iov_len;
> ++n;
> }
>
> - if (pos < xdrbuf->len)
> - dprintk("RPC: %s: marshaled only %d of %d\n",
> - __func__, pos, xdrbuf->len);
> -
> return n;
> }
>
>
On Thu, 2007-11-29 at 15:46 -0500, James Lentini wrote:
> In some instances, the code in rpcrdma_convert_iovs was failing to
> include a chunk for the xdr tail. This was caused by defensive code
> that checked if the current xdr pos was greater than the total xdr
> length before encoding the tail. However, the xdr->len is 0 for
> non-write operations, and therefore this check was invalid for such
> operations.
Huh? xdr->page_len is 0 for non-write operations, but xdr->len had
better not be 0...
> With this change, the tail will always be marshaled when present. This
> makes the maintenance of the pos index unnecessary. The test of
> xdrbuf->len to decide if a message should be logged is also incorrect.
>
> This fixes an observed problem when reading certain file sizes over
> NFS/RDMA. Please consider this for 2.6.24.
>
> Signed-off-by: Tom Tucker <[email protected]>
> Signed-off-by: Tom Talpey <[email protected]>
> Signed-off-by: James Lentini <[email protected]>
>
> ---
> net/sunrpc/xprtrdma/rpc_rdma.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -92,7 +92,6 @@
> seg[n].mr_page = NULL;
> seg[n].mr_offset = xdrbuf->head[0].iov_base;
> seg[n].mr_len = xdrbuf->head[0].iov_len;
> - pos += xdrbuf->head[0].iov_len;
> ++n;
> }
>
> @@ -104,7 +103,6 @@
> seg[n].mr_len = min_t(u32,
> PAGE_SIZE - xdrbuf->page_base, xdrbuf->page_len);
> len = xdrbuf->page_len - seg[n].mr_len;
> - pos += len;
Alternatively, just a single line fix should fix the bug AFAICS:
- pos += len;
+ pos += xdrbuf->page_len
> ++n;
> p = 1;
> while (len > 0) {
> @@ -119,20 +117,15 @@
> }
> }
>
> - if (pos < xdrbuf->len && xdrbuf->tail[0].iov_len) {
> + if (xdrbuf->tail[0].iov_len) {
> if (n == nsegs)
> return 0;
> seg[n].mr_page = NULL;
> seg[n].mr_offset = xdrbuf->tail[0].iov_base;
> seg[n].mr_len = xdrbuf->tail[0].iov_len;
> - pos += xdrbuf->tail[0].iov_len;
> ++n;
> }
>
> - if (pos < xdrbuf->len)
> - dprintk("RPC: %s: marshaled only %d of %d\n",
> - __func__, pos, xdrbuf->len);
> -
> return n;
> }
>
On Wed, 2007-12-05 at 19:56 -0500, Trond Myklebust wrote:
> On Thu, 2007-11-29 at 15:46 -0500, James Lentini wrote:
> > In some instances, the code in rpcrdma_convert_iovs was failing to
> > include a chunk for the xdr tail. This was caused by defensive code
> > that checked if the current xdr pos was greater than the total xdr
> > length before encoding the tail. However, the xdr->len is 0 for
> > non-write operations, and therefore this check was invalid for such
> > operations.
>
> Huh? xdr->page_len is 0 for non-write operations, but xdr->len had
> better not be 0...
xdr->len is 0
> > With this change, the tail will always be marshaled when present. This
> > makes the maintenance of the pos index unnecessary. The test of
> > xdrbuf->len to decide if a message should be logged is also incorrect.
> >
> > This fixes an observed problem when reading certain file sizes over
> > NFS/RDMA. Please consider this for 2.6.24.
> >
> > Signed-off-by: Tom Tucker <[email protected]>
> > Signed-off-by: Tom Talpey <[email protected]>
> > Signed-off-by: James Lentini <[email protected]>
> >
> > ---
> > net/sunrpc/xprtrdma/rpc_rdma.c | 9 +--------
> > 1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> > +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> > @@ -92,7 +92,6 @@
> > seg[n].mr_page = NULL;
> > seg[n].mr_offset = xdrbuf->head[0].iov_base;
> > seg[n].mr_len = xdrbuf->head[0].iov_len;
> > - pos += xdrbuf->head[0].iov_len;
> > ++n;
> > }
> >
> > @@ -104,7 +103,6 @@
> > seg[n].mr_len = min_t(u32,
> > PAGE_SIZE - xdrbuf->page_base, xdrbuf->page_len);
> > len = xdrbuf->page_len - seg[n].mr_len;
> > - pos += len;
>
> Alternatively, just a single line fix should fix the bug AFAICS:
> - pos += len;
> + pos += xdrbuf->page_len
xdr->len notwithstanding, for the purpose of this code, the pos
manipulation isn't necessary because the condition being checked (no
tail) can be trivially determined just by looking at tail[0].iov_len.
>
> > ++n;
> > p = 1;
> > while (len > 0) {
> > @@ -119,20 +117,15 @@
> > }
> > }
> >
> > - if (pos < xdrbuf->len && xdrbuf->tail[0].iov_len) {
> > + if (xdrbuf->tail[0].iov_len) {
> > if (n == nsegs)
> > return 0;
> > seg[n].mr_page = NULL;
> > seg[n].mr_offset = xdrbuf->tail[0].iov_base;
> > seg[n].mr_len = xdrbuf->tail[0].iov_len;
> > - pos += xdrbuf->tail[0].iov_len;
> > ++n;
> > }
> >
> > - if (pos < xdrbuf->len)
> > - dprintk("RPC: %s: marshaled only %d of %d\n",
> > - __func__, pos, xdrbuf->len);
> > -
> > return n;
> > }
> >
On Wed, 2007-12-05 at 22:50 -0600, Tom Tucker wrote:
> xdr->len is 0
Mind explaining where we're missing a call to xdr_adjust_iovec?
> > Alternatively, just a single line fix should fix the bug AFAICS:
> > - pos += len;
> > + pos += xdrbuf->page_len
>
> xdr->len notwithstanding, for the purpose of this code, the pos
> manipulation isn't necessary because the condition being checked (no
> tail) can be trivially determined just by looking at tail[0].iov_len.
I know that, but we're in an -rc4 release.
Trond
On Wed, 5 Dec 2007, Trond Myklebust wrote:
>
> On Thu, 2007-11-29 at 15:46 -0500, James Lentini wrote:
> > In some instances, the code in rpcrdma_convert_iovs was failing to
> > include a chunk for the xdr tail. This was caused by defensive code
> > that checked if the current xdr pos was greater than the total xdr
> > length before encoding the tail. However, the xdr->len is 0 for
> > non-write operations, and therefore this check was invalid for such
> > operations.
>
> Huh? xdr->page_len is 0 for non-write operations, but xdr->len had
> better not be 0...
We are talking about different xdr bufs. I'm refering to the xdr rcv
buf.
For an NFS read request, RPC/RDMA's rpcrdma_convert_iovs() will create
a request header with pointers to the xdr recv buf into which the
response should be placed (depending on the NFS operation,
rpcrdma_convert_iovs() can be passed either the rpc_rqst's snd or rcv
xdr_buf).
Using an NFSv3 read as an example, nfs3_xdr_readargs() will initialize
the rpc_rqst's snd xdr_buf len value with the return of
xdr_adjust_iovec(). However, xdr_inline_pages()'s initialization of
the rpc_rqst's rcv xdr_buf does not set len.
In this case, a len of 0 for the rpc_rqst's rcv xdr_buf appears to be
correct to me. The comment describing the field states len is the
"Length of XDR encoded message"; 0 seems like a reasonable value for
this field before the request is sent.
> > With this change, the tail will always be marshaled when present. This
> > makes the maintenance of the pos index unnecessary. The test of
> > xdrbuf->len to decide if a message should be logged is also incorrect.
> >
> > This fixes an observed problem when reading certain file sizes over
> > NFS/RDMA. Please consider this for 2.6.24.
> >
> > Signed-off-by: Tom Tucker <[email protected]>
> > Signed-off-by: Tom Talpey <[email protected]>
> > Signed-off-by: James Lentini <[email protected]>
> >
> > ---
> > net/sunrpc/xprtrdma/rpc_rdma.c | 9 +--------
> > 1 file changed, 1 insertion(+), 8 deletions(-)
> >
> > --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> > +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> > @@ -92,7 +92,6 @@
> > seg[n].mr_page = NULL;
> > seg[n].mr_offset = xdrbuf->head[0].iov_base;
> > seg[n].mr_len = xdrbuf->head[0].iov_len;
> > - pos += xdrbuf->head[0].iov_len;
> > ++n;
> > }
> >
> > @@ -104,7 +103,6 @@
> > seg[n].mr_len = min_t(u32,
> > PAGE_SIZE - xdrbuf->page_base, xdrbuf->page_len);
> > len = xdrbuf->page_len - seg[n].mr_len;
> > - pos += len;
>
> Alternatively, just a single line fix should fix the bug AFAICS:
> - pos += len;
> + pos += xdrbuf->page_len
Since we are marshaling the rcv buf, the "pos < xdrbuf->len" test
would fail.
>
> > ++n;
> > p = 1;
> > while (len > 0) {
> > @@ -119,20 +117,15 @@
> > }
> > }
> >
> > - if (pos < xdrbuf->len && xdrbuf->tail[0].iov_len) {
> > + if (xdrbuf->tail[0].iov_len) {
> > if (n == nsegs)
> > return 0;
> > seg[n].mr_page = NULL;
> > seg[n].mr_offset = xdrbuf->tail[0].iov_base;
> > seg[n].mr_len = xdrbuf->tail[0].iov_len;
> > - pos += xdrbuf->tail[0].iov_len;
> > ++n;
> > }
> >
> > - if (pos < xdrbuf->len)
> > - dprintk("RPC: %s: marshaled only %d of %d\n",
> > - __func__, pos, xdrbuf->len);
> > -
> > return n;
> > }
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Thu, 2007-12-06 at 13:20 -0500, James Lentini wrote:
>
> On Wed, 5 Dec 2007, Trond Myklebust wrote:
>
> >
> > On Thu, 2007-11-29 at 15:46 -0500, James Lentini wrote:
> > > In some instances, the code in rpcrdma_convert_iovs was failing to
> > > include a chunk for the xdr tail. This was caused by defensive code
> > > that checked if the current xdr pos was greater than the total xdr
> > > length before encoding the tail. However, the xdr->len is 0 for
> > > non-write operations, and therefore this check was invalid for such
> > > operations.
> >
> > Huh? xdr->page_len is 0 for non-write operations, but xdr->len had
> > better not be 0...
>
> We are talking about different xdr bufs. I'm refering to the xdr rcv
> buf.
>
> For an NFS read request, RPC/RDMA's rpcrdma_convert_iovs() will create
> a request header with pointers to the xdr recv buf into which the
> response should be placed (depending on the NFS operation,
> rpcrdma_convert_iovs() can be passed either the rpc_rqst's snd or rcv
> xdr_buf).
>
> Using an NFSv3 read as an example, nfs3_xdr_readargs() will initialize
> the rpc_rqst's snd xdr_buf len value with the return of
> xdr_adjust_iovec(). However, xdr_inline_pages()'s initialization of
> the rpc_rqst's rcv xdr_buf does not set len.
>
> In this case, a len of 0 for the rpc_rqst's rcv xdr_buf appears to be
> correct to me. The comment describing the field states len is the
> "Length of XDR encoded message"; 0 seems like a reasonable value for
> this field before the request is sent.
Eeewwww.... What an unholy mess!
Why are you trying to shoehorn reading and writing into
rpcrdma_create_chunks()? The way every action is split into 'if (read)
{} else {}' should be an obvious hint that none of this belongs
together. Please split this up for 2.6.25.
So yes, for read buffers, you should be looking at the storage buffer
length (xdr_buf->buflen) since the message length is obviously going to
be zero.
Trond
On Thu, 6 Dec 2007, Trond Myklebust wrote:
>
> On Thu, 2007-12-06 at 13:20 -0500, James Lentini wrote:
> >
> > On Wed, 5 Dec 2007, Trond Myklebust wrote:
> >
> > >
> > > On Thu, 2007-11-29 at 15:46 -0500, James Lentini wrote:
> > > > In some instances, the code in rpcrdma_convert_iovs was failing to
> > > > include a chunk for the xdr tail. This was caused by defensive code
> > > > that checked if the current xdr pos was greater than the total xdr
> > > > length before encoding the tail. However, the xdr->len is 0 for
> > > > non-write operations, and therefore this check was invalid for such
> > > > operations.
> > >
> > > Huh? xdr->page_len is 0 for non-write operations, but xdr->len had
> > > better not be 0...
> >
> > We are talking about different xdr bufs. I'm refering to the xdr rcv
> > buf.
> >
> > For an NFS read request, RPC/RDMA's rpcrdma_convert_iovs() will create
> > a request header with pointers to the xdr recv buf into which the
> > response should be placed (depending on the NFS operation,
> > rpcrdma_convert_iovs() can be passed either the rpc_rqst's snd or rcv
> > xdr_buf).
> >
> > Using an NFSv3 read as an example, nfs3_xdr_readargs() will initialize
> > the rpc_rqst's snd xdr_buf len value with the return of
> > xdr_adjust_iovec(). However, xdr_inline_pages()'s initialization of
> > the rpc_rqst's rcv xdr_buf does not set len.
> >
> > In this case, a len of 0 for the rpc_rqst's rcv xdr_buf appears to be
> > correct to me. The comment describing the field states len is the
> > "Length of XDR encoded message"; 0 seems like a reasonable value for
> > this field before the request is sent.
>
> Eeewwww.... What an unholy mess!
>
> Why are you trying to shoehorn reading and writing into
> rpcrdma_create_chunks()? The way every action is split into 'if
> (read) {} else {}' should be an obvious hint that none of this
> belongs together. Please split this up for 2.6.25.
ok.
> So yes, for read buffers, you should be looking at the storage
> buffer length (xdr_buf->buflen) since the message length is
> obviously going to be zero.
Agreed. This is why we removed the incorrect references to len in the
patch.
What is your opinion on including the fix in 2.6.24? It addresses an
observed problem for certain I/Os over NFS/RDMA. Would you like me to
resend the patch with a revised description?
At 06:11 PM 12/6/2007, James Lentini wrote:
>On Thu, 6 Dec 2007, Trond Myklebust wrote:
>> Eeewwww.... What an unholy mess!
>>
>> Why are you trying to shoehorn reading and writing into
>> rpcrdma_create_chunks()? The way every action is split into 'if
>> (read) {} else {}' should be an obvious hint that none of this
>> belongs together. Please split this up for 2.6.25.
>
>ok.
Agreed - well except for the "unholy mess" part. This code has been
in the nfs/rdma client basically since it was written (for 2.4.18). The
split version introduces quite a bit of redundancy, but clarity is good
too. I'll split it willingly.
>> So yes, for read buffers, you should be looking at the storage
>> buffer length (xdr_buf->buflen) since the message length is
>> obviously going to be zero.
>
>Agreed. This is why we removed the incorrect references to len in the
>patch.
>
>What is your opinion on including the fix in 2.6.24? It addresses an
>observed problem for certain I/Os over NFS/RDMA. Would you like me to
>resend the patch with a revised description?
The patch basically removes a debug-only assertion (the "failed to marshal"
message), but the single line that tested for buflen > pos has the side
effect of failing to marshal the tail buffer. So that one line is the key which
introduces the issue, and it should be removed for 2.6.24.
The other stuff is harmless, if wrong, and can be removed later.
Tom.
On Thu, 2007-12-06 at 18:11 -0500, James Lentini wrote:
> What is your opinion on including the fix in 2.6.24? It addresses an
> observed problem for certain I/Os over NFS/RDMA. Would you like me to
> resend the patch with a revised description?
Please do.
Trond
On Fri, 7 Dec 2007, Trond Myklebust wrote:
> On Thu, 2007-12-06 at 18:11 -0500, James Lentini wrote:
> > What is your opinion on including the fix in 2.6.24? It addresses an
> > observed problem for certain I/Os over NFS/RDMA. Would you like me to
> > resend the patch with a revised description?
>
> Please do.
rpcrdma_convert_iovs is passed an xdr_buf representing either an RPC
request or an RPC reply. In the case of a request, several
calculations and tests involving pos are unnecessary. In the case of a
reply, several calculations and tests involving pos are incorrect (the
code tests pos against the reply xdr buf's len field, which is always
0 at the time rpcrdma_convert_iovs is executed). This change removes
the incorrect/unnecessary calculations and tests involving pos.
This fixes an observed problem when reading certain file sizes over
NFS/RDMA.
Signed-off-by: Tom Tucker <[email protected]>
Signed-off-by: Tom Talpey <[email protected]>
Signed-off-by: James Lentini <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -92,7 +92,6 @@
seg[n].mr_page = NULL;
seg[n].mr_offset = xdrbuf->head[0].iov_base;
seg[n].mr_len = xdrbuf->head[0].iov_len;
- pos += xdrbuf->head[0].iov_len;
++n;
}
@@ -104,7 +103,6 @@
seg[n].mr_len = min_t(u32,
PAGE_SIZE - xdrbuf->page_base, xdrbuf->page_len);
len = xdrbuf->page_len - seg[n].mr_len;
- pos += len;
++n;
p = 1;
while (len > 0) {
@@ -119,20 +117,15 @@
}
}
- if (pos < xdrbuf->len && xdrbuf->tail[0].iov_len) {
+ if (xdrbuf->tail[0].iov_len) {
if (n == nsegs)
return 0;
seg[n].mr_page = NULL;
seg[n].mr_offset = xdrbuf->tail[0].iov_base;
seg[n].mr_len = xdrbuf->tail[0].iov_len;
- pos += xdrbuf->tail[0].iov_len;
++n;
}
- if (pos < xdrbuf->len)
- dprintk("RPC: %s: marshaled only %d of %d\n",
- __func__, pos, xdrbuf->len);
-
return n;
}