2017-01-23 20:52:40

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 0/5] Fix "support large inline thresholds"

Hi Anna-

I've received a number of reports that v4.9 commit 655fec6987be
("xprtrdma: Use gathered Send for large inline messages") causes
NFS/RDMA mounts to fail for devices that have a small max_sge.

This series addresses that problem. Would you consider this series
of bug fixes for v4.10-rc?


Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git

And for online review:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-for-4.10-rc


Changes since v1:
- Rebased on v4.10-rc5
- Renamed some variables for clarity
- Clarified patch descriptions
- Added Tested-by and Reviewed-by tags

---

Chuck Lever (5):
xprtrdma: Fix Read chunk padding
xprtrdma: Per-connection pad optimization
xprtrdma: Disable pad optimization by default
xprtrdma: Reduce required number of send SGEs
xprtrdma: Shrink send SGEs array


net/sunrpc/xprtrdma/rpc_rdma.c | 62 +++++++++++++++++++++++++--------------
net/sunrpc/xprtrdma/transport.c | 2 +
net/sunrpc/xprtrdma/verbs.c | 15 ++++++---
net/sunrpc/xprtrdma/xprt_rdma.h | 14 ++++++---
4 files changed, 60 insertions(+), 33 deletions(-)

--
Chuck Lever


2017-01-23 20:52:48

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 1/5] xprtrdma: Fix Read chunk padding

When pad optimization is disabled, rpcrdma_convert_iovs still
does not add explicit XDR round-up padding to a Read chunk.

Commit 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling")
incorrectly short-circuited the test for whether round-up padding
is needed that appears later in rpcrdma_convert_iovs.

However, if this is indeed a Read chunk (and not a Position-Zero
Read chunk), the tail iovec _always_ contains the chunk's padding,
and never anything else.

So, it's easy to just skip the tail when padding optimization is
enabled, and add the tail in a subsequent Read chunk segment, if
disabled.

Fixes: 677eb17e94ed ("xprtrdma: Fix XDR tail buffer marshalling")
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c52e0f2..a524d3c 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -226,8 +226,10 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
if (len && n == RPCRDMA_MAX_SEGS)
goto out_overflow;

- /* When encoding the read list, the tail is always sent inline */
- if (type == rpcrdma_readch)
+ /* When encoding a Read chunk, the tail iovec contains an
+ * XDR pad and may be omitted.
+ */
+ if (type == rpcrdma_readch && xprt_rdma_pad_optimize)
return n;

/* When encoding the Write list, some servers need to see an extra
@@ -238,10 +240,6 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
return n;

if (xdrbuf->tail[0].iov_len) {
- /* the rpcrdma protocol allows us to omit any trailing
- * xdr pad bytes, saving the server an RDMA operation. */
- if (xdrbuf->tail[0].iov_len < 4 && xprt_rdma_pad_optimize)
- return n;
n = rpcrdma_convert_kvec(&xdrbuf->tail[0], seg, n);
if (n == RPCRDMA_MAX_SEGS)
goto out_overflow;


2017-01-23 20:54:07

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 5/5] xprtrdma: Shrink send SGEs array

We no longer need to accommodate an xdr_buf whose pages start at an
offset and cross extra page boundaries. If there are more partial or
whole pages to send than there are available SGEs, the marshaling
logic is now smart enough to use a Read chunk instead of failing.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/xprt_rdma.h | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 3d7e9c9..852dd0a 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -305,16 +305,19 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
char *mr_offset; /* kva if no page, else offset */
};

-/* Reserve enough Send SGEs to send a maximum size inline request:
+/* The Send SGE array is provisioned to send a maximum size
+ * inline request:
* - RPC-over-RDMA header
* - xdr_buf head iovec
- * - RPCRDMA_MAX_INLINE bytes, possibly unaligned, in pages
+ * - RPCRDMA_MAX_INLINE bytes, in pages
* - xdr_buf tail iovec
+ *
+ * The actual number of array elements consumed by each RPC
+ * depends on the device's max_sge limit.
*/
enum {
RPCRDMA_MIN_SEND_SGES = 3,
- RPCRDMA_MAX_SEND_PAGES = PAGE_SIZE + RPCRDMA_MAX_INLINE - 1,
- RPCRDMA_MAX_PAGE_SGES = (RPCRDMA_MAX_SEND_PAGES >> PAGE_SHIFT) + 1,
+ RPCRDMA_MAX_PAGE_SGES = RPCRDMA_MAX_INLINE >> PAGE_SHIFT,
RPCRDMA_MAX_SEND_SGES = 1 + 1 + RPCRDMA_MAX_PAGE_SGES + 1,
};



2017-01-23 20:54:07

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 4/5] xprtrdma: Reduce required number of send SGEs

The MAX_SEND_SGES check introduced in commit 655fec6987be
("xprtrdma: Use gathered Send for large inline messages") fails
for devices that have a small max_sge.

Instead of checking for a large fixed maximum number of SGEs,
check for a minimum small number. RPC-over-RDMA will switch to
using a Read chunk if an xdr_buf has more pages than can fit in
the device's max_sge limit. This is considerably better than
failing all together to mount the server.

This fix supports devices that have as few as three send SGEs
available.

Reported-By: Selvin Xavier <[email protected]>
Reported-By: Devesh Sharma <[email protected]>
Reported-by: Honggang Li <[email protected]>
Reported-by: Ram Amrani <[email protected]>
Fixes: 655fec6987be ("xprtrdma: Use gathered Send for large ...")
Tested-by: Honggang Li <[email protected]>
Tested-by: Ram Amrani <[email protected]>
Tested-by: Steve Wise <[email protected]>
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 26 +++++++++++++++++++++++---
net/sunrpc/xprtrdma/verbs.c | 13 +++++++------
net/sunrpc/xprtrdma/xprt_rdma.h | 2 ++
3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c634f0f..d889883 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -125,14 +125,34 @@ void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *r_xprt)
/* The client can send a request inline as long as the RPCRDMA header
* plus the RPC call fit under the transport's inline limit. If the
* combined call message size exceeds that limit, the client must use
- * the read chunk list for this operation.
+ * a Read chunk for this operation.
+ *
+ * A Read chunk is also required if sending the RPC call inline would
+ * exceed this device's max_sge limit.
*/
static bool rpcrdma_args_inline(struct rpcrdma_xprt *r_xprt,
struct rpc_rqst *rqst)
{
- struct rpcrdma_ia *ia = &r_xprt->rx_ia;
+ struct xdr_buf *xdr = &rqst->rq_snd_buf;
+ unsigned int count, remaining, offset;
+
+ if (xdr->len > r_xprt->rx_ia.ri_max_inline_write)
+ return false;

- return rqst->rq_snd_buf.len <= ia->ri_max_inline_write;
+ if (xdr->page_len) {
+ remaining = xdr->page_len;
+ offset = xdr->page_base & ~PAGE_MASK;
+ count = 0;
+ while (remaining) {
+ remaining -= min_t(unsigned int,
+ PAGE_SIZE - offset, remaining);
+ offset = 0;
+ if (++count > r_xprt->rx_ia.ri_max_send_sges)
+ return false;
+ }
+ }
+
+ return true;
}

/* The client can't know how large the actual reply will be. Thus it
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 23f4da4..61d16c3 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -488,18 +488,19 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
*/
int
rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
- struct rpcrdma_create_data_internal *cdata)
+ struct rpcrdma_create_data_internal *cdata)
{
struct rpcrdma_connect_private *pmsg = &ep->rep_cm_private;
+ unsigned int max_qp_wr, max_sge;
struct ib_cq *sendcq, *recvcq;
- unsigned int max_qp_wr;
int rc;

- if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_SEND_SGES) {
- dprintk("RPC: %s: insufficient sge's available\n",
- __func__);
+ max_sge = min(ia->ri_device->attrs.max_sge, RPCRDMA_MAX_SEND_SGES);
+ if (max_sge < RPCRDMA_MIN_SEND_SGES) {
+ pr_warn("rpcrdma: HCA provides only %d send SGEs\n", max_sge);
return -ENOMEM;
}
+ ia->ri_max_send_sges = max_sge - RPCRDMA_MIN_SEND_SGES;

if (ia->ri_device->attrs.max_qp_wr <= RPCRDMA_BACKWARD_WRS) {
dprintk("RPC: %s: insufficient wqe's available\n",
@@ -524,7 +525,7 @@ static void rpcrdma_destroy_id(struct rdma_cm_id *id)
ep->rep_attr.cap.max_recv_wr = cdata->max_requests;
ep->rep_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
ep->rep_attr.cap.max_recv_wr += 1; /* drain cqe */
- ep->rep_attr.cap.max_send_sge = RPCRDMA_MAX_SEND_SGES;
+ ep->rep_attr.cap.max_send_sge = max_sge;
ep->rep_attr.cap.max_recv_sge = 1;
ep->rep_attr.cap.max_inline_data = 0;
ep->rep_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c137154..3d7e9c9 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -74,6 +74,7 @@ struct rpcrdma_ia {
unsigned int ri_max_frmr_depth;
unsigned int ri_max_inline_write;
unsigned int ri_max_inline_read;
+ unsigned int ri_max_send_sges;
bool ri_reminv_expected;
bool ri_implicit_roundup;
enum ib_mr_type ri_mrtype;
@@ -311,6 +312,7 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
* - xdr_buf tail iovec
*/
enum {
+ RPCRDMA_MIN_SEND_SGES = 3,
RPCRDMA_MAX_SEND_PAGES = PAGE_SIZE + RPCRDMA_MAX_INLINE - 1,
RPCRDMA_MAX_PAGE_SGES = (RPCRDMA_MAX_SEND_PAGES >> PAGE_SHIFT) + 1,
RPCRDMA_MAX_SEND_SGES = 1 + 1 + RPCRDMA_MAX_PAGE_SGES + 1,


2017-01-23 21:00:27

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 2/5] xprtrdma: Per-connection pad optimization

Pad optimization is changed by echoing into
/proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
affecting all RPC-over-RDMA connections to all servers.

The marshaling code picks up that value and uses it for decisions
about how to construct each RPC-over-RDMA frame. Having it change
suddenly in mid-operation can result in unexpected failures. And
some servers a client mounts might need chunk round-up, while
others don't.

So instead, copy the pad_optimize setting into each connection's
rpcrdma_ia when the transport is created, and use the copy, which
can't change during the life of the connection, instead.

This also removes a hack: rpcrdma_convert_iovs was using
the remote-invalidation-expected flag to predict when it could leave
out Write chunk padding. This is because the Linux server handles
implicit XDR padding on Write chunks correctly, and only Linux
servers can set the connection's remote-invalidation-expected flag.

It's more sensible to use the pad optimization setting instead.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 28 ++++++++++++++--------------
net/sunrpc/xprtrdma/verbs.c | 1 +
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
3 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index a524d3c..c634f0f 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -186,9 +186,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
*/

static int
-rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
- enum rpcrdma_chunktype type, struct rpcrdma_mr_seg *seg,
- bool reminv_expected)
+rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
+ unsigned int pos, enum rpcrdma_chunktype type,
+ struct rpcrdma_mr_seg *seg)
{
int len, n, p, page_base;
struct page **ppages;
@@ -229,14 +229,15 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
/* When encoding a Read chunk, the tail iovec contains an
* XDR pad and may be omitted.
*/
- if (type == rpcrdma_readch && xprt_rdma_pad_optimize)
+ if (type == rpcrdma_readch && r_xprt->rx_ia.ri_implicit_roundup)
return n;

- /* When encoding the Write list, some servers need to see an extra
- * segment for odd-length Write chunks. The upper layer provides
- * space in the tail iovec for this purpose.
+ /* When encoding a Write chunk, some servers need to see an
+ * extra segment for non-XDR-aligned Write chunks. The upper
+ * layer provides space in the tail iovec that may be used
+ * for this purpose.
*/
- if (type == rpcrdma_writech && reminv_expected)
+ if (type == rpcrdma_writech && r_xprt->rx_ia.ri_implicit_roundup)
return n;

if (xdrbuf->tail[0].iov_len) {
@@ -291,7 +292,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
if (rtype == rpcrdma_areadch)
pos = 0;
seg = req->rl_segments;
- nsegs = rpcrdma_convert_iovs(&rqst->rq_snd_buf, pos, rtype, seg, false);
+ nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_snd_buf, pos,
+ rtype, seg);
if (nsegs < 0)
return ERR_PTR(nsegs);

@@ -353,10 +355,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
}

seg = req->rl_segments;
- nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf,
+ nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_rcv_buf,
rqst->rq_rcv_buf.head[0].iov_len,
- wtype, seg,
- r_xprt->rx_ia.ri_reminv_expected);
+ wtype, seg);
if (nsegs < 0)
return ERR_PTR(nsegs);

@@ -421,8 +422,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
}

seg = req->rl_segments;
- nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, 0, wtype, seg,
- r_xprt->rx_ia.ri_reminv_expected);
+ nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_rcv_buf, 0, wtype, seg);
if (nsegs < 0)
return ERR_PTR(nsegs);

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 11d0774..2a6a367 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -208,6 +208,7 @@

/* Default settings for RPC-over-RDMA Version One */
r_xprt->rx_ia.ri_reminv_expected = false;
+ r_xprt->rx_ia.ri_implicit_roundup = xprt_rdma_pad_optimize;
rsize = RPCRDMA_V1_DEF_INLINE_SIZE;
wsize = RPCRDMA_V1_DEF_INLINE_SIZE;

diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index e35efd4..c137154 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -75,6 +75,7 @@ struct rpcrdma_ia {
unsigned int ri_max_inline_write;
unsigned int ri_max_inline_read;
bool ri_reminv_expected;
+ bool ri_implicit_roundup;
enum ib_mr_type ri_mrtype;
struct ib_qp_attr ri_qp_attr;
struct ib_qp_init_attr ri_qp_init_attr;


2017-01-23 21:02:06

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v2 3/5] xprtrdma: Disable pad optimization by default

Commit d5440e27d3e5 ('xprtrdma: Enable pad optimization') made the
Linux client omit XDR round-up padding in normal Read and Write
chunks so that the client doesn't have to register and invalidate
3-byte memory regions that contain no real data.

Unfortunately, my cheery 2014 assessment that this optimization "is
supported now by both Linux and Solaris servers" was premature.
We've found bugs in Solaris in this area since commit d5440e27d3e5
was merged (SYMLINK is the main culprit).

So for maximum interoperability, I'm disabling this optimization
again. If a CM private message is exchanged when connecting, the
client recognizes that the server is Linux, and enables the
optimization for that connection.

Until now the Solaris server bugs did not impact common operations,
and were thus largely unnoticed. Soon, less capable devices on Linux
NFS/RDMA clients will make use of Read chunks more often, and these
Solaris bugs will prevent interoperation in more cases.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/xprtrdma/transport.c | 2 +-
net/sunrpc/xprtrdma/verbs.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 534c178..6990581 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -67,7 +67,7 @@
static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
static unsigned int xprt_rdma_inline_write_padding;
static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
- int xprt_rdma_pad_optimize = 1;
+ int xprt_rdma_pad_optimize = 0;

#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 2a6a367..23f4da4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -216,6 +216,7 @@
pmsg->cp_magic == rpcrdma_cmp_magic &&
pmsg->cp_version == RPCRDMA_CMP_VERSION) {
r_xprt->rx_ia.ri_reminv_expected = true;
+ r_xprt->rx_ia.ri_implicit_roundup = true;
rsize = rpcrdma_decode_buffer_size(pmsg->cp_send_size);
wsize = rpcrdma_decode_buffer_size(pmsg->cp_recv_size);
}


2017-01-24 19:13:03

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] xprtrdma: Per-connection pad optimization

Hi Chuck,

On 01/23/2017 03:52 PM, Chuck Lever wrote:
> Pad optimization is changed by echoing into
> /proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
> affecting all RPC-over-RDMA connections to all servers.
>
> The marshaling code picks up that value and uses it for decisions
> about how to construct each RPC-over-RDMA frame. Having it change
> suddenly in mid-operation can result in unexpected failures. And
> some servers a client mounts might need chunk round-up, while
> others don't.
>
> So instead, copy the pad_optimize setting into each connection's
> rpcrdma_ia when the transport is created, and use the copy, which
> can't change during the life of the connection, instead.

Does the rdma_pad_optimize option have documentation somewhere that needs to be updated to describe the new behavior?

Anna

>
> This also removes a hack: rpcrdma_convert_iovs was using
> the remote-invalidation-expected flag to predict when it could leave
> out Write chunk padding. This is because the Linux server handles
> implicit XDR padding on Write chunks correctly, and only Linux
> servers can set the connection's remote-invalidation-expected flag.
>
> It's more sensible to use the pad optimization setting instead.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> net/sunrpc/xprtrdma/rpc_rdma.c | 28 ++++++++++++++--------------
> net/sunrpc/xprtrdma/verbs.c | 1 +
> net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
> 3 files changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index a524d3c..c634f0f 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -186,9 +186,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
> */
>
> static int
> -rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
> - enum rpcrdma_chunktype type, struct rpcrdma_mr_seg *seg,
> - bool reminv_expected)
> +rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
> + unsigned int pos, enum rpcrdma_chunktype type,
> + struct rpcrdma_mr_seg *seg)
> {
> int len, n, p, page_base;
> struct page **ppages;
> @@ -229,14 +229,15 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
> /* When encoding a Read chunk, the tail iovec contains an
> * XDR pad and may be omitted.
> */
> - if (type == rpcrdma_readch && xprt_rdma_pad_optimize)
> + if (type == rpcrdma_readch && r_xprt->rx_ia.ri_implicit_roundup)
> return n;
>
> - /* When encoding the Write list, some servers need to see an extra
> - * segment for odd-length Write chunks. The upper layer provides
> - * space in the tail iovec for this purpose.
> + /* When encoding a Write chunk, some servers need to see an
> + * extra segment for non-XDR-aligned Write chunks. The upper
> + * layer provides space in the tail iovec that may be used
> + * for this purpose.
> */
> - if (type == rpcrdma_writech && reminv_expected)
> + if (type == rpcrdma_writech && r_xprt->rx_ia.ri_implicit_roundup)
> return n;
>
> if (xdrbuf->tail[0].iov_len) {
> @@ -291,7 +292,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
> if (rtype == rpcrdma_areadch)
> pos = 0;
> seg = req->rl_segments;
> - nsegs = rpcrdma_convert_iovs(&rqst->rq_snd_buf, pos, rtype, seg, false);
> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_snd_buf, pos,
> + rtype, seg);
> if (nsegs < 0)
> return ERR_PTR(nsegs);
>
> @@ -353,10 +355,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
> }
>
> seg = req->rl_segments;
> - nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf,
> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_rcv_buf,
> rqst->rq_rcv_buf.head[0].iov_len,
> - wtype, seg,
> - r_xprt->rx_ia.ri_reminv_expected);
> + wtype, seg);
> if (nsegs < 0)
> return ERR_PTR(nsegs);
>
> @@ -421,8 +422,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
> }
>
> seg = req->rl_segments;
> - nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, 0, wtype, seg,
> - r_xprt->rx_ia.ri_reminv_expected);
> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_rcv_buf, 0, wtype, seg);
> if (nsegs < 0)
> return ERR_PTR(nsegs);
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 11d0774..2a6a367 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -208,6 +208,7 @@
>
> /* Default settings for RPC-over-RDMA Version One */
> r_xprt->rx_ia.ri_reminv_expected = false;
> + r_xprt->rx_ia.ri_implicit_roundup = xprt_rdma_pad_optimize;
> rsize = RPCRDMA_V1_DEF_INLINE_SIZE;
> wsize = RPCRDMA_V1_DEF_INLINE_SIZE;
>
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index e35efd4..c137154 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -75,6 +75,7 @@ struct rpcrdma_ia {
> unsigned int ri_max_inline_write;
> unsigned int ri_max_inline_read;
> bool ri_reminv_expected;
> + bool ri_implicit_roundup;
> enum ib_mr_type ri_mrtype;
> struct ib_qp_attr ri_qp_attr;
> struct ib_qp_init_attr ri_qp_init_attr;
>

2017-01-24 19:16:19

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] xprtrdma: Per-connection pad optimization


> On Jan 24, 2017, at 2:12 PM, Anna Schumaker <[email protected]> wrote:
>
> Hi Chuck,
>
> On 01/23/2017 03:52 PM, Chuck Lever wrote:
>> Pad optimization is changed by echoing into
>> /proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
>> affecting all RPC-over-RDMA connections to all servers.
>>
>> The marshaling code picks up that value and uses it for decisions
>> about how to construct each RPC-over-RDMA frame. Having it change
>> suddenly in mid-operation can result in unexpected failures. And
>> some servers a client mounts might need chunk round-up, while
>> others don't.
>>
>> So instead, copy the pad_optimize setting into each connection's
>> rpcrdma_ia when the transport is created, and use the copy, which
>> can't change during the life of the connection, instead.
>
> Does the rdma_pad_optimize option have documentation somewhere that needs to be updated to describe the new behavior?

I'm not aware of any. No documentation was changed by d5440e27d3e5.


> Anna
>
>>
>> This also removes a hack: rpcrdma_convert_iovs was using
>> the remote-invalidation-expected flag to predict when it could leave
>> out Write chunk padding. This is because the Linux server handles
>> implicit XDR padding on Write chunks correctly, and only Linux
>> servers can set the connection's remote-invalidation-expected flag.
>>
>> It's more sensible to use the pad optimization setting instead.
>>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> net/sunrpc/xprtrdma/rpc_rdma.c | 28 ++++++++++++++--------------
>> net/sunrpc/xprtrdma/verbs.c | 1 +
>> net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
>> 3 files changed, 16 insertions(+), 14 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>> index a524d3c..c634f0f 100644
>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>> @@ -186,9 +186,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>> */
>>
>> static int
>> -rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
>> - enum rpcrdma_chunktype type, struct rpcrdma_mr_seg *seg,
>> - bool reminv_expected)
>> +rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>> + unsigned int pos, enum rpcrdma_chunktype type,
>> + struct rpcrdma_mr_seg *seg)
>> {
>> int len, n, p, page_base;
>> struct page **ppages;
>> @@ -229,14 +229,15 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>> /* When encoding a Read chunk, the tail iovec contains an
>> * XDR pad and may be omitted.
>> */
>> - if (type == rpcrdma_readch && xprt_rdma_pad_optimize)
>> + if (type == rpcrdma_readch && r_xprt->rx_ia.ri_implicit_roundup)
>> return n;
>>
>> - /* When encoding the Write list, some servers need to see an extra
>> - * segment for odd-length Write chunks. The upper layer provides
>> - * space in the tail iovec for this purpose.
>> + /* When encoding a Write chunk, some servers need to see an
>> + * extra segment for non-XDR-aligned Write chunks. The upper
>> + * layer provides space in the tail iovec that may be used
>> + * for this purpose.
>> */
>> - if (type == rpcrdma_writech && reminv_expected)
>> + if (type == rpcrdma_writech && r_xprt->rx_ia.ri_implicit_roundup)
>> return n;
>>
>> if (xdrbuf->tail[0].iov_len) {
>> @@ -291,7 +292,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>> if (rtype == rpcrdma_areadch)
>> pos = 0;
>> seg = req->rl_segments;
>> - nsegs = rpcrdma_convert_iovs(&rqst->rq_snd_buf, pos, rtype, seg, false);
>> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_snd_buf, pos,
>> + rtype, seg);
>> if (nsegs < 0)
>> return ERR_PTR(nsegs);
>>
>> @@ -353,10 +355,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>> }
>>
>> seg = req->rl_segments;
>> - nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf,
>> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_rcv_buf,
>> rqst->rq_rcv_buf.head[0].iov_len,
>> - wtype, seg,
>> - r_xprt->rx_ia.ri_reminv_expected);
>> + wtype, seg);
>> if (nsegs < 0)
>> return ERR_PTR(nsegs);
>>
>> @@ -421,8 +422,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>> }
>>
>> seg = req->rl_segments;
>> - nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, 0, wtype, seg,
>> - r_xprt->rx_ia.ri_reminv_expected);
>> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_rcv_buf, 0, wtype, seg);
>> if (nsegs < 0)
>> return ERR_PTR(nsegs);
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 11d0774..2a6a367 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -208,6 +208,7 @@
>>
>> /* Default settings for RPC-over-RDMA Version One */
>> r_xprt->rx_ia.ri_reminv_expected = false;
>> + r_xprt->rx_ia.ri_implicit_roundup = xprt_rdma_pad_optimize;
>> rsize = RPCRDMA_V1_DEF_INLINE_SIZE;
>> wsize = RPCRDMA_V1_DEF_INLINE_SIZE;
>>
>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>> index e35efd4..c137154 100644
>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>> @@ -75,6 +75,7 @@ struct rpcrdma_ia {
>> unsigned int ri_max_inline_write;
>> unsigned int ri_max_inline_read;
>> bool ri_reminv_expected;
>> + bool ri_implicit_roundup;
>> enum ib_mr_type ri_mrtype;
>> struct ib_qp_attr ri_qp_attr;
>> struct ib_qp_init_attr ri_qp_init_attr;
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-01-24 19:23:19

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] xprtrdma: Per-connection pad optimization



On 01/24/2017 02:16 PM, Chuck Lever wrote:
>
>> On Jan 24, 2017, at 2:12 PM, Anna Schumaker <[email protected]> wrote:
>>
>> Hi Chuck,
>>
>> On 01/23/2017 03:52 PM, Chuck Lever wrote:
>>> Pad optimization is changed by echoing into
>>> /proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
>>> affecting all RPC-over-RDMA connections to all servers.
>>>
>>> The marshaling code picks up that value and uses it for decisions
>>> about how to construct each RPC-over-RDMA frame. Having it change
>>> suddenly in mid-operation can result in unexpected failures. And
>>> some servers a client mounts might need chunk round-up, while
>>> others don't.
>>>
>>> So instead, copy the pad_optimize setting into each connection's
>>> rpcrdma_ia when the transport is created, and use the copy, which
>>> can't change during the life of the connection, instead.
>>
>> Does the rdma_pad_optimize option have documentation somewhere that needs to be updated to describe the new behavior?
>
> I'm not aware of any. No documentation was changed by d5440e27d3e5.

Okay, I just wanted to make sure everything was covered :)

Thanks,
Anna

>
>
>> Anna
>>
>>>
>>> This also removes a hack: rpcrdma_convert_iovs was using
>>> the remote-invalidation-expected flag to predict when it could leave
>>> out Write chunk padding. This is because the Linux server handles
>>> implicit XDR padding on Write chunks correctly, and only Linux
>>> servers can set the connection's remote-invalidation-expected flag.
>>>
>>> It's more sensible to use the pad optimization setting instead.
>>>
>>> Signed-off-by: Chuck Lever <[email protected]>
>>> ---
>>> net/sunrpc/xprtrdma/rpc_rdma.c | 28 ++++++++++++++--------------
>>> net/sunrpc/xprtrdma/verbs.c | 1 +
>>> net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
>>> 3 files changed, 16 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> index a524d3c..c634f0f 100644
>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> @@ -186,9 +186,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>>> */
>>>
>>> static int
>>> -rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
>>> - enum rpcrdma_chunktype type, struct rpcrdma_mr_seg *seg,
>>> - bool reminv_expected)
>>> +rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct xdr_buf *xdrbuf,
>>> + unsigned int pos, enum rpcrdma_chunktype type,
>>> + struct rpcrdma_mr_seg *seg)
>>> {
>>> int len, n, p, page_base;
>>> struct page **ppages;
>>> @@ -229,14 +229,15 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>>> /* When encoding a Read chunk, the tail iovec contains an
>>> * XDR pad and may be omitted.
>>> */
>>> - if (type == rpcrdma_readch && xprt_rdma_pad_optimize)
>>> + if (type == rpcrdma_readch && r_xprt->rx_ia.ri_implicit_roundup)
>>> return n;
>>>
>>> - /* When encoding the Write list, some servers need to see an extra
>>> - * segment for odd-length Write chunks. The upper layer provides
>>> - * space in the tail iovec for this purpose.
>>> + /* When encoding a Write chunk, some servers need to see an
>>> + * extra segment for non-XDR-aligned Write chunks. The upper
>>> + * layer provides space in the tail iovec that may be used
>>> + * for this purpose.
>>> */
>>> - if (type == rpcrdma_writech && reminv_expected)
>>> + if (type == rpcrdma_writech && r_xprt->rx_ia.ri_implicit_roundup)
>>> return n;
>>>
>>> if (xdrbuf->tail[0].iov_len) {
>>> @@ -291,7 +292,8 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>>> if (rtype == rpcrdma_areadch)
>>> pos = 0;
>>> seg = req->rl_segments;
>>> - nsegs = rpcrdma_convert_iovs(&rqst->rq_snd_buf, pos, rtype, seg, false);
>>> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_snd_buf, pos,
>>> + rtype, seg);
>>> if (nsegs < 0)
>>> return ERR_PTR(nsegs);
>>>
>>> @@ -353,10 +355,9 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>>> }
>>>
>>> seg = req->rl_segments;
>>> - nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf,
>>> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_rcv_buf,
>>> rqst->rq_rcv_buf.head[0].iov_len,
>>> - wtype, seg,
>>> - r_xprt->rx_ia.ri_reminv_expected);
>>> + wtype, seg);
>>> if (nsegs < 0)
>>> return ERR_PTR(nsegs);
>>>
>>> @@ -421,8 +422,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
>>> }
>>>
>>> seg = req->rl_segments;
>>> - nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, 0, wtype, seg,
>>> - r_xprt->rx_ia.ri_reminv_expected);
>>> + nsegs = rpcrdma_convert_iovs(r_xprt, &rqst->rq_rcv_buf, 0, wtype, seg);
>>> if (nsegs < 0)
>>> return ERR_PTR(nsegs);
>>>
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index 11d0774..2a6a367 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -208,6 +208,7 @@
>>>
>>> /* Default settings for RPC-over-RDMA Version One */
>>> r_xprt->rx_ia.ri_reminv_expected = false;
>>> + r_xprt->rx_ia.ri_implicit_roundup = xprt_rdma_pad_optimize;
>>> rsize = RPCRDMA_V1_DEF_INLINE_SIZE;
>>> wsize = RPCRDMA_V1_DEF_INLINE_SIZE;
>>>
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> index e35efd4..c137154 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -75,6 +75,7 @@ struct rpcrdma_ia {
>>> unsigned int ri_max_inline_write;
>>> unsigned int ri_max_inline_read;
>>> bool ri_reminv_expected;
>>> + bool ri_implicit_roundup;
>>> enum ib_mr_type ri_mrtype;
>>> struct ib_qp_attr ri_qp_attr;
>>> struct ib_qp_init_attr ri_qp_init_attr;
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>

2017-01-24 21:36:25

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Fix "support large inline thresholds"

Hi Chuck,

On 01/23/2017 03:52 PM, Chuck Lever wrote:
> Hi Anna-
>
> I've received a number of reports that v4.9 commit 655fec6987be
> ("xprtrdma: Use gathered Send for large inline messages") causes
> NFS/RDMA mounts to fail for devices that have a small max_sge.
>
> This series addresses that problem. Would you consider this series
> of bug fixes for v4.10-rc?

I noticed that patches one and four are the only ones with a "Fixes:" tag. Do you need all five patches to fix the problem, or can the rest be deferred to 4.11?

Thanks,
Anna

>
>
> Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
> And for online review:
>
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-for-4.10-rc
>
>
> Changes since v1:
> - Rebased on v4.10-rc5
> - Renamed some variables for clarity
> - Clarified patch descriptions
> - Added Tested-by and Reviewed-by tags
>
> ---
>
> Chuck Lever (5):
> xprtrdma: Fix Read chunk padding
> xprtrdma: Per-connection pad optimization
> xprtrdma: Disable pad optimization by default
> xprtrdma: Reduce required number of send SGEs
> xprtrdma: Shrink send SGEs array
>
>
> net/sunrpc/xprtrdma/rpc_rdma.c | 62 +++++++++++++++++++++++++--------------
> net/sunrpc/xprtrdma/transport.c | 2 +
> net/sunrpc/xprtrdma/verbs.c | 15 ++++++---
> net/sunrpc/xprtrdma/xprt_rdma.h | 14 ++++++---
> 4 files changed, 60 insertions(+), 33 deletions(-)
>
> --
> Chuck Lever
>

2017-01-24 21:46:08

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Fix "support large inline thresholds"


> On Jan 24, 2017, at 4:35 PM, Anna Schumaker <[email protected]> wrote:
>
> Hi Chuck,
>
> On 01/23/2017 03:52 PM, Chuck Lever wrote:
>> Hi Anna-
>>
>> I've received a number of reports that v4.9 commit 655fec6987be
>> ("xprtrdma: Use gathered Send for large inline messages") causes
>> NFS/RDMA mounts to fail for devices that have a small max_sge.
>>
>> This series addresses that problem. Would you consider this series
>> of bug fixes for v4.10-rc?
>
> I noticed that patches one and four are the only ones with a "Fixes:" tag. Do you need all five patches to fix the problem, or can the rest be deferred to 4.11?

5/5 can be deferred if you want. The others are all required to
fix the problem. However, these were tested as a series. I'm
reticent to split them at this point.

Should 1 - 4 have a "Fixes: 655fec6987be ("xprtrdma: Use
gathered Send for large ...")" ?


> Thanks,
> Anna
>
>>
>>
>> Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:
>>
>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>>
>> And for online review:
>>
>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-for-4.10-rc
>>
>>
>> Changes since v1:
>> - Rebased on v4.10-rc5
>> - Renamed some variables for clarity
>> - Clarified patch descriptions
>> - Added Tested-by and Reviewed-by tags
>>
>> ---
>>
>> Chuck Lever (5):
>> xprtrdma: Fix Read chunk padding
>> xprtrdma: Per-connection pad optimization
>> xprtrdma: Disable pad optimization by default
>> xprtrdma: Reduce required number of send SGEs
>> xprtrdma: Shrink send SGEs array
>>
>>
>> net/sunrpc/xprtrdma/rpc_rdma.c | 62 +++++++++++++++++++++++++--------------
>> net/sunrpc/xprtrdma/transport.c | 2 +
>> net/sunrpc/xprtrdma/verbs.c | 15 ++++++---
>> net/sunrpc/xprtrdma/xprt_rdma.h | 14 ++++++---
>> 4 files changed, 60 insertions(+), 33 deletions(-)
>>
>> --
>> Chuck Lever
>>

--
Chuck Lever




2017-02-01 18:16:00

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Fix "support large inline thresholds"


> On Jan 24, 2017, at 4:46 PM, Chuck Lever <[email protected]> wrote:
>
>>
>> On Jan 24, 2017, at 4:35 PM, Anna Schumaker <[email protected]> wrote:
>>
>> Hi Chuck,
>>
>> On 01/23/2017 03:52 PM, Chuck Lever wrote:
>>> Hi Anna-
>>>
>>> I've received a number of reports that v4.9 commit 655fec6987be
>>> ("xprtrdma: Use gathered Send for large inline messages") causes
>>> NFS/RDMA mounts to fail for devices that have a small max_sge.
>>>
>>> This series addresses that problem. Would you consider this series
>>> of bug fixes for v4.10-rc?
>>
>> I noticed that patches one and four are the only ones with a "Fixes:" tag. Do you need all five patches to fix the problem, or can the rest be deferred to 4.11?
>
> 5/5 can be deferred if you want. The others are all required to
> fix the problem. However, these were tested as a series. I'm
> reticent to split them at this point.
>
> Should 1 - 4 have a "Fixes: 655fec6987be ("xprtrdma: Use
> gathered Send for large ...")" ?

Hi Anna, is there anything more you need from me? As above,
I was hoping these could be included in v4.10-rc, which is
likely to be closed in just a week or two.


>
>> Thanks,
>> Anna
>>
>>>
>>>
>>> Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:
>>>
>>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>>>
>>> And for online review:
>>>
>>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-for-4.10-rc
>>>
>>>
>>> Changes since v1:
>>> - Rebased on v4.10-rc5
>>> - Renamed some variables for clarity
>>> - Clarified patch descriptions
>>> - Added Tested-by and Reviewed-by tags
>>>
>>> ---
>>>
>>> Chuck Lever (5):
>>> xprtrdma: Fix Read chunk padding
>>> xprtrdma: Per-connection pad optimization
>>> xprtrdma: Disable pad optimization by default
>>> xprtrdma: Reduce required number of send SGEs
>>> xprtrdma: Shrink send SGEs array
>>>
>>>
>>> net/sunrpc/xprtrdma/rpc_rdma.c | 62 +++++++++++++++++++++++++--------------
>>> net/sunrpc/xprtrdma/transport.c | 2 +
>>> net/sunrpc/xprtrdma/verbs.c | 15 ++++++---
>>> net/sunrpc/xprtrdma/xprt_rdma.h | 14 ++++++---
>>> 4 files changed, 60 insertions(+), 33 deletions(-)
>>>
>>> --
>>> Chuck Lever
>>>
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-02-08 16:48:03

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Fix "support large inline thresholds"


> On Feb 1, 2017, at 1:15 PM, Chuck Lever <[email protected]> wrote:
>
>>
>> On Jan 24, 2017, at 4:46 PM, Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Jan 24, 2017, at 4:35 PM, Anna Schumaker <[email protected]> wrote:
>>>
>>> Hi Chuck,
>>>
>>> On 01/23/2017 03:52 PM, Chuck Lever wrote:
>>>> Hi Anna-
>>>>
>>>> I've received a number of reports that v4.9 commit 655fec6987be
>>>> ("xprtrdma: Use gathered Send for large inline messages") causes
>>>> NFS/RDMA mounts to fail for devices that have a small max_sge.
>>>>
>>>> This series addresses that problem. Would you consider this series
>>>> of bug fixes for v4.10-rc?
>>>
>>> I noticed that patches one and four are the only ones with a "Fixes:" tag. Do you need all five patches to fix the problem, or can the rest be deferred to 4.11?
>>
>> 5/5 can be deferred if you want. The others are all required to
>> fix the problem. However, these were tested as a series. I'm
>> reticent to split them at this point.
>>
>> Should 1 - 4 have a "Fixes: 655fec6987be ("xprtrdma: Use
>> gathered Send for large ...")" ?
>
> Hi Anna, is there anything more you need from me? As above,
> I was hoping these could be included in v4.10-rc, which is
> likely to be closed in just a week or two.

Hi Anna-

Since I haven't heard from you regarding this series, and v4.10
final is imminent, I will assume these fixes should be folded
into nfs-rdma-for-4.11. v3 of nfs-rdma-for-4.11 will include
these patches.


>>> Thanks,
>>> Anna
>>>
>>>>
>>>>
>>>> Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:
>>>>
>>>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>>>>
>>>> And for online review:
>>>>
>>>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-for-4.10-rc
>>>>
>>>>
>>>> Changes since v1:
>>>> - Rebased on v4.10-rc5
>>>> - Renamed some variables for clarity
>>>> - Clarified patch descriptions
>>>> - Added Tested-by and Reviewed-by tags
>>>>
>>>> ---
>>>>
>>>> Chuck Lever (5):
>>>> xprtrdma: Fix Read chunk padding
>>>> xprtrdma: Per-connection pad optimization
>>>> xprtrdma: Disable pad optimization by default
>>>> xprtrdma: Reduce required number of send SGEs
>>>> xprtrdma: Shrink send SGEs array
>>>>
>>>>
>>>> net/sunrpc/xprtrdma/rpc_rdma.c | 62 +++++++++++++++++++++++++--------------
>>>> net/sunrpc/xprtrdma/transport.c | 2 +
>>>> net/sunrpc/xprtrdma/verbs.c | 15 ++++++---
>>>> net/sunrpc/xprtrdma/xprt_rdma.h | 14 ++++++---
>>>> 4 files changed, 60 insertions(+), 33 deletions(-)
>>>>
>>>> --
>>>> Chuck Lever
>>>>
>>
>> --
>> Chuck Lever
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-02-08 18:28:49

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] Fix "support large inline thresholds"



On 02/08/2017 11:47 AM, Chuck Lever wrote:
>
>> On Feb 1, 2017, at 1:15 PM, Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Jan 24, 2017, at 4:46 PM, Chuck Lever <[email protected]> wrote:
>>>
>>>>
>>>> On Jan 24, 2017, at 4:35 PM, Anna Schumaker <[email protected]> wrote:
>>>>
>>>> Hi Chuck,
>>>>
>>>> On 01/23/2017 03:52 PM, Chuck Lever wrote:
>>>>> Hi Anna-
>>>>>
>>>>> I've received a number of reports that v4.9 commit 655fec6987be
>>>>> ("xprtrdma: Use gathered Send for large inline messages") causes
>>>>> NFS/RDMA mounts to fail for devices that have a small max_sge.
>>>>>
>>>>> This series addresses that problem. Would you consider this series
>>>>> of bug fixes for v4.10-rc?
>>>>
>>>> I noticed that patches one and four are the only ones with a "Fixes:" tag. Do you need all five patches to fix the problem, or can the rest be deferred to 4.11?
>>>
>>> 5/5 can be deferred if you want. The others are all required to
>>> fix the problem. However, these were tested as a series. I'm
>>> reticent to split them at this point.
>>>
>>> Should 1 - 4 have a "Fixes: 655fec6987be ("xprtrdma: Use
>>> gathered Send for large ...")" ?
>>
>> Hi Anna, is there anything more you need from me? As above,
>> I was hoping these could be included in v4.10-rc, which is
>> likely to be closed in just a week or two.
>
> Hi Anna-
>
> Since I haven't heard from you regarding this series, and v4.10
> final is imminent, I will assume these fixes should be folded
> into nfs-rdma-for-4.11. v3 of nfs-rdma-for-4.11 will include
> these patches.

Sorry that I dropped the ball on these :( I was considering sending them for v4.11 but with a "Cc stable" so that they get backported to v4.9 and v4.10 kernels. Rolling them into v3 sounds good to me.

Thanks,
Anna

>
>
>>>> Thanks,
>>>> Anna
>>>>
>>>>>
>>>>>
>>>>> Available in the "nfs-rdma-for-4.10-rc" topic branch of this git repo:
>>>>>
>>>>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>>>>>
>>>>> And for online review:
>>>>>
>>>>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-for-4.10-rc
>>>>>
>>>>>
>>>>> Changes since v1:
>>>>> - Rebased on v4.10-rc5
>>>>> - Renamed some variables for clarity
>>>>> - Clarified patch descriptions
>>>>> - Added Tested-by and Reviewed-by tags
>>>>>
>>>>> ---
>>>>>
>>>>> Chuck Lever (5):
>>>>> xprtrdma: Fix Read chunk padding
>>>>> xprtrdma: Per-connection pad optimization
>>>>> xprtrdma: Disable pad optimization by default
>>>>> xprtrdma: Reduce required number of send SGEs
>>>>> xprtrdma: Shrink send SGEs array
>>>>>
>>>>>
>>>>> net/sunrpc/xprtrdma/rpc_rdma.c | 62 +++++++++++++++++++++++++--------------
>>>>> net/sunrpc/xprtrdma/transport.c | 2 +
>>>>> net/sunrpc/xprtrdma/verbs.c | 15 ++++++---
>>>>> net/sunrpc/xprtrdma/xprt_rdma.h | 14 ++++++---
>>>>> 4 files changed, 60 insertions(+), 33 deletions(-)
>>>>>
>>>>> --
>>>>> Chuck Lever
>>>>>
>>>
>>> --
>>> Chuck Lever
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>