2015-07-09 20:45:02

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 0/5] NFS/RDMA server side for Linux 4.3

Some bug fixes and clean up patches, and one change to increase
the maximum NFS r/wsize to one megabyte.

These can be pulled from the "nfsd-rdma-for-4.3" branch in this repo:

git://git.linux-nfs.org/projects/cel/cel-2.6.git

Or can be browsed here:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.3

---

Chuck Lever (4):
svcrdma: Boost NFS READ/WRITE payload size maximum
svcrdma: Remove svc_rdma_fastreg()
svcrdma: Clean up svc_rdma_get_reply_array()
svcrdma: Fix send_reply() scatter/gather set-up

Shirley Ma (1):
NFS/RDMA Release resources in svcrdma when device is removed


include/linux/sunrpc/svc_rdma.h | 84 +-----------------------------
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 83 +++++++++++++++++++++++++++++-
net/sunrpc/xprtrdma/svc_rdma_transport.c | 35 -------------
3 files changed, 86 insertions(+), 116 deletions(-)

--
Chuck Lever


2015-07-09 20:45:21

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 2/5] svcrdma: Fix send_reply() scatter/gather set-up

The Linux NFS server returns garbage in the data payload of inline
NFS/RDMA READ replies. These are READs of under 1000 bytes or so
where the client has not provided either a reply chunk or a write
list.

The NFS server delivers the data payload for an NFS READ reply to
the transport in an xdr_buf page list. If the NFS client did not
provide a reply chunk or a write list, send_reply() is supposed to
set up a separate sge for the page containing the READ data, and
another sge for XDR padding if needed, then post all of the sges via
a single SEND Work Request.

The problem is send_reply() does not advance through the xdr_buf
when setting up scatter/gather entries for SEND WR. It always calls
dma_map_xdr with xdr_off set to zero. When there's more than one
sge, dma_map_xdr() sets up the SEND sge's so they all point to the
xdr_buf's head.

The current Linux NFS/RDMA client always provides a reply chunk or
a write list when performing an NFS READ over RDMA. Therefore, it
does not exercise this particular case. The Linux server has never
had to use more than one extra sge for building RPC/RDMA replies
with a Linux client.

However, an NFS/RDMA client _is_ allowed to send small NFS READs
without setting up a write list or reply chunk. The NFS READ reply
fits entirely within the inline reply buffer in this case. This is
perhaps a more efficient way of performing NFS READs that the Linux
NFS/RDMA client may some day adopt.

Fixes: b432e6b3d9c1 ('svcrdma: Change DMA mapping logic to . . .')
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285
Signed-off-by: Chuck Lever <[email protected]>
---

net/sunrpc/xprtrdma/svc_rdma_sendto.c | 10 +++++++++-
1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index d25cd43..95412ab 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -384,6 +384,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
int byte_count)
{
struct ib_send_wr send_wr;
+ u32 xdr_off;
int sge_no;
int sge_bytes;
int page_no;
@@ -418,8 +419,8 @@ static int send_reply(struct svcxprt_rdma *rdma,
ctxt->direction = DMA_TO_DEVICE;

/* Map the payload indicated by 'byte_count' */
+ xdr_off = 0;
for (sge_no = 1; byte_count && sge_no < vec->count; sge_no++) {
- int xdr_off = 0;
sge_bytes = min_t(size_t, vec->sge[sge_no].iov_len, byte_count);
byte_count -= sge_bytes;
ctxt->sge[sge_no].addr =
@@ -457,6 +458,13 @@ static int send_reply(struct svcxprt_rdma *rdma,
}
rqstp->rq_next_page = rqstp->rq_respages + 1;

+ /* The loop above bumps sc_dma_used for each sge. The
+ * xdr_buf.tail gets a separate sge, but resides in the
+ * same page as xdr_buf.head. Don't count it twice.
+ */
+ if (sge_no > ctxt->count)
+ atomic_dec(&rdma->sc_dma_used);
+
if (sge_no > rdma->sc_max_sge) {
pr_err("svcrdma: Too many sges (%d)\n", sge_no);
goto err;


2015-07-09 20:45:40

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 4/5] svcrdma: Remove svc_rdma_fastreg()

Commit 0bf4828983df ("svcrdma: refactor marshalling logic") removed
the last call site for svc_rdma_fastreg().

Signed-off-by: Chuck Lever <[email protected]>
---

include/linux/sunrpc/svc_rdma.h | 1 -
net/sunrpc/xprtrdma/svc_rdma_transport.c | 34 ------------------------------
2 files changed, 0 insertions(+), 35 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index ca4d86a..13af61b 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -227,7 +227,6 @@ extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
extern void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt);
extern struct svc_rdma_req_map *svc_rdma_get_req_map(void);
extern void svc_rdma_put_req_map(struct svc_rdma_req_map *);
-extern int svc_rdma_fastreg(struct svcxprt_rdma *, struct svc_rdma_fastreg_mr *);
extern struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *);
extern void svc_rdma_put_frmr(struct svcxprt_rdma *,
struct svc_rdma_fastreg_mr *);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index f4b9732..4054a9d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -1202,40 +1202,6 @@ static int svc_rdma_secure_port(struct svc_rqst *rqstp)
return 1;
}

-/*
- * Attempt to register the kvec representing the RPC memory with the
- * device.
- *
- * Returns:
- * NULL : The device does not support fastreg or there were no more
- * fastreg mr.
- * frmr : The kvec register request was successfully posted.
- * <0 : An error was encountered attempting to register the kvec.
- */
-int svc_rdma_fastreg(struct svcxprt_rdma *xprt,
- struct svc_rdma_fastreg_mr *frmr)
-{
- struct ib_send_wr fastreg_wr;
- u8 key;
-
- /* Bump the key */
- key = (u8)(frmr->mr->lkey & 0x000000FF);
- ib_update_fast_reg_key(frmr->mr, ++key);
-
- /* Prepare FASTREG WR */
- memset(&fastreg_wr, 0, sizeof fastreg_wr);
- fastreg_wr.opcode = IB_WR_FAST_REG_MR;
- fastreg_wr.send_flags = IB_SEND_SIGNALED;
- fastreg_wr.wr.fast_reg.iova_start = (unsigned long)frmr->kva;
- fastreg_wr.wr.fast_reg.page_list = frmr->page_list;
- fastreg_wr.wr.fast_reg.page_list_len = frmr->page_list_len;
- fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
- fastreg_wr.wr.fast_reg.length = frmr->map_len;
- fastreg_wr.wr.fast_reg.access_flags = frmr->access_flags;
- fastreg_wr.wr.fast_reg.rkey = frmr->mr->lkey;
- return svc_rdma_send(xprt, &fastreg_wr);
-}
-
int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
{
struct ib_send_wr *bad_wr, *n_wr;


2015-07-09 20:45:50

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

Increased to 1 megabyte.

Signed-off-by: Chuck Lever <[email protected]>
---

include/linux/sunrpc/svc_rdma.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 13af61b..1bca6dd 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -172,7 +172,7 @@ struct svcxprt_rdma {
#define RDMAXPRT_SQ_PENDING 2
#define RDMAXPRT_CONN_PENDING 3

-#define RPCRDMA_MAX_SVC_SEGS (64) /* server max scatter/gather */
+#define RPCRDMA_MAX_SVC_SEGS (256) /* server max scatter/gather */
#if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
#define RPCRDMA_MAXPAYLOAD RPCSVC_MAXPAYLOAD
#else


2015-07-09 20:45:31

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 3/5] svcrdma: Clean up svc_rdma_get_reply_array()

Kernel coding conventions frown upon having large nontrivial
functions in header files, and the preference these days is to
allow the compiler to make inlining decisions if possible.

As these functions are re-homed into a .c file, be sure that
comparisons with fields in struct rpcrdma_msg are with be32
constants.

This is a refactoring change; no behavior change is intended.

Signed-off-by: Chuck Lever <[email protected]>
---

include/linux/sunrpc/svc_rdma.h | 81 +--------------------------------
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 73 ++++++++++++++++++++++++++++++
2 files changed, 75 insertions(+), 79 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index cb94ee4..ca4d86a 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -213,6 +213,8 @@ extern int rdma_read_chunk_frmr(struct svcxprt_rdma *, struct svc_rqst *,

/* svc_rdma_sendto.c */
extern int svc_rdma_sendto(struct svc_rqst *);
+extern struct rpcrdma_read_chunk *
+ svc_rdma_get_read_chunk(struct rpcrdma_msg *);

/* svc_rdma_transport.c */
extern int svc_rdma_send(struct svcxprt_rdma *, struct ib_send_wr *);
@@ -238,83 +240,4 @@ extern void svc_rdma_prep_reply_hdr(struct svc_rqst *);
extern int svc_rdma_init(void);
extern void svc_rdma_cleanup(void);

-/*
- * Returns the address of the first read chunk or <nul> if no read chunk is
- * present
- */
-static inline struct rpcrdma_read_chunk *
-svc_rdma_get_read_chunk(struct rpcrdma_msg *rmsgp)
-{
- struct rpcrdma_read_chunk *ch =
- (struct rpcrdma_read_chunk *)&rmsgp->rm_body.rm_chunks[0];
-
- if (ch->rc_discrim == 0)
- return NULL;
-
- return ch;
-}
-
-/*
- * Returns the address of the first read write array element or <nul> if no
- * write array list is present
- */
-static inline struct rpcrdma_write_array *
-svc_rdma_get_write_array(struct rpcrdma_msg *rmsgp)
-{
- if (rmsgp->rm_body.rm_chunks[0] != 0
- || rmsgp->rm_body.rm_chunks[1] == 0)
- return NULL;
-
- return (struct rpcrdma_write_array *)&rmsgp->rm_body.rm_chunks[1];
-}
-
-/*
- * Returns the address of the first reply array element or <nul> if no
- * reply array is present
- */
-static inline struct rpcrdma_write_array *
-svc_rdma_get_reply_array(struct rpcrdma_msg *rmsgp)
-{
- struct rpcrdma_read_chunk *rch;
- struct rpcrdma_write_array *wr_ary;
- struct rpcrdma_write_array *rp_ary;
-
- /* XXX: Need to fix when reply list may occur with read-list and/or
- * write list */
- if (rmsgp->rm_body.rm_chunks[0] != 0 ||
- rmsgp->rm_body.rm_chunks[1] != 0)
- return NULL;
-
- rch = svc_rdma_get_read_chunk(rmsgp);
- if (rch) {
- while (rch->rc_discrim)
- rch++;
-
- /* The reply list follows an empty write array located
- * at 'rc_position' here. The reply array is at rc_target.
- */
- rp_ary = (struct rpcrdma_write_array *)&rch->rc_target;
-
- goto found_it;
- }
-
- wr_ary = svc_rdma_get_write_array(rmsgp);
- if (wr_ary) {
- rp_ary = (struct rpcrdma_write_array *)
- &wr_ary->
- wc_array[ntohl(wr_ary->wc_nchunks)].wc_target.rs_length;
-
- goto found_it;
- }
-
- /* No read list, no write list */
- rp_ary = (struct rpcrdma_write_array *)
- &rmsgp->rm_body.rm_chunks[2];
-
- found_it:
- if (rp_ary->wc_discrim == 0)
- return NULL;
-
- return rp_ary;
-}
#endif
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 95412ab..1dfae83 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -136,6 +136,79 @@ static dma_addr_t dma_map_xdr(struct svcxprt_rdma *xprt,
return dma_addr;
}

+/* Returns the address of the first read chunk or <nul> if no read chunk
+ * is present
+ */
+struct rpcrdma_read_chunk *
+svc_rdma_get_read_chunk(struct rpcrdma_msg *rmsgp)
+{
+ struct rpcrdma_read_chunk *ch =
+ (struct rpcrdma_read_chunk *)&rmsgp->rm_body.rm_chunks[0];
+
+ if (ch->rc_discrim == xdr_zero)
+ return NULL;
+ return ch;
+}
+
+/* Returns the address of the first read write array element or <nul>
+ * if no write array list is present
+ */
+static struct rpcrdma_write_array *
+svc_rdma_get_write_array(struct rpcrdma_msg *rmsgp)
+{
+ if (rmsgp->rm_body.rm_chunks[0] != xdr_zero ||
+ rmsgp->rm_body.rm_chunks[1] == xdr_zero)
+ return NULL;
+ return (struct rpcrdma_write_array *)&rmsgp->rm_body.rm_chunks[1];
+}
+
+/* Returns the address of the first reply array element or <nul> if no
+ * reply array is present
+ */
+static struct rpcrdma_write_array *
+svc_rdma_get_reply_array(struct rpcrdma_msg *rmsgp)
+{
+ struct rpcrdma_read_chunk *rch;
+ struct rpcrdma_write_array *wr_ary;
+ struct rpcrdma_write_array *rp_ary;
+
+ /* XXX: Need to fix when reply chunk may occur with read list
+ * and/or write list.
+ */
+ if (rmsgp->rm_body.rm_chunks[0] != xdr_zero ||
+ rmsgp->rm_body.rm_chunks[1] != xdr_zero)
+ return NULL;
+
+ rch = svc_rdma_get_read_chunk(rmsgp);
+ if (rch) {
+ while (rch->rc_discrim != xdr_zero)
+ rch++;
+
+ /* The reply chunk follows an empty write array located
+ * at 'rc_position' here. The reply array is at rc_target.
+ */
+ rp_ary = (struct rpcrdma_write_array *)&rch->rc_target;
+ goto found_it;
+ }
+
+ wr_ary = svc_rdma_get_write_array(rmsgp);
+ if (wr_ary) {
+ int chunk = be32_to_cpu(wr_ary->wc_nchunks);
+
+ rp_ary = (struct rpcrdma_write_array *)
+ &wr_ary->wc_array[chunk].wc_target.rs_length;
+ goto found_it;
+ }
+
+ /* No read list, no write list */
+ rp_ary = (struct rpcrdma_write_array *)&rmsgp->rm_body.rm_chunks[2];
+
+ found_it:
+ if (rp_ary->wc_discrim == xdr_zero)
+ return NULL;
+ return rp_ary;
+}
+
/* Assumptions:
* - The specified write_len can be represented in sc_max_sge * PAGE_SIZE
*/


2015-07-09 20:45:12

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v1 1/5] NFS/RDMA Release resources in svcrdma when device is removed

From: Shirley Ma <[email protected]>

When removing underlying RDMA device, the rmmod will hang forever if there
are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts
could also prevent the server from shutting down. Further debugging shows
that the existing connections are not teared down and resource are not
released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the
original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler
thus svc_xprt_free is never invoked to release the existing connection
resources.

The patch has been passed removing, adding device back and forth without
stopping NFS/RDMA service. This will also allow a device to be unplugged
and swapped out without shutting down NFS service.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=252
Signed-off-by: Shirley Ma <[email protected]>
Reviewed-by: Chuck Lever <[email protected]>
---

net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 6b36279..f4b9732 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -659,6 +659,7 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
if (xprt) {
set_bit(XPT_CLOSE, &xprt->xpt_flags);
svc_xprt_enqueue(xprt);
+ svc_xprt_put(xprt);
}
break;
default:


2015-07-10 14:18:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote:
> Increased to 1 megabyte.

Why not more or less?

Why do we even have this constant, why shouldn't we just use
RPCSVC_MAXPAYLOAD?

--b.

>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> include/linux/sunrpc/svc_rdma.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index 13af61b..1bca6dd 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -172,7 +172,7 @@ struct svcxprt_rdma {
> #define RDMAXPRT_SQ_PENDING 2
> #define RDMAXPRT_CONN_PENDING 3
>
> -#define RPCRDMA_MAX_SVC_SEGS (64) /* server max scatter/gather */
> +#define RPCRDMA_MAX_SVC_SEGS (256) /* server max scatter/gather */
> #if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
> #define RPCRDMA_MAXPAYLOAD RPCSVC_MAXPAYLOAD
> #else
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-07-10 14:18:47

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

On Fri, Jul 10, 2015 at 10:18:14AM -0400, J. Bruce Fields wrote:
> On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote:
> > Increased to 1 megabyte.
>
> Why not more or less?
>
> Why do we even have this constant, why shouldn't we just use
> RPCSVC_MAXPAYLOAD?

(That one question aside these look fine, I'll apply unless I hear
otherwise.)

--b.

2015-07-10 14:59:52

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum


On Jul 10, 2015, at 10:18 AM, [email protected] wrote:

> On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote:
>> Increased to 1 megabyte.
>
> Why not more or less?
>
> Why do we even have this constant, why shouldn't we just use
> RPCSVC_MAXPAYLOAD?

The payload size maximum for RDMA is based on RPCRDMA_MAX_SVC_SEGS.
We could reverse the relationship and make RPCRDMA_MAX_SVC_SEGS
based on RPCSVC_MAXPAYLOAD divided by the platform?s page size.


> --b.
>
>>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>>
>> include/linux/sunrpc/svc_rdma.h | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
>> index 13af61b..1bca6dd 100644
>> --- a/include/linux/sunrpc/svc_rdma.h
>> +++ b/include/linux/sunrpc/svc_rdma.h
>> @@ -172,7 +172,7 @@ struct svcxprt_rdma {
>> #define RDMAXPRT_SQ_PENDING 2
>> #define RDMAXPRT_CONN_PENDING 3
>>
>> -#define RPCRDMA_MAX_SVC_SEGS (64) /* server max scatter/gather */
>> +#define RPCRDMA_MAX_SVC_SEGS (256) /* server max scatter/gather */
>> #if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
>> #define RPCRDMA_MAXPAYLOAD RPCSVC_MAXPAYLOAD
>> #else
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2015-07-10 15:54:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

On Fri, Jul 10, 2015 at 10:59:48AM -0400, Chuck Lever wrote:
>
> On Jul 10, 2015, at 10:18 AM, [email protected] wrote:
>
> > On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote:
> >> Increased to 1 megabyte.
> >
> > Why not more or less?
> >
> > Why do we even have this constant, why shouldn't we just use
> > RPCSVC_MAXPAYLOAD?
>
> The payload size maximum for RDMA is based on RPCRDMA_MAX_SVC_SEGS.
> We could reverse the relationship and make RPCRDMA_MAX_SVC_SEGS
> based on RPCSVC_MAXPAYLOAD divided by the platform’s page size.

But there'd be no reason to do that, because we're not using
RPCRDMA_MAX_SVC_SEGS anywhere. Should we be?

--b.

>
>
> > --b.
> >
> >>
> >> Signed-off-by: Chuck Lever <[email protected]>
> >> ---
> >>
> >> include/linux/sunrpc/svc_rdma.h | 2 +-
> >> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> >> index 13af61b..1bca6dd 100644
> >> --- a/include/linux/sunrpc/svc_rdma.h
> >> +++ b/include/linux/sunrpc/svc_rdma.h
> >> @@ -172,7 +172,7 @@ struct svcxprt_rdma {
> >> #define RDMAXPRT_SQ_PENDING 2
> >> #define RDMAXPRT_CONN_PENDING 3
> >>
> >> -#define RPCRDMA_MAX_SVC_SEGS (64) /* server max scatter/gather */
> >> +#define RPCRDMA_MAX_SVC_SEGS (256) /* server max scatter/gather */
> >> #if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
> >> #define RPCRDMA_MAXPAYLOAD RPCSVC_MAXPAYLOAD
> >> #else
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>

2015-07-10 15:59:25

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum


On Jul 10, 2015, at 11:54 AM, J. Bruce Fields <[email protected]> wrote:

> On Fri, Jul 10, 2015 at 10:59:48AM -0400, Chuck Lever wrote:
>>
>> On Jul 10, 2015, at 10:18 AM, [email protected] wrote:
>>
>>> On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote:
>>>> Increased to 1 megabyte.
>>>
>>> Why not more or less?
>>>
>>> Why do we even have this constant, why shouldn't we just use
>>> RPCSVC_MAXPAYLOAD?
>>
>> The payload size maximum for RDMA is based on RPCRDMA_MAX_SVC_SEGS.
>> We could reverse the relationship and make RPCRDMA_MAX_SVC_SEGS
>> based on RPCSVC_MAXPAYLOAD divided by the platform?s page size.
>
> But there'd be no reason to do that, because we're not using
> RPCRDMA_MAX_SVC_SEGS anywhere. Should we be?

Let me try using RPCSVC_MAXPAYLOAD. That is based on RPCSVC_MAXPAGES,
which is actually used in svc_rdma_*.c


> --b.
>
>>
>>
>>> --b.
>>>
>>>>
>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>> ---
>>>>
>>>> include/linux/sunrpc/svc_rdma.h | 2 +-
>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
>>>> index 13af61b..1bca6dd 100644
>>>> --- a/include/linux/sunrpc/svc_rdma.h
>>>> +++ b/include/linux/sunrpc/svc_rdma.h
>>>> @@ -172,7 +172,7 @@ struct svcxprt_rdma {
>>>> #define RDMAXPRT_SQ_PENDING 2
>>>> #define RDMAXPRT_CONN_PENDING 3
>>>>
>>>> -#define RPCRDMA_MAX_SVC_SEGS (64) /* server max scatter/gather */
>>>> +#define RPCRDMA_MAX_SVC_SEGS (256) /* server max scatter/gather */
>>>> #if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
>>>> #define RPCRDMA_MAXPAYLOAD RPCSVC_MAXPAYLOAD
>>>> #else
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>>
>>

--
Chuck Lever




2015-07-10 16:05:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum

On Fri, Jul 10, 2015 at 11:59:20AM -0400, Chuck Lever wrote:
>
> On Jul 10, 2015, at 11:54 AM, J. Bruce Fields <[email protected]> wrote:
>
> > On Fri, Jul 10, 2015 at 10:59:48AM -0400, Chuck Lever wrote:
> >>
> >> On Jul 10, 2015, at 10:18 AM, [email protected] wrote:
> >>
> >>> On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote:
> >>>> Increased to 1 megabyte.
> >>>
> >>> Why not more or less?
> >>>
> >>> Why do we even have this constant, why shouldn't we just use
> >>> RPCSVC_MAXPAYLOAD?
> >>
> >> The payload size maximum for RDMA is based on RPCRDMA_MAX_SVC_SEGS.
> >> We could reverse the relationship and make RPCRDMA_MAX_SVC_SEGS
> >> based on RPCSVC_MAXPAYLOAD divided by the platform’s page size.
> >
> > But there'd be no reason to do that, because we're not using
> > RPCRDMA_MAX_SVC_SEGS anywhere. Should we be?
>
> Let me try using RPCSVC_MAXPAYLOAD. That is based on RPCSVC_MAXPAGES,
> which is actually used in svc_rdma_*.c

OK, thanks, that sounds like that would make more sense.

--b.

>
>
> > --b.
> >
> >>
> >>
> >>> --b.
> >>>
> >>>>
> >>>> Signed-off-by: Chuck Lever <[email protected]>
> >>>> ---
> >>>>
> >>>> include/linux/sunrpc/svc_rdma.h | 2 +-
> >>>> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> >>>> index 13af61b..1bca6dd 100644
> >>>> --- a/include/linux/sunrpc/svc_rdma.h
> >>>> +++ b/include/linux/sunrpc/svc_rdma.h
> >>>> @@ -172,7 +172,7 @@ struct svcxprt_rdma {
> >>>> #define RDMAXPRT_SQ_PENDING 2
> >>>> #define RDMAXPRT_CONN_PENDING 3
> >>>>
> >>>> -#define RPCRDMA_MAX_SVC_SEGS (64) /* server max scatter/gather */
> >>>> +#define RPCRDMA_MAX_SVC_SEGS (256) /* server max scatter/gather */
> >>>> #if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
> >>>> #define RPCRDMA_MAXPAYLOAD RPCSVC_MAXPAYLOAD
> >>>> #else
> >>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>>> the body of a message to [email protected]
> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >> --
> >> Chuck Lever
> >>
> >>
>
> --
> Chuck Lever
>
>

2015-07-13 16:40:28

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1 5/5] svcrdma: Boost NFS READ/WRITE payload size maximum


On Jul 10, 2015, at 12:05 PM, J. Bruce Fields <[email protected]> wrote:

> On Fri, Jul 10, 2015 at 11:59:20AM -0400, Chuck Lever wrote:
>>
>> On Jul 10, 2015, at 11:54 AM, J. Bruce Fields <[email protected]> wrote:
>>
>>> On Fri, Jul 10, 2015 at 10:59:48AM -0400, Chuck Lever wrote:
>>>>
>>>> On Jul 10, 2015, at 10:18 AM, [email protected] wrote:
>>>>
>>>>> On Thu, Jul 09, 2015 at 04:45:47PM -0400, Chuck Lever wrote:
>>>>>> Increased to 1 megabyte.
>>>>>
>>>>> Why not more or less?
>>>>>
>>>>> Why do we even have this constant, why shouldn't we just use
>>>>> RPCSVC_MAXPAYLOAD?
>>>>
>>>> The payload size maximum for RDMA is based on RPCRDMA_MAX_SVC_SEGS.
>>>> We could reverse the relationship and make RPCRDMA_MAX_SVC_SEGS
>>>> based on RPCSVC_MAXPAYLOAD divided by the platform?s page size.
>>>
>>> But there'd be no reason to do that, because we're not using
>>> RPCRDMA_MAX_SVC_SEGS anywhere. Should we be?
>>
>> Let me try using RPCSVC_MAXPAYLOAD. That is based on RPCSVC_MAXPAGES,
>> which is actually used in svc_rdma_*.c
>
> OK, thanks, that sounds like that would make more sense.

The change to use RPCSVC_MAXPAYLOAD is clean enough, but it may
re-expose a PPC64-x86_64 interop bug. We need to dig up some
testing resources to see if that bug is still a problem. Stay
tuned.

Since you?ve taken the other patches in this series, I?ll hold
off on reposting a v2 until I have this straightened out.


> --b.
>
>>
>>
>>> --b.
>>>
>>>>
>>>>
>>>>> --b.
>>>>>
>>>>>>
>>>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>>>> ---
>>>>>>
>>>>>> include/linux/sunrpc/svc_rdma.h | 2 +-
>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
>>>>>> index 13af61b..1bca6dd 100644
>>>>>> --- a/include/linux/sunrpc/svc_rdma.h
>>>>>> +++ b/include/linux/sunrpc/svc_rdma.h
>>>>>> @@ -172,7 +172,7 @@ struct svcxprt_rdma {
>>>>>> #define RDMAXPRT_SQ_PENDING 2
>>>>>> #define RDMAXPRT_CONN_PENDING 3
>>>>>>
>>>>>> -#define RPCRDMA_MAX_SVC_SEGS (64) /* server max scatter/gather */
>>>>>> +#define RPCRDMA_MAX_SVC_SEGS (256) /* server max scatter/gather */
>>>>>> #if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
>>>>>> #define RPCRDMA_MAXPAYLOAD RPCSVC_MAXPAYLOAD
>>>>>> #else
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>> the body of a message to [email protected]
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>> Chuck Lever
>>>>
>>>>
>>
>> --
>> Chuck Lever
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever