2008-05-20 00:08:05

by Tom Tucker

[permalink] [raw]
Subject: [PATCH REPOST2] svcrdma: Limit ORD based on client's advertised IRD

When adapters have differing IRD limits, the RDMA transport will fail to
connect properly. The RDMA transport should use the client's advertised
inbound read limit when computing its outbound read limit. For iWARP
transports, there is currently no standard for exchanging IRD/ORD
during connection establishment so the 'responder_resources' field in the
connect event is the local device's limit. The RDMA transport can be
configured to use a smaller ORD by writing the desired number to the
/proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file.

Signed-off-by: Tom Tucker <[email protected]>
---

This one includes the min_t change. This is nicer. Thanks Tom.

net/sunrpc/xprtrdma/svc_rdma_transport.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 68908b5..54b2126 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -580,7 +580,7 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
* will call the recvfrom method on the listen xprt which will accept the new
* connection.
*/
-static void handle_connect_req(struct rdma_cm_id *new_cma_id)
+static void handle_connect_req(struct rdma_cm_id *new_cma_id, size_t client_ird)
{
struct svcxprt_rdma *listen_xprt = new_cma_id->context;
struct svcxprt_rdma *newxprt;
@@ -597,6 +597,9 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id)
dprintk("svcrdma: Creating newxprt=%p, cm_id=%p, listenxprt=%p\n",
newxprt, newxprt->sc_cm_id, listen_xprt);

+ /* Save client advertised inbound read limit for use later in accept. */
+ newxprt->sc_ord = client_ird;
+
/* Set the local and remote addresses in the transport */
sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa));
@@ -633,7 +636,8 @@ static int rdma_listen_handler(struct rdma_cm_id *cma_id,
case RDMA_CM_EVENT_CONNECT_REQUEST:
dprintk("svcrdma: Connect request on cma_id=%p, xprt = %p, "
"event=%d\n", cma_id, cma_id->context, event->event);
- handle_connect_req(cma_id);
+ handle_connect_req(cma_id,
+ event->param.conn.responder_resources);
break;

case RDMA_CM_EVENT_ESTABLISHED:
@@ -807,8 +811,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
(size_t)svcrdma_max_requests);
newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;

- newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom,
- (size_t)svcrdma_ord);
+ /*
+ * Limit ORD based on client limit, local device limit, and
+ * configured svcrdma limit.
+ */
+ newxprt->sc_ord = min_t(size_t, devattr.max_qp_rd_atom, newxprt->sc_ord);
+ newxprt->sc_ord = min_t(size_t, svcrdma_ord, newxprt->sc_ord);

newxprt->sc_pd = ib_alloc_pd(newxprt->sc_cm_id->device);
if (IS_ERR(newxprt->sc_pd)) {



2008-05-20 21:24:38

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH REPOST2] svcrdma: Limit ORD based on client's advertised IRD

On Mon, May 19, 2008 at 07:11:35PM -0500, Tom Tucker wrote:
> When adapters have differing IRD limits, the RDMA transport will fail to
> connect properly.

How will this look to the user? Are they going to get some sort of
mysterious failure when they try to mount, and then have to go echo
lower values into /proc/sys/sunrpc/svc_rdma/max_outbound_read_requests
until it works?

> The RDMA transport should use the client's advertised
> inbound read limit when computing its outbound read limit. For iWARP
> transports, there is currently no standard for exchanging IRD/ORD
> during connection establishment so the 'responder_resources' field in the
> connect event is the local device's limit. The RDMA transport can be
> configured to use a smaller ORD by writing the desired number to the
> /proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file.

Do we have any documentation on using these various sysctls?

--b.

>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> This one includes the min_t change. This is nicer. Thanks Tom.
>
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 16 ++++++++++++----
> 1 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 68908b5..54b2126 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -580,7 +580,7 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
> * will call the recvfrom method on the listen xprt which will accept the new
> * connection.
> */
> -static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> +static void handle_connect_req(struct rdma_cm_id *new_cma_id, size_t client_ird)
> {
> struct svcxprt_rdma *listen_xprt = new_cma_id->context;
> struct svcxprt_rdma *newxprt;
> @@ -597,6 +597,9 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> dprintk("svcrdma: Creating newxprt=%p, cm_id=%p, listenxprt=%p\n",
> newxprt, newxprt->sc_cm_id, listen_xprt);
>
> + /* Save client advertised inbound read limit for use later in accept. */
> + newxprt->sc_ord = client_ird;
> +
> /* Set the local and remote addresses in the transport */
> sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
> svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa));
> @@ -633,7 +636,8 @@ static int rdma_listen_handler(struct rdma_cm_id *cma_id,
> case RDMA_CM_EVENT_CONNECT_REQUEST:
> dprintk("svcrdma: Connect request on cma_id=%p, xprt = %p, "
> "event=%d\n", cma_id, cma_id->context, event->event);
> - handle_connect_req(cma_id);
> + handle_connect_req(cma_id,
> + event->param.conn.responder_resources);
> break;
>
> case RDMA_CM_EVENT_ESTABLISHED:
> @@ -807,8 +811,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> (size_t)svcrdma_max_requests);
> newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
>
> - newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom,
> - (size_t)svcrdma_ord);
> + /*
> + * Limit ORD based on client limit, local device limit, and
> + * configured svcrdma limit.
> + */
> + newxprt->sc_ord = min_t(size_t, devattr.max_qp_rd_atom, newxprt->sc_ord);
> + newxprt->sc_ord = min_t(size_t, svcrdma_ord, newxprt->sc_ord);
>
> newxprt->sc_pd = ib_alloc_pd(newxprt->sc_cm_id->device);
> if (IS_ERR(newxprt->sc_pd)) {
>

2008-05-20 22:23:13

by Tom Tucker

[permalink] [raw]
Subject: Re: [PATCH REPOST2] svcrdma: Limit ORD based on client's advertised IRD


On Tue, 2008-05-20 at 17:24 -0400, J. Bruce Fields wrote:
> On Mon, May 19, 2008 at 07:11:35PM -0500, Tom Tucker wrote:
> > When adapters have differing IRD limits, the RDMA transport will fail to
> > connect properly.
>
> How will this look to the user? Are they going to get some sort of
> mysterious failure when they try to mount, and then have to go echo
> lower values into /proc/sys/sunrpc/svc_rdma/max_outbound_read_requests
> until it works?
>

Yes, the mount will fail with a connection error. It's disgusting.
However, there's not much we can do until we get this negotiated in the
iWARP CM (Steve Wise is working on this in the OpenFabrics working
group). When that work is finished, this issue goes away.

> > The RDMA transport should use the client's advertised
> > inbound read limit when computing its outbound read limit. For iWARP
> > transports, there is currently no standard for exchanging IRD/ORD
> > during connection establishment so the 'responder_resources' field in the
> > connect event is the local device's limit. The RDMA transport can be
> > configured to use a smaller ORD by writing the desired number to the
> > /proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file.
>
> Do we have any documentation on using these various sysctls?
>

At this point, not in Linux. I can put something together and post it
here.

> --b.
>
> >
> > Signed-off-by: Tom Tucker <[email protected]>
> > ---
> >
> > This one includes the min_t change. This is nicer. Thanks Tom.
> >
> > net/sunrpc/xprtrdma/svc_rdma_transport.c | 16 ++++++++++++----
> > 1 files changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > index 68908b5..54b2126 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > @@ -580,7 +580,7 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
> > * will call the recvfrom method on the listen xprt which will accept the new
> > * connection.
> > */
> > -static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> > +static void handle_connect_req(struct rdma_cm_id *new_cma_id, size_t client_ird)
> > {
> > struct svcxprt_rdma *listen_xprt = new_cma_id->context;
> > struct svcxprt_rdma *newxprt;
> > @@ -597,6 +597,9 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> > dprintk("svcrdma: Creating newxprt=%p, cm_id=%p, listenxprt=%p\n",
> > newxprt, newxprt->sc_cm_id, listen_xprt);
> >
> > + /* Save client advertised inbound read limit for use later in accept. */
> > + newxprt->sc_ord = client_ird;
> > +
> > /* Set the local and remote addresses in the transport */
> > sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
> > svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa));
> > @@ -633,7 +636,8 @@ static int rdma_listen_handler(struct rdma_cm_id *cma_id,
> > case RDMA_CM_EVENT_CONNECT_REQUEST:
> > dprintk("svcrdma: Connect request on cma_id=%p, xprt = %p, "
> > "event=%d\n", cma_id, cma_id->context, event->event);
> > - handle_connect_req(cma_id);
> > + handle_connect_req(cma_id,
> > + event->param.conn.responder_resources);
> > break;
> >
> > case RDMA_CM_EVENT_ESTABLISHED:
> > @@ -807,8 +811,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> > (size_t)svcrdma_max_requests);
> > newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
> >
> > - newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom,
> > - (size_t)svcrdma_ord);
> > + /*
> > + * Limit ORD based on client limit, local device limit, and
> > + * configured svcrdma limit.
> > + */
> > + newxprt->sc_ord = min_t(size_t, devattr.max_qp_rd_atom, newxprt->sc_ord);
> > + newxprt->sc_ord = min_t(size_t, svcrdma_ord, newxprt->sc_ord);
> >
> > newxprt->sc_pd = ib_alloc_pd(newxprt->sc_cm_id->device);
> > if (IS_ERR(newxprt->sc_pd)) {
> >


2008-05-20 22:26:19

by Tom Tucker

[permalink] [raw]
Subject: Re: [PATCH REPOST2] svcrdma: Limit ORD based on client's advertised IRD


On Tue, 2008-05-20 at 17:24 -0400, J. Bruce Fields wrote:
> On Mon, May 19, 2008 at 07:11:35PM -0500, Tom Tucker wrote:
> > When adapters have differing IRD limits, the RDMA transport will fail to
> > connect properly.
>
> How will this look to the user? Are they going to get some sort of
> mysterious failure when they try to mount, and then have to go echo
> lower values into /proc/sys/sunrpc/svc_rdma/max_outbound_read_requests
> until it works?

BTW, this is a hypothetical for iWARP. The last I checked, the current
spate of adapters seemed to work fine (Chelsio/Neteffect) because the
NFSRDMA default is supported by both. I'll check again and get back to
you.

Tom

>
> > The RDMA transport should use the client's advertised
> > inbound read limit when computing its outbound read limit. For iWARP
> > transports, there is currently no standard for exchanging IRD/ORD
> > during connection establishment so the 'responder_resources' field in the
> > connect event is the local device's limit. The RDMA transport can be
> > configured to use a smaller ORD by writing the desired number to the
> > /proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file.
>
> Do we have any documentation on using these various sysctls?
>
> --b.
>
> >
> > Signed-off-by: Tom Tucker <[email protected]>
> > ---
> >
> > This one includes the min_t change. This is nicer. Thanks Tom.
> >
> > net/sunrpc/xprtrdma/svc_rdma_transport.c | 16 ++++++++++++----
> > 1 files changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > index 68908b5..54b2126 100644
> > --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> > @@ -580,7 +580,7 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
> > * will call the recvfrom method on the listen xprt which will accept the new
> > * connection.
> > */
> > -static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> > +static void handle_connect_req(struct rdma_cm_id *new_cma_id, size_t client_ird)
> > {
> > struct svcxprt_rdma *listen_xprt = new_cma_id->context;
> > struct svcxprt_rdma *newxprt;
> > @@ -597,6 +597,9 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> > dprintk("svcrdma: Creating newxprt=%p, cm_id=%p, listenxprt=%p\n",
> > newxprt, newxprt->sc_cm_id, listen_xprt);
> >
> > + /* Save client advertised inbound read limit for use later in accept. */
> > + newxprt->sc_ord = client_ird;
> > +
> > /* Set the local and remote addresses in the transport */
> > sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
> > svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa));
> > @@ -633,7 +636,8 @@ static int rdma_listen_handler(struct rdma_cm_id *cma_id,
> > case RDMA_CM_EVENT_CONNECT_REQUEST:
> > dprintk("svcrdma: Connect request on cma_id=%p, xprt = %p, "
> > "event=%d\n", cma_id, cma_id->context, event->event);
> > - handle_connect_req(cma_id);
> > + handle_connect_req(cma_id,
> > + event->param.conn.responder_resources);
> > break;
> >
> > case RDMA_CM_EVENT_ESTABLISHED:
> > @@ -807,8 +811,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> > (size_t)svcrdma_max_requests);
> > newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;
> >
> > - newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom,
> > - (size_t)svcrdma_ord);
> > + /*
> > + * Limit ORD based on client limit, local device limit, and
> > + * configured svcrdma limit.
> > + */
> > + newxprt->sc_ord = min_t(size_t, devattr.max_qp_rd_atom, newxprt->sc_ord);
> > + newxprt->sc_ord = min_t(size_t, svcrdma_ord, newxprt->sc_ord);
> >
> > newxprt->sc_pd = ib_alloc_pd(newxprt->sc_cm_id->device);
> > if (IS_ERR(newxprt->sc_pd)) {
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2008-05-21 02:02:26

by Talpey, Thomas

[permalink] [raw]
Subject: Re: [PATCH REPOST2] svcrdma: Limit ORD based on client's advertised IRD

At 06:26 PM 5/20/2008, Tom Tucker wrote:
>
>On Tue, 2008-05-20 at 17:24 -0400, J. Bruce Fields wrote:
>> On Mon, May 19, 2008 at 07:11:35PM -0500, Tom Tucker wrote:
>> > When adapters have differing IRD limits, the RDMA transport will fail to
>> > connect properly.
>>
>> How will this look to the user? Are they going to get some sort of
>> mysterious failure when they try to mount, and then have to go echo
>> lower values into /proc/sys/sunrpc/svc_rdma/max_outbound_read_requests
>> until it works?
>>
>
>Yes, the mount will fail with a connection error. It's disgusting.

The client error is ECONNREFUSED, which mount.nfs ever-so-helpfully
reports as "mount: internal error". Chuck and I went over some options
at cthon to make this more useful, I'm fishing around, it's a simple
error code choice.

Tom.