2008-05-19 16:30:33

by Tom Tucker

[permalink] [raw]
Subject: [PATCH] svcrdma: Limit ORD based on client's advertised IRD

When adapters have differing IRD limits, the RDMA transport will fail to
connect properly. The RDMA transport should use the client's advertised
inbound read limit when computing it's outbound read limit. For iWARP
transports, there is no exchange of the IRD/ORD during connect request and
so the only way to ensure compatability is to use the
/proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file to control
server ORD.

Signed-off-by: Tom Tucker <[email protected]>
---

net/sunrpc/xprtrdma/svc_rdma_transport.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 68908b5..b961dc5 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -580,7 +580,7 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
* will call the recvfrom method on the listen xprt which will accept the new
* connection.
*/
-static void handle_connect_req(struct rdma_cm_id *new_cma_id)
+static void handle_connect_req(struct rdma_cm_id *new_cma_id, u8 client_ird)
{
struct svcxprt_rdma *listen_xprt = new_cma_id->context;
struct svcxprt_rdma *newxprt;
@@ -597,6 +597,9 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id)
dprintk("svcrdma: Creating newxprt=%p, cm_id=%p, listenxprt=%p\n",
newxprt, newxprt->sc_cm_id, listen_xprt);

+ /* Save client advertised inbound read limit for use later in accept. */
+ newxprt->sc_ord = client_ird;
+
/* Set the local and remote addresses in the transport */
sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
svc_xprt_set_remote(&newxprt->sc_xprt, sa, svc_addr_len(sa));
@@ -633,7 +636,8 @@ static int rdma_listen_handler(struct rdma_cm_id *cma_id,
case RDMA_CM_EVENT_CONNECT_REQUEST:
dprintk("svcrdma: Connect request on cma_id=%p, xprt = %p, "
"event=%d\n", cma_id, cma_id->context, event->event);
- handle_connect_req(cma_id);
+ handle_connect_req(cma_id,
+ event->param.conn.responder_resources);
break;

case RDMA_CM_EVENT_ESTABLISHED:
@@ -807,8 +811,12 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
(size_t)svcrdma_max_requests);
newxprt->sc_sq_depth = RPCRDMA_SQ_DEPTH_MULT * newxprt->sc_max_requests;

- newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom,
- (size_t)svcrdma_ord);
+ /*
+ * Limit ORD based on client limit, local device limit, and
+ * configured svcrdma limit.
+ */
+ newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom, newxprt->sc_ord);
+ newxprt->sc_ord = min((size_t)svcrdma_ord, newxprt->sc_ord);

newxprt->sc_pd = ib_alloc_pd(newxprt->sc_cm_id->device);
if (IS_ERR(newxprt->sc_pd)) {



2008-05-19 17:18:06

by Talpey, Thomas

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Limit ORD based on client's advertised IRD

At 12:34 PM 5/19/2008, Tom Tucker wrote:
>When adapters have differing IRD limits, the RDMA transport will fail to
>connect properly. The RDMA transport should use the client's advertised
>inbound read limit when computing it's outbound read limit. For iWARP
>transports, there is no exchange of the IRD/ORD during connect request and
>so the only way to ensure compatability is to use the
>/proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file to control
>server ORD.

Well, technically the iWARP connection manager *currently* does not
exchange these values. I think the comment would be more accurate
if it said "in the absence of"...

>-static void handle_connect_req(struct rdma_cm_id *new_cma_id)
>+static void handle_connect_req(struct rdma_cm_id *new_cma_id, u8 client_ird)

While "u8" might be the limit of the implementations today, wouldn't "int" be
easier to understand, and more portable?

>+ /* Save client advertised inbound read limit for use later in accept. */
>+ newxprt->sc_ord = client_ird;
>+
and...
>+ /*
>+ * Limit ORD based on client limit, local device limit, and
>+ * configured svcrdma limit.
>+ */
>+ newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom, newxprt->sc_ord);
>+ newxprt->sc_ord = min((size_t)svcrdma_ord, newxprt->sc_ord);

Looks good, but it's a little odd that the value goes into sc_ord in one
routine, only to be changed before being used. I guess the device attrs
aren't handy at the time the connection is notified though, so ok.

Tom.


2008-05-19 17:39:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Limit ORD based on client's advertised IRD

I got dropped of the cc: somehow--please try not to prune cc's. I have
enough trouble keeping up without the additional confusion to my mail
filters....

--b.

On Mon, May 19, 2008 at 01:17:54PM -0400, Talpey, Thomas wrote:
> At 12:34 PM 5/19/2008, Tom Tucker wrote:
> >When adapters have differing IRD limits, the RDMA transport will fail to
> >connect properly. The RDMA transport should use the client's advertised
> >inbound read limit when computing it's outbound read limit. For iWARP
> >transports, there is no exchange of the IRD/ORD during connect request and
> >so the only way to ensure compatability is to use the
> >/proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file to control
> >server ORD.
>
> Well, technically the iWARP connection manager *currently* does not
> exchange these values. I think the comment would be more accurate
> if it said "in the absence of"...
>
> >-static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> >+static void handle_connect_req(struct rdma_cm_id *new_cma_id, u8 client_ird)
>
> While "u8" might be the limit of the implementations today, wouldn't "int" be
> easier to understand, and more portable?
>
> >+ /* Save client advertised inbound read limit for use later in accept. */
> >+ newxprt->sc_ord = client_ird;
> >+
> and...
> >+ /*
> >+ * Limit ORD based on client limit, local device limit, and
> >+ * configured svcrdma limit.
> >+ */
> >+ newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom, newxprt->sc_ord);
> >+ newxprt->sc_ord = min((size_t)svcrdma_ord, newxprt->sc_ord);
>
> Looks good, but it's a little odd that the value goes into sc_ord in one
> routine, only to be changed before being used. I guess the device attrs
> aren't handy at the time the connection is notified though, so ok.
>
> Tom.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-05-19 18:28:43

by Tom Tucker

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Limit ORD based on client's advertised IRD


On Mon, 2008-05-19 at 13:17 -0400, Talpey, Thomas wrote:
> At 12:34 PM 5/19/2008, Tom Tucker wrote:
> >When adapters have differing IRD limits, the RDMA transport will fail to
> >connect properly. The RDMA transport should use the client's advertised
> >inbound read limit when computing it's outbound read limit. For iWARP
> >transports, there is no exchange of the IRD/ORD during connect request and
> >so the only way to ensure compatability is to use the
> >/proc/sys/sunrpc/svc_rdma/max_outbound_read_requests file to control
> >server ORD.
>
> Well, technically the iWARP connection manager *currently* does not
> exchange these values. I think the comment would be more accurate
> if it said "in the absence of"...

I'm not sure it really has anything to do with where it is implemented.
The iWARP protocol(s) do not specify how or if IRD/ORD will be exchanged
so this commit doesn't help them. The note is meant to be a hint to
users of the transport how to get around potential connection issues
between iWARP adapters with differing capabilities.

The two spelling errors should be fixed though ;-)

>
> >-static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> >+static void handle_connect_req(struct rdma_cm_id *new_cma_id, u8 client_ird)
>
> While "u8" might be the limit of the implementations today, wouldn't "int" be
> easier to understand, and more portable?
>

I think that 'int' might provoke the "wrath of Chuck". Unsigned
something else --

> >+ /* Save client advertised inbound read limit for use later in accept. */
> >+ newxprt->sc_ord = client_ird;
> >+
> and...
> >+ /*
> >+ * Limit ORD based on client limit, local device limit, and
> >+ * configured svcrdma limit.
> >+ */
> >+ newxprt->sc_ord = min((size_t)devattr.max_qp_rd_atom, newxprt->sc_ord);
> >+ newxprt->sc_ord = min((size_t)svcrdma_ord, newxprt->sc_ord);
>
> Looks good, but it's a little odd that the value goes into sc_ord in one
> routine, only to be changed before being used. I guess the device attrs
> aren't handy at the time the connection is notified though, so ok.
>

Yes. It's the initial candidate for ORD that may be updated later based
on configuration and/or device capabilities.

> Tom.


2008-05-19 18:36:22

by Talpey, Thomas

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Limit ORD based on client's advertised IRD

At 02:32 PM 5/19/2008, Tom Tucker wrote:
>> Well, technically the iWARP connection manager *currently* does not
>> exchange these values. I think the comment would be more accurate
>> if it said "in the absence of"...
>
>I'm not sure it really has anything to do with where it is implemented.

That wasn't my point - I meant, in the absence of information what the
peer's maximum might be, the server has to make something up. But, future
iwarp cm's could pass it.

What value bubbles up in this field from current iWARP connections? Is
it total garbage, or at least, recognizably useless (-1)?

>> >-static void handle_connect_req(struct rdma_cm_id *new_cma_id)
>> >+static void handle_connect_req(struct rdma_cm_id *new_cma_id, u8
>client_ird)
>>
>> While "u8" might be the limit of the implementations today, wouldn't "int" be
>> easier to understand, and more portable?
>>
>
>I think that 'int' might provoke the "wrath of Chuck". Unsigned
>something else --

Okay, well if not "int", then at least some type which is not a specific
storage size. u{8,16,32} etc should be reserved for describing hardware
and on-wire stuff, and handled as native when used locally.

Tom.


2008-05-19 18:50:52

by Tom Tucker

[permalink] [raw]
Subject: Re: [PATCH] svcrdma: Limit ORD based on client's advertised IRD


On Mon, 2008-05-19 at 14:34 -0400, Talpey, Thomas wrote:
> At 02:32 PM 5/19/2008, Tom Tucker wrote:
> >> Well, technically the iWARP connection manager *currently* does not
> >> exchange these values. I think the comment would be more accurate
> >> if it said "in the absence of"...
> >
> >I'm not sure it really has anything to do with where it is implemented.
>
> That wasn't my point - I meant, in the absence of information what the
> peer's maximum might be, the server has to make something up. But, future
> iwarp cm's could pass it.
>
> What value bubbles up in this field from current iWARP connections? Is
> it total garbage, or at least, recognizably useless (-1)?

For iWARP it is currently the local device's max. When iWARP (hopefully)
gets a standard for this -- it will be the remote peer's IRD. This code
won't need to change.

>
> >> >-static void handle_connect_req(struct rdma_cm_id *new_cma_id)
> >> >+static void handle_connect_req(struct rdma_cm_id *new_cma_id, u8
> >client_ird)
> >>
> >> While "u8" might be the limit of the implementations today, wouldn't "int" be
> >> easier to understand, and more portable?
> >>
> >
> >I think that 'int' might provoke the "wrath of Chuck". Unsigned
> >something else --
>
> Okay, well if not "int", then at least some type which is not a specific
> storage size. u{8,16,32} etc should be reserved for describing hardware
> and on-wire stuff, and handled as native when used locally.
>
> Tom.