Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:46984 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752463AbbKXBom convert rfc822-to-8bit (ORCPT ); Mon, 23 Nov 2015 20:44:42 -0500 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH v1 1/9] xprtrdma: Add a safety margin for receive buffers From: Chuck Lever In-Reply-To: <5653BBCB.8010107@talpey.com> Date: Mon, 23 Nov 2015 20:44:38 -0500 Cc: linux-rdma@vger.kernel.org, Linux NFS Mailing List Message-Id: References: <20151123220627.32702.62667.stgit@manet.1015granger.net> <20151123221357.32702.59922.stgit@manet.1015granger.net> <5653B586.705@talpey.com> <5653BBCB.8010107@talpey.com> To: Tom Talpey Sender: linux-nfs-owner@vger.kernel.org List-ID: > On Nov 23, 2015, at 8:22 PM, Tom Talpey wrote: > > On 11/23/2015 8:16 PM, Chuck Lever wrote: >> >>> On Nov 23, 2015, at 7:55 PM, Tom Talpey wrote: >>> >>> On 11/23/2015 5:13 PM, Chuck Lever wrote: >>>> Rarely, senders post a Send that is larger than the client's inline >>>> threshold. That can be due to a bug, or the client and server may >>>> not have communicated about their inline limits. RPC-over-RDMA >>>> currently doesn't specify any particular limit on inline size, so >>>> peers have to guess what it is. >>>> >>>> It is fatal to the connection if the size of a Send is larger than >>>> the client's receive buffer. The sender is likely to retry with the >>>> same message size, so the workload is stuck at that point. >>>> >>>> Follow Postel's robustness principal: Be conservative in what you >>>> do, be liberal in what you accept from others. Increase the size of >>>> client receive buffers by a safety margin, and add a warning when >>>> the inline threshold is exceeded during receive. >>> >>> Safety is good, but how do know the chosen value is enough? >>> Isn't it better to fail the badly-composed request and be done >>> with it? Even if the stupid sender loops, which it will do >>> anyway. >> >> It’s good enough to compensate for the most common sender bug, >> which is that the sender did not account for the 28 bytes of >> the RPC-over-RDMA header when it built the send buffer. The >> additional 100 byte margin is gravy. > > I think it's good to have sympathy and resilience to differing > designs on the other end of the wire, but I fail to have it for > stupid bugs. Unless this can take down the receiver, fail it fast. > > MHO. See above: the client can’t tell why it’s failed. Again, the Send on the server side fails with LOCAL_LEN_ERR and the connection is terminated. The client sees only the connection loss. There’s no distinction between this and a cable pull or a server crash, where you do want the client to retransmit. I agree it’s a dumb server bug. But the idea is to preserve the connection, since NFS retransmits are a hazard. Just floating this idea here, this is v1. This one can be dropped. >> The loop occurs because the server gets a completion error. >> The client just sees a connection loss. There’s no way for it >> to know it should fail the RPC, so it keeps trying. >> >> Perhaps the server could remember that the reply failed, and >> when the client retransmits, it can simply return that XID >> with an RDMA_ERROR. >> >> >>>> Note the Linux server's receive buffers are already page-sized. >>>> >>>> Signed-off-by: Chuck Lever >>>> --- >>>> net/sunrpc/xprtrdma/rpc_rdma.c | 7 +++++++ >>>> net/sunrpc/xprtrdma/verbs.c | 6 +++++- >>>> net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++++ >>>> 3 files changed, 17 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c >>>> index c10d969..a169252 100644 >>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c >>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c >>>> @@ -776,6 +776,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep) >>>> int rdmalen, status; >>>> unsigned long cwnd; >>>> u32 credits; >>>> + RPC_IFDEBUG(struct rpcrdma_create_data_internal *cdata); >>>> >>>> dprintk("RPC: %s: incoming rep %p\n", __func__, rep); >>>> >>>> @@ -783,6 +784,12 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep) >>>> goto out_badstatus; >>>> if (rep->rr_len < RPCRDMA_HDRLEN_MIN) >>>> goto out_shortreply; >>>> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) >>>> + cdata = &r_xprt->rx_data; >>>> + if (rep->rr_len > cdata->inline_rsize) >>>> + pr_warn("RPC: %u byte reply exceeds inline threshold\n", >>>> + rep->rr_len); >>>> +#endif >>>> >>>> headerp = rdmab_to_msg(rep->rr_rdmabuf); >>>> if (headerp->rm_vers != rpcrdma_version) >>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c >>>> index eadd1655..e3f12e2 100644 >>>> --- a/net/sunrpc/xprtrdma/verbs.c >>>> +++ b/net/sunrpc/xprtrdma/verbs.c >>>> @@ -924,7 +924,11 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt) >>>> if (rep == NULL) >>>> goto out; >>>> >>>> - rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize, >>>> + /* The actual size of our receive buffers is increased slightly >>>> + * to prevent small receive overruns from killing our connection. >>>> + */ >>>> + rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize + >>>> + RPCRDMA_RECV_MARGIN, >>>> GFP_KERNEL); >>>> if (IS_ERR(rep->rr_rdmabuf)) { >>>> rc = PTR_ERR(rep->rr_rdmabuf); >>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h >>>> index ac7f8d4..1b72ab1 100644 >>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h >>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h >>>> @@ -337,6 +337,11 @@ struct rpcrdma_create_data_internal { >>>> #define RPCRDMA_INLINE_PAD_VALUE(rq)\ >>>> rpcx_to_rdmad(rq->rq_xprt).padding >>>> >>>> +/* To help prevent spurious connection shutdown, allow senders to >>>> + * overrun our receive inline threshold by a small bit. >>>> + */ >>>> +#define RPCRDMA_RECV_MARGIN (128) >>>> + >>>> /* >>>> * Statistics for RPCRDMA >>>> */ >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >> >> -- >> Chuck Lever >> >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- Chuck Lever