Return-Path: Received: from userp1040.oracle.com ([156.151.31.81]:28304 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752211AbbKXBQp convert rfc822-to-8bit (ORCPT ); Mon, 23 Nov 2015 20:16:45 -0500 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH v1 1/9] xprtrdma: Add a safety margin for receive buffers From: Chuck Lever In-Reply-To: <5653B586.705@talpey.com> Date: Mon, 23 Nov 2015 20:16:42 -0500 Cc: linux-rdma@vger.kernel.org, Linux NFS Mailing List Message-Id: References: <20151123220627.32702.62667.stgit@manet.1015granger.net> <20151123221357.32702.59922.stgit@manet.1015granger.net> <5653B586.705@talpey.com> To: Tom Talpey Sender: linux-nfs-owner@vger.kernel.org List-ID: > On Nov 23, 2015, at 7:55 PM, Tom Talpey wrote: > > On 11/23/2015 5:13 PM, Chuck Lever wrote: >> Rarely, senders post a Send that is larger than the client's inline >> threshold. That can be due to a bug, or the client and server may >> not have communicated about their inline limits. RPC-over-RDMA >> currently doesn't specify any particular limit on inline size, so >> peers have to guess what it is. >> >> It is fatal to the connection if the size of a Send is larger than >> the client's receive buffer. The sender is likely to retry with the >> same message size, so the workload is stuck at that point. >> >> Follow Postel's robustness principal: Be conservative in what you >> do, be liberal in what you accept from others. Increase the size of >> client receive buffers by a safety margin, and add a warning when >> the inline threshold is exceeded during receive. > > Safety is good, but how do know the chosen value is enough? > Isn't it better to fail the badly-composed request and be done > with it? Even if the stupid sender loops, which it will do > anyway. It’s good enough to compensate for the most common sender bug, which is that the sender did not account for the 28 bytes of the RPC-over-RDMA header when it built the send buffer. The additional 100 byte margin is gravy. The loop occurs because the server gets a completion error. The client just sees a connection loss. There’s no way for it to know it should fail the RPC, so it keeps trying. Perhaps the server could remember that the reply failed, and when the client retransmits, it can simply return that XID with an RDMA_ERROR. >> Note the Linux server's receive buffers are already page-sized. >> >> Signed-off-by: Chuck Lever >> --- >> net/sunrpc/xprtrdma/rpc_rdma.c | 7 +++++++ >> net/sunrpc/xprtrdma/verbs.c | 6 +++++- >> net/sunrpc/xprtrdma/xprt_rdma.h | 5 +++++ >> 3 files changed, 17 insertions(+), 1 deletion(-) >> >> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c >> index c10d969..a169252 100644 >> --- a/net/sunrpc/xprtrdma/rpc_rdma.c >> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c >> @@ -776,6 +776,7 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep) >> int rdmalen, status; >> unsigned long cwnd; >> u32 credits; >> + RPC_IFDEBUG(struct rpcrdma_create_data_internal *cdata); >> >> dprintk("RPC: %s: incoming rep %p\n", __func__, rep); >> >> @@ -783,6 +784,12 @@ rpcrdma_reply_handler(struct rpcrdma_rep *rep) >> goto out_badstatus; >> if (rep->rr_len < RPCRDMA_HDRLEN_MIN) >> goto out_shortreply; >> +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) >> + cdata = &r_xprt->rx_data; >> + if (rep->rr_len > cdata->inline_rsize) >> + pr_warn("RPC: %u byte reply exceeds inline threshold\n", >> + rep->rr_len); >> +#endif >> >> headerp = rdmab_to_msg(rep->rr_rdmabuf); >> if (headerp->rm_vers != rpcrdma_version) >> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c >> index eadd1655..e3f12e2 100644 >> --- a/net/sunrpc/xprtrdma/verbs.c >> +++ b/net/sunrpc/xprtrdma/verbs.c >> @@ -924,7 +924,11 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt) >> if (rep == NULL) >> goto out; >> >> - rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize, >> + /* The actual size of our receive buffers is increased slightly >> + * to prevent small receive overruns from killing our connection. >> + */ >> + rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize + >> + RPCRDMA_RECV_MARGIN, >> GFP_KERNEL); >> if (IS_ERR(rep->rr_rdmabuf)) { >> rc = PTR_ERR(rep->rr_rdmabuf); >> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h >> index ac7f8d4..1b72ab1 100644 >> --- a/net/sunrpc/xprtrdma/xprt_rdma.h >> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h >> @@ -337,6 +337,11 @@ struct rpcrdma_create_data_internal { >> #define RPCRDMA_INLINE_PAD_VALUE(rq)\ >> rpcx_to_rdmad(rq->rq_xprt).padding >> >> +/* To help prevent spurious connection shutdown, allow senders to >> + * overrun our receive inline threshold by a small bit. >> + */ >> +#define RPCRDMA_RECV_MARGIN (128) >> + >> /* >> * Statistics for RPCRDMA >> */ >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- Chuck Lever