Return-Path: Received: from aserp2130.oracle.com ([141.146.126.79]:33544 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754594AbdLOCQ6 (ORCPT ); Thu, 14 Dec 2017 21:16:58 -0500 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: [PATCH v3] SUNRPC: Fix a race in the receive code path From: Chuck Lever In-Reply-To: <1513287230.15001.15.camel@primarydata.com> Date: Thu, 14 Dec 2017 21:16:52 -0500 Cc: Linux NFS Mailing List Message-Id: References: <20171204001726.5747-1-trond.myklebust@primarydata.com> <17932F80-EF03-48BA-AD6C-90A441B286EF@oracle.com> <1513253779.6553.7.camel@primarydata.com> <42B23A9B-3566-4E0F-9E3D-53BB2C39C0E9@oracle.com> <1513278198.5198.3.camel@primarydata.com> <2E949D1D-AD89-448B-B263-B53FB66C9766@oracle.com> <1513283844.11150.11.camel@primarydata.com> <4D12D608-0108-4CBA-AEF4-CD8CEC86B55F@oracle.com> <1513287230.15001.15.camel@primarydata.com> To: Trond Myklebust , Anna Schumaker Sender: linux-nfs-owner@vger.kernel.org List-ID: > On Dec 14, 2017, at 4:33 PM, Trond Myklebust = wrote: >=20 > On Thu, 2017-12-14 at 15:59 -0500, Chuck Lever wrote: >>> On Dec 14, 2017, at 3:37 PM, Trond Myklebust >> om> wrote: >>>=20 >>> On Thu, 2017-12-14 at 14:22 -0500, Chuck Lever wrote: >>>>> On Dec 14, 2017, at 2:03 PM, Trond Myklebust >>>> ta.c >>>>> om> wrote: >>>>>=20 >>>>> Does the RDMA code update the connect cookie when the >>>>> connection >>>>> breaks? It looks to me as if it only does that when the >>>>> connection >>>>> is >>>>> re-established. We really want both. >>>>>=20 >>>>>> I imagine this issue could also impact write buffer >>>>>> exhaustion >>>>>> on TCP. >>>>>=20 >>>>> See net/sunrpc/xprtsock.c:xs_tcp_state_change() >>>>=20 >>>> xprtrdma manipulates the connect_cookie in its connect worker, >>>> see rpcrdma_connect_worker. This was added by: >>>>=20 >>>> commit 575448bd36208f99fe0dd554a43518d798966740 >>>> Author: Tom Talpey >>>> AuthorDate: Thu Oct 9 15:00:40 2008 -0400 >>>> Commit: Trond Myklebust >>>> CommitDate: Fri Oct 10 15:10:36 2008 -0400 >>>>=20 >>>> RPC/RDMA: suppress retransmit on RPC/RDMA clients. >>>>=20 >>>> Would it be more correct to bump the cookie in >>>> rpcrdma_conn_upcall, >>>> which is the equivalent to xs_tcp_state_change? (if so, why, so >>>> I can compose a reasonable patch description) >>>>=20 >>>> It could be bumped in the RDMA_CM_EVENT_ESTABLISHED and the >>>> RDMA_CM_EVENT_DISCONNECTED cases, for example. I'm not sure >>>> RDMA provides a distinction between "server disconnected" >>>> and "client disconnected" although that probably does not >>>> matter for this purpose. >>>>=20 >>>> But, why would the additional cookie update help? The transport >>>> is not disconnecting before the deadlock. >>>>=20 >>>=20 >>> The connection cookie's purpose is twofold: >>>=20 >>> 1) It tracks whether or not a request has been transmitted on the >>> current connection or not. >>=20 >> That's broken by setting the cookie unconditionally outside >> the transport_lock, isn't it? >>=20 >>=20 >>> 2) It ensures that when several requests with the same connection >>> cookie all call xprt_conditional_disconnect(), then that results in >>> a >>> single disconnection event. To do so, it assumes that >>> xprt_autoclose() >>> will change the cookie if the disconnection attempt is successful. >>>=20 >>> In TCP we do so in the xs_tcp_state_change(). If the RDMA transport >>> can >>> guarantee that the call to xprt->ops->close(xprt) is always >>> successful, >>> then you could do so there. >>=20 >> I don't mind moving the cookie bump to rpcrdma_conn_upcall, >> but I'm not sure I understand the locking requirements. >>=20 >> Currently, xprt_transmit sets the connect_cookie while holding >> the transport_lock. >>=20 >> xprt_conditional_disconnect compares the cookie while holding >> the transport_lock. >>=20 >> For TCP, the transport_lock is held when bumping the cookie >> in the ESTABLISHED case, but _not_ in the two CLOSE cases? >=20 > That should be OK. The networking layer should provide sufficient > serialisation that we don't have to worry about collisions. >=20 >>=20 >> xprtrdma holds the transport_lock when bumping the cookie, >> which it does in its connect worker. It has to hold the lock >> because it skips the value 0. xprtrdma needs to guarantee >> that an RPC is never transmitted on the same connection >> twice (and maybe it could use rq_connect_cookie instead of >> its own cookie). >>=20 >> xprt_reserve_init is holding the reserve_lock but not the >> transport_lock when it grabs the cookie. Maybe it should >> not be initializing the rqst's cookie there? >>=20 >> Seems to me that xprt_transmit needs to update the rqst's >> cookie while holding the transport_lock, especially if >> xprtrdma needs to skip a cookie value? I'm sure I'm missing >> something. >>=20 >=20 > It should be OK, given that the connection is a state machine. > However, I missed something that you said earlier about > xprt_prepare_transmit(). >=20 > OK. How about the following fixup patch instead of the earlier one? >=20 > 8<--------------------------------------------------- > =46rom 21cdb2802d9d8b71553998e6be5aafeff0742142 Mon Sep 17 00:00:00 = 2001 > From: Trond Myklebust > Date: Thu, 14 Dec 2017 07:05:27 -0500 > Subject: [PATCH] fixup! SUNRPC: Fix a race in the receive code path >=20 > --- > net/sunrpc/xprt.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) >=20 > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 5e4278e9ce37..33b74fd84051 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -1001,6 +1001,7 @@ void xprt_transmit(struct rpc_task *task) > { > struct rpc_rqst *req =3D task->tk_rqstp; > struct rpc_xprt *xprt =3D req->rq_xprt; > + unsigned int connect_cookie; > int status, numreqs; >=20 > dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, = req->rq_slen); > @@ -1024,7 +1025,7 @@ void xprt_transmit(struct rpc_task *task) > } else if (!req->rq_bytes_sent) > return; >=20 > - req->rq_connect_cookie =3D xprt->connect_cookie; > + connect_cookie =3D xprt->connect_cookie; > req->rq_xtime =3D ktime_get(); > status =3D xprt->ops->send_request(task); > trace_xprt_transmit(xprt, req->rq_xid, status); > @@ -1050,6 +1051,7 @@ void xprt_transmit(struct rpc_task *task) > xprt->stat.pending_u +=3D xprt->pending.qlen; > spin_unlock_bh(&xprt->transport_lock); >=20 > + req->rq_connect_cookie =3D connect_cookie; > if (rpc_reply_expected(task) && = !READ_ONCE(req->rq_reply_bytes_recvd)) { > /* > * Sleep on the pending queue if we're expecting a = reply. > --=20 > 2.14.3 No problems here, passed basic testing with NFSv4.0 on a client with extra send_request fault injection. I hope we can get the recv race fix (as updated here) and the queue-work-on patch [1] into v4.15-rc. -- Chuck Lever [1] https://marc.info/?l=3Dlinux-nfs&m=3D151241427912572&w=3D2=