Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:37863 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750923AbbJASbw convert rfc822-to-8bit (ORCPT ); Thu, 1 Oct 2015 14:31:52 -0400 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets From: Chuck Lever In-Reply-To: <20151001181548.GA8670@obsidianresearch.com> Date: Thu, 1 Oct 2015 14:31:41 -0400 Cc: linux-rdma , Devesh Sharma , Sagi Grimberg , Linux NFS Mailing List Message-Id: References: <20150917204435.19671.56195.stgit@manet.1015granger.net> <55FE8C0F.1050706@dev.mellanox.co.il> <0804C887-9E32-4257-96D2-6C1FBC9CB271@oracle.com> <20151001171310.GA8428@obsidianresearch.com> <20151001181548.GA8670@obsidianresearch.com> To: Jason Gunthorpe Sender: linux-nfs-owner@vger.kernel.org List-ID: > On Oct 1, 2015, at 2:15 PM, Jason Gunthorpe wrote: > > On Thu, Oct 01, 2015 at 01:36:26PM -0400, Chuck Lever wrote: > >>>> A missed WC will result in an RPC/RDMA transport deadlock. In >>>> fact that is the reason for this particular patch (although >>>> it addresses only one source of missed WCs). So I would like >>>> to see that there are no windows here. >>> >>> WCs are never missed. >> >> The review comment earlier in this thread suggested there is >> a race condition where a WC can be “delayed” resulting in, >> well, I’m still not certain what the consequences are. > > Yes. The consequence would typically be lockup of CQ processing. > >>> while (1) { >>> struct ib_wc wcs[100]; >>> int rc = ib_poll_cq(cw, NELEMS(wcs), wcs); >>> >>> .. process rc wcs .. >>> >>> if (rc != NELEMS(wcs)) >>> if (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP | >>> IB_CQ_REPORT_MISSED_EVENTS) == 0) >>> break; >>> } >>> >>> API wise, we should probably look at forcing >>> IB_CQ_REPORT_MISSED_EVENTS on and dropping the flag. >> >> It’s been suggested that it’s not clear what a positive >> return value from ib_req_notify_cq() means when the >> REPORT_MISSED_EVENTS flags is set: does it mean that >> the CQ has been re-armed? I had assumed that a positive >> RC meant both missed events and a successful re-arm, >> but the pseudo-code above suggests that is not the >> case. > > The ULP must assume the CQ has NOT been armed after a positive return. OK, I will fix this when I revise 03/18. > What the driver does to the arm state is undefined - for instance the > driver may trigger a callback and still return 1 here. > > However, the driver must make this guarentee: > > If ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns 0 then > the call back will always be called when the CQ is non-empty. > > The ULP must loop doing polling until the above happens, otherwise the > event notification may be missed. > > ie the above is guarnteed to close the WC delay/lockup race. > > Again, if there has been confusion on the driver side, drivers that > don't implement the above are broken. > > Review Roland's original commit comments on this feature. > > https://github.com/jgunthorpe/linux/commit/ed23a72778f3dbd465e55b06fe31629e7e1dd2f3 > > I'm not sure where we are at on the 'significant overhead for some > low-level drivers' issue, but assuming that is still the case, then > the recommendation is this: > > bool exiting = false; > while (1) { > struct ib_wc wcs[100]; > int rc = ib_poll_cq(cq, NELEMS(wcs), wcs); > if (rc == 0 && exiting) > break; > > .. process rc wcs .. > > if (rc != NELEMS(wcs)) { > ib_req_notify_cq(cq, IB_CQ_NEXT_COMP) > exiting = true; > } else > exiting = false; > } > > ie a double poll. > AFAIK, this is a common pattern in the ULPs.. Perhaps we should > implement this as a core API: > > struct ib_wc wcs[100]; > while ((rc = ib_poll_cq_and_arm(cq, NELEMS(wcs), wcs)) != 0) { > .. process rc wcs .. > > ib_poll_cq_and_arm reads wcs off the CQ. If it returns 0 then the > callback is guarenteed to happen when the CQ is non empty. This makes sense to me, especially if it means fewer times grabbing and releasing the CQ’s spinlock. Does a ULP want to continue polling if ib_poll_cq{_and_arm) returns a negative RC? — Chuck Lever