Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752158AbdIFIsy (ORCPT ); Wed, 6 Sep 2017 04:48:54 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:36261 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750832AbdIFIsu (ORCPT ); Wed, 6 Sep 2017 04:48:50 -0400 Date: Wed, 6 Sep 2017 10:48:41 +0200 From: Peter Zijlstra To: Andrea Parri Cc: Jens Axboe , linux-kernel@vger.kernel.org, stern@rowland.harvard.edu, will.deacon@arm.com, tom.leiming@gmail.com, boqun.feng@gmail.com, paulmck@linux.vnet.ibm.com, hch@lst.de, bart.vanassche@wdc.com Subject: Re: [PATCH] blk-mq: Start to fix memory ordering... Message-ID: <20170906084841.upozocusxerkpqdp@hirez.programming.kicks-ass.net> References: <20170904090932.ngfefw2clot3sbqi@hirez.programming.kicks-ass.net> <20170906071304.GA3519@andrea> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170906071304.GA3519@andrea> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2611 Lines: 62 On Wed, Sep 06, 2017 at 09:13:04AM +0200, Andrea Parri wrote: > > + smp_mb__before_atomic(); > > I am wondering whether we should be using smp_wmb() instead: this would > provide the above guarantee and save a full barrier on powerpc/arm64. Right, did that. > > + set_bit(REQ_ATOM_STARTED, &rq->atomic_flags); > > + if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) { > > + /* > > + * Coherence order guarantees these consequtive stores to a > > + * singe variable propagate in the specified order. Thus the > > + * clear_bit() is ordered _after_ the set bit. See > > + * blk_mq_check_expired(). > > + */ > > clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags); > > It could be useful to stress that set_bit(), clear_bit() must "act" on > the same subword of the unsigned long (whatever "subword" means at this > level...) to rely on the coherence order (c.f., alpha's implementation). As I wrote to your initial reply (which I now saw was private) I went through the architectures again and found h8300 to use byte ops to implement all the bitops. So subword here means byte :/ The last time we looked at this was for PG_waiter and back then I think we settled on u32 (with Alpha for example using 32bit ll/sc ops). Linus moved PG_waiters to the same byte though, so that is all fine. b91e1302ad9b ("mm: optimize PageWaiters bit use for unlock_page()") > > + if (time_after_eq(jiffies, deadline)) { > > + if (!blk_mark_rq_complete(rq)) { > > + /* > > + * Relies on the implied MB from test_and_clear() to > > + * order the COMPLETE load against the STARTED load. > > + * Orders against the coherence order in > > + * blk_mq_start_request(). > > I understand "from test_and_set_bit()" (in blk_mark_rq_complete()) and > that the interested cycle is: > > /* in blk_mq_start_request() */ > [STORE STARTED bit = 1 into atomic_flags] > -->co [STORE COMPLETE bit = 0 into atomic_flags] > /* in blk_mq_check_expired() */ > -->rf [LOAD COMPLETE bit = 0 from atomic_flags] > -->po-loc [LOAD STARTED bit = 0 from atomic_flags] > /* in blk_mq_start_request() again */ > -->fr [STORE STARTED bit = 1 into atomic_flags] > > (N.B. Assume all accesses happen to/from the same subword.) > > This cycle being forbidden by the "coherence check", I'd say we do not > need to rely on the MB mentioned by the comment; what am I missing? Nothing, I forgot about the read-after-read thing but did spot the MB. Either one suffices to guarantee the order we need. It just needs to be documented as being relied upon.