Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932540AbdIFOTo (ORCPT ); Wed, 6 Sep 2017 10:19:44 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:37185 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932286AbdIFOTk (ORCPT ); Wed, 6 Sep 2017 10:19:40 -0400 X-Google-Smtp-Source: ADKCNb5l0PZwFx3fNgQwBkaJyRBRVqVnVO1SF1XtPNE6KwsBjO06vdoJhmSgF+CYU+AicVUKMYWYtw== Date: Wed, 6 Sep 2017 22:19:57 +0800 From: Boqun Feng To: Alan Stern Cc: Peter Zijlstra , Jens Axboe , parri.andrea@gmail.com, linux-kernel@vger.kernel.org, tom.leiming@gmail.com, hch@lst.de, paulmck@linux.vnet.ibm.com, will.deacon@arm.com Subject: Re: [PATCH -v2] blk-mq: Start to fix memory ordering... Message-ID: <20170906141957.GV11771@tardis> References: <20170906080022.sotl4pvihs2nvg4m@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="G2d+m75Tjo442KmM" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5091 Lines: 135 --G2d+m75Tjo442KmM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Sep 06, 2017 at 10:02:00AM -0400, Alan Stern wrote: > On Wed, 6 Sep 2017, Peter Zijlstra wrote: >=20 > > Attempt to untangle the ordering in blk-mq. The patch introducing the > > single smp_mb__before_atomic() is obviously broken in that it doesn't > > clearly specify a pairing barrier and an obtained guarantee. > >=20 > > The comment is further misleading in that it hints that the > > deadline store and the COMPLETE store also need to be ordered, but > > AFAICT there is no such dependency. However what does appear to be > > important is the clear happening _after_ the store, and that worked by > > pure accident. > >=20 > > This clarifies blk_mq_start_request() -- we should not get there with > > STARTING set -- this simplifies the code and makes the barrier usage > > sane (the old code could be read to allow not having _any_ atomic after > > the barrier, in which case the barrier hasn't got anything to order). We > > then also introduce the missing pairing barrier for it. > >=20 > > Also down-grade the barrier to smp_wmb(), this is cheaper for > > PowerPC/ARM and doesn't cost anything extra on x86. > >=20 > > And it documents the STARTING vs COMPLETE ordering. Although I've not > > been entirely successful in reverse engineering the blk-mq state > > machine so there might still be more funnies around timeout vs > > requeue. > >=20 > > If I got anything wrong, feel free to educate me by adding comments to > > clarify things ;-) > >=20 > > Cc: Alan Stern > > Cc: Will Deacon > > Cc: Ming Lei > > Cc: Christoph Hellwig > > Cc: Jens Axboe > > Cc: Andrea Parri > > Cc: Boqun Feng > > Cc: Bart Van Assche > > Cc: "Paul E. McKenney" > > Fixes: 538b75341835 ("blk-mq: request deadline must be visible before m= arking rq as started") > > Signed-off-by: Peter Zijlstra (Intel) > > --- > > - spelling; Andrea and Bart > > - compiles (urgh!) > > - smp_wmb(); Adrea > >=20 > >=20 > > block/blk-mq.c | 52 ++++++++++++++++++++++++++++++++++++++++-----= ------- > > block/blk-timeout.c | 2 +- > > 2 files changed, 41 insertions(+), 13 deletions(-) > >=20 > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 4603b115e234..506a0f355117 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -558,22 +558,32 @@ void blk_mq_start_request(struct request *rq) > > =20 > > blk_add_timer(rq); > > =20 > > - /* > > - * Ensure that ->deadline is visible before set the started > > - * flag and clear the completed flag. > > - */ > > - smp_mb__before_atomic(); > > + WARN_ON_ONCE(test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)); > > =20 > > /* > > * Mark us as started and clear complete. Complete might have been > > * set if requeue raced with timeout, which then marked it as > > * complete. So be sure to clear complete again when we start > > * the request, otherwise we'll ignore the completion event. > > + * > > + * Ensure that ->deadline is visible before we set STARTED, such that > > + * blk_mq_check_expired() is guaranteed to observe our ->deadline when > > + * it observes STARTED. > > */ > > - if (!test_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) > > - set_bit(REQ_ATOM_STARTED, &rq->atomic_flags); > > - if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) > > + smp_wmb(); > > + set_bit(REQ_ATOM_STARTED, &rq->atomic_flags); > > + if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) { > > + /* > > + * Coherence order guarantees these consecutive stores to a > > + * single variable propagate in the specified order. Thus the > > + * clear_bit() is ordered _after_ the set bit. See > > + * blk_mq_check_expired(). > > + * > > + * (the bits must be part of the same byte for this to be > > + * true). >=20 > Adding this comment is well and good, but for more security you should=20 > also add a comment (maybe even a compile-time check) to the place where= =20 > REQ_ATOM_STARTED and REQ_ATOM_COMPLETE are defined. Otherwise they=20 > might eventually get moved into separate bytes. >=20 How about adding: BUILD_BUG_ON((REQ_ATOM_STARTED / BITS_PER_BYTE) !=3D (REQ_ATOM_COMPLETE / = BITS_PER_BYTE)); here? Regards, Boqun > Alan Stern >=20 --G2d+m75Tjo442KmM Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEj5IosQTPz8XU1wRHSXnow7UH+rgFAlmwBAkACgkQSXnow7UH +rjLQggAsxyMod+JfYZgDWG8jP0eWGc3YwdG5hGqMPLRMcEOTiW/wAdF89DsABxS Z8kyaptLrU6wtAA8sGESzO9uL4j6+iJSo8Bi87SFmbdDBsOdkuL3WnZN9cjvzw0w gob1UHr/66WkaMU1aYaDWEvA6Ux3xDX/jidhUf1De772+a0XaRLXd/gx5ezK2kdU WpShePQiZO6SdWz16k5QLwYWgzemQdLE4bQrlv8c986vltiNmnVWlj/YquxjA6Of 4a0r4CIcLJvgshiUwkAhae0jhmUfqkvXFH7urI3Svl2OGD1x7n8UCM7EPPxuMVX9 G1R484v1CJeh1AH9Ky2RIB6G9CitDw== =4Ryd -----END PGP SIGNATURE----- --G2d+m75Tjo442KmM--