Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754027AbbERPg5 (ORCPT ); Mon, 18 May 2015 11:36:57 -0400 Received: from cantor2.suse.de ([195.135.220.15]:54248 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752344AbbERPgt (ORCPT ); Mon, 18 May 2015 11:36:49 -0400 Date: Mon, 18 May 2015 17:36:44 +0200 From: Jan Kara To: Mike Snitzer Cc: Jan Kara , dm-devel@redhat.com, Jens Axboe , linux-kernel@vger.kernel.org, Chris Mason Subject: Re: [PATCH for-4.2 01/14] block: remove management of bi_remaining when restoring original bi_end_io Message-ID: <20150518153644.GA10187@quack.suse.cz> References: <1431637512-64245-1-git-send-email-snitzer@redhat.com> <1431637512-64245-2-git-send-email-snitzer@redhat.com> <20150518072237.GE26351@quack.suse.cz> <20150518131358.GA13998@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150518131358.GA13998@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4905 Lines: 87 On Mon 18-05-15 09:13:59, Mike Snitzer wrote: > On Mon, May 18 2015 at 3:22am -0400, > Jan Kara wrote: > > > On Thu 14-05-15 17:04:59, Mike Snitzer wrote: > > > Commit c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for > > > non-chains") regressed all existing callers that followed this pattern: > > > 1) saving a bio's original bi_end_io > > > 2) wiring up an intermediate bi_end_io > > > 3) restoring the original bi_end_io from intermediate bi_end_io > > > 4) calling bio_endio() to execute the restored original bi_end_io > > > > > > The regression was due to BIO_CHAIN only ever getting set if > > > bio_inc_remaining() is called. For the above pattern it isn't set until > > > step 3 above (step 2 would've needed to establish BIO_CHAIN). As such > > > the first bio_endio(), in step 2 above, never decremented __bi_remaining > > > before calling the intermediate bi_end_io -- leaving __bi_remaining with > > > the value 1 instead of 0. When bio_inc_remaining() occurred during step > > > 3 it brought it to a value of 2. When the second bio_endio() was > > > called, in step 4 above, it should've called the original bi_end_io but > > > it didn't because there was an extra reference that wasn't dropped (due > > > to atomic operations being optimized away since BIO_CHAIN wasn't set > > > upfront). > > > > > > Fix this issue by removing the __bi_remaining management complexity for > > > all callers that use the above pattern -- bio_chain() is the only > > > interface that _needs_ to be concerned with __bi_remaining. For the > > > above pattern callers just expect the bi_end_io they set to get called! > > > Remove bio_endio_nodec() and also remove all bio_inc_remaining() calls > > > that aren't associated with the bio_chain() interface. > > > > > > The bio_inc_remaining() interface has been left exported because it is > > > still useful for more elaborate uses of bio_chain() -- it will be used > > > in an upcoming DM commit "dm thin: range discard support". > > > > > > Fixes: c4cf5261 ("bio: skip atomic inc/dec of ->bi_remaining for non-chains") > > > Signed-off-by: Mike Snitzer > > > Cc: Jan Kara > > > Cc: Chris Mason > > One question: What happens if you stack dm-thin on top of e.g. dm-linear? > > dm-thin will do it's thing to a bio and passes it to dm-linear. That will > > split & chain the bio so BIO_CHAIN will be set. And on IO completion you > > will have troubles in dm-thinp as now bi_remaining gets decremented in > > bio_endio(). That's the reason why I suggested that we should clear > > BIO_CHAIN once bi_remaining hits zero... > > I think you need to be more precise in explaining the scenario you're > concerned about. Could be there is an issue but I'm not seeing it yet. > > Are you referring to the patch that makes DM thinp use the proposed > blkdev_issue_discard_async() interface? The bios issued to DM linear > are generated by blkdev_issue_discard_async(). By using bio_chain() > they establish ancestory with the parent DM thinp bio (which has > had BIO_CHAIN set even before calling blkdev_issue_discard_async because > there is potential for DM thinp to complete the parent bio before all N > blkdev_issue_discard_async() generated bios complete -- so that is why > DM thinp itself takes an extra reference on the parent bio using > bio_inc_remaining() before calling blkdev_issue_discard_async) No, I'm not referring to your proposed interface. I'm referring to current kernel + your patch to remove bio_inc_remaining() from all the dm targets. Ah, after checking again I see where misunderstanding may have come from - the device below has to be handled by drivers/md/linear.c which is MD linear driver, not DM one. I confused those two. Anyway here is the failure I envision (and frankly, I don't understand dm details much so I may be just completely wrong but I'd like to understand what prevents the following from happening): * We have dm-thin stacked on top of drivers/dm/linear.c * FS issues bio to dm-thin. remap_and_issue_overwrite() sets bi_end_io to overwrite_endio. dm-thin eventually calls generic_make_request(bio). * Now linear_make_request() gets called and it ends up calling bio_chain(split, bio). This sets BIO_CHAIN on bio. * IO for all chained bios is completed. So bio->bi_remaining is now zero, bio still has BIO_CHAIN set and overwrite_endio gets called. * process_prepared_mapping() will eventually try to call original bi_end_io callback but that never happens because bi_remaining is 0 and BIO_CHAIN remained set. Honza -- Jan Kara SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/