Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753167AbdCHMwG (ORCPT ); Wed, 8 Mar 2017 07:52:06 -0500 Received: from mail-wr0-f175.google.com ([209.85.128.175]:34753 "EHLO mail-wr0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752095AbdCHMwD (ORCPT ); Wed, 8 Mar 2017 07:52:03 -0500 MIME-Version: 1.0 In-Reply-To: <20170307165233.GB30230@redhat.com> References: <87h93blz6g.fsf@notabene.neil.brown.name> <71562c2c-97f4-9a0a-32ec-30e0702ca575@profitbricks.com> <87lgsjj9w8.fsf@notabene.neil.brown.name> <20170307165233.GB30230@redhat.com> Date: Wed, 8 Mar 2017 12:46:33 +0100 Message-ID: Subject: Re: blk: improve order of bio handling in generic_make_request() From: Lars Ellenberg To: Mike Snitzer Cc: Jens Axboe , Jack Wang , NeilBrown , LKML , Kent Overstreet , Pavel Machek , Mikulas Patocka Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1995 Lines: 49 On 7 March 2017 at 17:52, Mike Snitzer wrote: > > On 06.03.2017 21:18, Jens Axboe wrote: > > > I like the change, and thanks for tackling this. It's been a pending > > > issue for way too long. I do think we should squash Jack's patch > > > into the original, as it does clean up the code nicely. > > > > > > Do we have a proper test case for this, so we can verify that it > > > does indeed also work in practice? > > > > > Hi Jens, > > > > I can trigger deadlock with in RAID1 with test below: > > > > I create one md with one local loop device and one remote scsi > > exported by SRP. running fio with mix rw on top of md, force_close > > session on storage side. mdx_raid1 is wait on free_array in D state, > > and a lot of fio also in D state in wait_barrier. > > > > With the patch from Neil above, I can no longer trigger it anymore. > > > > The discussion was in link below: > > http://www.spinics.net/lists/raid/msg54680.html > > In addition to Jack's MD raid test there is a DM snapshot deadlock test, > albeit unpolished/needy to get running, see: > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html > > But to actually test block core's ability to handle this, upstream > commit d67a5f4b5947aba4bfe9a80a2b86079c215ca755 ("dm: flush queued bios > when process blocks to avoid deadlock") would need to be reverted. > > Also, I know Lars had a drbd deadlock too. Not sure if Jack's MD test > is sufficient to coverage for drbd. Lars? > As this is just a slightly different implementation, trading some bytes of stack for more local, self-contained, "obvious" code changes (good job!), but follows the same basic idea as my original RFC [*] (see the "inspired-by" tag) I have no doubt it fixes the issues we are able to provoke with DRBD. [*] https://lkml.org/lkml/2016/7/19/263 (where I also already suggest to fix the device-mapper issues by losing the in-device-mapper loop, relying on the loop in generic_make_request()) Cheers, Lars