Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756166AbdCGVOX (ORCPT ); Tue, 7 Mar 2017 16:14:23 -0500 Received: from mx2.suse.de ([195.135.220.15]:37383 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755601AbdCGVOT (ORCPT ); Tue, 7 Mar 2017 16:14:19 -0500 From: NeilBrown To: Mike Snitzer , Jens Axboe Date: Wed, 08 Mar 2017 07:29:55 +1100 Cc: Jack Wang , LKML , Lars Ellenberg , Kent Overstreet , Pavel Machek , Mikulas Patocka Subject: Re: blk: improve order of bio handling in generic_make_request() In-Reply-To: <20170307171436.GA2109@redhat.com> References: <87h93blz6g.fsf@notabene.neil.brown.name> <71562c2c-97f4-9a0a-32ec-30e0702ca575@profitbricks.com> <87lgsjj9w8.fsf@notabene.neil.brown.name> <20170307165233.GB30230@redhat.com> <5cfbdc6b-9ba7-605a-642b-7f625cf5f5b7@kernel.dk> <20170307171436.GA2109@redhat.com> Message-ID: <87tw74j0e4.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4505 Lines: 122 --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Tue, Mar 07 2017, Mike Snitzer wrote: > On Tue, Mar 07 2017 at 12:05pm -0500, > Jens Axboe wrote: > >> On 03/07/2017 09:52 AM, Mike Snitzer wrote: >> > On Tue, Mar 07 2017 at 3:49am -0500, >> > Jack Wang wrote: >> >=20 >> >> >> >> >> >> On 06.03.2017 21:18, Jens Axboe wrote: >> >>> On 03/05/2017 09:40 PM, NeilBrown wrote: >> >>>> On Fri, Mar 03 2017, Jack Wang wrote: >> >>>>> >> >>>>> Thanks Neil for pushing the fix. >> >>>>> >> >>>>> We can optimize generic_make_request a little bit: >> >>>>> - assign bio_list struct hold directly instead init and merge >> >>>>> - remove duplicate code >> >>>>> >> >>>>> I think better to squash into your fix. >> >>>> >> >>>> Hi Jack, >> >>>> I don't object to your changes, but I'd like to see a response from >> >>>> Jens first. >> >>>> My preference would be to get the original patch in, then other ch= anges >> >>>> that build on it, such as this one, can be added. Until the core >> >>>> changes lands, any other work is pointless. >> >>>> >> >>>> Of course if Jens wants a this merged before he'll apply it, I'll >> >>>> happily do that. >> >>> >> >>> I like the change, and thanks for tackling this. It's been a pending >> >>> issue for way too long. I do think we should squash Jack's patch >> >>> into the original, as it does clean up the code nicely. >> >>> >> >>> Do we have a proper test case for this, so we can verify that it >> >>> does indeed also work in practice? >> >>> >> >> Hi Jens, >> >> >> >> I can trigger deadlock with in RAID1 with test below: >> >> >> >> I create one md with one local loop device and one remote scsi >> >> exported by SRP. running fio with mix rw on top of md, force_close >> >> session on storage side. mdx_raid1 is wait on free_array in D state, >> >> and a lot of fio also in D state in wait_barrier. >> >> >> >> With the patch from Neil above, I can no longer trigger it anymore. >> >> >> >> The discussion was in link below: >> >> http://www.spinics.net/lists/raid/msg54680.html >> >=20 >> > In addition to Jack's MD raid test there is a DM snapshot deadlock tes= t, >> > albeit unpolished/needy to get running, see: >> > https://www.redhat.com/archives/dm-devel/2017-January/msg00064.html >>=20 >> Can you run this patch with that test, reverting your DM workaround? > > Yeap, will do. Last time Mikulas tried a similar patch it still > deadlocked. But I'll give it a go (likely tomorrow). I don't think this will fix the DM snapshot deadlock by itself. Rather, it make it possible for some internal changes to DM to fix it. The DM change might be something vaguely like: diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 3086da5664f3..06ee0960e415 100644 =2D-- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1216,6 +1216,14 @@ static int __split_and_process_non_flush(struct clon= e_info *ci) len =3D min_t(sector_t, max_io_len(ci->sector, ti), ci->sector_count); + if (len < ci->sector_count) { + struct bio *split =3D bio_split(bio, len, GFP_NOIO, fs_bio_set); + bio_chain(split, bio); + generic_make_request(bio); + bio =3D split; + ci->sector_count =3D len; + } + r =3D __clone_and_map_data_bio(ci, ti, ci->sector, &len); if (r < 0) return r; Instead of looping inside DM, this change causes the remainder to be passed to generic_make_request() and DM only handles or region at a time. So there is only one loop, in the top generic_make_request(). That loop will not reliable handle bios in the "right" order. Thanks, NeilBrown --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAli/GEQACgkQOeye3VZi gbkhaQ/+OmKJXfyb6+ztlLvC3StD6carczAC8ziuK0+OkAznXISTjhzj7sSQ6O+/ O5cGlXA3WMBWRNc6wgHKfs4mbDOX9FHnrpZWOgbzyvOn7nmvblvGkoyY1EUhfTSh J3DFtEDGFB8F4NVS+lq4Rx7iZ4od1ZB8SaZ/KdV/5HmdNEHd587mILQDZZxqJVrG yKzMTV3g7CH+2ywn2p4wsuI0KZqfiEszfytHmcyHphM1GlXgatNFkPTiCpVywh1A H5LNt+F6uwrLQ8WATdRUNuB4H5bnF8bSy4h85aiw5PY6O6+XB748S893SPsXzK49 NKIrshXn/aE0MA02QNRLOgne/wz1gnj5QUQGq1W3/2vmWHtL95UPsobd/CLAny2f oGTNo/0hhttGueW1G/ORQ8YRZtnkx3twQV2H9YNK2D9DnJDm/9xbXSGc8zoQ0ok0 jKr1VO8okDM0ypjwoCAAQvIMDMC+UdfTq/viV+4POMGeBeybbwsksPlE+oAz2roI AoslIf6os4kyyjdjvERE4E2QCD3AXPH1HaIH8UJYjEUHW4Vsa+U2YH4ylNhDIsF5 Ec11ODjrsZKQUYrGS5GAO4PvtAjgg3YuyuG78g45Owz0Sk55ofBR1Kkcjb5z+mOJ 72cC6N+AWegXy6dHtPE+/ZOLOD/e2DrAIpqAfvSFp+t0M8eVxFM= =4rRk -----END PGP SIGNATURE----- --=-=-=--