Received: by 10.223.185.116 with SMTP id b49csp5968759wrg; Wed, 7 Mar 2018 22:53:51 -0800 (PST) X-Google-Smtp-Source: AG47ELujfJDZwA2MgwSm780NhFXhrgOS+OFr/OZB3AfMaOk43KinIDfJccJPmZHt7bzIr3ezyRXF X-Received: by 10.98.252.22 with SMTP id e22mr25357161pfh.235.1520492031628; Wed, 07 Mar 2018 22:53:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520492031; cv=none; d=google.com; s=arc-20160816; b=cHQ+CQY/vgWRROdXeKVHgmkgy9tr8/e1FpauSsyPOHfxEmp7FbE7F5c3zVQMTK1f1W OiFQA5EWMN7AT14cfLrEZAZhAVWDWGu/aU2DL1id00o56LjVBaOa4YaIBjVi5jY+j4Df n5H9KheIFAI2+ys9115iJn5uNk9/PHmbBEJhNWr6AALllQk0i+fd9XzEtfDN3C7MnN3q eaq7Eoo3+qsuNzy8ptBeE4d8A+aFiczWsnGOBoiO8A6zA+bmCxwY06Uwa92XMIKedAQj kCCFFxg3npAkSW2bI8oRmyExxsOlBiu7Nshd2Nq/K8tXSBd6/9NDiUTUQX9/T6+5DcWV /MBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :spamdiagnosticmetadata:spamdiagnosticoutput:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=dOLJEfN8RvsdM+q9P4bEDVPl0XiWWAZye944O2ks4Ok=; b=KzlTCNDzGqZVKpJTNo/1JPKPqCHgHV1lzkrpzlRNV+EIyEnMLfN6b59Hi5hKpoM0Sa tAAbYa3DOF+ix1dyqBqv8y94L/+typDJc1DlKjS/F0bAY9SPdpGsquqXqdy7Sd6Ot6jQ 9oBIC2PXYi2pfu6M4DoMOV284e8dsXOAnMCeEFvlkp+yhLlejshqXbgaUvtru/M5hAeb n6jfJzdWX7jvGMDj5UmkEaDRU9hgUgXRZtOrRC0b65PWAWWINGuk6xYeOaf1jkhu6WJe e46PmVLgyPTAUfTT9rf6MLqfQTPF3N70Qm703uRLNHRFHZSMV/b3jFrDOXSWGzHmVNV7 h+zQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector1 header.b=hR/bSMFh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o14si12491213pgr.748.2018.03.07.22.53.37; Wed, 07 Mar 2018 22:53:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector1 header.b=hR/bSMFh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965227AbeCHGvV (ORCPT + 99 others); Thu, 8 Mar 2018 01:51:21 -0500 Received: from mail-sn1nam01on0120.outbound.protection.outlook.com ([104.47.32.120]:33969 "EHLO NAM01-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S965022AbeCHE6J (ORCPT ); Wed, 7 Mar 2018 23:58:09 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=dOLJEfN8RvsdM+q9P4bEDVPl0XiWWAZye944O2ks4Ok=; b=hR/bSMFhHh022ElyZ1aBm80iN4FG6Dpu7z3BgerttRl0nrbyapXHR0fk2zozgjk6MNnyRi2Iv4MazHu/xueIubboqRK1EZiRnQmJ1lnkqI732+SR9kAHagR6PqZPCX4bRArRVRxbkKlMFZWo4japtLfzgQXqfZiZlREquRzj3/g= Received: from DM5PR2101MB1032.namprd21.prod.outlook.com (52.132.128.13) by DM5PR2101MB0888.namprd21.prod.outlook.com (52.132.132.157) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.588.3; Thu, 8 Mar 2018 04:58:05 +0000 Received: from DM5PR2101MB1032.namprd21.prod.outlook.com ([fe80::8063:c68a:b210:7446]) by DM5PR2101MB1032.namprd21.prod.outlook.com ([fe80::8063:c68a:b210:7446%2]) with mapi id 15.20.0588.008; Thu, 8 Mar 2018 04:58:05 +0000 From: Sasha Levin To: "linux-kernel@vger.kernel.org" , "stable@vger.kernel.org" CC: NeilBrown , Mike Snitzer , Sasha Levin Subject: [PATCH AUTOSEL for 4.14 08/67] dm: ensure bio submission follows a depth-first tree walk Thread-Topic: [PATCH AUTOSEL for 4.14 08/67] dm: ensure bio submission follows a depth-first tree walk Thread-Index: AQHTtpn5gSdG8BrM8kCcPe54AEj2FQ== Date: Thu, 8 Mar 2018 04:57:34 +0000 Message-ID: <20180308045641.7814-8-alexander.levin@microsoft.com> References: <20180308045641.7814-1-alexander.levin@microsoft.com> In-Reply-To: <20180308045641.7814-1-alexander.levin@microsoft.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [52.168.54.252] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;DM5PR2101MB0888;7:egRFjD/v1kvNnkFfNHFdE998dARAMO4qIrZhmsOebiBpdmHGKd7WcWUUwX0G3hwFkG8LaYXjEX9Zcv6XVvNv3ojqj2sBILIINCdJGuTbtB8vBeNdTFt1/XbN6FWFPpAAgJYyNqV1IkawY6gC3c+/wITZ9Z7Y2U/PKFgI2CzTv71PwCZH4QO/oXYjaC1DEzYezMRMjPWW7/4+5PXSD0TYcS81NKz94SdyEJHMdc4wCoOiImjZMMKjnMUrJNYV+GUN;20:/0l4sxkT5+EqOU9SUoLgzsXpbIokB8hRfsvrW3hhp91EKDpGQIcAJLYhuAam5xUgBMqh5fCaFRp8d5nh2klHdI7eYPzVTGcJ4eA1vkxzW+j82aZSfilCATg9uWp6qpEiSCIded78Vo1e3yW/saMLd6kd7o/ar51pojL8MHKe9OQ= x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: 73668705-c8b1-47b2-c0f9-08d584b12dd9 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(3008032)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7193020);SRVR:DM5PR2101MB0888; x-ms-traffictypediagnostic: DM5PR2101MB0888: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Alexander.Levin@microsoft.com; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(28532068793085)(89211679590171); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(8211001083)(61425038)(6040501)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231220)(944501244)(52105095)(10201501046)(3002001)(6055026)(61426038)(61427038)(6041288)(20161123562045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123560045)(6072148)(201708071742011);SRVR:DM5PR2101MB0888;BCL:0;PCL:0;RULEID:;SRVR:DM5PR2101MB0888; x-forefront-prvs: 060503E79B x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(39860400002)(366004)(346002)(376002)(39380400002)(396003)(189003)(199004)(6116002)(99286004)(2950100002)(6512007)(6666003)(10290500003)(36756003)(5660300001)(106356001)(1076002)(22452003)(3846002)(97736004)(316002)(53936002)(68736007)(105586002)(4326008)(25786009)(54906003)(478600001)(2501003)(186003)(66066001)(76176011)(3280700002)(86362001)(14454004)(2900100001)(5250100002)(110136005)(305945005)(7736002)(72206003)(3660700001)(6436002)(2906002)(26005)(6486002)(10090500001)(8676002)(81156014)(107886003)(81166006)(86612001)(6506007)(59450400001)(8936002)(102836004)(22906009)(217873001);DIR:OUT;SFP:1102;SCL:1;SRVR:DM5PR2101MB0888;H:DM5PR2101MB1032.namprd21.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: C54aKUqE2Gz2EGO/nCvkYykQfnhO4q608QjPSNLV0W3p68HMW4C4VUvVxPTmDkYocw/dNAQZQ9loJGlP3+S3MCedxbV4znLq77UxvBDSwtDYrB2JJS4z/+9Rzvb3CktHhsdHQtgQApfOlszOb088Ui5reSgz/da6M5QYi90AysmkZHS2wXogMIsuZOUCO+9bvJ+Kxuk3+9wwgtQ0hN8NZyzPIEjcpuQW6KPBYuGWRC5myi3EjiXschh4V+V4fjVx6NHy10HAC6kk1yKU+6H60V5LiefYcJRg5hLa+nxkhLY1T4BObcL/88xrwdbVpKs30OockfEYF9G5nbvsxxyruw== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-Network-Message-Id: 73668705-c8b1-47b2-c0f9-08d584b12dd9 X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Mar 2018 04:57:34.8651 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR2101MB0888 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: NeilBrown [ Upstream commit 18a25da84354c6bb655320de6072c00eda6eb602 ] A dm device can, in general, represent a tree of targets, each of which handles a sub-range of the range of blocks handled by the parent. The bio sequencing managed by generic_make_request() requires that bios are generated and handled in a depth-first manner. Each call to a make_request_fn() may submit bios to a single member device, and may submit bios for a reduced region of the same device as the make_request_fn. In particular, any bios submitted to member devices must be expected to be processed in order, so a later one must never wait for an earlier one. This ordering is usually achieved by using bio_split() to reduce a bio to a size that can be completely handled by one target, and resubmitting the remainder to the originating device. bio_queue_split() shows the canonical approach. dm doesn't follow this approach, largely because it has needed to split bios since long before bio_split() was available. It currently can submit bios to separate targets within the one dm_make_request() call. Dependencies between these targets, as can happen with dm-snap, can cause deadlocks if either bios gets stuck behind the other in the queues managed by generic_make_request(). This requires the 'rescue' functionality provided by dm_offload_{start,end}. Some of this requirement can be removed by changing the order of bio submission to follow the canonical approach. That is, if dm finds that it needs to split a bio, the remainder should be sent to generic_make_request() rather than being handled immediately. This delays the handling until the first part is completely processed, so the deadlock problems do not occur. __split_and_process_bio() can be called both from dm_make_request() and from dm_wq_work(). When called from dm_wq_work() the current approach is perfectly satisfactory as each bio will be processed immediately. When called from dm_make_request(), current->bio_list will be non-NULL, and in this case it is best to create a separate "clone" bio for the remainder. When we use bio_clone_bioset() to split off the front part of a bio and chain the two together and submit the remainder to generic_make_request(), it is important that the newly allocated bio is used as the head to be processed immediately, and the original bio gets "bio_advance()"d and sent to generic_make_request() as the remainder. Otherwise, if the newly allocated bio is used as the remainder, and if it then needs to be split again, then the next bio_clone_bioset() call will be made while holding a reference a bio (result of the first clone) from the same bioset. This can potentially exhaust the bioset mempool and result in a memory allocation deadlock. Note that there is no race caused by reassigning cio.io->bio after already calling __map_bio(). This bio will only be dereferenced again after dec_pending() has found io->io_count to be zero, and this cannot happen before the dec_pending() call at the end of __split_and_process_bio(). To provide the clone bio when splitting, we use q->bio_split. This was previously being freed by bio-based dm to avoid having excess rescuer threads. As bio_split bio sets no longer create rescuer threads, there is little cost and much gain from restoring the q->bio_split bio set. Signed-off-by: NeilBrown Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin --- drivers/md/dm.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 1dfc855ac708..902b6a5d3a4e 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1497,8 +1497,29 @@ static void __split_and_process_bio(struct mapped_de= vice *md, } else { ci.bio =3D bio; ci.sector_count =3D bio_sectors(bio); - while (ci.sector_count && !error) + while (ci.sector_count && !error) { error =3D __split_and_process_non_flush(&ci); + if (current->bio_list && ci.sector_count && !error) { + /* + * Remainder must be passed to generic_make_request() + * so that it gets handled *after* bios already submitted + * have been completely processed. + * We take a clone of the original to store in + * ci.io->bio to be used by end_io_acct() and + * for dec_pending to use for completion handling. + * As this path is not used for REQ_OP_ZONE_REPORT, + * the usage of io->bio in dm_remap_zone_report() + * won't be affected by this reassignment. + */ + struct bio *b =3D bio_clone_bioset(bio, GFP_NOIO, + md->queue->bio_split); + ci.io->bio =3D b; + bio_advance(bio, (bio_sectors(bio) - ci.sector_count) << 9); + bio_chain(b, bio); + generic_make_request(bio); + break; + } + } } =20 /* drop the extra reference count */ @@ -1509,8 +1530,8 @@ static void __split_and_process_bio(struct mapped_dev= ice *md, *---------------------------------------------------------------*/ =20 /* - * The request function that just remaps the bio built up by - * dm_merge_bvec. + * The request function that remaps the bio to one target and + * splits off any remainder. */ static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio) { @@ -2044,12 +2065,6 @@ int dm_setup_md_queue(struct mapped_device *md, stru= ct dm_table *t) case DM_TYPE_DAX_BIO_BASED: dm_init_normal_md_queue(md); blk_queue_make_request(md->queue, dm_make_request); - /* - * DM handles splitting bios as needed. Free the bio_split bioset - * since it won't be used (saves 1 process per bio-based DM device). - */ - bioset_free(md->queue->bio_split); - md->queue->bio_split =3D NULL; =20 if (type =3D=3D DM_TYPE_DAX_BIO_BASED) queue_flag_set_unlocked(QUEUE_FLAG_DAX, md->queue); --=20 2.14.1