Received: by 2002:a05:6a10:7420:0:0:0:0 with SMTP id hk32csp865322pxb; Wed, 16 Feb 2022 06:09:32 -0800 (PST) X-Google-Smtp-Source: ABdhPJzY8dApHLY2YYz4jBTWaIy0bkbTXsGfzEpTjEoz+4qyEZsTHljf8N/eqcb5DTWEajed0WWX X-Received: by 2002:a63:2c91:0:b0:373:7200:d079 with SMTP id s139-20020a632c91000000b003737200d079mr1889762pgs.65.1645020572044; Wed, 16 Feb 2022 06:09:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645020572; cv=none; d=google.com; s=arc-20160816; b=Z6yQKeWtamQhOjzpCPJb+17vC41xRMwsvDzqsPKnzeWhqlTKHpyCkDD0UW3Sup/YB/ MjXwfizGj5nnSB2oS/lDRGqyWlj8tQSd2ocrycH2d9el1AgCGmcfEbkmZeXcYTEHvae4 NfIcvwlLAsirMNafBmpfroxLRniXsygnBCSA2wo4fwTJ67GltnizI+yo/b35kYTgLIGY N6crzcPXzbtuZG+d5TiNiebH+JjCgoqQzbK48CMfvTJrQHExWx5Qk4zQy1WsaABIP2ZM eeGoiHLqyRPd8XAaDjhGP0Jih0No5yH2VwdO+VtlVD4hKlgBhABVbvcGcmq166AWrH12 jvmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=azRO6atsKDpTz2PAJa9G+0pSCGer7lSD00DHb22a6a0=; b=jG9iL6pB+Yyk/tH+0gxkoA5UgJ6Pc1ciBFfndbMBYraPOVFga6Mutt1Jpzz3r6bIl9 P1Tqfk1nzKUwssQowLxQBbVdElKJ46QkUF/w4dGoh4/8bLyi/GyUHfeNGbbf9p7ygXQb IHil0P35ixSs46TM3qmKlg4rQOtBX0uV1YjLvjqyBfCSCe6PVHV6AIivDCoq/0Fxmz/x w2QkK6hsClDxXPBZTpp4G/gf0iDJMX7A8kcMGJS3BLaFZMyoqU8tYnRPZ5WrCc+4+pZL 6/6h0kUCpNBweEW7EJ9RHsuSifylwW9T4Zj+W4st44scDAbCoaeBMkgOrfpSS1kO/+kv oYoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y22si747985pgu.322.2022.02.16.06.09.08; Wed, 16 Feb 2022 06:09:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234526AbiBPNd3 (ORCPT + 99 others); Wed, 16 Feb 2022 08:33:29 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:53526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234520AbiBPNd2 (ORCPT ); Wed, 16 Feb 2022 08:33:28 -0500 Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73B52228C99; Wed, 16 Feb 2022 05:33:15 -0800 (PST) Received: by verein.lst.de (Postfix, from userid 2407) id 6C73468B05; Wed, 16 Feb 2022 14:33:11 +0100 (CET) Date: Wed, 16 Feb 2022 14:33:11 +0100 From: Christoph Hellwig To: Markus =?iso-8859-1?Q?Bl=F6chl?= Cc: Christoph Hellwig , Hannes Reinecke , Keith Busch , Jens Axboe , Sagi Grimberg , linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Stefan Roese Subject: Re: [RFC PATCH] nvme: prevent hang on surprise removal of NVMe disk Message-ID: <20220216133311.GA16154@lst.de> References: <20220214095107.3t5en5a3tosaeoo6@ipetronik.com> <20220215201738.GA26945@lst.de> <20220216125937.d6brzu7labgywxcg@ipetronik.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220216125937.d6brzu7labgywxcg@ipetronik.com> User-Agent: Mutt/1.5.17 (2007-11-01) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I'd do something like this, which gets us a properly documented interface (the del_gendisk change will be split into a separate patch): diff --git a/block/blk-core.c b/block/blk-core.c index d93e3bb9a769b..15d5c5ba5bbe5 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -284,12 +284,19 @@ void blk_queue_start_drain(struct request_queue *q) wake_up_all(&q->mq_freeze_wq); } -void blk_set_queue_dying(struct request_queue *q) +/** + * blk_set_disk_dead - mark a disk as dead + * @disk: disk to mark as dead + * + * Mark as disk as dead (e.g. surprise removed) and don't accept any new I/O + * to this disk. + */ +void blk_mark_disk_dead(struct gendisk *disk) { - blk_queue_flag_set(QUEUE_FLAG_DYING, q); - blk_queue_start_drain(q); + set_bit(GD_DEAD, &disk->state); + blk_queue_start_drain(disk->queue); } -EXPORT_SYMBOL_GPL(blk_set_queue_dying); +EXPORT_SYMBOL_GPL(blk_mark_disk_dead); /** * blk_cleanup_queue - shutdown a request queue @@ -308,7 +315,8 @@ void blk_cleanup_queue(struct request_queue *q) WARN_ON_ONCE(blk_queue_registered(q)); /* mark @q DYING, no new request or merges will be allowed afterwards */ - blk_set_queue_dying(q); + blk_queue_flag_set(QUEUE_FLAG_DYING, q); + blk_queue_start_drain(q); blk_queue_flag_set(QUEUE_FLAG_NOMERGES, q); blk_queue_flag_set(QUEUE_FLAG_NOXMERGES, q); diff --git a/block/genhd.c b/block/genhd.c index 626c8406f21a6..35e49b7f1112a 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -584,7 +584,8 @@ void del_gendisk(struct gendisk *disk) blk_drop_partitions(disk); mutex_unlock(&disk->open_mutex); - fsync_bdev(disk->part0); + if (!test_bit(GD_DEAD, &disk->state)) + fsync_bdev(disk->part0); __invalidate_device(disk->part0, true); /* diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index e6005c2323281..2b588b62cbbb2 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -4112,7 +4112,7 @@ static void mtip_pci_remove(struct pci_dev *pdev) "Completion workers still active!\n"); } - blk_set_queue_dying(dd->queue); + blk_mark_disk_dead(dd->disk); set_bit(MTIP_DDF_REMOVE_PENDING_BIT, &dd->dd_flag); /* Clean up the block layer. */ diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c index 4203cdab8abfd..b844432bad20b 100644 --- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -7185,7 +7185,7 @@ static ssize_t do_rbd_remove(struct bus_type *bus, * IO to complete/fail. */ blk_mq_freeze_queue(rbd_dev->disk->queue); - blk_set_queue_dying(rbd_dev->disk->queue); + blk_mark_disk_dead(rbd_dev->disk); } del_gendisk(rbd_dev->disk); diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index ccd0dd0c6b83c..ca71a0585333f 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -2126,7 +2126,7 @@ static void blkfront_closing(struct blkfront_info *info) /* No more blkif_request(). */ blk_mq_stop_hw_queues(info->rq); - blk_set_queue_dying(info->rq); + blk_mark_disk_dead(info->gd); set_capacity(info->gd, 0); for_each_rinfo(info, rinfo, i) { diff --git a/drivers/md/dm.c b/drivers/md/dm.c index dcbd6d201619d..997ace47bbd54 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2077,7 +2077,7 @@ static void __dm_destroy(struct mapped_device *md, bool wait) set_bit(DMF_FREEING, &md->flags); spin_unlock(&_minor_lock); - blk_set_queue_dying(md->queue); + blk_mark_disk_dead(md->disk); /* * Take suspend_lock so that presuspend and postsuspend methods diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 79005ea1a33e3..469f23186159c 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4574,7 +4574,7 @@ static void nvme_set_queue_dying(struct nvme_ns *ns) if (test_and_set_bit(NVME_NS_DEAD, &ns->flags)) return; - blk_set_queue_dying(ns->queue); + blk_mark_disk_dead(ns->disk); nvme_start_ns_queue(ns); set_capacity_and_notify(ns->disk, 0); diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index f8bf6606eb2fc..ff775235534cf 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -848,7 +848,7 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head) { if (!head->disk) return; - blk_set_queue_dying(head->disk->queue); + blk_mark_disk_dead(head->disk); /* make sure all pending bios are cleaned up */ kblockd_schedule_work(&head->requeue_work); flush_work(&head->requeue_work); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f35aea98bc351..16b47035e4b06 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -748,7 +748,8 @@ extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q, bool __must_check blk_get_queue(struct request_queue *); extern void blk_put_queue(struct request_queue *); -extern void blk_set_queue_dying(struct request_queue *); + +void blk_mark_disk_dead(struct gendisk *disk); #ifdef CONFIG_BLOCK /*