Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp432583pxa; Tue, 11 Aug 2020 06:45:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy5LYPPw530jPV6btJK7VAQghimUmSiN/9OFz1IjBn3K76hWxaODVMuRcZpwOsRNIutTTnX X-Received: by 2002:a17:906:7d6:: with SMTP id m22mr25539536ejc.229.1597153529402; Tue, 11 Aug 2020 06:45:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1597153529; cv=none; d=google.com; s=arc-20160816; b=vke+oLZBj0SZmeN1wmolNtlDjEshkLyjuwwwlAEXLCtC46NJwNArDa6EQhkPlj4Z7N RLS6NkesSEvH67JO6O1dYzo8iGdK7VxPeHeou8j9LObe92cLZxywD/3qsv78JdUjmZAS DHiHhS2b1H+SmHsPmdj9xzh+WhXNwhbVVek99058jT4BP4AXxQcXu8FyhgWLhIU8Coqk 73/n/SbYNJa7yr4rIZao0Z33vvq2Hwabw9Dv1FU0KxoWUc15HDZS6+KUcEiVZ4h5JAXN DXSVbvVkPM1zifhdRlTOgLH7qz7S7x8y5eHXIWQ+RAYNZSkDMLQu2dDfsLQfmWmDFCqR X0kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature; bh=6n9ucVjwm5xquc5GPMdzBC85j1s4vXBylBHGsRZs4Ms=; b=QN4ZejE9WPkR3bR2qTZ4CJWVZko+i2UXr+a31+HGaFlSVTzA2r5NdFFoLEYaPZkBNb NMHQORupOturjWOrMMdgqzAOJQO/LPWg16TBNKCsljS2DN/X3kOBrOM0Fjv7+XLCwSUD I/LvQwwcjdbWrx++fQ6Ag1IPSRMOG9CChqt8QRxtcOH+wh1Ndsb3BC1+Gw4GgsWbWGFU Cp52DrSZ7fJ8Tcw1tVxqNaAiAqp9H/mrjwU37p3TZGV53XEZuau0uuxxYCtbBqoWm4gU 8VvTjfRvuRfozTzrTi8h6CwkG25ac7lwMgaTiRtDsG18ZAz8hSaN+gKV5goW++9iOVZN xL2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@126.com header.s=s110527 header.b=Jgg5an1w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=126.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id sa12si12787881ejb.276.2020.08.11.06.45.06; Tue, 11 Aug 2020 06:45:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@126.com header.s=s110527 header.b=Jgg5an1w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=126.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728740AbgHKNnr (ORCPT + 99 others); Tue, 11 Aug 2020 09:43:47 -0400 Received: from m15114.mail.126.com ([220.181.15.114]:38809 "EHLO m15114.mail.126.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728532AbgHKNnq (ORCPT ); Tue, 11 Aug 2020 09:43:46 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=From:Subject:Date:Message-Id; bh=6n9ucVjwm5xquc5GPM dzBC85j1s4vXBylBHGsRZs4Ms=; b=Jgg5an1wvpkpZs1ZrZkhMeUIB8/6tYVEJw jpzf3+Kq+omnrzTjIDicoaMlTNT+YsN6s763Z67WnHDeb++HPzpzDEzloFSVbfeC jZCfHKIbqfoFkhNa5hcmMC57X6PlfRD3BtuIRfl7BMtlYYT17b8CoHO56d2qoyke xDJ66a1L8= Received: from 192.168.137.133 (unknown [112.10.84.202]) by smtp7 (Coremail) with SMTP id DsmowAA3CBBqoDJfSwACIQ--.8153S3; Tue, 11 Aug 2020 21:43:09 +0800 (CST) From: Xianting Tian To: axboe@kernel.dk Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] block: don't read block device if it's invalid Date: Tue, 11 Aug 2020 09:43:06 -0400 Message-Id: <1597153386-87954-1-git-send-email-xianting_tian@126.com> X-Mailer: git-send-email 1.8.3.1 X-CM-TRANSID: DsmowAA3CBBqoDJfSwACIQ--.8153S3 X-Coremail-Antispam: 1Uf129KBjvJXoW3Jw1xXF18Gw4fCFWrAw48Xrb_yoW7Xw4rpr y5XryUGr18Cr47Jr47tF13JrW8t3yDtay7JrWxKry3Ar1UGrW5tFyxAFyUAF1qgr18trZr JF4qqr40gryUWFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jrR6wUUUUU= X-Originating-IP: [112.10.84.202] X-CM-SenderInfo: h0ld03plqjs3xldqqiyswou0bp/1tbi3A19pFpD-XbzCAAAsh Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We found several processes in 'D' state after nvme device hot-removed, The call trace as below, we can see process 848 got lock 'bdev->bd_mutex' in blkdev_reread_part(), but scheduled out due to wait for IO done. But the IO won't be completed as the device is hot-removed. Then it caused the lock 'bdev->bd_mutex' can't be unlocked. As a result, it caused other processes, which need to get the same lock 'bdev->bd_mutex', blocked on this lock. When nvme device hot-removed, kernel will start a thread to handle the task of nvme device removing, as the call trace of process 1111504 shows below. I listed the call trace of nvme_kill_queues() in detail as below, we can see 'NVME_NS_DEAD' is set, then when executing nvme_revalidate_disk(), it found 'NVME_NS_DEAD' is set and 'set_capacity(disk, 0)' will be called to set disk capacity to 0. nvme_kill_queues() if (test_and_set_bit(NVME_NS_DEAD, &ns->flags)) return; revalidate_disk(disk) disk->fops->revalidate_disk(disk) <=for nvme device, revalidate_disk=nvme_revalidate_disk() mutex_lock(&bdev->bd_mutex) This patch is to reduce the probability of such problem. Before getting the lock of 'bdev->bd_mutex' in blkdev_reread_part(), add the code to check if the capacity of the disk is 0, just return. Then we can avoid the happen of the issue: nvme device is hot-removed, and its capacity is alreday set to 0; then if there is process like 848 want to read the device, it will return directly in blkdev_reread_part(), then it will not get the lock "bdev->bd_mutex", which can't be unlocked by the process itself as IO can't be completed. cat /proc/848/stack [] io_schedule+0x16/0x40 [] do_read_cache_page+0x3ee/0x5e0 [] read_cache_page+0x15/0x20 [] read_dev_sector+0x2d/0xa0 [] read_lba+0x104/0x1c0 [] find_valid_gpt+0xfa/0x720 [] efi_partition+0x89/0x430 [] check_partition+0x100/0x1f0 [] rescan_partitions+0xb4/0x360 [] __blkdev_reread_part+0x64/0x70 [] blkdev_reread_part+0x23/0x40 <<==mutex_lock(&bdev->bd_mutex); [] blkdev_ioctl+0x44b/0x8e0 [] block_ioctl+0x41/0x50 [] do_vfs_ioctl+0xa7/0x5e0 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x1f/0xb9 [] 0xffffffffffffffff cat /proc/1111504/stack [] revalidate_disk+0x49/0x80 <<==mutex_lock(&bdev->bd_mutex); [] nvme_kill_queues+0x52/0x80 [nvme_core] [] nvme_remove_namespaces+0x44/0x50 [nvme_core] [] nvme_remove+0x85/0x130 [nvme] [] pci_device_remove+0x39/0xc0 [] device_release_driver_internal+0x141/0x210 [] device_release_driver+0x12/0x20 [] pci_stop_bus_device+0x8c/0xa0 [] pci_stop_and_remove_bus_device+0x12/0x20 [] pciehp_unconfigure_device+0x7a/0x1e0 [] pciehp_disable_slot+0x52/0xd0 [] pciehp_power_thread+0x8a/0xb0 [] process_one_work+0x14e/0x370 [] worker_thread+0x4d/0x3f0 [] kthread+0x109/0x140 [] ret_from_fork+0x2a/0x40 [] 0xffffffffffffffff cat /proc/1197767/stack [] __blkdev_get+0x6e/0x450 <<==mutex_lock_nested(&bdev->bd_mutex, for_part); [] blkdev_get+0x1a4/0x300 [] blkdev_open+0x7a/0xa0 [] do_dentry_open+0x20f/0x330 [] vfs_open+0x50/0x70 [] path_openat+0x548/0x13b0 [] do_filp_open+0x91/0x100 [] do_sys_open+0x124/0x210 [] SyS_open+0x1e/0x20 [] do_syscall_64+0x6c/0x1b0 [] entry_SYSCALL64_slow_path+0x25/0x25 [] 0xffffffffffffffff ps -eo pid,comm,state | grep ' D' 848 systemd-udevd D 1111504 kworker/10:1 D 1197767 isdct D 1198830 isdct D 1580322 xxd D 1616804 kworker/10:0 D 1626264 isdct D 1734726 kworker/10:2 D 2197993 isdct D 2662117 xxd D 3083718 xxd D 3189834 xxd D Signed-off-by: Xianting Tian --- block/ioctl.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/ioctl.c b/block/ioctl.c index bdb3bbb..159bceb 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -94,6 +94,9 @@ static int blkdev_reread_part(struct block_device *bdev) { int ret; + if (unlikely(!get_capacity(bdev->bd_disk))) + return -EIO; + if (!disk_part_scan_enabled(bdev->bd_disk) || bdev != bdev->bd_contains) return -EINVAL; if (!capable(CAP_SYS_ADMIN)) -- 1.8.3.1