Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp2103701ioo; Sat, 28 May 2022 04:51:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJykZ3FHgPQuBWpHpRX75mHe4Kja9NwfWZVqqJ7MfpdW6E/1Y/YHDDXAt7y6zn2SPMk6S3Zg X-Received: by 2002:a17:907:2cc4:b0:6fe:1c72:7888 with SMTP id hg4-20020a1709072cc400b006fe1c727888mr42232015ejc.373.1653738691171; Sat, 28 May 2022 04:51:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653738691; cv=none; d=google.com; s=arc-20160816; b=0SuuxCv3Vqy9IoQZxKW59o8FdKikHxySli67BCB6k9lwl0x/it5fTSeUHLB09t7xgH WWwja3h2oLucfEDbr1ystMSUfvp7UjimSf49KE4jloadJ2hQDd7VOkBDLXfACZsd8CsU SkxtXpQsBD3210Zn3PFaxM858ez+udRTHJfAJn2vUupCEjTG3/jiYcbhp8voFrdN5Jx0 Z30qfYzb5+5b/fsGWs+gKJZDmgsxQbkwL16UAFyFgRLkPOhSy7khCKqMwZCWfGu1iZ+o KaqoE9E1We21BY0/XFw15b339jyf6XLwVydUwErI3o9HbkDczJlmzZozGOVZzUH3/IwK dx/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=CYRwSL6RfvXCtHlH/8+l84HNHlFRS5C+Xr1M2kIn/d0=; b=DEUDZrrKluQo/Sa4qjq8t+TuU2AtifspKkDDCOyt/mC3X80kAaXYuRMQRBzqUvWDSF 4EOsIhwWcoBoVzKTH2IvUoQXpK8navy1xT/9f8PjpYPeamYdxsvaVFf9fshTa/ognMxH PzKHE/HBDZyz1CZpypcYbMexpW94uSTbUar/8SqSr01AoWSyedaI2H/A/dE2bKqL+COp 2qWzgBzd83UCkr60ymfBsv/MWvVza4HNYIAtxYMUF8ZTNrnPEW5Gs3N6eywZjAQZbpBJ J5uT93SQ9n7YYofIg3Fpv/iYM+5KNam2y5ptxrk+czLsYlJXQW2Mjayq3nbpIGaSIr0i ITTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hw8-20020a170907a0c800b006fec2f4c252si7083311ejc.135.2022.05.28.04.51.05; Sat, 28 May 2022 04:51:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1355572AbiE1B72 (ORCPT + 99 others); Fri, 27 May 2022 21:59:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355399AbiE1B7L (ORCPT ); Fri, 27 May 2022 21:59:11 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4904F5D196; Fri, 27 May 2022 18:59:10 -0700 (PDT) Received: from kwepemi500020.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4L94ZN6kg2zjX2h; Sat, 28 May 2022 09:58:04 +0800 (CST) Received: from kwepemm600009.china.huawei.com (7.193.23.164) by kwepemi500020.china.huawei.com (7.221.188.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Sat, 28 May 2022 09:59:08 +0800 Received: from huawei.com (10.175.127.227) by kwepemm600009.china.huawei.com (7.193.23.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Sat, 28 May 2022 09:59:07 +0800 From: Yu Kuai To: , , , CC: , , , Subject: [PATCH -next v4 4/6] nbd: fix io hung while disconnecting device Date: Sat, 28 May 2022 10:12:33 +0800 Message-ID: <20220528021235.2120995-5-yukuai3@huawei.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220528021235.2120995-1-yukuai3@huawei.com> References: <20220528021235.2120995-1-yukuai3@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.127.227] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600009.china.huawei.com (7.193.23.164) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In our tests, "qemu-nbd" triggers a io hung: INFO: task qemu-nbd:11445 blocked for more than 368 seconds. Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:qemu-nbd state:D stack: 0 pid:11445 ppid: 1 flags:0x00000000 Call Trace: __schedule+0x480/0x1050 ? _raw_spin_lock_irqsave+0x3e/0xb0 schedule+0x9c/0x1b0 blk_mq_freeze_queue_wait+0x9d/0xf0 ? ipi_rseq+0x70/0x70 blk_mq_freeze_queue+0x2b/0x40 nbd_add_socket+0x6b/0x270 [nbd] nbd_ioctl+0x383/0x510 [nbd] blkdev_ioctl+0x18e/0x3e0 __x64_sys_ioctl+0xac/0x120 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fd8ff706577 RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577 RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0 R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0 "qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following message was found: block nbd0: Send disconnect failed -32 Which indicate that something is wrong with the server. Then, "qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear requests after commit 2516ab1543fd("nbd: only clear the queue on device teardown"). And in the meantime, request can't complete through timeout because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which means such request will never be completed in this situation. Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't complete multiple times, switch back to call nbd_clear_sock() in nbd_clear_sock_ioctl(), so that inflight requests can be cleared. Signed-off-by: Yu Kuai Reviewed-by: Josef Bacik --- drivers/block/nbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index a673a97b9b6b..d536bca0a9a2 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1424,7 +1424,7 @@ static int nbd_start_device_ioctl(struct nbd_device *nbd) static void nbd_clear_sock_ioctl(struct nbd_device *nbd, struct block_device *bdev) { - sock_shutdown(nbd); + nbd_clear_sock(nbd); __invalidate_device(bdev, true); nbd_bdev_reset(nbd); if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, -- 2.31.1