Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753155AbdHOCRf (ORCPT ); Mon, 14 Aug 2017 22:17:35 -0400 Received: from szxga02-in.huawei.com ([45.249.212.188]:11326 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753128AbdHOCRd (ORCPT ); Mon, 14 Aug 2017 22:17:33 -0400 From: "Tangchen (UVP)" To: Bart Van Assche , "lduncan@suse.com" , "cleech@redhat.com" , "axboe@kernel.dk" CC: "linux-scsi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , guijianfeng , zhengchuan , "Tangchen (UVP)" Subject: =?utf-8?B?562U5aSNOiBbaXNjc2ldIERlYWRsb2NrIG9jY3VycmVkIHdoZW4gbmV0d29y?= =?utf-8?Q?k_is_in_error?= Thread-Topic: [iscsi] Deadlock occurred when network is in error Thread-Index: AdMU7px11YQAMA8RQpqjHfoYPGQQNgAIc0wAABXO1gA= Date: Tue, 15 Aug 2017 02:16:11 +0000 Message-ID: <22E823DBB7698E489DC113638F7470729C1AF0@DGGEMM506-MBX.china.huawei.com> References: <22E823DBB7698E489DC113638F7470729C17B6@DGGEMM506-MBX.china.huawei.com> <1502723836.2333.3.camel@wdc.com> In-Reply-To: <1502723836.2333.3.camel@wdc.com> Accept-Language: zh-CN, en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.134.147.155] Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A0B0205.599259A7.0041,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=169.254.3.138, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: a27f8717b56ac88840f0ee56cd42fd76 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id v7F2He19022401 Content-Length: 2684 Lines: 73 Hi, Bart, Thank you very much for the quick response. But I'm not using mq, and I run into these two problems in a non-mq system. The patch you pointed out is fix for mq, so I don't think it can resolve this problem. IIUC, mq is for SSD ? I'm not using ssd, so mq is disabled. On Mon, 2017-08-14 at 11:23 +0000, Tangchen (UVP) wrote: > Problem 2: > > *************** > [What it looks like] > *************** > When remove a scsi device, and the network error happens, __blk_drain_queue() could hang forever. > > # cat /proc/19160/stack > [] msleep+0x1d/0x30 > [] __blk_drain_queue+0xe4/0x160 [] > blk_cleanup_queue+0x106/0x2e0 [] > __scsi_remove_device+0x52/0xc0 [scsi_mod] [] > scsi_remove_device+0x2b/0x40 [scsi_mod] [] > sdev_store_delete_callback+0x10/0x20 [scsi_mod] [] > sysfs_schedule_callback_work+0x15/0x80 > [] process_one_work+0x169/0x340 [] > worker_thread+0x183/0x490 [] kthread+0x96/0xa0 > [] kernel_thread_helper+0x4/0x10 > [] 0xffffffffffffffff > > The request queue of this device was stopped. So the following check will be true forever: > __blk_run_queue() > { > if (unlikely(blk_queue_stopped(q))) > return; > > __blk_run_queue_uncond(q); > } > > So __blk_run_queue_uncond() will never be called, and the process hang. > > [ ... ] > > **************** > [How to reproduce] > **************** > Unfortunately I cannot reproduce it in the latest kernel. > The script below will help to reproduce, but not very often. > > # create network error > tc qdisc add dev eth1 root netem loss 60% > > # restart iscsid and rescan scsi bus again and again while [ 1 ] do > systemctl restart iscsid > rescan-scsi-bus (http://manpages.ubuntu.com/manpages/trusty/man8/rescan-scsi-bus.8.html) > done This should have been fixed by commit 36e3cf273977 ("scsi: Avoid that SCSI queues get stuck"). The first mainline kernel that includes this commit is kernel v4.11. > void __blk_run_queue(struct request_queue *q) { > - if (unlikely(blk_queue_stopped(q))) > + if (unlikely(blk_queue_stopped(q)) && > + unlikely(!blk_queue_dying(q))) > return; > > __blk_run_queue_uncond(q); Are you aware that the single queue block layer is on its way out and will be removed sooner or later? Please focus your testing on scsi-mq. Regarding the above patch: it is wrong because it will cause lockups during path removal for other block drivers. Please drop this patch. Bart.