Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp291279pxt; Fri, 6 Aug 2021 02:03:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOQOJjEP4V8bYYN1nZ/0gH97fVMAjgLwFUsN/x5UWcQr/x/UD/B5RVIJkHptlKzeiGqYCw X-Received: by 2002:a17:906:fa10:: with SMTP id lo16mr8946766ejb.154.1628240595349; Fri, 06 Aug 2021 02:03:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628240595; cv=none; d=google.com; s=arc-20160816; b=FnNc1UkIVSRqhs8/Jz7K9bt2gdhPFEMb6ZNAeuhr0R2+KLQdFsAKhWaLs8bJPHMkTx rSTLyqfjZJXr2YqaUIW+fT9dmM/J4KAds6XuJGsaAtQuruUv0yG+i2plRsgfTmufklUs SaBHUlQeMs8wgAYb5K2Bqbfw0RfMX4MbMhJ1pIlamntHpNyJqJ+oCk1k0tGtAgZokYXK jKDRalFpAbN24MHkTbqzVfKCE69HlWNuYPs6C01ygsg3VrpPH4lDtJhBnbpVHS0PVF0Y UyiW3hQ6/S1Saras2c69DtK/zMXLc1EEyFjjtrD4Kv8YJpFEOCucKpV1pjvA/aeMsOW7 Wr9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=9wxWzf9pX/4I8Ie6NRbCV7vttmNs8giBG+7xQcoXCuU=; b=XFRmhFZv3UaS31CUmbGPJ9KIl/uuhzLP8aBrwXoig9OauFslorNu0ZLi1km2JoV2kU yGOQBHo+GjgW1n6oJz2LB9SeuPgN+RY83ZKma7ETXoT9aSSgL/6A5r7mbbrpEGaaufEV xTjIs59I7GULpiKlQ263PTmfcudzFUGLEpm0ZyoBxJUTquWCk8m4ahopk50VXRcM303q GAWP4eYU60eeYNLisPEN+KbmAoY/KSQYp690IwbCVd8aTNNRyNO+lqU23XizEcpjUhwj G5wvgT3rW/oYX1Yo0uPY24/XQ1zIuRqCcmf86Ppk+B3mediJT8XEbkE3ZBahCNYfn+z7 k2LA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f8si7922364edx.584.2021.08.06.02.02.52; Fri, 06 Aug 2021 02:03:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243962AbhHFI7V (ORCPT + 99 others); Fri, 6 Aug 2021 04:59:21 -0400 Received: from frasgout.his.huawei.com ([185.176.79.56]:3601 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230119AbhHFI7U (ORCPT ); Fri, 6 Aug 2021 04:59:20 -0400 Received: from fraeml745-chm.china.huawei.com (unknown [172.18.147.207]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Ggzsw4CHWz6H6vm; Fri, 6 Aug 2021 16:58:44 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml745-chm.china.huawei.com (10.206.15.226) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Fri, 6 Aug 2021 10:59:03 +0200 Received: from [10.47.24.8] (10.47.24.8) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Fri, 6 Aug 2021 09:59:02 +0100 Subject: Re: [PATCH] scsi: core: Run queue first after running device. To: , , , , CC: , , , References: <20210805143231.1713299-1-lijinlin3@huawei.com> From: John Garry Message-ID: <908bb2bb-c511-06a4-e0b6-577d90bb9b57@huawei.com> Date: Fri, 6 Aug 2021 09:58:29 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: <20210805143231.1713299-1-lijinlin3@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.47.24.8] X-ClientProxiedBy: lhreml706-chm.china.huawei.com (10.201.108.55) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/08/2021 15:32, lijinlin3@huawei.com wrote: > From: Li Jinlin > > We found a hang issue, the test steps are as follows: > 1. echo "blocked" >/sys/block/sda/device/state > 2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10 > 3. echo none > /sys/block/sda/queue/scheduler > 4. echo "running" >/sys/block/sda/device/state > > Step3 and Step4 should finish this work after Step4, but them hangs. > > CPU#0 CPU#1 CPU#2 > --------------- ---------------- ---------------- > Step1: blocking device > > Step2: dd xxxx > ^^^^^^ get request > q_usage_counter++ > > Step3: switching scheculer > elv_iosched_store > elevator_switch > blk_mq_freeze_queue > blk_freeze_queue > > blk_freeze_queue_start > ^^^^^^ mq_freeze_depth++ > > > blk_mq_run_hw_queues > ^^^^^^ can't run queue when dev blocked > > > blk_mq_freeze_queue_wait > ^^^^^^ Hang here!!! > wait q_usage_counter==0 > > Step4: running device > store_state_field > scsi_rescan_device > scsi_attach_vpd > scsi_vpd_inquiry > __scsi_execute > blk_get_request > blk_mq_alloc_request > blk_queue_enter > ^^^^^^ Hang here!!! > wait mq_freeze_depth==0 > > blk_mq_run_hw_queues > ^^^^^^ dispatch IO, q_usage_counter will reduce to zero > > blk_mq_unfreeze_queue > ^^^^^ mq_freeze_depth-- > > Step3 and Step4 wait for each other, caused hangs. > > This requires run queue frist to fix this issue when the device state frist ? > changes to SDEV_RUNNING. > > Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after offlinining device") > Signed-off-by: Li Jinlin > Signed-off-by: Qiu Laibin > Signed-off-by: Wu Bo what kind of SoB is this? > --- > drivers/scsi/scsi_sysfs.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index c3a710bceba0..aa701582c950 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -809,12 +809,12 @@ store_state_field(struct device *dev, struct device_attribute *attr, > ret = scsi_device_set_state(sdev, state); > /* > * If the device state changes to SDEV_RUNNING, we need to > - * rescan the device to revalidate it, and run the queue to > - * avoid I/O hang. > + * run the queue to avoid I/O hang, and rescan the device > + * to revalidate it. A bit more description of the IO hang would be useful > */ > if (ret == 0 && state == SDEV_RUNNING) { > - scsi_rescan_device(dev); > blk_mq_run_hw_queues(sdev->request_queue, true); > + scsi_rescan_device(dev); This would not have happened if scsi_rescan_device() was ran outside the mutex lock region, like I suggested originally. Indeed, I doubt blk_mq_run_hw_queues() needs to be run with the sdev state_mutex held either. > } > mutex_unlock(&sdev->state_mutex); > > --