Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp292874pxt; Fri, 6 Aug 2021 02:05:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy6TQh6C2+jEVQMQLMGWnxkNesnvh5iS+Y0lIsUFOouLoXB/MXrt3RMM54nDoODHd4WDbLE X-Received: by 2002:a17:907:766c:: with SMTP id kk12mr8585668ejc.525.1628240748910; Fri, 06 Aug 2021 02:05:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628240748; cv=none; d=google.com; s=arc-20160816; b=bHgkhoNN8lvVO67xLT4oId8Sd1cKn6aOi4N3KG9miMJHTOWrnU0hN2qAfAmBzuAKEN WfU23E3tV00QdkOCZFqVY3eKB8jC8aFnwE5ZvWxKqmzsqDGMfVIItargpE4qQyVcCr6V HoyrDFozBaNaSeA5VT5/2mwkdIhO6RQ5j1znTiH0qe4btS0+VxsrmxxtqImnNF0TuKnN snkzevk2RWIyc7AQLDMK6arQmXyaGzUouaHwPVBrn/RtDwga0dh/UszsIbOPaLLv7DFV hx+8TDIa3Name2pXekbtt8vYv7PRQBd88ptDkqLMBBkLjr1Vom8S76dbAS3yDZz/8yje mlIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:from:subject; bh=1jvw6bKw/0eEFs3HmlogBNbODgyl0/UD8tGNjr0mY3Y=; b=B5NV9VYRswuC3RWNPx4opiuhpsWU3e6XWwkM3cGda1+9gWoisEvXXX4IsYIX4EzZVj J+9VWeZpLKttDs//Llndb+g0ffv5Yqfg9M3mZWHkFxlyUx+veYCefb6IVYN8g2Q8hE7w K1IilftBnU4eILoR9x7oWuyBx8/SYn997jeJfNcyv2o65TnAisl7rp+KwoDw9+kBkQ9a AnFLCCDD9ZTL2U+0PMI4vJdDJSVWeBjTaLu72u3PR6LvydW3o8bGjYnavSDEJDycqUvJ BrPp+EHY08SgrHweWgA7qH8MAZ2MXV7jstjy+QBAYJHQ2zxJUm1T0FanpcUIAWTmyNa+ zgIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k12si7219855eja.547.2021.08.06.02.05.25; Fri, 06 Aug 2021 02:05:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243998AbhHFJBj (ORCPT + 99 others); Fri, 6 Aug 2021 05:01:39 -0400 Received: from frasgout.his.huawei.com ([185.176.79.56]:3602 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243869AbhHFJBj (ORCPT ); Fri, 6 Aug 2021 05:01:39 -0400 Received: from fraeml739-chm.china.huawei.com (unknown [172.18.147.206]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Ggzwc4q27z6BCTG; Fri, 6 Aug 2021 17:01:04 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml739-chm.china.huawei.com (10.206.15.220) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Fri, 6 Aug 2021 11:01:22 +0200 Received: from [10.47.24.8] (10.47.24.8) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2176.2; Fri, 6 Aug 2021 10:01:21 +0100 Subject: Re: [PATCH] scsi: core: Run queue first after running device. From: John Garry To: , , , , CC: , , , References: <20210805143231.1713299-1-lijinlin3@huawei.com> <908bb2bb-c511-06a4-e0b6-577d90bb9b57@huawei.com> Message-ID: <0d27db3f-4236-4a30-97a0-ad1dcbf4bcfa@huawei.com> Date: Fri, 6 Aug 2021 10:00:48 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: <908bb2bb-c511-06a4-e0b6-577d90bb9b57@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.47.24.8] X-ClientProxiedBy: lhreml706-chm.china.huawei.com (10.201.108.55) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/08/2021 09:58, John Garry wrote: And the patch subject is ambiguous > On 05/08/2021 15:32, lijinlin3@huawei.com wrote: >> From: Li Jinlin >> >> We found a hang issue, the test steps are as follows: >>    1. echo "blocked" >/sys/block/sda/device/state >>    2. dd if=/dev/sda of=/mnt/t.log bs=1M count=10 >>    3. echo none > /sys/block/sda/queue/scheduler >>    4. echo "running" >/sys/block/sda/device/state >> >> Step3 and Step4 should finish this work after Step4, but them hangs. >> >>    CPU#0               CPU#1                CPU#2 >>    ---------------     ----------------     ---------------- >>                                             Step1: blocking device >> >>                                             Step2: dd xxxx >>                                                    ^^^^^^ get request >> >> q_usage_counter++ >> >>                        Step3: switching scheculer >>                        elv_iosched_store >>                          elevator_switch >>                            blk_mq_freeze_queue >>                              blk_freeze_queue >>                                > blk_freeze_queue_start >>                                  ^^^^^^ mq_freeze_depth++ >> >>                                > blk_mq_run_hw_queues >>                                  ^^^^^^ can't run queue when dev blocked >> >>                                > blk_mq_freeze_queue_wait >>                                  ^^^^^^ Hang here!!! >>                                         wait q_usage_counter==0 >> >>    Step4: running device >>    store_state_field >>      scsi_rescan_device >>        scsi_attach_vpd >>          scsi_vpd_inquiry >>            __scsi_execute >>              blk_get_request >>                blk_mq_alloc_request >>                  blk_queue_enter >>                  ^^^^^^ Hang here!!! >>                         wait mq_freeze_depth==0 >> >>      blk_mq_run_hw_queues >>      ^^^^^^ dispatch IO, q_usage_counter will reduce to zero >> >>                              blk_mq_unfreeze_queue >>                              ^^^^^ mq_freeze_depth-- >> >> Step3 and Step4 wait for each other, caused hangs. >> >> This requires run queue frist to fix this issue when the device state > > frist ? > >> changes to SDEV_RUNNING. >> >> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after >> offlinining device") >> Signed-off-by: Li Jinlin >> Signed-off-by: Qiu Laibin >> Signed-off-by: Wu Bo > > what kind of SoB is this? > >> --- >>   drivers/scsi/scsi_sysfs.c | 6 +++--- >>   1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c >> index c3a710bceba0..aa701582c950 100644 >> --- a/drivers/scsi/scsi_sysfs.c >> +++ b/drivers/scsi/scsi_sysfs.c >> @@ -809,12 +809,12 @@ store_state_field(struct device *dev, struct >> device_attribute *attr, >>       ret = scsi_device_set_state(sdev, state); >>       /* >>        * If the device state changes to SDEV_RUNNING, we need to >> -     * rescan the device to revalidate it, and run the queue to >> -     * avoid I/O hang. >> +     * run the queue to avoid I/O hang, and rescan the device >> +     * to revalidate it. > > A bit more description of the IO hang would be useful > >>        */ >>       if (ret == 0 && state == SDEV_RUNNING) { >> -        scsi_rescan_device(dev); >>           blk_mq_run_hw_queues(sdev->request_queue, true); >> +        scsi_rescan_device(dev); > > This would not have happened if scsi_rescan_device() was ran outside the > mutex lock region, like I suggested originally. > > Indeed, I doubt blk_mq_run_hw_queues() needs to be run with the sdev > state_mutex held either. > >>       } >>       mutex_unlock(&sdev->state_mutex); >> -- > > .