Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2442318rdh; Wed, 27 Sep 2023 02:49:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFTHZ4fcswqmVrb65QcblKTk8lgolPxBmb22fTS0cUqGrjaK4F0d96mR1hRcE//9skyVGwY X-Received: by 2002:a05:6a20:d429:b0:13a:59b1:c884 with SMTP id il41-20020a056a20d42900b0013a59b1c884mr1269172pzb.40.1695808140737; Wed, 27 Sep 2023 02:49:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695808140; cv=none; d=google.com; s=arc-20160816; b=RlXR9zmgRAwSsKpt2KS9r1CtnnQMzffk0jfYJpBlw871H8LSQauf4paLYt74kt+zN8 qwXaAi2jw+lapQ17HxVZKff9vcCChOLDzdBsXZRKIijBHc0NBgQD71CghEsdE3pql7Dk hI7z7pOkISlMUAjuAphOeIMtwa3CIaVk/3cMm7Mx3K0EQUY8QcjjQvZWCcNcu4lbYUgb 4FNXQkEQQoODRzGI4z+BQ3lhpgRRv4DZVRn1Eb9qh2bflJWguN5+nFLpEKMWcAIjg7bl YM0+8TtOr22L+td9XeXdUluETmbXsO52MXKWYlpc9QY8qG5J9U441xoEJvWa834VD2vT QZKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=bImjQO8yMAOcyemf+j7vifpdVUHL6Ft4UQfD5d12HNo=; fh=xY6zCZnndZ34IOi4JiHn9UGuKF09HkKqybwz6f7VEJY=; b=fWs22emQRdPd5LTJbQJUlLBC0O+4mCNLL0ygSX/KFHto4MNQwjI9IBAiMs7QrVuOin yatnkNlfbEOaJ9pQP3DUgWWVbumXz2I3qgQi1zf9Zh4dSOjB5W6aLCFZg+YIz8EMxXkq YfIIlEn/qFDt4INss5G4xpPnsZTUIoNldCgUKP/fcpym/bijf7H/9u9HC2k4RDTz1Iru NZ/vw3G4sXSelL5X+1gyFTr+MKnMsA0EqOJKoy0xWxL4i+ynmQZKHIQj6CdG2DzWugNA FHHI0gpJbtVc93rV9ffSqAKtYBjZs16JoXbxvzT8RTDlJku2q05DkkgbviWLt0JyrqTR +hxg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id z21-20020a17090ad79500b0027749a98350si8296735pju.159.2023.09.27.02.49.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 02:49:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 4D591818F660; Wed, 27 Sep 2023 02:40:05 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229954AbjI0Jjo (ORCPT + 99 others); Wed, 27 Sep 2023 05:39:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229550AbjI0Jjn (ORCPT ); Wed, 27 Sep 2023 05:39:43 -0400 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB7FDC0; Wed, 27 Sep 2023 02:39:41 -0700 (PDT) Received: from kwepemm000012.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4RwWgx2MKdzMlpH; Wed, 27 Sep 2023 17:35:57 +0800 (CST) Received: from [10.174.178.220] (10.174.178.220) by kwepemm000012.china.huawei.com (7.193.23.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Wed, 27 Sep 2023 17:39:39 +0800 Message-ID: <374f2f3c-e0f0-cd28-4b43-fa46a1fd5002@huawei.com> Date: Wed, 27 Sep 2023 17:39:38 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [RFC PATCH v2 00/18] scsi: scsi_error: Introduce new error handle mechanism Content-Language: en-US To: Mike Christie , Christoph Hellwig CC: "James E . J . Bottomley" , "Martin K . Petersen" , , Hannes Reinecke , , , References: <20230901094127.2010873-1-haowenchao2@huawei.com> <47bed3cb-f307-ec55-5c28-051687dab1ea@huawei.com> <06268327-cfed-f266-34a7-fda69411ef2a@huawei.com> <27eb28b9-46e9-489f-9826-5e8f9a9a662f@oracle.com> From: Wenchao Hao In-Reply-To: <27eb28b9-46e9-489f-9826-5e8f9a9a662f@oracle.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.178.220] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm000012.china.huawei.com (7.193.23.142) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-2.2 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 27 Sep 2023 02:40:05 -0700 (PDT) On 2023/9/27 1:37, Mike Christie wrote: > On 9/26/23 7:57 AM, Wenchao Hao wrote: >> On 2023/9/26 1:54, Mike Christie wrote: >>> On 9/25/23 10:07 AM, Wenchao Hao wrote: >>>> On 2023/9/25 22:55, Christoph Hellwig wrote: >>>>> Before we add another new error handling mechanism we need to fix the >>>>> old one first.  Hannes' work on not passing the scsi_cmnd to the various >>>>> reset handlers hasn't made a lot of progress in the last five years and >>>>> we'll need to urgently fix that first before adding even more >>>>> complexity. >>>>> >>>> I observed Hannes's patches posted about one year ago, it has not been >>>> applied yet. I don't know if he is still working on it. >>>> >>>> My patches do not depend much on that work, I think the conflict can be >>>> solved fast between two changes. >>> >>> I think we want to figure out Hannes's patches first. >>> >>> For a new EH design we will want to be able to do multiple TMFs in parallel >>> on the same host/target right? >>> >> >> It's not necessary to do multiple TMFs in parallel, it's ok to make sure >> each TMFs do not affect each other. >> >> For example, we have two devices: 0:0:0:0 and 0:0:0:1 >> >> Both of them request device reset, they do not happened in parallel, but >> would in serial. If 0:0:0:0 is performing device reset in progress, 0:0:0:1 >> just wait 0:0:0:0 to finish. > > I see. I guess we still get the benefit of not having to stop other devices > when doing TMFs. > Yes, it's better to support multiple TMFs in parallel than just run in serial. I would wait for Hannes's changes to be applied and send my change again. > I think we still want a common way to allocate/free and manage resources > drivers will use during this time. Maybe have a init_device/target/cmd/eh_priv > and exit_device/target/eh_priv (I'm not sure of the name, but something similar > to the init_cmd_priv/exit_cmd_priv we have for normal commands. > > scsi-ml then calls into the new eh with the priv data. Drivers don't have to > do the preallocation and worry if it's per device/target/host. > > I'm not 100% sure about the low level details. Check out how Hannes's is > handling tag management for TMFs as well. > > >> >>> The problem is that we need to be able to make forward progress in the EH >>> path and not fail just because we can't allocate memory for a TMF related >>> struct. To accomplish this now, drivers will use mempools, preallocate TMF >>> related structs/mem/tags with their scsi_cmnd related structs, preallocate >>> per host/target/device related structs or ignore what I wrote above and just >>> fail. >>> >>> Hannes's patches fix up the eh callouts so they don't pass in a scsi_cmnd >>> when it's not needed. That seems nice because after that, then for your new >>> EH we can begin to standardize on how to handle preallocation of drivers >>> resources needed to perform TMFs for your new EH. It could be a per >>> device/target/host callout to allow drivers to preallocate, then scsi-ml calls >>> into the drivers with that data. It doesn't have to be exactly like that or >>> anything close. It would be nice for drivers to not have to think about this >>> type of thing and scsi-ml just to handle the resource management for us when >>> there are multiple TMFs in progress. >>> >> >