Received: by 2002:a05:7412:f584:b0:e2:908c:2ebd with SMTP id eh4csp1931658rdb; Tue, 5 Sep 2023 09:07:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFDVVFEL4ymHlqlfQYiY+skHpyagFIzmdIynJNNlOY2ebe6ICbKTHhqvrr2JMO6XQGpeqas X-Received: by 2002:a05:6402:12d7:b0:52b:c980:43f3 with SMTP id k23-20020a05640212d700b0052bc98043f3mr206249edx.28.1693930069502; Tue, 05 Sep 2023 09:07:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693930069; cv=none; d=google.com; s=arc-20160816; b=T1O3bRSvtVK4x0I8zYaBP0KGl0fNT7sRx9qonXFGG724OnO8uQY8XpkiUV6u9Q9KVz FWx/4Kzjw89fy8ilg+1Dec3JUBMNsWRN7U7B9SnYy8qtQvmLDgaihZlFEI+H4PCQg8nu hgedJpXhgiocGSn7whW3njqv7ClRpRguc7zoDhoV5K+kmcNT2isoiDSux8i008deQ3Dc pzMJ9g2q3G+7wzbJlpN2O+xiVr082IfWEU7RjQERZ1KK/Mbc0ElSmJTFRxvTIVMimYUY l+swpj9bXumTLfTCs/EsS0vhogmcqLNyIEsOIHheZS7YVFohegQMEe+5vmMqUfob5vcr 40Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=1BKqk4RYObYkQrma17CqXMMX/3vcxgMdivdj0xh40HM=; fh=nIVZeCjXorAtieqNYTeJNKMWI5wB8Yukpag5M8y9TD4=; b=iu7l1yQQOMuotN/N2Ces0RtR0NJeMjLm0gfQvyCWRUmkXTCBV7/PpzOE1z9ZTbGwUx jbcPhRxB5xjTUNvAaI+PpNyM6vyn3SR7HcsK2RQJ9Ab/3OGZavzpwRdKDSXTaex3ixF/ 4f2hZbTx8g7G07dODu2RMQdza+iMvp84PtbEzPZtXH7hwI7UfpnAy3NlbVBMZb9a6GS9 CVANvI6iu/x7Tfv1iQFt6/H5Gd6rVDrfkR8nvUSy0ad61uuOgYF3dakMBpRNzQolCCPK sHVhb22pzUGUKvSnJfSE+yUUZM48Vvf+Ktz01sGUgWVVhTYylzWkjoiBjNwU6mkTfGOW CtlQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id er5-20020a056402448500b005223c340277si1659543edb.426.2023.09.05.09.07.45; Tue, 05 Sep 2023 09:07:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349977AbjIDLqD (ORCPT + 19 others); Mon, 4 Sep 2023 07:46:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232492AbjIDLqC (ORCPT ); Mon, 4 Sep 2023 07:46:02 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7836F1AB; Mon, 4 Sep 2023 04:45:58 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4RfRfT6zsNz4f3n5t; Mon, 4 Sep 2023 19:45:53 +0800 (CST) Received: from [10.174.179.247] (unknown [10.174.179.247]) by APP4 (Coremail) with SMTP id gCh0CgA3x6lvw_VkE_JgCQ--.36724S3; Mon, 04 Sep 2023 19:45:55 +0800 (CST) Message-ID: <5d37add3-41ce-e2af-b45a-d701eaf36a6c@huaweicloud.com> Date: Mon, 4 Sep 2023 19:45:51 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH] scsi: ata: Fix a race condition between scsi error handler and ahci interrupt To: Niklas Cassel Cc: "dlemoal@kernel.org" , "linux-ide@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "yukuai3@huawei.com" , "yi.zhang@huawei.com" , "houtao1@huawei.com" , "yangerkun@huawei.com" References: <20230810014848.2148316-1-linan666@huaweicloud.com> From: Li Nan In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgA3x6lvw_VkE_JgCQ--.36724S3 X-Coremail-Antispam: 1UD129KBjvJXoWxJFWfKryUJr4UJry5Kr45Jrb_yoW5ZFW5pF WkJayqkr1DXr40yr4vqa1Fva4Fqan7Kry7ZryDW3s7ZF1qg34rtr4kCFZ8WFnagw1kGw4a vw4jgr9rAF4UXrUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvjb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I 0E14v26rxl6s0DM2vYz4IE04k24VAvwVAKI4IrM2AIxVAIcxkEcVAq07x20xvEncxIr21l 5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67 AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41lc7I2V7IY0VAS07Al zVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F4 0E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFyl IxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxV AFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E3s1lIxAIcVC2z280aVAFwI0_ Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxUFf HjUUUUU X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2023/8/22 18:30, Niklas Cassel 写道: > On Tue, Aug 22, 2023 at 05:20:33PM +0800, Li Nan wrote: >> Thanks for your reply, Niklas. >> >> 在 2023/8/21 21:51, Niklas Cassel 写道: >>> On Thu, Aug 10, 2023 at 09:48:48AM +0800, linan666@huaweicloud.com wrote: >> >> [snip] >> >>> >>> Hello Li Nan, >>> >>> I do not understand why the code in: >>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L722-L731 >>> >>> does not kick in, and repeats EH. >>> >>> >>> EH_PENDING is cleared before ->error_handler() is called: >>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L697 >>> >>> So ahci_error_intr() from the second error interrupt, which is called after >>> thawing the port, should have called ata_std_sched_eh(), which calls >>> ata_eh_set_pending(), which should have set EH_PENDING: >>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L884 >>> >>> >>> >>> My only guess is that after thawing the port: >>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L2807 >>> >>> The second error irq comes, and sets EH_PENDING, >>> but then this silly code might clear it: >>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L2825-L2837 >>> >> >> Yeah, I think so. >> >>> I think the best way would be if we could improve this "spurious error >>> condition check"... because if this is indeed the code that clears EH_PENDING >>> for you, then this code basically makes the "goto repeat" code in >>> ata_scsi_port_error_handler() useless... >>> >>> >>> An alternative to improving the "spurious error condition check" might be for >>> you to try something like: >>> >> >> We have used this solution before, but it will case WARN_ON in >> ata_eh_finish() as below: >> >> WARNING: CPU: 1 PID: 118 at ../drivers/ata/libata-eh.c:4016 >> ata_eh_finish+0x15a/0x170 > > Ok. > > How about if you simply move the WARN_ON to ata_scsi_port_error_handler() > as well: > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > index 35e03679b0bf..5be2fc651131 100644 > --- a/drivers/ata/libata-eh.c > +++ b/drivers/ata/libata-eh.c > @@ -741,6 +741,12 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap) > */ > ap->ops->end_eh(ap); > > + if (!ap->scsi_host->host_eh_scheduled) { > + /* make sure nr_active_links is zero after EH */ > + WARN_ON(ap->nr_active_links); > + ap->nr_active_links = 0; > + } > + > spin_unlock_irqrestore(ap->lock, flags); > ata_eh_release(ap); > } else { > @@ -962,7 +968,7 @@ void ata_std_end_eh(struct ata_port *ap) > { > struct Scsi_Host *host = ap->scsi_host; > > - host->host_eh_scheduled = 0; > + host->host_eh_scheduled--; > } > EXPORT_SYMBOL(ata_std_end_eh); > > @@ -3948,10 +3954,6 @@ void ata_eh_finish(struct ata_port *ap) > } > } > } > - > - /* make sure nr_active_links is zero after EH */ > - WARN_ON(ap->nr_active_links); > - ap->nr_active_links = 0; > } > > /** > > > > Kind regards, > Niklas We have tested this patch and it can fix the bug. Thank you so much. :) Feel free to add: Tested-by: Li Nan -- Thanks, Nan