Received: by 2002:a05:7412:f584:b0:e2:908c:2ebd with SMTP id eh4csp1380219rdb; Mon, 4 Sep 2023 11:48:05 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHTgPaQTKadR+K7I+NXQ0vv5c9+pQDLLB+rAU+Ih5hKgPSS83/mMinq0V8KTJ34B8fgKj/K X-Received: by 2002:a17:906:224b:b0:9a1:bee8:9252 with SMTP id 11-20020a170906224b00b009a1bee89252mr8316103ejr.29.1693853285230; Mon, 04 Sep 2023 11:48:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693853285; cv=none; d=google.com; s=arc-20160816; b=TeEzsv6HHTkHeabtf7nKXIy9BnuuPymu4YNMqBFxy+5WQf3L4iQGS4BEoXSKai+M4N 44AHDNRoTYC/gEeh+7OofS8Ch1XW4mFGbVKAMkt89qZT/dVeZoBNVJw2oNHPw4tg46jL zV58Dzl/jT8dQcUEbCcKxYDUDiTRNcUC4eWSwhxj+2w/w9BB9Tb1g4UWFDgCHqNam6cN QwtJmT7H9E6b6VJJ6UQGBKTliZLzkaJs+/ypjJI7YtRmtPDRXzkX8ilUwZoQ+1MN577U RIO5GC4BHdDokmzgWxu1DCSydVLr2gPd9caxZQ+4Ue8hZjyoYvZ4jzlEUIwWVnb2t4gR IUdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=LkcsJYd0UoZL9Z5N/L8nWQqMhVFtzNLqBNsKPWj/uPo=; fh=nIVZeCjXorAtieqNYTeJNKMWI5wB8Yukpag5M8y9TD4=; b=W3qXkTFz7+EOgv8Pqlzz0jHsXVZcsSz1VCccK2r1UJ/t6yowJTQpVP8yOSIIfkf9Kr KENbhgk/y/y3Fop0tqJ2JmjbJEPo/ccFRcZL30Gtw0wwg103PZUm0R9toXT/+Z/qRTXg +uxf2rahKoLAILq4+o/CuILjv6kkpSyWl3hJWTbF7kfSaVVKUWgJ5QBLxRLjCxpvjqsZ Pg9VdV0ldKiMdGlpMoJ1iNwAuTuSSFRMO/oijCQDukSZ5Es0tLPuUJs1lpJrcw1D9RzK DmoMR5fkDhRVl1B8kkapS4ZvotB3VfofTO6++qiuWRlnUVKnEls03ybJKzkYqi8UyYSc s/6A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s13-20020a1709064d8d00b0098e41199c5dsi6503790eju.417.2023.09.04.11.47.37; Mon, 04 Sep 2023 11:48:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242539AbjIDNAT (ORCPT + 99 others); Mon, 4 Sep 2023 09:00:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59754 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235732AbjIDNAT (ORCPT ); Mon, 4 Sep 2023 09:00:19 -0400 Received: from dggsgout11.his.huawei.com (unknown [45.249.212.51]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A63B90; Mon, 4 Sep 2023 06:00:15 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4RfTJB5BzZz4f3l7y; Mon, 4 Sep 2023 21:00:10 +0800 (CST) Received: from [10.174.179.247] (unknown [10.174.179.247]) by APP4 (Coremail) with SMTP id gCh0CgCXc6bX1PVkZC5lCQ--.16578S3; Mon, 04 Sep 2023 21:00:11 +0800 (CST) Message-ID: <033cb727-35b5-3845-64b1-e698891d70b1@huaweicloud.com> Date: Mon, 4 Sep 2023 21:00:07 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.10.0 Subject: Re: [PATCH] scsi: ata: Fix a race condition between scsi error handler and ahci interrupt To: Niklas Cassel Cc: "dlemoal@kernel.org" , "linux-ide@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "yukuai3@huawei.com" , "yi.zhang@huawei.com" , "houtao1@huawei.com" , "yangerkun@huawei.com" References: <20230810014848.2148316-1-linan666@huaweicloud.com> <5d37add3-41ce-e2af-b45a-d701eaf36a6c@huaweicloud.com> From: Li Nan In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgCXc6bX1PVkZC5lCQ--.16578S3 X-Coremail-Antispam: 1UD129KBjvJXoWxXFyUKr1rKrWkWw43ZrW3GFg_yoWrJFyrpF WUJa1qkr1DXrW8tr4qqa1F9F1Fqan7Kry7ZryDJ3s7Zr1qq34rtr1DCFZ8WFn29wn7Gw1I vw4jgr9rAFWUXrUanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUv0b4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Cr0_Gr1UM28EF7xvwVC2z280aVAFwI0_GcCE3s1l84ACjcxK6I8E87Iv6xkF7I 0E14v26rxl6s0DM2vYz4IE04k24VAvwVAKI4IrM2AIxVAIcxkEcVAq07x20xvEncxIr21l 5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1j6r18McIj6I8E87Iv67 AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IY64vIr41lc7I2V7IY0VAS07Al zVAYIcxG8wCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F4 0E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Jw0_GFyl IxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY6xIIjxv20xvEc7CjxV AFwI0_Jr0_Gr1lIxAIcVCF04k26cxKx2IYs7xG6rW3Jr0E3s1lIxAIcVC2z280aVAFwI0_ Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU1 VOJ7UUUUU== X-CM-SenderInfo: polqt0awwwqx5xdzvxpfor3voofrz/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2023/9/4 19:57, Niklas Cassel 写道: > On Mon, Sep 04, 2023 at 07:45:51PM +0800, Li Nan wrote: >> >> >> 在 2023/8/22 18:30, Niklas Cassel 写道: >>> On Tue, Aug 22, 2023 at 05:20:33PM +0800, Li Nan wrote: >>>> Thanks for your reply, Niklas. >>>> >>>> 在 2023/8/21 21:51, Niklas Cassel 写道: >>>>> On Thu, Aug 10, 2023 at 09:48:48AM +0800, linan666@huaweicloud.com wrote: >>>> >>>> [snip] >>>> >>>>> >>>>> Hello Li Nan, >>>>> >>>>> I do not understand why the code in: >>>>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L722-L731 >>>>> >>>>> does not kick in, and repeats EH. >>>>> >>>>> >>>>> EH_PENDING is cleared before ->error_handler() is called: >>>>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L697 >>>>> >>>>> So ahci_error_intr() from the second error interrupt, which is called after >>>>> thawing the port, should have called ata_std_sched_eh(), which calls >>>>> ata_eh_set_pending(), which should have set EH_PENDING: >>>>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L884 >>>>> >>>>> >>>>> >>>>> My only guess is that after thawing the port: >>>>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L2807 >>>>> >>>>> The second error irq comes, and sets EH_PENDING, >>>>> but then this silly code might clear it: >>>>> https://github.com/torvalds/linux/blob/v6.5-rc7/drivers/ata/libata-eh.c#L2825-L2837 >>>>> >>>> >>>> Yeah, I think so. >>>> >>>>> I think the best way would be if we could improve this "spurious error >>>>> condition check"... because if this is indeed the code that clears EH_PENDING >>>>> for you, then this code basically makes the "goto repeat" code in >>>>> ata_scsi_port_error_handler() useless... >>>>> >>>>> >>>>> An alternative to improving the "spurious error condition check" might be for >>>>> you to try something like: >>>>> >>>> >>>> We have used this solution before, but it will case WARN_ON in >>>> ata_eh_finish() as below: >>>> >>>> WARNING: CPU: 1 PID: 118 at ../drivers/ata/libata-eh.c:4016 >>>> ata_eh_finish+0x15a/0x170 >>> >>> Ok. >>> >>> How about if you simply move the WARN_ON to ata_scsi_port_error_handler() >>> as well: >>> >>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c >>> index 35e03679b0bf..5be2fc651131 100644 >>> --- a/drivers/ata/libata-eh.c >>> +++ b/drivers/ata/libata-eh.c >>> @@ -741,6 +741,12 @@ void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap) >>> */ >>> ap->ops->end_eh(ap); >>> + if (!ap->scsi_host->host_eh_scheduled) { >>> + /* make sure nr_active_links is zero after EH */ >>> + WARN_ON(ap->nr_active_links); >>> + ap->nr_active_links = 0; >>> + } >>> + >>> spin_unlock_irqrestore(ap->lock, flags); >>> ata_eh_release(ap); >>> } else { >>> @@ -962,7 +968,7 @@ void ata_std_end_eh(struct ata_port *ap) >>> { >>> struct Scsi_Host *host = ap->scsi_host; >>> - host->host_eh_scheduled = 0; >>> + host->host_eh_scheduled--; >>> } >>> EXPORT_SYMBOL(ata_std_end_eh); >>> @@ -3948,10 +3954,6 @@ void ata_eh_finish(struct ata_port *ap) >>> } >>> } >>> } >>> - >>> - /* make sure nr_active_links is zero after EH */ >>> - WARN_ON(ap->nr_active_links); >>> - ap->nr_active_links = 0; >>> } >>> /** >>> >>> >>> >>> Kind regards, >>> Niklas >> >> We have tested this patch and it can fix the bug. Thank you so much. :) > > Awesome! :) > > Please send out a real patch, so that it is easier for the maintainer to > apply. > > No need to give any credit to me. > > > Kind regards, > Niklas It is my pleasure. I will send v2 later. -- Thanks, Nan