Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp1475300rwb; Fri, 12 Aug 2022 01:17:32 -0700 (PDT) X-Google-Smtp-Source: AA6agR7sdyJQO/txRRb34A6AvkQJjxJ2yPqlG9pjBBDbMuVBGC43gcQ3t2Glx7hJ0UfICPx8Qs1Z X-Received: by 2002:a05:6402:22bc:b0:43d:73a7:370a with SMTP id cx28-20020a05640222bc00b0043d73a7370amr2615485edb.120.1660292252386; Fri, 12 Aug 2022 01:17:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660292252; cv=none; d=google.com; s=arc-20160816; b=upqBFWgCL8bxSSw5/elK582F0ywYO5fgfIQIWJm9x4Tw4/ah9z6QjBkzeUiI3Uqy3U R5jJu5Rc8x9My9ytRAoF265a20S5582IK8M4QtH6J+Eb7AXm5WPLKNFn3jMFVeoiFIQG ELtodFW1vG+kqtWBIij7b54NxNqPVBS7ZEhuoCui1m+FzLnfhBao1W+DqyK1g1lsWY+F SXj3f2zvMOIvvafrrc27uPOy9sL/pFcPrHta+CEHYLbGa1+ilOIvya6Rh+6MSDHMs41E JpiDDp1zuj1fwNlB8F6No2PEFaVt2hhvO2blm8EvO5YilIFTpmiCFZqWhKdyFbqFkPYF mOQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id; bh=pVs00carWg/qi+3xLRlTeUDjDhSTFQvCa7DTv51e/6I=; b=FO70qXEcEzyAUjzrM62ZO6Q3B5uaG5Q9IivcpqkwGUSJ5U+nuIeaegPQn9QP4+GjRE n3N9xkviNxvo10LyqAoEjfCmgeGPR82Qz39AViV5HT1Hg7zUH1aMmBhBnk/Qe8PpY1sa mWH/TlPxXYfhjc7kZuWGoqa0QCrpJH2+tfNGRD0iTdb73FxgSxGf/KjRT7u6lMBar0aj MJLhWVuaAwUHxWedmFlkXkMUbPgRWyMxvosxchQeXAK7lVsp5lHIiN2fY9b2glFklE8I 3N4W4453anlQEbCPIYuO+KCIu4+t4wfUitugLKfL7uTnI/XecYBFO8bCPruKV9u3GnH6 ESKQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dt12-20020a170907728c00b007303cde561csi1450074ejc.947.2022.08.12.01.17.07; Fri, 12 Aug 2022 01:17:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234774AbiHLIGu (ORCPT + 99 others); Fri, 12 Aug 2022 04:06:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229552AbiHLIGr (ORCPT ); Fri, 12 Aug 2022 04:06:47 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3EB329DB6D; Fri, 12 Aug 2022 01:06:46 -0700 (PDT) Received: from fraeml741-chm.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4M3x8Y4CcJz687YW; Fri, 12 Aug 2022 16:06:37 +0800 (CST) Received: from lhrpeml500003.china.huawei.com (7.191.162.67) by fraeml741-chm.china.huawei.com (10.206.15.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 12 Aug 2022 10:06:43 +0200 Received: from [10.48.157.254] (10.48.157.254) by lhrpeml500003.china.huawei.com (7.191.162.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 12 Aug 2022 09:06:42 +0100 Message-ID: <437abe43-7ddd-6f49-9386-d8ed04c659bf@huawei.com> Date: Fri, 12 Aug 2022 09:06:42 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: Re: [PATCH 0/6] libsas and drivers: NCQ error handling To: Damien Le Moal , , , , , , CC: , References: <1658489049-232850-1-git-send-email-john.garry@huawei.com> From: John Garry In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.48.157.254] X-ClientProxiedBy: lhrpeml100003.china.huawei.com (7.191.160.210) To lhrpeml500003.china.huawei.com (7.191.162.67) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/08/2022 19:54, Damien Le Moal wrote: > On 2022/07/22 4:24, John Garry wrote: >> As reported in [0], the pm8001 driver NCQ error handling more or less >> duplicates what libata does in link error handling, as follows: >> - abort all commands >> - do autopsy with read log ext 10 command >> - reset the target to recover >> >> Indeed for the hisi_sas driver we want to add similar handling for NCQ >> errors. >> >> This series add a new libsas API - sas_ata_link_abort() - to handle host >> NCQ errors, and fixes up pm8001 and hisi_sas drivers to use it. As >> mentioned in the pm8001 changeover patch, I would prefer a better place to >> locate the SATA ABORT command (rather that nexus reset callback). >> >> I would appreciate some testing of the pm8001 change as the read log ext10 >> command mostly hangs on my arm64 machine - these arm64 hangs are a known >> issue. > Thanks for this! > I applied this series on top of the current Linus tree and ran some tests: a > bunch of fio runs and also ran libzbc test suites on a SATA SMR drive as that > generates many command failures. No problems detected, the tests all pass. > FYI, messages for failed commands look like this: > > pm80xx0:: mpi_sata_event 2685: SATA EVENT 0x23 > sas: Enter sas_scsi_recover_host busy: 1 failed: 1 > sas: sas_scsi_find_task: aborting task 0x00000000ba62a907 > pm80xx0:: mpi_sata_completion 2292: task null, freeing CCB tag 2 > sas: sas_scsi_find_task: task 0x00000000ba62a907 is aborted > sas: sas_eh_handle_sas_errors: task 0x00000000ba62a907 is aborted > ata21.00: exception Emask 0x0 SAct 0x20000000 SErr 0x0 action 0x0 > ata21.00: failed command: WRITE FPDMA QUEUED > ata21.00: cmd 61/02:00:ff:ff:ea/00:00:02:00:00/40 tag 29 ncq dma 8192 out > res 43/04:02:ff:ff:ea/00:00:02:00:00/00 Emask 0x400 (NCQ error) > ata21.00: status: { DRDY SENSE ERR } > ata21.00: error: { ABRT } > ata21.00: configured for UDMA/133 > ata21: EH complete > sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1 > For this specific test we don't seem to run a hardreset after the autopsy, but we do seem to be getting an NCQ error. That's interesting. We have noticed this scenario for hisi_sas NCQ error, whereby the autopsy decided a reset is not required or useful, such as a medium error. Anyway the pm8001 driver relies on the reset being run always for the NCQ error. So I am thinking of tweaking sas_ata_link_abort() as follows: void sas_ata_link_abort(struct domain_device *device) { struct ata_port *ap = device->sata_dev.ap; struct ata_link *link = &ap->link; link->eh_info.err_mask |= AC_ERR_DEV; + link->eh_info.action |= ATA_EH_RESET; ata_link_abort(link); } This should force a reset. Thanks, John > Seems all good to me. > >> >> Finally with these changes we can make the libsas task alloc/free APIs >> private, which they should always have been. >> >> Based on v5.19-rc6 >> >> [0] https://lore.kernel.org/linux-scsi/8fb3b093-55f0-1fab-81f4-e8519810a978@huawei.com/ >> >> John Garry (5): >> scsi: pm8001: Modify task abort handling for SATA task >> scsi: libsas: Add sas_ata_link_abort() >> scsi: pm8001: Use sas_ata_link_abort() to handle NCQ errors >> scsi: hisi_sas: Don't issue ATA softreset in hisi_sas_abort_task() >> scsi: libsas: Make sas_{alloc, alloc_slow, free}_task() private >> >> Xingui Yang (1): >> scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw >> >> drivers/scsi/hisi_sas/hisi_sas_main.c | 5 +- >> drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 22 ++- >> drivers/scsi/libsas/sas_ata.c | 10 ++ >> drivers/scsi/libsas/sas_init.c | 3 - >> drivers/scsi/libsas/sas_internal.h | 4 + >> drivers/scsi/pm8001/pm8001_hwi.c | 194 +++++++------------------ >> drivers/scsi/pm8001/pm8001_sas.c | 13 ++ >> drivers/scsi/pm8001/pm8001_sas.h | 8 +- >> drivers/scsi/pm8001/pm80xx_hwi.c | 177 ++-------------------- >> include/scsi/libsas.h | 4 - >> include/scsi/sas_ata.h | 5 + >> 11 files changed, 132 insertions(+), 313 deletions(-) >> > >