Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp616514rwb; Thu, 6 Oct 2022 01:57:29 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4+EswgcEzeQAsX0lSxU/jh7i8EQXvWfj1sW/n10cFMrm7foVsq72QPUDvd88sufxCrEFOy X-Received: by 2002:a17:906:8a66:b0:78b:da52:b752 with SMTP id hy6-20020a1709068a6600b0078bda52b752mr3120187ejc.365.1665046649615; Thu, 06 Oct 2022 01:57:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665046649; cv=none; d=google.com; s=arc-20160816; b=vcH45dM0iMF+M7vjbOqfGJ/QnwlgB+gVhCFufVR7Ehn4f4iQtYwQzEQfIRtpmuDFas jwacBXkTBj+tq9J10wUODW5L5IdqCBupL56OkUQf3DwV+qFhHPCCyXUSbDPtePiZhTOg yfjWtrWNC9j5YN7vPPyO9SeFfsj/DYBFxd82fUfebx3u30Y9XjLfSQyW9ez6I8caIzUP yTxvI64ZCXl3eXrz1rs2CBvw2YhveNwKCSshFuA/m0H+wvgVOFmHG1vXMxtnip1GY2Hc 6k0nyK0EBRdZ5tSp4zkYBiCGZtcomkYxDMwZJhr5M0FtII0qBq0NtT/NmgezVO3wq5gB Ulfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:subject:from:user-agent:mime-version:date:message-id; bh=kJumDJRbJM/HP+BioOj/ZiGe+fa0BSy2vz6CfYOJuU4=; b=nQrBdeFY69Cbr6dYuUaxqIbOtovMp/Pmx/7aYZBPwrJ6XyPcunycMj2SiwWXPUT/eU TvmqYB0/i8PhUvhjOZJNShVoZeHyU8jxMioSmHPifJs8e2U8p/v2p4YsMy4+4AH+Zz2H DsE4uRWh/fZ9WFRsjXazSgEi5PiOq1bn+Shw+JTGmWntH75Jmu+kPYGXO9QlcKxWrXvm TKg2NFBkH3dKgNKTKAHnXlZh9vyWamWg4Hz8J8rUFmnLt5A08224Hu5LlqMk/KoeXj6B XnL9G9TstRDmk7bPdYcXERZpTVbRDzA+I+Pj/E4E96y3AMlEvL8cslYROMutMcBhKXap iYiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d2-20020a05640208c200b00458a650f3fasi13088655edz.409.2022.10.06.01.57.04; Thu, 06 Oct 2022 01:57:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231153AbiJFIdh (ORCPT + 99 others); Thu, 6 Oct 2022 04:33:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230397AbiJFId3 (ORCPT ); Thu, 6 Oct 2022 04:33:29 -0400 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E04F92F56; Thu, 6 Oct 2022 01:33:27 -0700 (PDT) Received: from fraeml740-chm.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Mjl5B1JSNz67Lm9; Thu, 6 Oct 2022 16:30:54 +0800 (CST) Received: from lhrpeml500003.china.huawei.com (7.191.162.67) by fraeml740-chm.china.huawei.com (10.206.15.221) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 6 Oct 2022 10:33:24 +0200 Received: from [10.126.169.169] (10.126.169.169) by lhrpeml500003.china.huawei.com (7.191.162.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 6 Oct 2022 09:33:23 +0100 Message-ID: Date: Thu, 6 Oct 2022 09:33:23 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 From: John Garry Subject: Re: [PATCH v5 0/7] libsas and drivers: NCQ error handling To: Damien Le Moal , Niklas Cassel CC: "jejb@linux.ibm.com" , "martin.petersen@oracle.com" , "jinpu.wang@cloud.ionos.com" , "linux-scsi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Linuxarm , yangxingui , yanaijie References: <1664262298-239952-1-git-send-email-john.garry@huawei.com> <27148ec5-d1ae-d9a2-1b00-a4c34d2da198@huawei.com> <5db6a7bc-dfeb-76e1-6899-7041daa934cf@opensource.wdc.com> <64ab35a7-f1ff-92ee-890e-89a5aee935a4@opensource.wdc.com> In-Reply-To: <64ab35a7-f1ff-92ee-890e-89a5aee935a4@opensource.wdc.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.126.169.169] X-ClientProxiedBy: lhrpeml500002.china.huawei.com (7.191.160.78) To lhrpeml500003.china.huawei.com (7.191.162.67) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/10/2022 23:42, Damien Le Moal wrote: >> Hello Damien, >> >> John explained that he got a timeout from EH when reading the log: >> [ 350.281581] ata1: failed to read log page 10h (errno=-5) >> [ 350.577181] ata1.00: exception Emask 0x1 SAct 0xffffffff SErr 0x0 action 0x6 frozen >> >> ata_eh_read_log_10h() uses ata_read_log_page(), which will first try to read >> the log using READ LOG DMA EXT. If that fails, it will retry using READ LOG EXT. >> >> Therefore, to see if this is a driver specific bug, I suggested to try to read >> the NCQ Command Error log using ATA16 passthrough commands: >> >> $ sudo sg_sat_read_gplog -d --log=0x10 /dev/sdc >> will read the log using READ LOG DMA EXT. >> >> $ sudo sg_sat_read_gplog --log=0x10 /dev/sdc >> will read the log using READ LOG EXT. Note that I can't get a distro to boot on this system from the HDD for the same timeout problem (so no tools easily available). >> >> Neither of these two suggested commands are NCQ commands. >> (Neither command is encapsulated in a RECEIVE FPDMA QUEUED, >> so I'm not sure what you mean.) >> >> >> Garry, I now see that: >> [ 350.577181] ata1.00: exception Emask 0x1 SAct 0xffffffff SErr 0x0 action 0x6 frozen >> Your port is frozen. >> >> ata_read_log_page() calls ata_exec_internal() which calls ata_exec_internal_sg(), >> which will simply return an error without sending down the command to the drive, >> if the port is frozen. >> >> Not sure why your port is frozen, mine is obviously not. I think that it gets frozen when the internal command for read log ext times out. More below about that timeout. >> >> ata_do_link_abort() calls ata_eh_set_pending() without activating fast drain: >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-eh.c?h=v6.0#n989 >> >> So I'm not sure why your port is frozen. >> (The fast drain timer does freeze the port, but it shouldn't be enabled.) >> It might be worthwhile to see who freezes the port in your case. > Might come from the command timeout. John has had many problems with the > pm80xx HBA in his Arm machine from a while back. Likely not a driver issue > but a hw one... No-one seems to be able to recreate the same problem. > > We need to try the HBA on our Arm board to see what happens. > Yeah, it just looks to be the longstanding issue of using this card on my arm64 machine - that is that I get IO timeouts quite regularly. I should have mentioned that yesterday. This just seems to be a driver issue. Interestingly this read log ext always seems to timeout, so maybe I could see if there is anything specific about this command which could give a clue to the underlying issue. But I have spent much time trying to debug this issue, so not too motivated any more if I’m completely honest ... Thanks, John