Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp898697pxb; Tue, 1 Feb 2022 12:44:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJz7PUgb4xPX0T780atgdyCAuvFyEXIVnhvNRgIins/ZEH0C9MD2imE0gnad4Gfzms/ca4Ud X-Received: by 2002:a17:90b:4acb:: with SMTP id mh11mr4347484pjb.72.1643748247665; Tue, 01 Feb 2022 12:44:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643748247; cv=none; d=google.com; s=arc-20160816; b=MDepSaJuQmNjh2AnqSX9qDR/nH/xSXOVsKUdjKHtNNYeDGciZyPtJwqLxLYoqUYDZ8 SXpLKYv16E6lHFVwNJ5WgQVvEPebtOphwjD3uTGEE0MTHg4ClFiwzcjVLkgAGAZM3Apa m4cTmj2OmVipgOE5t0fu1NfaXaxc8EPg5SL8pongeNBs+80MD00gawnS58QBemLwU0Ju sLIfnk4hgUTpCJCbMoqstAusG0KEuocpgsP7N2sZd69V+zj7A8w/y7/aNc++XACQAgOm Ze3o2BnMqlis3I2JaEGkScJ/wDnVzML1rpg5u60+ol0/MxqMgJKs3Penw1XUOIEU7ram O53w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:references:cc :to:subject:from; bh=sEfHFfJxiEAYiT/qIqN+L0YLwvg4gGPBdMqBKYRV8Os=; b=Sw16K0gzkoxf4ZTnbnJ15IYNf5iEyZlQBLsmios5h8STKUmhMo4/sr38pFdW4MsOc/ 3Aitil/UKIeV0JXnEonFPXOK1FCGojFpQwbTeF+Hi9v3ndKBqT8sS4xhAyccOhlUKd1B u1J4Qck5/b4FfpZfIX/d3vQGb41p7lkDoICpx9aGtAyX4xOltcnC3hXOb1+WGDNT9fyy lWUAQePDYFIIJXI/HtdQuVOwqc7K7hJFgxJNslnKXrEsJDhDgFLWGHm6dp8dE0udfwCh wEX5iDZkdqcSd37mQ5/wab/6WXDBQAA/hxLCp//UzzVUHPGC268LOXRlRlDYFqzemrpb f3MQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a38si18238493pfx.138.2022.02.01.12.43.55; Tue, 01 Feb 2022 12:44:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1379969AbiAaP7a (ORCPT + 99 others); Mon, 31 Jan 2022 10:59:30 -0500 Received: from frasgout.his.huawei.com ([185.176.79.56]:4569 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239569AbiAaP73 (ORCPT ); Mon, 31 Jan 2022 10:59:29 -0500 Received: from fraeml745-chm.china.huawei.com (unknown [172.18.147.206]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4JnXmc1p3Dz67k8Z; Mon, 31 Jan 2022 23:58:56 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml745-chm.china.huawei.com (10.206.15.226) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 31 Jan 2022 16:59:26 +0100 Received: from [10.47.91.239] (10.47.91.239) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Mon, 31 Jan 2022 15:59:26 +0000 From: John Garry Subject: Re: [PATCH 00/16] scsi: libsas and users: Factor out LLDD TMF code To: Damien Le Moal , , , , , , CC: , , , , , , References: <1643110372-85470-1-git-send-email-john.garry@huawei.com> <1893d9ef-042b-af3b-74ea-dd4d0210c493@opensource.wdc.com> <14df160f-c0f2-cc9f-56d4-8eda67969e0b@huawei.com> Message-ID: <49da4d80-5cc3-35c3-ccaa-6def8165eb65@huawei.com> Date: Mon, 31 Jan 2022 15:58:50 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.47.91.239] X-ClientProxiedBy: lhreml745-chm.china.huawei.com (10.201.108.195) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 28/01/2022 09:09, John Garry wrote: >> I ran some more tests. In particular, I ran libzbc compliance tests on a >> 20TB SMR drives. All tests pass with 5.17-rc1, but after applying your >> series, I see command timeout that take forever to recover from, with >> the drive revalidation failing after that. >> >> [  385.102073] sas: Enter sas_scsi_recover_host busy: 1 failed: 1 >> [  385.108026] sas: sas_scsi_find_task: aborting task 0x000000007068ed73 >> [  405.561099] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task >> timeout. >> [  405.568236] sas: sas_scsi_find_task: task 0x000000007068ed73 is >> aborted >> [  405.574930] sas: sas_eh_handle_sas_errors: task 0x000000007068ed73 is >> aborted >> [  411.192602] ata21.00: qc timeout (cmd 0xec) >> [  431.672122] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task >> timeout. >> [  431.679282] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4) >> [  431.685544] ata21.00: revalidation failed (errno=-5) >> [  441.911948] ata21.00: qc timeout (cmd 0xec) >> [  462.391545] pm80xx0:: pm8001_exec_internal_task_abort  757:TMF task >> timeout. >> [  462.398696] ata21.00: failed to IDENTIFY (I/O error, err_mask=0x4) >> [  462.404992] ata21.00: revalidation failed (errno=-5) >> [  492.598769] ata21.00: qc timeout (cmd 0xec) >> ... >> >> So there is a problem. Need to dig into this. I see this issue only with >> libzbc passthrough tests. fio runs with libaio are fine. > > Thanks for the notice. I think that I also saw a hang, but, IIRC, it > happened on mainline for me - but it's hard to know if I broke something > if it is already broke in another way. That is why I wanted this card > working properly... Hi Damien, From testing mainline, I can see a hang on my arm64 system for SAS disks. I think that the reason is the we don't finish some commands in EH properly for pm8001: - In EH, we attempt to abort the task in sas_scsi_find_task() -> lldd_abort_task() The default return from pm8001_exec_internal_tmf_task() is -TMF_RESP_FUNC_FAILED, so if the TMF does not execute properly we return this value - sas_scsi_find_task() cannot handle -TMF_RESP_FUNC_FAILED, and returns -TMF_RESP_FUNC_FAILED directly to sas_eh_handle_sas_errors(), which, again, does not handle -TMF_RESP_FUNC_FAILED. So we don't progress to ever finish the comand. This looks like the correct fix for mainline: --- a/drivers/scsi/pm8001/pm8001_sas.c +++ b/drivers/scsi/pm8001/pm8001_sas.c @@ -766,7 +766,7 @@ static int pm8001_exec_internal_tmf_task(struct domain_device *dev, pm8001_dev, DS_OPERATIONAL); wait_for_completion(&completion_setstate); } - res = -TMF_RESP_FUNC_FAILED; + res = TMF_RESP_FUNC_FAILED; That's effectively the same as what I have in this series in sas_execute_tmf(). However your testing is a SATA device, which I'll check further. Thanks, John