Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2030365ybt; Sun, 28 Jun 2020 05:59:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzUIEBveNgdiECI/UbFShYispRNBSVHLpRzS/hv3wHb7BKsUMTIC/xgirgHU9jLL3tWSM8r X-Received: by 2002:a50:afe1:: with SMTP id h88mr12584579edd.295.1593349199018; Sun, 28 Jun 2020 05:59:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593349199; cv=none; d=google.com; s=arc-20160816; b=xfwSZ4ph+XheQZbyF6YuFXkJDNpf5W+JWMARhUnfgwroCPfxwoN5TI9B2unwbEvlkN 8Y6OWYHxQyHTJzwTnqNmOVSNGg58RYXdwh/a021ZGBQhz9qBaFvA6unc9RYVHmL+Iejc SarG09sbzrRbcug7QP034awp1icnu+xOmitXppl71JztMtxzMJ+4+jN9hqgSshqy3zcS iysNs14HntWwYmLNf0GQrKxknfGNjqE4vE3g/QlB22I1ACIc/hgj/x2xP1X7NtFX7mWM yIerh0fO7h1uT3tzJwCxpUb6V0Egt6v5PphQxp6ehnWh5S52AdDqE29YJUuICWr7Dvvf FgYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject; bh=431CtACeWy847dnPPvsIKKw9XqNkAVaiIse5ZX0DGHM=; b=kdsr3OPAI7FhKeEK4Kx0buEt9zBSpmG7kaAc3XJn1DfZJsxfnOECwDldY/1RLyTRMG NGwhuh7RB6Y4f4/H1Pi5MwxDmAalmq1yUpdDwt5/vSqMZFndaPUe0xZhS+VVIX/vdHtl 1UHYH+HonygqJV+r4rTPcSRhvjH97ZIjfhblbMLVZmobsYJmM3vZJiAcyXN412rOYOBN Pbp9/y9Bv+j9wwDIXAE9JmtunC5at67rIbMpByHmFoHoBmLsW24SsDyb5ObyDc4Gexax X/fNrLtuYgUOTbuLpyXEkBrzAqRP3K9j7pc9Dy5QbejUnbOD8TCJlmAYX1ahP55M01Hp cfog== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j8si15797123ejs.237.2020.06.28.05.59.35; Sun, 28 Jun 2020 05:59:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726419AbgF1M7F (ORCPT + 99 others); Sun, 28 Jun 2020 08:59:05 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:59820 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726316AbgF1M7F (ORCPT ); Sun, 28 Jun 2020 08:59:05 -0400 Received: from DGGEMS403-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id 71F73490504BFFE6F6BC; Sun, 28 Jun 2020 20:59:03 +0800 (CST) Received: from [10.65.58.147] (10.65.58.147) by DGGEMS403-HUB.china.huawei.com (10.3.19.203) with Microsoft SMTP Server id 14.3.487.0; Sun, 28 Jun 2020 20:58:59 +0800 Subject: Re: [PATCH v2 1/2] PCI/ERR: Fix fatal error recovery for non-hotplug capable devices To: Jay Vosburgh , References: <25283.1591332444@famine> <3400.1593024778@famine> CC: , , , From: Yicong Yang Message-ID: <25d4ec0f-71a3-a3d8-a4e8-b44ea4326e82@hisilicon.com> Date: Sun, 28 Jun 2020 20:59:09 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <3400.1593024778@famine> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.65.58.147] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jay, I've tested the patches on my board, and they work well. Thanks, Yicong On 2020/6/25 2:52, Jay Vosburgh wrote: > Jay Vosburgh wrote: > >> sathyanarayanan.kuppuswamy@linux.intel.com wrote: >> >> From: Kuppuswamy Sathyanarayanan >>> Fatal (DPC) error recovery is currently broken for non-hotplug >>> capable devices. With current implementation, after successful >>> fatal error recovery, non-hotplug capable device state won't be >>> restored properly. You can find related issues in following links. >>> >>> https://lkml.org/lkml/2020/5/27/290 >>> https://lore.kernel.org/linux-pci/12115.1588207324@famine/ >>> https://lkml.org/lkml/2020/3/28/328 >>> >>> Current fatal error recovery implementation relies on hotplug handler >>> for detaching/re-enumerating the affected devices/drivers on DLLSC >>> state changes. So when dealing with non-hotplug capable devices, >>> recovery code does not restore the state of the affected devices >>> correctly. Correct implementation should call report_slot_reset() >>> function after resetting the link to restore the state of the >>> device/driver. >>> >>> So use PCI_ERS_RESULT_NEED_RESET as error status for successful >>> reset_link() operation and use PCI_ERS_RESULT_DISCONNECT for failure >>> case. PCI_ERS_RESULT_NEED_RESET error state will ensure slot_reset() >>> is called after reset link operation which will also fix the above >>> mentioned issue. >>> >>> [original patch is from jay.vosburgh@canonical.com] >>> [original patch link https://lore.kernel.org/linux-pci/12115.1588207324@famine/] >>> Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") >>> Signed-off-by: Jay Vosburgh >>> Signed-off-by: Kuppuswamy Sathyanarayanan >> I've tested this patch set on one of our test machines, and it >> resolves the issue. I plan to test with other systems tomorrow. > I've done testing on two different systems that exhibit the > original issue and this patch set appears to behave as expected. > > Has anyone else (Yicong?) had an opportunity to test this? > > Can this be considered for acceptance, or is additional feedback > or review needed? > > -J > >>> --- >>> drivers/pci/pcie/err.c | 24 ++++++++++++++++++++++-- >>> 1 file changed, 22 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >>> index 14bb8f54723e..5fe8561c7185 100644 >>> --- a/drivers/pci/pcie/err.c >>> +++ b/drivers/pci/pcie/err.c >>> @@ -165,8 +165,28 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>> pci_dbg(dev, "broadcast error_detected message\n"); >>> if (state == pci_channel_io_frozen) { >>> pci_walk_bus(bus, report_frozen_detected, &status); >>> - status = reset_link(dev); >>> - if (status != PCI_ERS_RESULT_RECOVERED) { >>> + /* >>> + * After resetting the link using reset_link() call, the >>> + * possible value of error status is either >>> + * PCI_ERS_RESULT_DISCONNECT (failure case) or >>> + * PCI_ERS_RESULT_NEED_RESET (success case). >>> + * So ignore the return value of report_error_detected() >>> + * call for fatal errors. Instead use >>> + * PCI_ERS_RESULT_NEED_RESET as initial status value. >>> + * >>> + * Ignoring the status return value of report_error_detected() >>> + * call will also help in case of EDR mode based error >>> + * recovery. In EDR mode AER and DPC Capabilities are owned by >>> + * firmware and hence report_error_detected() call will possibly >>> + * return PCI_ERS_RESULT_NO_AER_DRIVER. So if we don't ignore >>> + * the return value of report_error_detected() then >>> + * pcie_do_recovery() would report incorrect status after >>> + * successful recovery. Ignoring PCI_ERS_RESULT_NO_AER_DRIVER >>> + * in non EDR case should not have any functional impact. >>> + */ >>> + status = PCI_ERS_RESULT_NEED_RESET; >>> + if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) { >>> + status = PCI_ERS_RESULT_DISCONNECT; >>> pci_warn(dev, "link reset failed\n"); >>> goto failed; >>> } >>> -- >>> 2.17.1 > --- > -Jay Vosburgh, jay.vosburgh@canonical.com > . >