Received: by 2002:a25:ef43:0:0:0:0:0 with SMTP id w3csp753963ybm; Thu, 28 May 2020 14:21:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx3p157kSXglylqhswn+uFYhh//sdx3bU2q4tjW40yAbHK82sIFNU1dSEQDbYoTKRjZPxNj X-Received: by 2002:aa7:dc57:: with SMTP id g23mr5007580edu.352.1590700913198; Thu, 28 May 2020 14:21:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590700913; cv=none; d=google.com; s=arc-20160816; b=un7he5jY5pVpZOa7ha2u2Ys7Ar7VSwsa0+4jFGMgCJzOsNxeIQZtPCisTqUBOyrQBD CQUbOMxfSWdSTIA7RxfSRgh/+K6NMtc+sjjy/GB7XovnH/k2PCcQjzIcCCEq6ZxCsuf+ s81dGazGrZkwtVpZlCQGYEOAYduwX5N/nq8wZ0pybsDV9RYKw7PyrJTUKN4nOcMll0+3 KjQRlt17txIhusD6zAw2hC84I5/UjdK4FMqCkvJHbfK+ATrZRIPvdhjTKWV6vrSPIyNT H1/+JIDuRcghvRYLomwVPPmmrB5CUpshyRD1ROhLuXi6cG/gzibSFLawuDdhtbFrhFe4 XUFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:ironport-sdr:ironport-sdr; bh=1x5ipmAoMglu/0dg5Zf5cfEitjVQUK57SAtADegtP+U=; b=C6iAcju82bBK8X93YkK52V5LZoh7uUfNGvXidxjwqVdLuwI6rfgUGFD2fy0uylhHWT G5h+9ujtV3I65eHPbYCLhT4B7nbJB1Wn4tbIA7tvOWoEvebBqwE0+EYle8f0e//+NtnP F5Flc/IuoqEgMtQRRK0u8zVx6UQ6jc0Wma1p10Lv5ppQ9s7gYGneimREfz/axwYHoDQN gzUe9OgiB8GiIc9RLVdhGRbPYPba3oX3WR7x++gxvSDcMR9OBuvV1C+6+pI/1KwMKLpR S8FFpUM9/R33ueAjBIGg/YpFlxBLxWMlsjM3bs2sffyuOCU3ubOcpzkjnuUrpd8FTiXA 63ZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f22si3979586edx.79.2020.05.28.14.21.29; Thu, 28 May 2020 14:21:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436551AbgE1VSv (ORCPT + 99 others); Thu, 28 May 2020 17:18:51 -0400 Received: from mga11.intel.com ([192.55.52.93]:61069 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436543AbgE1VSn (ORCPT ); Thu, 28 May 2020 17:18:43 -0400 IronPort-SDR: BC49y7K5BgYLTB0PeyRL5FEzNMFM674uOqPUmJoo5GOQoHQnabDEY6cZE45lSblC38aaRGYCbG Q6KCJOWBQDCQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2020 14:18:42 -0700 IronPort-SDR: 4AScpdRfM/Qz6G0HI8poSM3B9H5Mqkq5vrBTOmecqUhhBSB1772DdFAAsj45uU0XmppsLIFVUS 1ihVdZLx2rfw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,446,1583222400"; d="scan'208";a="311045002" Received: from vvhadaga-mobl.amr.corp.intel.com (HELO [10.254.98.146]) ([10.254.98.146]) by FMSMGA003.fm.intel.com with ESMTP; 28 May 2020 14:18:42 -0700 Subject: Re: [PATCH] PCI: ERR: Don't override the status returned by error_detect() To: Zhiqiang Hou , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, ruscur@russell.cc, sbobroff@linux.ibm.com, oohall@gmail.com, bhelgaas@google.com References: <20200527083130.4137-1-Zhiqiang.Hou@nxp.com> From: "Kuppuswamy, Sathyanarayanan" Message-ID: <84a2bc7e-7556-96ff-6cd5-988d432ad8e3@linux.intel.com> Date: Thu, 28 May 2020 14:18:41 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200527083130.4137-1-Zhiqiang.Hou@nxp.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 5/27/20 1:31 AM, Zhiqiang Hou wrote: > From: Hou Zhiqiang > > The commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") > overrode the 'status' returned by the error_detect() call back function, > which is depended on by the next step. This overriding makes the Endpoint > driver's required info (kept in the var status) lost, so it results in the > fatal errors' recovery failed and then kernel panic. Can you explain why updating status affects the recovery ? > > In the e1000e case, the error logs: > pcieport 0002:00:00.0: AER: Uncorrected (Fatal) error received: 0002:01:00.0 > e1000e 0002:01:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID) > pcieport 0002:00:00.0: AER: Root Port link has been reset As per above commit log, it looks like link is reset correctly. > SError Interrupt on CPU0, code 0xbf000002 -- SError > CPU: 0 PID: 111 Comm: irq/76-aerdrv Not tainted 5.7.0-rc7-next-20200526 #8 > Hardware name: LS1046A RDB Board (DT) > pstate: 80000005 (Nzcv daif -PAN -UAO BTYPE=--) > pc : __pci_enable_msix_range+0x4c8/0x5b8 > lr : __pci_enable_msix_range+0x480/0x5b8 > sp : ffff80001116bb30 > x29: ffff80001116bb30 x28: 0000000000000003 > x27: 0000000000000003 x26: 0000000000000000 > x25: ffff00097243e0a8 x24: 0000000000000001 > x23: ffff00097243e2d8 x22: 0000000000000000 > x21: 0000000000000003 x20: ffff00095bd46080 > x19: ffff00097243e000 x18: ffffffffffffffff > x17: 0000000000000000 x16: 0000000000000000 > x15: ffffb958fa0e9948 x14: ffff00095bd46303 > x13: ffff00095bd46302 x12: 0000000000000038 > x11: 0000000000000040 x10: ffffb958fa101e68 > x9 : ffffb958fa101e60 x8 : 0000000000000908 > x7 : 0000000000000908 x6 : ffff800011600000 > x5 : ffff00095bd46800 x4 : ffff00096e7f6080 > x3 : 0000000000000000 x2 : 0000000000000000 > x1 : 0000000000000000 x0 : 0000000000000000 > Kernel panic - not syncing: Asynchronous SError Interrupt > CPU: 0 PID: 111 Comm: irq/76-aerdrv Not tainted 5.7.0-rc7-next-20200526 #8 > > I think it's the expected result that "if the initial value of error > status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER > then even after successful recovery (using reset_link()) pcie_do_recovery() > will report the recovery result as failure" which is described in > commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()"). > > Refer to the Documentation/PCI/pci-error-recovery.rst. > As the error_detect() is mandatory callback if the pci_err_handlers is > implemented, if it return the PCI_ERS_RESULT_DISCONNECT, it means the > driver doesn't want to recover at all; > For the case PCI_ERS_RESULT_NO_AER_DRIVER, if the pci_err_handlers is not > implemented, the failure is more expected. > > Fixes: commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") > Signed-off-by: Hou Zhiqiang > --- > drivers/pci/pcie/err.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index 14bb8f54723e..84f72342259c 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -165,8 +165,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_dbg(dev, "broadcast error_detected message\n"); > if (state == pci_channel_io_frozen) { > pci_walk_bus(bus, report_frozen_detected, &status); > - status = reset_link(dev); > - if (status != PCI_ERS_RESULT_RECOVERED) { > + if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED) { > pci_warn(dev, "link reset failed\n"); > goto failed; > } > -- Sathyanarayanan Kuppuswamy Linux Kernel Developer