Received: by 10.223.185.116 with SMTP id b49csp830383wrg; Sat, 10 Feb 2018 21:26:36 -0800 (PST) X-Google-Smtp-Source: AH8x2259DK9h7EKaO4C596QGZO3OqnyiraowvCE5JdJp9VW/tIhPZugAxpZ6WDeKKjIRiMbIjW85 X-Received: by 2002:a17:902:43e4:: with SMTP id j91-v6mr7124768pld.153.1518326795873; Sat, 10 Feb 2018 21:26:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518326795; cv=none; d=google.com; s=arc-20160816; b=zRU2Cc/0xT3CrBI+uGT3ilj7rUudHPh1ZPzRPRvU5M6Mx5BQhb2WMaQrhM16nP6xrP xnq4xxSyuPj4gz1N2o9IcwOkKVzsYlVvS1URw2jxD0whqrmbo1NkpNTCEsUEe94FbtIv Xw0Kl+MCMekhoUfMu8yxpwrjnK5wNnMHMbRnvTrwdIihEs0V64Vqhvwo3osaWHZGVczn 4m+c2bL8JKWsmEwv9FuJfCLJkafl8eURHfOu7nNnL8a9gz5a3zsuAVPl7Mc9FRZZib4Z LJRyOyWE8p6fWZQn1CzPbQpS0CP8ecB3L+s7buN3T8LSu/XA4EYK/h8tOVm1A7OJzMoo 8Cyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:subject:message-id:date:cc:to :from:mime-version:content-transfer-encoding:content-disposition :arc-authentication-results; bh=fECUlci9y6cE9sinywrDEk8bvHqTAW6Yct9dobEnEWU=; b=WCIGnEBXq9XhaKK2UKCsQXDUuq5ZqlV/9CxLw2vKE7r1r0NjnMno1UrrGTvyCVBW8/ p7VuX26HD4p4YT8Ra8lrcqalbgbQlDGcgExgFzFn74ocKFAJfGHUSGq2B2qJfLFGPM6j mLPU/4oSVI50rts4Hj9P1GMn0mFxl3+cnuCZP7aQFFQMLRfd+LUlhkFEH80XlijGyUtJ GnZ267TuXiwaQIltusCRNqxymAr/6QxSClwha1h/XVikGMBR5YE1CP9jMDB2mAM/cHBP NVp8o62Adfn0zE5RVsGautVSaDJsLl68NhA6RFljrJ6EXBC69C3E4Iw0KQK9Hr8oQh/y IZGQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 23si4325116pfm.48.2018.02.10.21.26.22; Sat, 10 Feb 2018 21:26:35 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754143AbeBKFYH (ORCPT + 99 others); Sun, 11 Feb 2018 00:24:07 -0500 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:41330 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752493AbeBKEdk (ORCPT ); Sat, 10 Feb 2018 23:33:40 -0500 Received: from [2a02:8011:400e:2:6f00:88c8:c921:d332] (helo=deadeye) by shadbolt.decadent.org.uk with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1ekjKc-0002hE-6x; Sun, 11 Feb 2018 04:33:38 +0000 Received: from ben by deadeye with local (Exim 4.90) (envelope-from ) id 1ekjKX-0004Rk-1n; Sun, 11 Feb 2018 04:33:33 +0000 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, "Bjorn Helgaas" , "Dongdong Liu" , "Gabriele Paoloni" Date: Sun, 11 Feb 2018 04:20:06 +0000 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) Subject: [PATCH 3.2 04/79] PCI/AER: Report non-fatal errors only to the affected endpoint In-Reply-To: X-SA-Exim-Connect-IP: 2a02:8011:400e:2:6f00:88c8:c921:d332 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 3.2.99-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: Gabriele Paoloni commit 86acc790717fb60fb51ea3095084e331d8711c74 upstream. Previously, if an non-fatal error was reported by an endpoint, we called report_error_detected() for the endpoint, every sibling on the bus, and their descendents. If any of them did not implement the .error_detected() method, do_recovery() failed, leaving all these devices unrecovered. For example, the system described in the bugzilla below has two devices: 0000:74:02.0 [19e5:a230] SAS controller, driver has .error_detected() 0000:74:03.0 [19e5:a235] SATA controller, driver lacks .error_detected() When a device such as 74:02.0 reported a non-fatal error, do_recovery() failed because 74:03.0 lacked an .error_detected() method. But per PCIe r3.1, sec 6.2.2.2.2, such an error does not compromise the Link and does not affect 74:03.0: Non-fatal errors are uncorrectable errors which cause a particular transaction to be unreliable but the Link is otherwise fully functional. Isolating Non-fatal from Fatal errors provides Requester/Receiver logic in a device or system management software the opportunity to recover from the error without resetting the components on the Link and disturbing other transactions in progress. Devices not associated with the transaction in error are not impacted by the error. Report non-fatal errors only to the endpoint that reported them. We really want to check for AER_NONFATAL here, but the current code structure doesn't allow that. Looking for pci_channel_io_normal is the best we can do now. Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055 Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver") Signed-off-by: Gabriele Paoloni Signed-off-by: Dongdong Liu [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas Signed-off-by: Ben Hutchings --- drivers/pci/pcie/aer/aerdrv_core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -367,7 +367,14 @@ static pci_ers_result_t broadcast_error_ * If the error is reported by an end point, we think this * error is related to the upstream link of the end point. */ - pci_walk_bus(dev->bus, cb, &result_data); + if (state == pci_channel_io_normal) + /* + * the error is non fatal so the bus is ok, just invoke + * the callback for the function that logged the error. + */ + cb(dev, &result_data); + else + pci_walk_bus(dev->bus, cb, &result_data); } return result_data.result;