Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751416AbeABTIa (ORCPT + 1 other); Tue, 2 Jan 2018 14:08:30 -0500 Received: from mga01.intel.com ([192.55.52.88]:30383 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751315AbeABTI2 (ORCPT ); Tue, 2 Jan 2018 14:08:28 -0500 X-Amp-Result: UNKNOWN X-Amp-Original-Verdict: FILE UNKNOWN X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.45,498,1508828400"; d="scan'208";a="189010365" Date: Tue, 2 Jan 2018 12:12:18 -0700 From: Keith Busch To: Bjorn Helgaas Cc: Oza Pawandeep , Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Gabriele Paoloni , Wei Zhang , Sinan Kaya , Timur Tabi Subject: Re: [PATCH v2 0/4] Address error and recovery for AER and DPC Message-ID: <20180102191218.GD24386@localhost.localdomain> References: <1514532259-19383-1-git-send-email-poza@codeaurora.org> <20180102190215.GC6211@bhelgaas-glaptop.roam.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180102190215.GC6211@bhelgaas-glaptop.roam.corp.google.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Tue, Jan 02, 2018 at 01:02:15PM -0600, Bjorn Helgaas wrote: > On Fri, Dec 29, 2017 at 12:54:15PM +0530, Oza Pawandeep wrote: > > This patch set brings in support for DPC and AER to co-exist and not to > > race for recovery. > > > > The current implementation of AER and error message broadcasting to the > > EP driver is tightly coupled and limited to AER service driver. > > It is important to factor out broadcasting and other link handling > > callbacks. So that not only when AER gets triggered, but also when DPC get > > triggered, or both get triggered simultaneously (for e.g. ERR_FATAL), > > callbacks are handled appropriately. > > having modularized the code, the race between AER and DPC is handled > > gracefully. > > for e.g. when DPC is active and kicked in, AER should not attempt to do > > recovery, because DPC takes care of it. > > High-level question: > > We have some convoluted code in negotiate_os_control() and > aer_service_init() that (I think) essentially disables AER unless the > platform firmware grants us permission to use it. > > The last implementation note in PCIe r3.1, sec 6.2.10 says > > DPC may be controlled in some configurations by platform firmware > and in other configurations by the operating system. DPC > functionality is strongly linked with the functionality in Advanced > Error Reporting. To avoid conflicts over whether platform firmware > or the operating system have control of DPC, it is recommended that > platform firmware and operating systems always link the control of > DPC to the control of Advanced Error Reporting. > > I read that as suggesting that we should enable DPC support in Linux > if and only if we also enable AER. But I don't see anything in DPC > that looks like that. Should there be something there? Should DPC be > restructured so it's enabled and handled inside the AER driver instead > of being a separate driver? Yes, I agree the two should be linked. I submitted a patch for that here, though driver responsibilities are still separate in this series: https://marc.info/?l=linux-pci&m=151371742225111&w=2