Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753186AbeABNZR (ORCPT + 1 other); Tue, 2 Jan 2018 08:25:17 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:40540 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752256AbeABNZL (ORCPT ); Tue, 2 Jan 2018 08:25:11 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 46A8C602B9 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=okaya@codeaurora.org Subject: Re: [PATCH v2 2/4] PCI/DPC/AER: Address Concurrency between AER and DPC To: Keith Busch , Oza Pawandeep Cc: Bjorn Helgaas , Philippe Ombredanne , Thomas Gleixner , Greg Kroah-Hartman , Kate Stewart , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Dongdong Liu , Gabriele Paoloni , Wei Zhang , Timur Tabi References: <1514532259-19383-1-git-send-email-poza@codeaurora.org> <1514532259-19383-3-git-send-email-poza@codeaurora.org> <20171229172324.GF16407@localhost.localdomain> From: Sinan Kaya Message-ID: <5e9ffecf-2da7-0014-9a62-e2ae10323ce3@codeaurora.org> Date: Tue, 2 Jan 2018 08:25:08 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20171229172324.GF16407@localhost.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Keith, On 12/29/2017 12:23 PM, Keith Busch wrote: > On Fri, Dec 29, 2017 at 12:54:17PM +0530, Oza Pawandeep wrote: >> This patch addresses the race condition between AER and DPC for recovery. >> >> Current DPC driver does not do recovery, e.g. calling end-point's driver's >> callbacks, which sanitize the device. >> DPC driver implements link_reset callback, and calls pci_do_recovery. > > I'm not sure I see why any of this is necessary for two reasons: > > 1. A downstream port containment event disables the link. How can a driver > sanitize an end device when all the end devices below the containment are > physically inaccessible? Any attempt to access such devices will just > end with either CA or UR (depending on DPC control settings). Since we > already know the failed outcome from attempting to access such devices, > why do you want the drivers to do anything? The reset callback to the endpoint driver has a status field indicating whether the IO is frozen or not. If IO is not frozen, an endpoint driver can potentially recover from the error by reissuing the failed request. If IO is frozen, then the endpoint driver needs to clean up outstanding resources. It is not safe to just shutdown the driver while there are transactions in flight. This is the reason for the status field and a chance for driver to clean up any state machines and resources. Also note that the error callback has a result return value. An endpoint driver indicates whether it was successful on recovering or not. > > 2. A DPC event suppresses the error message required for the Linux > AER driver to run. How can AER and DPC run concurrently? > As we briefly discussed in previous email exchanges, I think you are looking at a use case with a switch that supports DPC functionality. Oza and I are looking at a root port functionality with DPC feature. As you already know, AER errors are logged to AER capability register independent of the DPC driver presence. A root port is also allowed to share the MSI interrupts across DPC and AER. Therefore, when a DPC interrupt fires; both AER driver and DPC driver starts recovery work. This is the issue we are trying to deal with. In the end, the driver needs to work for both root port and switches. I think you verified it against a switch. We are doing the same for a root port and submitting the plumbing code. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.