Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5347448yba; Mon, 13 May 2019 09:19:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqyZQAT/vrg8NRZ4ggSBIVE6y8wSAl+eT2e8GLL/nSJghuW9zWhJN0xbVYGKNuv08RNvtDqT X-Received: by 2002:a63:ce43:: with SMTP id r3mr14340344pgi.368.1557764360252; Mon, 13 May 2019 09:19:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557764360; cv=none; d=google.com; s=arc-20160816; b=bSkGp6iXUxHmIXlWgYhLZwHh1MeN/bQzbmhutmhZH3/jPZPZ5oLGe2VYvW5jrfBYW3 v+T57WbOWAkXHdNe8UDvnMLYKWE6dSRy3qHoBFT8b7MpXQsrsHlNuLE9H1qlaUcGqQpF BKEkvVo2KSrs3gO7GJr6cFGJHIWEj1sKNNgciasRz+b/IhJcUVkooqnUcyqVmAESIenc CHbUOES6UuhbcJW6ogQZB/VMrV21MbZaBKi8O9aPYc+Pm+6edzGHMcmPLGqqOrWZrUO6 XanbBtTJonbnwOJcOq9uK8Eihm1FzpX1ygxy8xnK3KfA1IaDw+lVTBO7G1Qkddzi9eyR UvLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=SD77a4ELmMoZj6y+JXyUbEpGi9DP3uvA13MOGxgQx8I=; b=dMABhpqGzVGVgLTkI8BEsR1MoS+wKNAsrhJC5kWhHohjBhAKf8UAJ2K16fOtdcpzKL NzY9jqK1a7wdzb76XBy74XyfJU+rIZ1RouJ0Ywd+wCJ5cPdLgZ6kN9iuafnVaFHk78TR yIW4tOO6mcFKbTe9Xw+j2vksctnWXJjQ7IeIT7DkHzqNxuy1Ip3Wt2ixf4YsiTTP4XQ+ kmg3ICITB2STrpFbMVMBiY42QS5Uu651ErCoOkf/Oua3NIugYSlA1D52PG42aOFXZ4B6 nYllpmP3BBknmSgKvz1qWdPL7ThtleqNmx70kMGT6vOI28TN7TjnQpggPZNZjvngS2Tw dsXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b=XRzYbBwz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b4si15704740plx.432.2019.05.13.09.19.03; Mon, 13 May 2019 09:19:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=casper.20170209 header.b=XRzYbBwz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731059AbfEMO4L (ORCPT + 99 others); Mon, 13 May 2019 10:56:11 -0400 Received: from casper.infradead.org ([85.118.1.10]:52174 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731049AbfEMO4J (ORCPT ); Mon, 13 May 2019 10:56:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:Content-Type: MIME-Version:References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=SD77a4ELmMoZj6y+JXyUbEpGi9DP3uvA13MOGxgQx8I=; b=XRzYbBwzs2uBiRQP4mu6R/OEY3 y3p+6SaTVAho+dROvIZxG4TDUD8L73isoQjCHIROWbg5NbBy1pI51+gOxPuIi/piNtKIhI95Q5N5V vVSLAG3CXI9CF3JiNun3ivf4cc3ZlE4iAdHlDe0Y+X9M3+um/ZDpZ4UljPtUYCK1Y4cCxqOdkwlzc aUZjgUEhaDwLhf2oMXqyrdPFUxYSPF1a80CHL/zVKEUW8b1jEzVcPs4hVSrn35lqVLKN+sid5/S8y L9DXg5WPBH3YkRvbMRlHAnqrMdsB4GwcEX5B0788fzwrQWDO0r9MFLtpU3cLIEPRYT9vlICTIY3KH XTEFNNVQ==; Received: from [179.179.44.200] (helo=coco.lan) by casper.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1hQCN3-0002c0-1B; Mon, 13 May 2019 14:56:05 +0000 Date: Mon, 13 May 2019 11:56:00 -0300 From: Mauro Carvalho Chehab To: Changbin Du Cc: bhelgaas@google.com, corbet@lwn.net, linux-pci@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 07/12] Documentation: PCI: convert pci-error-recovery.txt to reST Message-ID: <20190513115600.60e59e5e@coco.lan> In-Reply-To: <20190513142000.3524-8-changbin.du@gmail.com> References: <20190513142000.3524-1-changbin.du@gmail.com> <20190513142000.3524-8-changbin.du@gmail.com> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Em Mon, 13 May 2019 22:19:55 +0800 Changbin Du escreveu: > This converts the plain text documentation to reStructuredText format and > add it to Sphinx TOC tree. No essential content change. > > Signed-off-by: Changbin Du > Acked-by: Bjorn Helgaas > Cc: Mauro Carvalho Chehab Reviewed-by: Mauro Carvalho Chehab > --- > Documentation/PCI/index.rst | 1 + > ...or-recovery.txt => pci-error-recovery.rst} | 287 +++++++++--------- > MAINTAINERS | 4 +- > 3 files changed, 152 insertions(+), 140 deletions(-) > rename Documentation/PCI/{pci-error-recovery.txt => pci-error-recovery.rst} (67%) > > diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst > index 6f573f3df993..92e62d0fc9e6 100644 > --- a/Documentation/PCI/index.rst > +++ b/Documentation/PCI/index.rst > @@ -13,3 +13,4 @@ Linux PCI Bus Subsystem > pci-iov-howto > msi-howto > acpi-info > + pci-error-recovery > diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.rst > similarity index 67% > rename from Documentation/PCI/pci-error-recovery.txt > rename to Documentation/PCI/pci-error-recovery.rst > index 0b6bb3ef449e..83db42092935 100644 > --- a/Documentation/PCI/pci-error-recovery.txt > +++ b/Documentation/PCI/pci-error-recovery.rst > @@ -1,12 +1,13 @@ > +.. SPDX-License-Identifier: GPL-2.0 > > - PCI Error Recovery > - ------------------ > - February 2, 2006 > +================== > +PCI Error Recovery > +================== > > - Current document maintainer: > - Linas Vepstas > - updated by Richard Lary > - and Mike Mason on 27-Jul-2009 > + > +:Authors: - Linas Vepstas > + - Richard Lary > + - Mike Mason > > > Many PCI bus controllers are able to detect a variety of hardware > @@ -63,7 +64,8 @@ mechanisms for dealing with SCSI bus errors and SCSI bus resets. > > > Detailed Design > ---------------- > +=============== > + > Design and implementation details below, based on a chain of > public email discussions with Ben Herrenschmidt, circa 5 April 2005. > > @@ -73,30 +75,33 @@ pci_driver. A driver that fails to provide the structure is "non-aware", > and the actual recovery steps taken are platform dependent. The > arch/powerpc implementation will simulate a PCI hotplug remove/add. > > -This structure has the form: > -struct pci_error_handlers > -{ > - int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); > - int (*mmio_enabled)(struct pci_dev *dev); > - int (*slot_reset)(struct pci_dev *dev); > - void (*resume)(struct pci_dev *dev); > -}; > - > -The possible channel states are: > -enum pci_channel_state { > - pci_channel_io_normal, /* I/O channel is in normal state */ > - pci_channel_io_frozen, /* I/O to channel is blocked */ > - pci_channel_io_perm_failure, /* PCI card is dead */ > -}; > - > -Possible return values are: > -enum pci_ers_result { > - PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ > - PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ > - PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ > - PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ > - PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ > -}; > +This structure has the form:: > + > + struct pci_error_handlers > + { > + int (*error_detected)(struct pci_dev *dev, enum pci_channel_state); > + int (*mmio_enabled)(struct pci_dev *dev); > + int (*slot_reset)(struct pci_dev *dev); > + void (*resume)(struct pci_dev *dev); > + }; > + > +The possible channel states are:: > + > + enum pci_channel_state { > + pci_channel_io_normal, /* I/O channel is in normal state */ > + pci_channel_io_frozen, /* I/O to channel is blocked */ > + pci_channel_io_perm_failure, /* PCI card is dead */ > + }; > + > +Possible return values are:: > + > + enum pci_ers_result { > + PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */ > + PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */ > + PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */ > + PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */ > + PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */ > + }; > > A driver does not have to implement all of these callbacks; however, > if it implements any, it must implement error_detected(). If a callback > @@ -134,16 +139,17 @@ shouldn't do any new IOs. Called in task context. This is sort of a > > All drivers participating in this system must implement this call. > The driver must return one of the following result codes: > - - PCI_ERS_RESULT_CAN_RECOVER: > - Driver returns this if it thinks it might be able to recover > - the HW by just banging IOs or if it wants to be given > - a chance to extract some diagnostic information (see > - mmio_enable, below). > - - PCI_ERS_RESULT_NEED_RESET: > - Driver returns this if it can't recover without a > - slot reset. > - - PCI_ERS_RESULT_DISCONNECT: > - Driver returns this if it doesn't want to recover at all. > + > + - PCI_ERS_RESULT_CAN_RECOVER > + Driver returns this if it thinks it might be able to recover > + the HW by just banging IOs or if it wants to be given > + a chance to extract some diagnostic information (see > + mmio_enable, below). > + - PCI_ERS_RESULT_NEED_RESET > + Driver returns this if it can't recover without a > + slot reset. > + - PCI_ERS_RESULT_DISCONNECT > + Driver returns this if it doesn't want to recover at all. > > The next step taken will depend on the result codes returned by the > drivers. > @@ -159,25 +165,27 @@ then recovery proceeds to STEP 4 (Slot Reset). > If the platform is unable to recover the slot, the next step > is STEP 6 (Permanent Failure). > > ->>> The current powerpc implementation assumes that a device driver will > ->>> *not* schedule or semaphore in this routine; the current powerpc > ->>> implementation uses one kernel thread to notify all devices; > ->>> thus, if one device sleeps/schedules, all devices are affected. > ->>> Doing better requires complex multi-threaded logic in the error > ->>> recovery implementation (e.g. waiting for all notification threads > ->>> to "join" before proceeding with recovery.) This seems excessively > ->>> complex and not worth implementing. > - > ->>> The current powerpc implementation doesn't much care if the device > ->>> attempts I/O at this point, or not. I/O's will fail, returning > ->>> a value of 0xff on read, and writes will be dropped. If more than > ->>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH > ->>> assumes that the device driver has gone into an infinite loop > ->>> and prints an error to syslog. A reboot is then required to > ->>> get the device working again. > +.. note:: > + > + The current powerpc implementation assumes that a device driver will > + *not* schedule or semaphore in this routine; the current powerpc > + implementation uses one kernel thread to notify all devices; > + thus, if one device sleeps/schedules, all devices are affected. > + Doing better requires complex multi-threaded logic in the error > + recovery implementation (e.g. waiting for all notification threads > + to "join" before proceeding with recovery.) This seems excessively > + complex and not worth implementing. > + > + The current powerpc implementation doesn't much care if the device > + attempts I/O at this point, or not. I/O's will fail, returning > + a value of 0xff on read, and writes will be dropped. If more than > + EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH > + assumes that the device driver has gone into an infinite loop > + and prints an error to syslog. A reboot is then required to > + get the device working again. > > STEP 2: MMIO Enabled > -------------------- > +-------------------- > The platform re-enables MMIO to the device (but typically not the > DMA), and then calls the mmio_enabled() callback on all affected > device drivers. > @@ -192,34 +200,36 @@ link reset was performed by the HW. If the platform can't just re-enable IOs > without a slot reset or a link reset, it will not call this callback, and > instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset) > > ->>> The following is proposed; no platform implements this yet: > ->>> Proposal: All I/O's should be done _synchronously_ from within > ->>> this callback, errors triggered by them will be returned via > ->>> the normal pci_check_whatever() API, no new error_detected() > ->>> callback will be issued due to an error happening here. However, > ->>> such an error might cause IOs to be re-blocked for the whole > ->>> segment, and thus invalidate the recovery that other devices > ->>> on the same segment might have done, forcing the whole segment > ->>> into one of the next states, that is, link reset or slot reset. > +.. note:: > + > + The following is proposed; no platform implements this yet: > + Proposal: All I/O's should be done _synchronously_ from within > + this callback, errors triggered by them will be returned via > + the normal pci_check_whatever() API, no new error_detected() > + callback will be issued due to an error happening here. However, > + such an error might cause IOs to be re-blocked for the whole > + segment, and thus invalidate the recovery that other devices > + on the same segment might have done, forcing the whole segment > + into one of the next states, that is, link reset or slot reset. > > The driver should return one of the following result codes: > - - PCI_ERS_RESULT_RECOVERED > - Driver returns this if it thinks the device is fully > - functional and thinks it is ready to start > - normal driver operations again. There is no > - guarantee that the driver will actually be > - allowed to proceed, as another driver on the > - same segment might have failed and thus triggered a > - slot reset on platforms that support it. > - > - - PCI_ERS_RESULT_NEED_RESET > - Driver returns this if it thinks the device is not > - recoverable in its current state and it needs a slot > - reset to proceed. > - > - - PCI_ERS_RESULT_DISCONNECT > - Same as above. Total failure, no recovery even after > - reset driver dead. (To be defined more precisely) > + - PCI_ERS_RESULT_RECOVERED > + Driver returns this if it thinks the device is fully > + functional and thinks it is ready to start > + normal driver operations again. There is no > + guarantee that the driver will actually be > + allowed to proceed, as another driver on the > + same segment might have failed and thus triggered a > + slot reset on platforms that support it. > + > + - PCI_ERS_RESULT_NEED_RESET > + Driver returns this if it thinks the device is not > + recoverable in its current state and it needs a slot > + reset to proceed. > + > + - PCI_ERS_RESULT_DISCONNECT > + Same as above. Total failure, no recovery even after > + reset driver dead. (To be defined more precisely) > > The next step taken depends on the results returned by the drivers. > If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform > @@ -293,31 +303,33 @@ device will be considered "dead" in this case. > Drivers for multi-function cards will need to coordinate among > themselves as to which driver instance will perform any "one-shot" > or global device initialization. For example, the Symbios sym53cxx2 > -driver performs device init only from PCI function 0: > +driver performs device init only from PCI function 0:: > > -+ if (PCI_FUNC(pdev->devfn) == 0) > -+ sym_reset_scsi_bus(np, 0); > + + if (PCI_FUNC(pdev->devfn) == 0) > + + sym_reset_scsi_bus(np, 0); > > - Result codes: > - - PCI_ERS_RESULT_DISCONNECT > - Same as above. > +Result codes: > + - PCI_ERS_RESULT_DISCONNECT > + Same as above. > > Drivers for PCI Express cards that require a fundamental reset must > set the needs_freset bit in the pci_dev structure in their probe function. > For example, the QLogic qla2xxx driver sets the needs_freset bit for certain > -PCI card types: > +PCI card types:: > > -+ /* Set EEH reset type to fundamental if required by hba */ > -+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) > -+ pdev->needs_freset = 1; > -+ > + + /* Set EEH reset type to fundamental if required by hba */ > + + if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha)) > + + pdev->needs_freset = 1; > + + > > Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent > Failure). > > ->>> The current powerpc implementation does not try a power-cycle > ->>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT. > ->>> However, it probably should. > +.. note:: > + > + The current powerpc implementation does not try a power-cycle > + reset if the driver returned PCI_ERS_RESULT_DISCONNECT. > + However, it probably should. > > > STEP 5: Resume Operations > @@ -370,44 +382,43 @@ The current policy is to turn this into a platform policy. > That is, the recovery API only requires that: > > - There is no guarantee that interrupt delivery can proceed from any > -device on the segment starting from the error detection and until the > -slot_reset callback is called, at which point interrupts are expected > -to be fully operational. > + device on the segment starting from the error detection and until the > + slot_reset callback is called, at which point interrupts are expected > + to be fully operational. > > - There is no guarantee that interrupt delivery is stopped, that is, > -a driver that gets an interrupt after detecting an error, or that detects > -an error within the interrupt handler such that it prevents proper > -ack'ing of the interrupt (and thus removal of the source) should just > -return IRQ_NOTHANDLED. It's up to the platform to deal with that > -condition, typically by masking the IRQ source during the duration of > -the error handling. It is expected that the platform "knows" which > -interrupts are routed to error-management capable slots and can deal > -with temporarily disabling that IRQ number during error processing (this > -isn't terribly complex). That means some IRQ latency for other devices > -sharing the interrupt, but there is simply no other way. High end > -platforms aren't supposed to share interrupts between many devices > -anyway :) > - > ->>> Implementation details for the powerpc platform are discussed in > ->>> the file Documentation/powerpc/eeh-pci-error-recovery.txt > - > ->>> As of this writing, there is a growing list of device drivers with > ->>> patches implementing error recovery. Not all of these patches are in > ->>> mainline yet. These may be used as "examples": > ->>> > ->>> drivers/scsi/ipr > ->>> drivers/scsi/sym53c8xx_2 > ->>> drivers/scsi/qla2xxx > ->>> drivers/scsi/lpfc > ->>> drivers/next/bnx2.c > ->>> drivers/next/e100.c > ->>> drivers/net/e1000 > ->>> drivers/net/e1000e > ->>> drivers/net/ixgb > ->>> drivers/net/ixgbe > ->>> drivers/net/cxgb3 > ->>> drivers/net/s2io.c > ->>> drivers/net/qlge > - > -The End > -------- > + a driver that gets an interrupt after detecting an error, or that detects > + an error within the interrupt handler such that it prevents proper > + ack'ing of the interrupt (and thus removal of the source) should just > + return IRQ_NOTHANDLED. It's up to the platform to deal with that > + condition, typically by masking the IRQ source during the duration of > + the error handling. It is expected that the platform "knows" which > + interrupts are routed to error-management capable slots and can deal > + with temporarily disabling that IRQ number during error processing (this > + isn't terribly complex). That means some IRQ latency for other devices > + sharing the interrupt, but there is simply no other way. High end > + platforms aren't supposed to share interrupts between many devices > + anyway :) > + > +.. note:: > + > + Implementation details for the powerpc platform are discussed in > + the file Documentation/powerpc/eeh-pci-error-recovery.txt > + > + As of this writing, there is a growing list of device drivers with > + patches implementing error recovery. Not all of these patches are in > + mainline yet. These may be used as "examples": > + > + - drivers/scsi/ipr > + - drivers/scsi/sym53c8xx_2 > + - drivers/scsi/qla2xxx > + - drivers/scsi/lpfc > + - drivers/next/bnx2.c > + - drivers/next/e100.c > + - drivers/net/e1000 > + - drivers/net/e1000e > + - drivers/net/ixgb > + - drivers/net/ixgbe > + - drivers/net/cxgb3 > + - drivers/net/s2io.c > + - drivers/net/qlge > diff --git a/MAINTAINERS b/MAINTAINERS > index fb9f9d71f7a2..6e5ec5d3987e 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -12101,7 +12101,7 @@ M: Sam Bobroff > M: Oliver O'Halloran > L: linuxppc-dev@lists.ozlabs.org > S: Supported > -F: Documentation/PCI/pci-error-recovery.txt > +F: Documentation/PCI/pci-error-recovery.rst > F: drivers/pci/pcie/aer.c > F: drivers/pci/pcie/dpc.c > F: drivers/pci/pcie/err.c > @@ -12114,7 +12114,7 @@ PCI ERROR RECOVERY > M: Linas Vepstas > L: linux-pci@vger.kernel.org > S: Supported > -F: Documentation/PCI/pci-error-recovery.txt > +F: Documentation/PCI/pci-error-recovery.rst > > PCI MSI DRIVER FOR ALTERA MSI IP > M: Ley Foon Tan Thanks, Mauro