2005-03-11 18:49:47

by Nguyen, Tom L

[permalink] [raw]
Subject: RE: [PATCH] PCI Express Advanced Error Reporting Driver


On Thursday, March 10, 2005 9:23 PM Greg KH wrote:
>> On Thu, Mar 10, 2005 at 03:04:18PM -0800, long wrote:
>> PCI Express error signaling can occur on the PCI Express link itself
>> or on behalf of transactions initiated on the link. PCI Express
>> defines the Advanced Error Reporting capability, which is implemented

>> with a PCI Express advanced error reporting extended capability
>> structure, to provide more robust error reporting. With the Advanced
>> Error Reporting capability a PCI Express component, which detects an
>> error, can send an error message to the Root Port associated with
>> its hierarchy.

>This patch was too big for lkml, and should also be sent to the
>linux-pci list. Care to split it up and resend it?

Will split it up and resend.

> Also, how does this tie into the recent discussion about pci error
> recovery?

The standard PCI Specification calls out SERR and PERR. I am not sure
about the recent discussion of PCI error of recovery. It is perhaps
regarding the possibility of recovering from a PERR or SERR. However,
PCI Express error occurs on the PCI Express link or on behalf of
transactions occurred on the PCI Express link. PCI Express component,
which implements PCI Express Advanced Error Reporting Capability, sends
error message to the Root Port to indicate error occurred on the PCI
Express link where it is connected. The PCI Express error recovery is on
behalf of attempting to do a PCI Express link recovery, not PCI error
recovery. It appears that PCI Express AER is disjoint from PCI error
recovery.

Thanks,
Long


2005-03-11 23:10:58

by Paul Mackerras

[permalink] [raw]
Subject: RE: [PATCH] PCI Express Advanced Error Reporting Driver

Nguyen, Tom L writes:

> The standard PCI Specification calls out SERR and PERR. I am not sure
> about the recent discussion of PCI error of recovery. It is perhaps
> regarding the possibility of recovering from a PERR or SERR. However,
> PCI Express error occurs on the PCI Express link or on behalf of
> transactions occurred on the PCI Express link. PCI Express component,
> which implements PCI Express Advanced Error Reporting Capability, sends
> error message to the Root Port to indicate error occurred on the PCI
> Express link where it is connected. The PCI Express error recovery is on
> behalf of attempting to do a PCI Express link recovery, not PCI error
> recovery. It appears that PCI Express AER is disjoint from PCI error
> recovery.

To give you some context, the recent discussion was about how we could
give a unified interface to drivers for both PCI-Express error
reporting and for the "Enhanced Error Handling" (EEH) facilities we
have on IBM PPC64 boxes. EEH includes not only the detection and
reporting of errors (for PCI, PCI-X and PCI-Express buses) but also
hardware support for isolating devices when an error is detected, plus
means for resetting individual bus segments or slots, to assist in
recovering a device which has got into a bad state.

Does PCI Express provide any facilities for recovering from errors,
beyond just "try that transaction again"?

Paul.