2007-01-18 19:46:23

by Allexio Ju

[permalink] [raw]
Subject: Questions on PCI express AER support in HBA driver

Hi,

I've got some questions on supporting PCI Express AER in Linux HBA drivers.
BTW, I'm developing SCSI HBA driver.

What are the expected changes on SCSI LLD driver in regards to PCIE
AER supporting? I understood that the driver need to call following
API during probing,
---
if (pci_find_aer_capability(dev)) {
pci_enable_pcie_error_reporting(dev);
}
---
What else does SCSI LLD driver need to changed?

Thanks in advance.

Allexio


2007-01-18 20:46:34

by Allexio Ju

[permalink] [raw]
Subject: Re: Questions on PCI express AER support in HBA driver

> What are the expected changes on SCSI LLD driver in regards to PCIE
> AER supporting? I understood that the driver need to call following
> APIs during probing to enable AER support for the device,
> ---
> if (pci_find_aer_capability(dev)) {
> pci_enable_pcie_error_reporting(dev);
> }
> ---
> What else does SCSI LLD driver need to changed?
Can anyone provide comment?

Allexio

2007-01-18 23:56:36

by linas

[permalink] [raw]
Subject: Re: Questions on PCI express AER support in HBA driver

On Thu, Jan 18, 2007 at 11:46:21AM -0800, Allexio Ju wrote:
> Hi,
>
> I've got some questions on supporting PCI Express AER in Linux HBA drivers.
> BTW, I'm developing SCSI HBA driver.
[...]

> What else does SCSI LLD driver need to changed?

There are several scsi controllers that handle pci error recovery.
For example, look at drivers/scsi/ipr.c, search for
struct pci_error_handlers. The callback routines there deal
with reseting the driver after an error has been found.

I've posted patches in the past for the symbios driver;
I suppose it is time to clean them up and resubmit them
again; they are not in the kernel yet.

I recently posted patches for the Emulex lpfc fibre channel
scsi card, google for the subject line
"[PATCH] lpfc: add PCI error recovery support"

to see what the patch looks like.

All of this work was done on powerpc systems. I have only
a vague idea of how this works on PC-class Intel platforms.

--linas

2007-01-19 02:30:49

by Yanmin Zhang

[permalink] [raw]
Subject: Re: Questions on PCI express AER support in HBA driver

On Thu, 2007-01-18 at 12:46 -0800, Allexio Ju wrote:
> > What are the expected changes on SCSI LLD driver in regards to PCIE
> > AER supporting? I understood that the driver need to call following
> > APIs during probing to enable AER support for the device,
> > ---
> > if (pci_find_aer_capability(dev)) {
> > pci_enable_pcie_error_reporting(dev);
> > }
This is just to enable the error reporting of the device. One of the important
parts of AER is error recovery.

When an AER error happens, device will send an error message to root port. Then,
root port will notify kernel by interrupt. Kernel will print out error message,
and do error recovery to recover the related devices. Such recovery need device
drivers to provide a couple of callbacks. Documentation/pci-error-recovery.txt
has the detailed callback definitions and the recovery steps.


> > ---
> > What else does SCSI LLD driver need to changed?
Usually, 3 callbacks are enough while error_detected is must.
int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
int (*slot_reset)(struct pci_dev *dev);
void (*resume)(struct pci_dev *dev);

You could refer to the patches of e1000 drivers written by Linas, or just read
the source codes of the 3 callbacks of e1000 drivers in the latest kernel.

Yanmin