2004-04-06 11:09:17

by Hidetoshi Seto

[permalink] [raw]
Subject: [RFC] readX_check() - Interface for PCI-X error recovery

Hi, all,


This is a request for comments.

We are considering about PCI-X error recovery.

What we deal as a problem here is that there is some cases that
a PCI error which is recoverable by a device driver is only passed to
a kernel but not to the driver, and that it results a system down
because the kernel cannot determine a relevant driver which should
recover the error.
For example, in the case of PCI-X on ia64, PCI write error is recoverable
because the error is recorded in a status register, and the proper driver
can read the register after its transaction. However, PCI read error
become a system down because the error cause a MCA(Machine Check Abort)
interruption to the kernel instantly, but the kernel has no idea.


We have post our ideas for this recovery before now.
This time, we accede to Linus's idea and reconsider our interfaces.

Seto: "[RFC] How drivers notice a HW error?" (readX_check() I/F)
http://marc.theaimsgroup.com/?l=linux-kernel&m=106992207709400&w=2

Ishii: "[RFC/PATCH, 1/4] readX_check() performance evaluation"
http://marc.theaimsgroup.com/?l=linux-kernel&m=107525559029806&w=2

Linus: "Re: [RFC/PATCH, 2/4] readX_check() performance evaluation"
http://marc.theaimsgroup.com/?l=linux-kernel&m=107525904702681&w=2


We are supposing the use of these interfaces will be in some fundamental
drivers such as for SCSI or NIC, and now we are working for implementation.

We would appreciate if you point us any problem of this or something you
noticed about this.


Thanks,

H.Seto

-----

Abstract is as following:


- Resources newly required:

on struct device:
error flag
list of recoverable physical address regions
pointer to host bridge of the device

on per_cpu:
list of currently checking devices


- Interfaces newly required:

clear_pcix_errors(dev)
Clear the error flag of the dev, and start to check the device.
This also clears the status register of its host bridge.

readX_check(dev,vaddr)
Read a register of the device mapped to vaddr, and check errors
if possible(This is depending on its architecture. In the case of
ia64, we can generate a MCA from an error by simple operation to
test the read data.)
If any error happen on the recoverable region, set the error flag.

read_pcix_errors(dev)
Return whether here was an error or not referring the error flag in
struct device and the status register of the host bridge, and stop
checking of the device.

register_pcix_region(dev,paddr,len)
unregister_pcix_region(dev)
Register/Unregister a physical address region that the driver can
recover. This function would be in the init/cleanup of the driver.


- Flow

xx_probe(dev, ...)
{
..
register_pcix_region(dev,paddr,len);
..
}

xx_intr()
{
spin_lock_irqsave(); /* disable interrupt & preemption */

clear_pcix_errors(dev); /* clear error flag */
{
..
x = readX_check(dev, vaddr);
..
}
error = read_pcix_errors(dev); /* read error flag */
if (error)
recovery_or_offline(dev);

spin_lock_irqrestore(); /* enable interrupt & preemption */
}

xx_remove()
{
..
unregister_pcix_region(dev) ;
..
}


- Note

on clear_pcix_errors():
If the status of the host bridge indicates an error when it let be
clear, it should be cleared after setting each flags of all devices
under the bridge. This is for other clear-read block on other cpu.

on ia64:
We can generate a MCA from a poisoned data synchronously in
readX_check(), so it is required a MCA handler to check the error
and deal error flags in struct device.

on ia32:
Since ia32's MCE(Machine Check Exception) is asynchronous, it's
hard to check errors in readX_check(). So we check it by reading
the status register of host bridge in read_pcix_errors().

on other arch:
if it is not possible on the arch, do like:
#ifndef CONFIG_PCI_RECOVERY
#define clear_pcix_errors(dev) do { } while (0)
#define read_pcix_errors(dev) (0)
#endif



2004-04-06 11:52:21

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC] readX_check() - Interface for PCI-X error recovery

On Tue, Apr 06, 2004 at 08:04:49PM +0900, Hidetoshi Seto wrote:
> - Resources newly required:
>
> on struct device:
> error flag
> list of recoverable physical address regions

Can't you just use the pci_dev->resource regions for this?

> pointer to host bridge of the device

Again, there's already ways of getting to this from the pci_dev.
Simply wander up through pdev->bus->self until you get to a self that
is NULL and you've found a root bus. Alternatively, you might just look
at PCI_CONTROLLER() on ia64.

> on per_cpu:
> list of currently checking devices
>
>
> - Interfaces newly required:
>
> clear_pcix_errors(dev)
> Clear the error flag of the dev, and start to check the device.
> This also clears the status register of its host bridge.

For consistency, how about naming these functions pci_clear_errors()
and pci_check_errors()? PCI-Express has similar error-checking abilities
and I'd hate to see two extremely similar capabilities at war with each
other over unacceoptable names ;-)

> readX_check(dev,vaddr)
> Read a register of the device mapped to vaddr, and check errors
> if possible(This is depending on its architecture. In the case of
> ia64, we can generate a MCA from an error by simple operation to
> test the read data.)
> If any error happen on the recoverable region, set the error flag.

I really don't think we want another readX variant. Do we then also
add readX_check_relaxed()? Can't we just pretend the MCA is asynchronous
on ia64? I'm sure we'd get better performance.

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain

2004-04-06 16:17:09

by Jesse Barnes

[permalink] [raw]
Subject: Re: [RFC] readX_check() - Interface for PCI-X error recovery

> > readX_check(dev,vaddr)
> > Read a register of the device mapped to vaddr, and check errors
> > if possible(This is depending on its architecture. In the case of
> > ia64, we can generate a MCA from an error by simple operation to
> > test the read data.)
> > If any error happen on the recoverable region, set the error flag.
>
> I really don't think we want another readX variant. Do we then also
> add readX_check_relaxed()? Can't we just pretend the MCA is asynchronous
> on ia64? I'm sure we'd get better performance.

Hmm.. I wonder if we could get away with not having a new readX interface by
registering each PCI resource either at driver init time or in arch code with
the MCA hander. Then we could just make the read routines use the variable
that was just read to try to flush out the MCA (there may be better ways to
do this). E.g.

arch_pci_scan()
{
...
for_each_pci_resource(dev, res) {
check_region(res);
}
...
}

...

unsigned char readb(unsigned long addr)
{
unsigned char val = *(volatile unsigned char *)addr;
#ifdef CONFIG_PCI_CHECK
/* try to flush out the MCA by doing something with val */
#endif
return val;
}

...

Then presumably the MCA error handler would see that an MCA occurred in a
region registered during PCI initialization and return an error for
pci_read_errors(dev);

Jesse

2004-04-07 02:09:19

by Grant Grundler

[permalink] [raw]
Subject: Re: [RFC] readX_check() - Interface for PCI-X error recovery

On Tue, Apr 06, 2004 at 08:04:49PM +0900, Hidetoshi Seto wrote:
...
> - Resources newly required:
>
> on struct device:
> error flag
> list of recoverable physical address regions

I think willy already pointed out the struct pci_dev->resources[] should point
to MMIO adress ranges the PCI(-X) device "owns".

> pointer to host bridge of the device

ditto. I think willy meant walk pci_dev->bus->parent until bus->self
is NULL normally should work.

> on per_cpu:
> list of currently checking devices

Why wouldn't this be per PCI host bus controller instead of per CPU?

I'm thinking SMP - any CPU could access any MMIO region.
How those MMIO addresses get routed depends on how PCI Host Bus
controllers are programmed. Ie Bridges have to route MMIO transactions
to children (PCI devices).

> - Interfaces newly required:
>
> clear_pcix_errors(dev)
> Clear the error flag of the dev, and start to check the device.
> This also clears the status register of its host bridge.
>
> readX_check(dev,vaddr)
> Read a register of the device mapped to vaddr, and check errors
> if possible(This is depending on its architecture. In the case of
> ia64, we can generate a MCA from an error by simple operation to
> test the read data.)
> If any error happen on the recoverable region, set the error flag.
>
> read_pcix_errors(dev)
> Return whether here was an error or not referring the error flag in
> struct device and the status register of the host bridge, and stop
> checking of the device.
>
> register_pcix_region(dev,paddr,len)
> unregister_pcix_region(dev)
> Register/Unregister a physical address region that the driver can
> recover. This function would be in the init/cleanup of the driver.

Could register/unregister functionality be part of the clear/read errors?

>
>
> - Flow
>
> xx_probe(dev, ...)
> {
> ..
> register_pcix_region(dev,paddr,len);
> ..
> }
>
> xx_intr()
> {
> spin_lock_irqsave(); /* disable interrupt & preemption */
>
> clear_pcix_errors(dev); /* clear error flag */
> {
> ..
> x = readX_check(dev, vaddr);
> ..
> }
> error = read_pcix_errors(dev); /* read error flag */
> if (error)
> recovery_or_offline(dev);
>
> spin_lock_irqrestore(); /* enable interrupt & preemption */

Holding the spinlock might not be realistic for most drivers
during a recovery - ie re-init the HW. I understand this is a
generic example but it's a bit too simple...


...
> on other arch:
> if it is not possible on the arch, do like:
> #ifndef CONFIG_PCI_RECOVERY
> #define clear_pcix_errors(dev) do { } while (0)
> #define read_pcix_errors(dev) (0)
> #endif

And need similar for the register/unregister/readX_check() calls.

thanks,
grant

2004-04-07 09:34:56

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [RFC] readX_check() - Interface for PCI-X error recovery

Matthew Wilcox <[email protected]> wrote:

> > list of recoverable physical address regions
>
> Can't you just use the pci_dev->resource regions for this?

If there is a list separated usual one, driver can choice which region
to check or not. However, in general, maybe almost all driver will regist
all region it has.
Agree, I'll use the pci_dev->resource. This is a realistic approach.

> > pointer to host bridge of the device
>
> Again, there's already ways of getting to this from the pci_dev.

I see, this wouldn't cost so much. I'll use.

> > clear_pcix_errors(dev)
>
> For consistency, how about naming these functions pci_clear_errors()
> and pci_check_errors()? PCI-Express has similar error-checking abilities
> and I'd hate to see two extremely similar capabilities at war with each
> other over unacceoptable names ;-)

Sounds good. I think these naming are more better for peace porpose :-)

> > readX_check(dev,vaddr)
>
> I really don't think we want another readX variant. Do we then also
> add readX_check_relaxed()? Can't we just pretend the MCA is asynchronous
> on ia64? I'm sure we'd get better performance.

On ia64, readX_check() will be used to make sure that there has no MCA before
read_pcix_errors(). I think we could use original plain readX() in place of
readX_check() if we ignore checking accuracy or if the driver takes care to
all loaded data. But I don't know how about this on other architecture.

I think inside of readX_check() would be mostly arch specified codes.


Thanks,

H.Seto