LinuxLists.cc - [PATCH/RFC] I/O-check interface for driver's error handling

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> The MCA handler has to go and figure out what the hell just happened
> (was it a DIMM error, PCI bus error, etc). OK, fine, it finds that it
> was an error on PCI bus 73. At this point, I think the architecture
> error handler needs to call into the PCI subsystem and say "Hey, there
> was an error, you deal with it".
>
> If we're lucky, we get all the information that allows us to figure
> out which device it was (eg a destination address that matches a BAR),
> then we could have a ->error method in the pci_driver that handles it.
> If there's no ->error method, at leat call ->remove so one device only
> takes itself down.
>
> Does this make sense?

This was my thought too last time we had this discussion. A completely
asynchronous call is probably needed in addition to Hidetoshi's proposed API,
since as you point out, the driver may not be running when an error occurs
(e.g. in the case of a DMA error or more general bus problem). The async
->error callback could do a total reset of the card, or something along those
lines as Jeff suggests, while the inline ioerr_clear/ioerr_check API could
potentially deal with errors as they happen (probably in the case of PIO
related errors), when the additional context may allow us to be smarter about
recovery.

Jesse

2005-03-01 17:19:54

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Hidetoshi Seto <[email protected]> writes:

>
> int sample_read_with_iochk(struct pci_dev *dev, u32 *buf, int words)
> {
> unsigned long ofs = pci_resource_start(dev, 0) + DATA_OFFSET;
> int i;
>
> /* Create magical cookie on the stack */
> iocookie cookie;
>
> /* Critical section start */
> iochk_clear(&dev, &cookie);
> {
> /* Get the whole packet of data */
> for (i = 0; i < words; i++)
> *buf++ = ioread32(dev, ofs);
> }
> /* Critical section end. Did we have any trouble? */
> if ( iochk_read(&cookie) ) return -1;

Looks good for handling PCI-Express errors.

But what would the default handling be? It would be nice if there
was a simple way for a driver to say "just shut me down on an error"
without adding iochk_* to each function. Ideally this would be just
a standard callback that knows how to clean up the driver.

> +void iochk_clear(iocookie *cookie, struct pci_dev *dev)
> +{
> + local_irq_save(*cookie);
> +}
> +
> +int iochk_read(iocookie *cookie)
> +{
> + local_irq_restore(*cookie);
> + return 0;
> +}

These should be inlined.

> +EXPORT_SYMBOL(iochk_init);

This doesn't need to be exported.

-Andi

2005-03-01 18:08:10

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, 1 Mar 2005, Andi Kleen wrote:
>
> But what would the default handling be? It would be nice if there
> was a simple way for a driver to say "just shut me down on an error"
> without adding iochk_* to each function. Ideally this would be just
> a standard callback that knows how to clean up the driver.

There can't be any.

The thing is, IO errors just will be very architecture-dependent. Some
might have exceptions happening, without the exception handler really
having much of an idea of who caused it, unless that driver had prepared
it some way, and gotten the proper locks.

A non-converted driver just doesn't _do_ any of that. It doesn't guarantee
that it's the only one accessing that bus, since it doesn't do the
"iocheck_clear()/iocheck_read()" things that imply all the locking etc.

So the default handling for iochecks pretty much _has_ to be "report them
to the user", and then letting the user decide what to do if the hardware
is going bad.

Shutting down the hardware by default might be a horribly bad thing to do
even _if_ you could pinpoint the driver that caused the problem in the
first place (and that's a big if, and probably depends on the details of
what the actual hw architecture support ends up being). So don't even try.
The sysadmin may have different preferences than some driver default.

In fact, I'd argue that even a driver that _uses_ the interface should not
necessarily shut itself down on error. Obviously, it should always log the
error, but outside of that it might be good if the operator can decide and
set a flag whether it should try to re-try (which may not always be
possible, of course), shut down, or just continue.

Linus

2005-03-01 18:33:46

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, Mar 01, 2005 at 09:10:29AM -0800, Jesse Barnes was heard to remark:
> On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> > The MCA handler has to go and figure out what the hell just happened
> > (was it a DIMM error, PCI bus error, etc).

I assume "MCA" stands for machine check architecture .. except that
is not how it currently works, at least not on the systems I work with.

The PCI bridge chips on many IBM ppc64 pSereies boxes can detect
PCI errors and handle them "cleanly", without causing machine checks;
so can some PCI-Express chips (Seto is the expert on PCI-Express,
I'm not).

On ppc64, after a PCI error, the pci slot is "isolated": all i/o
to and from the device is cut off (including dma). I/O reads return
all 0xff's (which is what an empty pci/pcmcia slot returns).

There are three low-level firmware api's:
-- ask if a slot is "isolated" (returns yes/no)
-- reset the pci card (assert the #RST pci signal)
-- un-isolate the pci slot

The current ppc64 code doesn't use Seto's API, but it could, that
is the direction I'm moving towards.

I don't know if PCI-Express "isolates" the slot; Seto, can you provide
an overview of what the PCI-Express spec says?

> > If we're lucky, we get all the information that allows us to figure
> > out which device it was (eg a destination address that matches a BAR),

Seto's API allows drivers to find out is thier PCI slot is isolated.
So it works for me.

> > then we could have a ->error method in the pci_driver that handles it.
> > If there's no ->error method, at leat call ->remove so one device only
> > takes itself down.

The tricky part is what to do with multi-function cards/slots
(e.g. a pci bus error on a bridge that has multiple devices under it).
In this case, multiple device drivers are affected. Thus, no single
device driver can handle the recovery of the bridge chip.

The current proposal (and prototype) has a "master recovery thread"
to handle the coordinated reset of the pci controller. This master
recovery thyread makes three calls in struct pci_driver:

void (*frozen) (struct pci_dev *); /* called when dev is first frozen */
void (*thawed) (struct pci_dev *); /* called after card is reset */
void (*perm_failure) (struct pci_dev *); /* called if card is dead */

The master recovery thread runs in the kernel. Earlier suggestions said
"run it in user space, use pci hotplug, use udev, etc." However, if
you get a pci error on a scsi card, you can't shell script
"umount /dev/sdX; rmmod scsi; clear_pci_error; insmod scsi; mount /dev/sdX"
beacuse you can't umount an open filesystem, and you can't really close
it (I fiddled with prototyping some of this, but its ugly and painful
and bizarre and outside my area of expertise :)

FWIW, the current prototype tries to do a pci hotplug if the above
routines aren't implemented in struct pci_driver. It can recover
from pci errors on ethernet cards, and I have one scsi driver that
successfully recovers with above API, and am working on adding recovery
to the symbios driver.

--linas

2005-03-01 18:45:29

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, Mar 01, 2005 at 10:08:48AM -0800, Linus Torvalds wrote:
> The thing is, IO errors just will be very architecture-dependent. Some
> might have exceptions happening, without the exception handler really
> having much of an idea of who caused it, unless that driver had prepared
> it some way, and gotten the proper locks.

A lot of architectures will move towards PCI Express over the next years,
and it has nice standardized error handling. It won't work
on a lot of older chipsets, but having such a feature only
on new hardware is ok.

> A non-converted driver just doesn't _do_ any of that. It doesn't guarantee
> that it's the only one accessing that bus, since it doesn't do the
> "iocheck_clear()/iocheck_read()" things that imply all the locking etc.

It just reads 0xffffffff all the time and moves on. That will
not be nice, but work for a short time until a higher level
error handler can take over.

>
> So the default handling for iochecks pretty much _has_ to be "report them
> to the user", and then letting the user decide what to do if the hardware
> is going bad.

Not with PCI Express. There is a standard way now to figure out which
device went bad and you can get interrupts for it.

>
> Shutting down the hardware by default might be a horribly bad thing to do
> even _if_ you could pinpoint the driver that caused the problem in the
> first place (and that's a big if, and probably depends on the details of
> what the actual hw architecture support ends up being). So don't even try.
> The sysadmin may have different preferences than some driver default.

There are already architectures that do it (e.g. IBM ppc64 or HP zx*).
It doesn't work too badly for them.

>
> In fact, I'd argue that even a driver that _uses_ the interface should not
> necessarily shut itself down on error. Obviously, it should always log the
> error, but outside of that it might be good if the operator can decide and
> set a flag whether it should try to re-try (which may not always be
> possible, of course), shut down, or just continue.

Ok, maybe an /sbin/hotplug like interface may make sense for it.
However shutdown as default is not too bad.

-Andi

2005-03-01 18:59:46

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, Mar 01, 2005 at 10:08:48AM -0800, Linus Torvalds was heard to remark:
>
> On Tue, 1 Mar 2005, Andi Kleen wrote:
> >
> > But what would the default handling be? It would be nice if there
> > was a simple way for a driver to say "just shut me down on an error"
> > without adding iochk_* to each function. Ideally this would be just
> > a standard callback that knows how to clean up the driver.
>
> There can't be any.
>
> The thing is, IO errors just will be very architecture-dependent. Some

:)
FWIW, I've got a working prototype that is ppc64-architecture specific
and I've been yelled at for not proposing an architecture-generic API
and so my current goal is to hash out enough commonality with Seto to
come up with a generic API that's acceptable for the mainline kernel.

(FYI, the 'prototype' currently ships with Novell/SUSE SLES9, but
I haven't been sucessful in getting the patches in upstream.)

> might have exceptions happening, without the exception handler really
> having much of an idea of who caused it, unless that driver had prepared
> it some way, and gotten the proper locks.

Most hotplug-capable drivers are "most of the way there", since they
can deal with the sudden loss of i/o to the pci card, and know how
to clean themselves up.

> A non-converted driver just doesn't _do_ any of that. It doesn't guarantee
> that it's the only one accessing that bus, since it doesn't do the
> "iocheck_clear()/iocheck_read()" things that imply all the locking etc.

Yes; for example, the pci error might affect multiple device drivers.
The proposal is that there should be a "master cleanup thread" to
deal with this (see my other email).

> Shutting down the hardware by default might be a horribly bad thing to do

The current ppc64 prototype code does a pci-hotplug-remove/hotplug-add
if we've detected a pci error, and the affected device driver doesn't
know what to do. This works for ethernet cards, but can't work for
anything with a file system on it (because a pci-hotplug-remove
on a scsi card trickles up to the block device, which trickles up to
the file system, which can't be unmounted post-facto.)

> In fact, I'd argue that even a driver that _uses_ the interface should not
> necessarily shut itself down on error. Obviously, it should always log the
> error, but outside of that it might be good if the operator can decide and
> set a flag whether it should try to re-try (which may not always be
> possible, of course), shut down, or just continue.

On ppc64, "just continue" is not an option; the pci slot is "isolated"
all i/o is blocked, including dma.

The current prototype code tells the device driver that the pci slot is
hung, then it resets the slot, then it tells the device driver that the
pci slot is good-to-go again.

My goal is to negotiate a standard set of interfaces in struct pci_driver
to do the above. (see other email).

--linas

2005-03-01 19:18:07

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, Mar 01, 2005 at 11:37:24AM -0500, Jeff Garzik was heard to remark:
>
> A new API handles none of this.

Seto is propsing an API that solves a different problem than what
you are thinking about.

In my case, the hardware (pci controller) will shut down a pci
slot(s) in the case of a pci error (parity or otherwise).
There's nothing that the software can do except to reset
the pci controller (and the cards underneath it).

Seto's API solves 1/2 the problem for me: it allows errors
to be detected. The other 1/2 (to be discussed) is how
to coordinate all the affected device drivers while the
pci controller is being reset.

--linas

2005-03-01 19:28:26

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, Mar 01, 2005 at 02:42:11PM +0000, Matthew Wilcox was heard to remark:
> On Tue, Mar 01, 2005 at 05:33:48PM +0900, Hidetoshi Seto wrote:
> > Today's patch is 3rd one - iochk_clear/read() interface.
> > - This also adds pair-interface, but not to sandwich only readX().
> > Depends on platform, starting with ioreadX(), inX(), writeX()
> > if possible... and so on could be target of error checking.
>
> I'd prefer to see it as ioerr_clear(), ioerr_read() ...

I'd prefer pci_io_start() and pci_io_check_err()

The names should have "pci" in them.

I don't like "ioerr_clear" because it implies we are clearing the
io error; we are not; we are clearing the checker for io errors.

> > - Additionally adds special token - abstract "iocookie" structure
> > to control/identifies/manage I/Os, by passing it to OS.
> > Actual type of "iocookie" could be arch-specific. Device drivers
> > could use the iocookie structure without knowing its detail.
>
> Fine.

Do we really need a cookie?

> > If arch doesn't(or cannot) have its io-checking strategy, these
> > interfaces could be used as a replacement of local_irq_save/restore
> > pair. Therefore, driver maintainer can write their driver code with
> > these interfaces for all arch, even where checking is not implemented.
>
> But many drivers don't need to save/restore interrupts around IO accesses.
> I think defaulting these to disable and restore interrupts is a very bad idea.
> They should probably be no-ops in the generic case.

Yes, they should be no-ops. save/resotre interrupts would be a bad idea.

--linas

2005-03-01 19:36:43

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, 1 Mar 2005, Linas Vepstas wrote:
>
> > > - Additionally adds special token - abstract "iocookie" structure
> > > to control/identifies/manage I/Os, by passing it to OS.
> > > Actual type of "iocookie" could be arch-specific. Device drivers
> > > could use the iocookie structure without knowing its detail.
> >
> > Fine.
>
> Do we really need a cookie?

I think you do.

That pair might have to disable interrupts (if there are any issues about
concurrent accesses through a shared error bus). In that case, the cooke
might be the old "flags" value.

> > But many drivers don't need to save/restore interrupts around IO accesses.
> > I think defaulting these to disable and restore interrupts is a very bad idea.
> > They should probably be no-ops in the generic case.
>
> Yes, they should be no-ops. save/resotre interrupts would be a bad idea.

But they may be part of that the architecture wants to do (imagine a
spinlock protecting a sub-segment of a bus - you need to disable
interrupts to avoid deadlocks).

Linus

2005-03-01 22:18:34

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

> I have been thinking about PCI system and parity errors, and how to
> handle them. I do not think this is the correct approach.
>
> A simple retry is... too simple. If you are having a massive problem on
> your PCI bus, more action should be taken than a retry.

It goes beyond that, see below.

> In my opinion each driver needs to be aware of PCI sys/parity errs, and
> handle them. For network drivers, this is rather simple -- check the
> hardware, then restart the DMA engine. Possibly turning off
> TSO/checksum to guarantee that bad packets are not accepted. For SATA
> and SCSI drivers, this is more complex, as one must retry a number of
> queued disk commands, after resetting the hardware.
>
> A new API handles none of this.

On IBM pSeries machine (and I'm trying to figure out an API to deal with
that generically for drivers), upon a PCI error (either MMIO error or
DMA error), the slot is put in isolation automatically.

>From this point, we can instruct the firmware to 1) re-enable MMIO, 2)
re-enable DMA, 3) proceed to a slot reset and re-enable MMIO & DMA.

That allows all sort of recovery strategies. However, obviously, not all
architectures provide those facilities.

So I'm looking into a way to expose a generic API to drivers that would
allow them to use those facilities when present, and/or fallback to
whatever they can do when not (or just retry or even no recovery).

I have some ideas, but am not fully happy with them yet. But part of the
problem is the notification of the driver.

Checking IOs is one thing, what to do once a failure is detected is
another. Also, we need asynchronous notification, since a driver may
well be idle, not doing any IO, while the bus segment on which it's
sitting is getting isolated because another card on the same segment (or
another function on the same card) triggered an error.

Then, we need at least several back-and-forth callbacks. I'm thinking
about an additional callback in pci_driver() with a message and a state
indicating what happened, and returning wether to proceed or not, I'll
try to write down the details in a later email.

Another issue finally is the type of error informations. Various systems
may provide various details, like some systems, upon a DMA error, can
provide you with the actual address that faulted. Those infos can be
very useful for diagnosing the issue (since some errors are actual bugs,
for example, we spent a lot of time chasing issues with e1000 vs.
barriers). An "error cookie" is I think a good idea, with eventually
various accessors to extract data from it, and maybe a function to dump
the content in ascii form in some buffer...

Ben.

2005-03-01 22:23:14

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, 2005-03-01 at 08:49 -0800, Linus Torvalds wrote:
>
> On Tue, 1 Mar 2005, Jeff Garzik wrote:
> >
> > A new API handles none of this.
>
> Ehh?
>
> The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> I think that's because nobody knows how to handle it (ie it's probably
> hw-dependent and all existign implementations would thus be
> driver-specific anyway).

We do on pSeries. Additionally, another device on the same physical
segment can trigger an error and cause a slot isolation on us even when
we do no IOs, so we also need asynchronous notification.

> And in the sense of "any general new api handles none of it", your
> argument doesn't make sense. The _old_ IO API's clearly don't handle it.
> So if you seem to say that "A new API" can't handle it either, then that
> translates to "no API can ever handle it". Fair enough, if you think it's
> impossible, but clearly you can handle some things.
>
> And yes, CLEARLY drivers will have to do all the heavy lifting.

I think Seto has a good start tho. But it's not enough. See my other
mail to Jeff for my other thought on the issue, I'm still trying to get
some API I'm happy myself with however, since it's not a simple issue.

> I don't expect most drivers to care. In fact, I expect about five to ten
> drivers to be converted to really care, and then for some forseeable time
> you'll have to be very picky about your hardware if you care about PCI
> parity errors etc. Most people don't, and most drivers won't be written in
> environments where they can be reasonably tested.

On pSeries, we'll default, for drivers that don't care, to triggering a
PCI unplug, slot reset, PCI re-plug. This is perfect for ethernet
drivers for example. But it's a real pain for block devices since they
won't be able to recover (and we can't even force unmount the dangling
filesystem afaik).

For drivers like IPR SCSI, we want to expose a richer API so they can
make use of the features provided by the platform, like resetting the
slot.

But the above also need to be able to differenciate a driver that cares
from a driver that doesn't. I think an additional callback in pci_driver
for notifying of async events (error happened, slot has been reset, IOs
are re-enabled, whatever we define ...) is a good idea, since the
presence of a callback is a good enough indication to the core of wether
the driver has it's own recovery strategy or not.

> That's just a fact. Anybody who expects all drivers to suddenly start
> doing IO checks is just living in a dream world.
>
> Linus
--
Benjamin Herrenschmidt <[email protected]>

2005-03-01 22:26:45

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, 2005-03-01 at 09:10 -0800, Jesse Barnes wrote:
> On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> > The MCA handler has to go and figure out what the hell just happened
> > (was it a DIMM error, PCI bus error, etc). OK, fine, it finds that it
> > was an error on PCI bus 73. At this point, I think the architecture
> > error handler needs to call into the PCI subsystem and say "Hey, there
> > was an error, you deal with it".
> >
> > If we're lucky, we get all the information that allows us to figure
> > out which device it was (eg a destination address that matches a BAR),
> > then we could have a ->error method in the pci_driver that handles it.
> > If there's no ->error method, at leat call ->remove so one device only
> > takes itself down.
> >
> > Does this make sense?
>
> This was my thought too last time we had this discussion. A completely
> asynchronous call is probably needed in addition to Hidetoshi's proposed API,
> since as you point out, the driver may not be running when an error occurs
> (e.g. in the case of a DMA error or more general bus problem). The async
> ->error callback could do a total reset of the card, or something along those
> lines as Jeff suggests, while the inline ioerr_clear/ioerr_check API could
> potentially deal with errors as they happen (probably in the case of PIO
> related errors), when the additional context may allow us to be smarter about
> recovery.

What I think we need is an async call that takes:

- an opaque blob with the error informations and accessors (see my
reply to Jeff)

- a slot state (slot isolated, slot has been reset, slot has IOs /DMA
enabled/disabled, must probably be a bitmask)

- a bit mask of possible actions the driver can request (nothing, reset
slot, re-enable IOs, ...)

I'm afraid tho that the combinatory explosion will make it difficult to
drivers to deal with the right thing. Maybe we can simplify the mecanism
to archs that can just 1) re-enable IOs, 2) reset slot.

Note that the reason to re-enable IOs before trying to reset the slot is
that some devices will allow you to extract diagnostic informations, and
it's always useful to gather as much informations as possible upon an
error of this type.

Ben.

2005-03-01 22:28:16

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, 2005-03-01 at 18:19 +0100, Andi Kleen wrote:
> Hidetoshi Seto <[email protected]> writes:
>
> >
> > int sample_read_with_iochk(struct pci_dev *dev, u32 *buf, int words)
> > {
> > unsigned long ofs = pci_resource_start(dev, 0) + DATA_OFFSET;
> > int i;
> >
> > /* Create magical cookie on the stack */
> > iocookie cookie;
> >
> > /* Critical section start */
> > iochk_clear(&dev, &cookie);
> > {
> > /* Get the whole packet of data */
> > for (i = 0; i < words; i++)
> > *buf++ = ioread32(dev, ofs);
> > }
> > /* Critical section end. Did we have any trouble? */
> > if ( iochk_read(&cookie) ) return -1;
>
> Looks good for handling PCI-Express errors.
>
> But what would the default handling be? It would be nice if there
> was a simple way for a driver to say "just shut me down on an error"
> without adding iochk_* to each function. Ideally this would be just
> a standard callback that knows how to clean up the driver.

I think that would be the lack of a callback, see other messages.

> > +void iochk_clear(iocookie *cookie, struct pci_dev *dev)
> > +{
> > + local_irq_save(*cookie);
> > +}
> > +
> > +int iochk_read(iocookie *cookie)
> > +{
> > + local_irq_restore(*cookie);
> > + return 0;
> > +}
>
> These should be inlined.
>
> > +EXPORT_SYMBOL(iochk_init);
>
> This doesn't need to be exported.
>
> -Andi
--
Benjamin Herrenschmidt <[email protected]>

2005-03-01 22:30:05

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, 2005-03-01 at 12:33 -0600, Linas Vepstas wrote:

> The current proposal (and prototype) has a "master recovery thread"
> to handle the coordinated reset of the pci controller. This master
> recovery thyread makes three calls in struct pci_driver:
>
> void (*frozen) (struct pci_dev *); /* called when dev is first frozen */
> void (*thawed) (struct pci_dev *); /* called after card is reset */
> void (*perm_failure) (struct pci_dev *); /* called if card is dead */

See my other emails. I think only one callback is enough, and I think we
need more parameters.

> The master recovery thread runs in the kernel. Earlier suggestions said
> "run it in user space, use pci hotplug, use udev, etc." However, if
> you get a pci error on a scsi card, you can't shell script
> "umount /dev/sdX; rmmod scsi; clear_pci_error; insmod scsi; mount /dev/sdX"
> beacuse you can't umount an open filesystem, and you can't really close
> it (I fiddled with prototyping some of this, but its ugly and painful
> and bizarre and outside my area of expertise :)
>
> FWIW, the current prototype tries to do a pci hotplug if the above
> routines aren't implemented in struct pci_driver. It can recover
> from pci errors on ethernet cards, and I have one scsi driver that
> successfully recovers with above API, and am working on adding recovery
> to the symbios driver.
>
> --linas
--
Benjamin Herrenschmidt <[email protected]>

2005-03-01 22:33:17

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

> In fact, I'd argue that even a driver that _uses_ the interface should not
> necessarily shut itself down on error. Obviously, it should always log the
> error, but outside of that it might be good if the operator can decide and
> set a flag whether it should try to re-try (which may not always be
> possible, of course), shut down, or just continue.

In lots of case, you don't have an operator smart enough to make this
decision nowadays, and even if you had, for things like your SCSI
adapter, you just can't expect userland to be operational. The error
recovery policy should be buildable in the driver. If it's not, however,
then I agree that userland intervention is probably a good idea.

Note that on pSeries, we have no choice. On error, the slot is isolated.
So we can't let the driver continue anyway.

Ben.

2005-03-02 02:26:11

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Matthew Wilcox wrote:
> I think what Jeff meant was "this new API handles none of this".
> And that's true, it doesn't handle DMA errors. But I think that's just
> something that hasn't been written/designed yet.

Yes, this API just supports drivers wanting to be more RAS-aware.
It would be happy if how implement it could be separate in two part:
- arch-specific part
Capability would depend on arch, can only generic thing but couldn't
be device specific. Device/bus isolation could be(with help of hotplug
and so on), but re-enable them would not be easily.
- generic part
Capability would depend on drivers, should be more device specific.
How divide and connect them is now in discussion and consideration.

> So how should we handle it? Obviously the driver may not be executing
> when a PCI parity error occurs, so we probably get to find out about
> this through some architecture-specific whole-system error, let's call
> it an MCA.
>
> The MCA handler has to go and figure out what the hell just happened
> (was it a DIMM error, PCI bus error, etc). OK, fine, it finds that it
> was an error on PCI bus 73. At this point, I think the architecture
> error handler needs to call into the PCI subsystem and say "Hey, there
> was an error, you deal with it".
>
> If we're lucky, we get all the information that allows us to figure
> out which device it was (eg a destination address that matches a BAR),
> then we could have a ->error method in the pci_driver that handles it.
> If there's no ->error method, at leat call ->remove so one device only
> takes itself down.
>
> Does this make sense?

Note that here is a difficulty: the MCA handler on some arch would run on
special context - MCA environment. In other words, since some MCA handler
would be called by non-maskable interrupt(e.g. NMI), so it's difficult to
call some driver's callback using protected kernel locks from MCA context.

Therefore what MCA handler could do is just indicates a error was there,
by something like status flag which drivers can refer. And after possible
deley, we would be able to call callbacks.

Thanks,
H.Seto

2005-03-02 03:12:46

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Jesse Barnes wrote:
> This was my thought too last time we had this discussion. A completely
> asynchronous call is probably needed in addition to Hidetoshi's proposed API,
> since as you point out, the driver may not be running when an error occurs
> (e.g. in the case of a DMA error or more general bus problem). The async
> ->error callback could do a total reset of the card, or something along those
> lines as Jeff suggests, while the inline ioerr_clear/ioerr_check API could
> potentially deal with errors as they happen (probably in the case of PIO
> related errors), when the additional context may allow us to be smarter about
> recovery.

Depend on the bridge implementation, special error handling of PCI-X would be
available in the case of a DMA error.

PCI-X Command register has Uncorrectable Data Error Recovery Enable bit to
avoid asserting SERR on error. Some bridge generates poisoned data and pass
it to destination instead of asserting error or passing broken data.

The device driver would be interrupted on the completion of DMA, and check
status register of controlling device to find a error during the DMA.
If there was a error, driver could attempt to recover from the error.

I don't know whether this is actually possible or not, and also there are
upcoming drivers implementing such special handling.

Though, when and how we should call drivers to do device specific staff is
one of the problem. My API would provide "a chance" which could be defined by
driver, at least.

Thanks,
H.Seto

2005-03-02 06:11:10

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Linas Vepstas wrote:
>> I'd prefer to see it as ioerr_clear(), ioerr_read() ...
>
> I'd prefer pci_io_start() and pci_io_check_err()
>
> The names should have "pci" in them.
>
> I don't like "ioerr_clear" because it implies we are clearing the io error; we are not; we are clearing the checker
for io errors.

My intention was "clear/read checker(called iochk) to check my I/O."
(bitmask would be better for error flag, but bits are not defined yet.)
So I agree that ioerr_clear/read() would be one of good alternatives.
But still I'd prefer iochk_*, because it doesn't clear error but checker.
iochecker_* would be bit long.

And then, I don't think it need to have "pci" ... limitation of this
API's target. It would not be match if there are a recoverable device
over some PCI to XXX bridge, or if there are some special arch where
don't have PCI but other recoverable bus system, or if future bus system
doesn't called pci...
Currently we would deal only pci, but in future possibly not.

> Do we really need a cookie?

Some do, some not.
For example, if arch has only a counter of error exception, saving value
of the counter to the cookie would be make sense.

> Yes, they should be no-ops. save/restore interrupts would be a bad idea.

I expect that we should not do any operation requires enabled interrupt
between iochk_clear and iochk_read. If their defaults are no-ops, device
maintainers who develops their driver on not-implemented arch should be
more careful. Or are there any bad thing other than waste of steps?

Thanks,
H.Seto

2005-03-02 17:49:07

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wed, Mar 02, 2005 at 11:28:01AM +0900, Hidetoshi Seto was heard to remark:
>
> Note that here is a difficulty: the MCA handler on some arch would run on
> special context - MCA environment. In other words, since some MCA handler
> would be called by non-maskable interrupt(e.g. NMI), so it's difficult to
> call some driver's callback using protected kernel locks from MCA context.
>
> Therefore what MCA handler could do is just indicates a error was there,
> by something like status flag which drivers can refer. And after possible
> deley, we would be able to call callbacks.

FWIW, many device drivers do I/O in an interrupt context (e.g. from
timer interrupts), which is why error recovery needs to run in a
separate thread from error detection.

On ppc64, here's what I currently do:
-- check for pci error (call firmware & ask it)
-- if (error) put event on a workqueue
-- workqueue handler sends out a notifier_call_chain
to anyone who cares.

Thus, the error recovery thread runs in its own thread.
Right now, the above code is arch-specific, but I don't see any
reason we couldn't make the event, the workqueue and the
notifier_call chain arch-generic.

Below is some "pseudocode" version (mentally substitute
"pci error event" for every occurance of "eeh"). Its got some
ppc64-specific crud in there that we have to fix to make it
truly generic (I just cut and pasted from current code).

Would a cleaned up version of this code be suitable for a
arch-generic pci error recovery framework? Seto, would
this be useful to you?

--linas

===================================================
Header file:

/**
* EEH Notifier event flags.
* Freeze -- pci slot is frozen, no i/o is possible
*/
#define EEH_NOTIFY_FREEZE 1

/** EEH event -- structure holding pci slot data that describes
* a change in the isolation status of a PCI slot. A pointer
* to this struct is passed as the data pointer in a notify callback.
*/
struct eeh_event {
struct list_head list;
struct pci_dev *dev;
int reset_state;
int time_unavail;
};

/** Register to find out about EEH events. */
int eeh_register_notifier(struct notifier_block *nb);
int eeh_unregister_notifier(struct notifier_block *nb);

===================================================
C file:

/* EEH event workqueue setup. */
static spinlock_t eeh_eventlist_lock = SPIN_LOCK_UNLOCKED;
LIST_HEAD(eeh_eventlist);
static void eeh_event_handler(void *);
DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL);

static struct notifier_block *eeh_notifier_chain;

/**
* eeh_register_notifier - Register to find out about EEH events.
* @nb: notifier block to callback on events
*/
int eeh_register_notifier(struct notifier_block *nb)
{
return notifier_chain_register(&eeh_notifier_chain, nb);
}

/**
* eeh_unregister_notifier - Unregister to an EEH event notifier.
* @nb: notifier block to callback on events
*/
int eeh_unregister_notifier(struct notifier_block *nb)
{
return notifier_chain_unregister(&eeh_notifier_chain, nb);
}

/**
* queue up a pci error event to be dispatched to all listeners
* of the pci error notifier call chain. This routine is safe to call
* within an interrupt context. The actual event delivery
* will be from a workque thread.
*/

void eeh_queue_failure(struct pci_dev *dev)
{
struct eeh_event *event;

event = kmalloc(sizeof(*event), GFP_ATOMIC);
if (event == NULL) {
printk (KERN_ERR "EEH: out of memory, event not handled\n");
return 1;
}

event->dev = dev;
event->reset_state = rets[0];
event->time_unavail = rets[2];

/* We may be called in an interrupt context */
spin_lock_irqsave(&eeh_eventlist_lock, flags);
list_add(&event->list, &eeh_eventlist);
spin_unlock_irqrestore(&eeh_eventlist_lock, flags);

/* Most EEH events are due to device driver bugs. Having
* a stack trace will help the device-driver authors figure
* out what happened. So print that out. */
if (rets[0] != 5) dump_stack();
schedule_work(&eeh_event_wq);
}

/**
* eeh_event_handler - dispatch EEH events. The detection of a frozen
* slot can occur inside an interrupt, where it can be hard to do
* anything about it. The goal of this routine is to pull these
* detection events out of the context of the interrupt handler, and
* re-dispatch them for processing at a later time in a normal context.
*
* @dummy - unused
*/
static void eeh_event_handler(void *dummy)
{
unsigned long flags;
struct eeh_event *event;

while (1) {
spin_lock_irqsave(&eeh_eventlist_lock, flags);
event = NULL;
if (!list_empty(&eeh_eventlist)) {
event = list_entry(eeh_eventlist.next, struct eeh_event, list);
list_del(&event->list);
}
spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
if (event == NULL)
break;

if (event->reset_state != 5) {
printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device "
"%s %s\n", event->reset_state,
pci_name(event->dev), pci_pretty_name(event->dev));
}

__get_cpu_var(slot_resets)++;
notifier_call_chain (&eeh_notifier_chain,
EEH_NOTIFY_FREEZE, event);

pci_dev_put(event->dev);
kfree(event);
}
}

2005-03-02 18:09:57

by linux-os (Dick Johnson)

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wed, 2 Mar 2005, Linas Vepstas wrote:

> On Wed, Mar 02, 2005 at 11:28:01AM +0900, Hidetoshi Seto was heard to remark:
>>
>> Note that here is a difficulty: the MCA handler on some arch would run on
>> special context - MCA environment. In other words, since some MCA handler
[SNIPPED...]

>
> /**
> * queue up a pci error event to be dispatched to all listeners
> * of the pci error notifier call chain. This routine is safe to call
> * within an interrupt context. The actual event delivery
> * will be from a workque thread.
> */
>
> void eeh_queue_failure(struct pci_dev *dev)
> {
> struct eeh_event *event;
>
> event = kmalloc(sizeof(*event), GFP_ATOMIC);
> if (event == NULL) {
> printk (KERN_ERR "EEH: out of memory, event not handled\n");
> return 1;
> }
>
> event->dev = dev;
> event->reset_state = rets[0];
> event->time_unavail = rets[2];
>
> /* We may be called in an interrupt context */
> spin_lock_irqsave(&eeh_eventlist_lock, flags);
^^^^^^^^^^^^^^^^^^
> list_add(&event->list, &eeh_eventlist);
> spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
^^^^^^^^^^^^^^^^^^^^^

I don't think this is SMP safe from interrupt-context.
You need the lock when you are building the event-list,
not just when you queue it.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

2005-03-02 18:24:15

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to remark:
>
> The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> I think that's because nobody knows how to handle it (ie it's probably
> hw-dependent and all existign implementations would thus be
> driver-specific anyway).

?
We could add a call

int pci_was_there_an_error_during_dma (struct pci_dev);

right? And it could return true/false, right? I can certainly
do that today with ppc64. I just can't tell you which dma triggered
the problem.

> And yes, CLEARLY drivers will have to do all the heavy lifting.

well .. maybe. On ppc64, we have one hack-ish solution for
hotplug-capable but pci-error-unaware device drivers, and that
is to hot unplug the driver, clear the pci error condition, and
and replug the driver. Works great for ethernet; haven't tested
USB.

I'm getting greif from the guys over here because my hack-ish code
is hackish, and isn't arch-generic, and Paul Mackerras doesn't like it,
which is why Benh is threatening to re-write it, and etc. ...
which is why Seto is involved, and we're having this conversation ...

--linas

2005-03-02 18:41:52

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wednesday, March 2, 2005 10:22 am, Linas Vepstas wrote:
> On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to
remark:
> > The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> > I think that's because nobody knows how to handle it (ie it's probably
> > hw-dependent and all existign implementations would thus be
> > driver-specific anyway).
>
> ?
> We could add a call
>
> int pci_was_there_an_error_during_dma (struct pci_dev);
>
> right? And it could return true/false, right? I can certainly
> do that today with ppc64. I just can't tell you which dma triggered
> the problem.

One issue with that is how to notify drivers that they need to make this call.
In may cases, DMA completion will be signalled by an interrupt, but if the
DMA failed, that interrupt may never happen, which means the call to
pci_unmap or the above function from the interrupt handler may never occur.

Some I/O errors are by nature asynchronous and unpredictable, so I think we
need a separate thread and callback interface to deal with errors where the
beginning of the transaction and the end of the transaction occur in
different contexts, in addition to the PIO transaction interface already
proposed.

Jesse

2005-03-02 19:25:04

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wed, Mar 02, 2005 at 03:13:05PM +0900, Hidetoshi Seto was heard to remark:

[ .. iochk_clear() and iochk_read() ...]

> And then, I don't think it need to have "pci" ... limitation of this
> API's target. It would not be match if there are a recoverable device
> over some PCI to XXX bridge, or if there are some special arch where
> don't have PCI but other recoverable bus system, or if future bus system
> doesn't called pci...
> Currently we would deal only pci, but in future possibly not.

OK, in that case, I like the names you picked.

> > Yes, they should be no-ops. save/restore interrupts would be a bad idea.
>
> I expect that we should not do any operation requires enabled interrupt
> between iochk_clear and iochk_read.

why? Maybe some specific pci chipset might need this, but in general,
I don't see why this should be assumed.

> If their defaults are no-ops, device
> maintainers who develops their driver on not-implemented arch should be
> more careful.

Why? People who write device drivers already know if/when they need to
disable interrupts, and so they already disable if they need it.

If a specific arch is using a specific pci chipset that needs interrupts
disabled in order to get io error info, then the arch-specific
implementation can disable interrupts in iock_clear() ... But I can't
imagine why one would want to make this the default behaviour for
all arches that *don't* support io error checking ...

(on ppc64, iochk_clear() would be a no-op, and iochk_read() would
check a flag and maybe call firmware, but would not otherwise have to
fiddle with interrupts).

--linas

p.s. I would like to have iochk_read() take struct pci_dev * as an
argument. (I could store a pointer to pci_dev in the "cookie" but
that seems odd).

2005-03-02 19:48:25

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wed, Mar 02, 2005 at 10:41:06AM -0800, Jesse Barnes was heard to remark:
> On Wednesday, March 2, 2005 10:22 am, Linas Vepstas wrote:
> > On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to
> remark:
> > > The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> > > I think that's because nobody knows how to handle it (ie it's probably
> > > hw-dependent and all existign implementations would thus be
> > > driver-specific anyway).
> >
> > ?
> > We could add a call
> >
> > int pci_was_there_an_error_during_dma (struct pci_dev);
> >
> > right? And it could return true/false, right? I can certainly
> > do that today with ppc64. I just can't tell you which dma triggered
> > the problem.
>
> One issue with that is how to notify drivers that they need to make this call.
> In may cases, DMA completion will be signalled by an interrupt, but if the
> DMA failed, that interrupt may never happen, which means the call to
> pci_unmap or the above function from the interrupt handler may never occur.

Hmm. Well, I notice that e100/e1000 has a heartbeat built into it, so
that if the card doesn't respond, it resets the card. So this would
be a natural place for them. And I suspect that *all* scsi drivers
have timeouts; they initiate a cascade of reset sequences if they
don't get data back after X seconds.

I see nothing wrong with adding a requirement, something that says
"If a device driver wants to be pci-error aware, then yea-verily it
must use a dma-timeout timer (say 15 seconds) and check for
pci errors in the dma-timeout handler".

> Some I/O errors are by nature asynchronous and unpredictable, so I think we
> need a separate thread and callback interface to deal with errors where the
> beginning of the transaction and the end of the transaction occur in
> different contexts, in addition to the PIO transaction interface already
> proposed.

I don't know what the pci-express implementation is like, but the
ppc64 implementation is *not* transactional, so I don't have that issue.

If some pci chipsets only report dma errors on a transactional basis,
then we need to modify pci_map_sg() and pci_map_single() and etc. to
take a cookie as well. It would be up to the device driver to alloc
and retire cookies as the dma's complete (and make sure it can find
its cookies in any context). Is this or something like this that
is needed?

--linas

2005-03-02 20:07:26

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wed, Mar 02, 2005 at 09:27:27AM +1100, Benjamin Herrenschmidt was heard to remark:
> On Tue, 2005-03-01 at 12:33 -0600, Linas Vepstas wrote:
>
> > The current proposal (and prototype) has a "master recovery thread"
> > to handle the coordinated reset of the pci controller. This master
> > recovery thyread makes three calls in struct pci_driver:
> >
> > void (*frozen) (struct pci_dev *); /* called when dev is first frozen */
> > void (*thawed) (struct pci_dev *); /* called after card is reset */
> > void (*perm_failure) (struct pci_dev *); /* called if card is dead */
>
> See my other emails. I think only one callback is enough, and I think we
> need more parameters.

That's a style issue. Propose an API, I'll code it. We can have
the master recovery thread be a state machine, and so every device
driver gets notified of state changes:

typedef enum pci_bus_state {
DEVICE_IO_FROZEN=1,
DEVICE_IO_THAWED,
DEVICE_PERM_FAILURE,
};

struct pci_driver {
....
void (*io_state_change) (struct pci_dev *dev, pci_bus_state);
};

would that work?

--linas

2005-03-02 22:50:34

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wed, 2005-03-02 at 14:02 -0600, Linas Vepstas wrote:
> On Wed, Mar 02, 2005 at 09:27:27AM +1100, Benjamin Herrenschmidt was heard to remark:

> That's a style issue. Propose an API, I'll code it. We can have
> the master recovery thread be a state machine, and so every device
> driver gets notified of state changes:
>
> typedef enum pci_bus_state {
> DEVICE_IO_FROZEN=1,
> DEVICE_IO_THAWED,
> DEVICE_PERM_FAILURE,
> };
>
> struct pci_driver {
> ....
> void (*io_state_change) (struct pci_dev *dev, pci_bus_state);
> };
>
> would that work?

Too much ppc64-centric.

Also, we want to use the re-enable IOs facility of EEH to give the
driver a chance to extract diagnostic infos from the HW.

Ben.

2005-03-02 22:49:45

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wed, 2005-03-02 at 12:22 -0600, Linas Vepstas wrote:
> On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to remark:
> >
> > The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> > I think that's because nobody knows how to handle it (ie it's probably
> > hw-dependent and all existign implementations would thus be
> > driver-specific anyway).
>
> ?
> We could add a call
>
> int pci_was_there_an_error_during_dma (struct pci_dev);
>
> right? And it could return true/false, right? I can certainly
> do that today with ppc64. I just can't tell you which dma triggered
> the problem.

That's ugly. I prefer asynchronous notification by far.

> > And yes, CLEARLY drivers will have to do all the heavy lifting.
>
> well .. maybe. On ppc64, we have one hack-ish solution for
> hotplug-capable but pci-error-unaware device drivers, and that
> is to hot unplug the driver, clear the pci error condition, and
> and replug the driver. Works great for ethernet; haven't tested
> USB.
>
> I'm getting greif from the guys over here because my hack-ish code
> is hackish, and isn't arch-generic, and Paul Mackerras doesn't like it,
> which is why Benh is threatening to re-write it, and etc. ...
> which is why Seto is involved, and we're having this conversation ...
>
>
> --linas
--
Benjamin Herrenschmidt <[email protected]>

2005-03-02 22:54:10

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

> One issue with that is how to notify drivers that they need to make this call.
> In may cases, DMA completion will be signalled by an interrupt, but if the
> DMA failed, that interrupt may never happen, which means the call to
> pci_unmap or the above function from the interrupt handler may never occur.
>
> Some I/O errors are by nature asynchronous and unpredictable, so I think we
> need a separate thread and callback interface to deal with errors where the
> beginning of the transaction and the end of the transaction occur in
> different contexts, in addition to the PIO transaction interface already
> proposed.

Yup. As I wrote already. We need a new callback in pci_driver, that's
the cleanest way. It would take an "event" argument plus some still to
be defined flags to inform the driver about platform capabilities.

The driver based on the result code can then request a slot reset if
the platform supports it, or just try to recover, or just giveup.

Another callback would come once the slot has been reset & reconfigure,
etc...

I'm going to write something down real soon now (after I had breakfast
hopefully :)

Ben.

2005-03-02 22:49:44

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wed, 2005-03-02 at 13:03 -0500, linux-os wrote:

> > event->dev = dev;
> > event->reset_state = rets[0];
> > event->time_unavail = rets[2];
> >
> > /* We may be called in an interrupt context */
> > spin_lock_irqsave(&eeh_eventlist_lock, flags);
> ^^^^^^^^^^^^^^^^^^
> > list_add(&event->list, &eeh_eventlist);
> > spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
> ^^^^^^^^^^^^^^^^^^^^^
>
> I don't think this is SMP safe from interrupt-context.
> You need the lock when you are building the event-list,
> not just when you queue it.

Go buy a clue, they are cheap these days.

Ben.

2005-03-02 23:39:37

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Thu, Mar 03, 2005 at 09:46:12AM +1100, Benjamin Herrenschmidt was heard to remark:
> On Wed, 2005-03-02 at 14:02 -0600, Linas Vepstas wrote:
> > On Wed, Mar 02, 2005 at 09:27:27AM +1100, Benjamin Herrenschmidt was heard to remark:
>
> > That's a style issue. Propose an API, I'll code it. We can have
> > the master recovery thread be a state machine, and so every device
> > driver gets notified of state changes:
> >
> > typedef enum pci_bus_state {
> > DEVICE_IO_FROZEN=1,
> > DEVICE_IO_THAWED,
> > DEVICE_PERM_FAILURE,
> > };
> >
> > struct pci_driver {
> > ....
> > void (*io_state_change) (struct pci_dev *dev, pci_bus_state);
> > };
> >
> > would that work?
>
> Too much ppc64-centric.

Ah Ben, you are hard to make happy.

> Also, we want to use the re-enable IOs facility of EEH to give the
> driver a chance to extract diagnostic infos from the HW.

Yes, of course. Recovery would have to happen through multiple steps,
one of which is re-enabling i/o. (Another one would be to ask all of
the device drivers affected by the outage if any of them need a
device reset.)

Right now, my goal was not to specify the final interface with all the
bits correct, but to get agreement that this particular approach, of
modifying struct pci_driver, would make everyone happy.

--linas

2005-03-02 23:48:33

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Thu, Mar 03, 2005 at 09:41:43AM +1100, Benjamin Herrenschmidt was heard to remark:
> On Wed, 2005-03-02 at 12:22 -0600, Linas Vepstas wrote:
> > On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds was heard to remark:
> > >
> > > The new API is what _allows_ a driver to care. It doesn't handle DMA, but
> > > I think that's because nobody knows how to handle it (ie it's probably
> > > hw-dependent and all existign implementations would thus be
> > > driver-specific anyway).
> >
> > ?
> > We could add a call
> >
> > int pci_was_there_an_error_during_dma (struct pci_dev);
> >
> > right? And it could return true/false, right? I can certainly
> > do that today with ppc64. I just can't tell you which dma triggered
> > the problem.
>
> That's ugly. I prefer asynchronous notification by far.

Well, we've got that, I think the goal was to figure out what the
PCI-Express folks think they need, and what the dev driver folks want.

Put it another way: a device driver author should have the opportunity
to poll the pci bus status if they so desire. Polling for bus status
on ppc64 is real easy. Given what Jesse Barnes was saying, it sounded
like a simple (optional, the dev driver doesn't have to use it) poll
is not enough, because some errors might be transactional.

--linas

2005-03-02 23:48:30

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Wednesday, March 2, 2005 3:30 pm, Linas Vepstas wrote:
> Put it another way: a device driver author should have the opportunity
> to poll the pci bus status if they so desire. Polling for bus status
> on ppc64 is real easy. Given what Jesse Barnes was saying, it sounded
> like a simple (optional, the dev driver doesn't have to use it) poll
> is not enough, because some errors might be transactional.

Yeah, I'm not arguing against your call, it could be useful for polling for
errors or for use in an error handling callback. What I was trying to say
earlier (maybe I wasn't very clear) was that the idea of creating
transactions for certain types of I/O (even if those transactions are
artificial and purely in software) can be useful since it creates boundaries
and context, making it easier to figure out what went wrong, hopefully making
it easier to fix things and carry on.

IOW, using Seto-san's iochk_clear/iochk_read interface makes certain types of
errors much easier to deal with since you *know* where an error occurred and
can presumably deal with it right away. The problem comes in for things that
aren't well encapsulated, like DMA, for which error polling or some sort of
callback is needed (and even with polling you'll need to poll in an error
handling thread as you mentioned since the driver may start DMA, return, and
the error can happen later when we're not actually in driver code). So I
think we mostly agree on what things need to be done, you and benh just have
to fight it out over the details. :)

Jesse

2005-03-04 02:09:33

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Linas Vepstas wrote:
>>If their defaults are no-ops, device
>>maintainers who develops their driver on not-implemented arch should be
>>more careful.
>
> Why? People who write device drivers already know if/when they need to
> disable interrupts, and so they already disable if they need it.

OK, I'll remake them as no-ops.
Nothing will start unless trust in driver folks.

> p.s. I would like to have iochk_read() take struct pci_dev * as an
> argument. (I could store a pointer to pci_dev in the "cookie" but
> that seems odd).

I'd like to store the pointer and handle all only with the cookie...
Or is it needed to pass different device to iochk_clear() and iochk_read()?

Thanks,
H.Seto

2005-03-04 02:27:04

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Linas Vepstas wrote:
> Below is some "pseudocode" version (mentally substitute
> "pci error event" for every occurance of "eeh"). Its got some
> ppc64-specific crud in there that we have to fix to make it
> truly generic (I just cut and pasted from current code).
>
> Would a cleaned up version of this code be suitable for a
> arch-generic pci error recovery framework? Seto, would
> this be useful to you?

Yes, it would. I'm looking forward to see your generic one.

Thanks,
H.Seto

2005-03-04 12:51:14

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Thanks for all comments!

OK, I'd like to sort our situation:

################

$ Here are 2 features:
- iochk_clear/read() interface for error "detection"
by Seto ... me :-)
- callback, thread, and event notification for error "recovery"
by Linas ... expert in PPC64

$ What will "detection" interface provides?
- allow drivers to get error information
- device/bus was isolated/going-reset/re-enabled/etc.
- error status which hardware and PCI subsystem provides
- allow drivers to do "simple retry" easily
- major soft errors(*1) would be recovered by a simple retry
- in cases that device/bus was re-enabled but a retry is required

$ What will "recovery" infrastructure provides?
- allow drivers to help OS's recovery
- usually OS cannot re-enable affected devices by itself
- allow drivers to respond asynchronous error event
- allow drivers to implement "device specific recovery"

$ Difference of stance
- "detection"
- Assume that the number of soft error is far more than that of
hard error. (PCI-Express has ECC, but traditional PCI does not.)
- Assume that it isn't too late that attempt of device isolation
and/or recovery comes after a simple retry(*2), and that a retry
would be required even if the recovery had go well.
- It isn't matter whether device isolation is actually possible or
not for the arch. The fundamental intention of this interface is
prevent user applications from data pollution.
- Currently DMA and asynchronous I/O is not target.
- "recovery"
- (I'd appreciate it if Linas could fill here by his suitable words.)
- (Maybe,) it is based on assuming that erroneous device should be
isolated immediately irrespective of type of the error.
- (I guess that) once a device was isolated, it become harder to
re-enable it. It seems like a kind of hotplug feature.
- Currently there are few platform which can isolate devices and
attempt to recover from the I/O error.

$ How to use
- "detection" ... easy.
- clip I/Os by iochk_clear() and iochk_read()
- if iochk_read() returns non-0, retry once and/or notify the error
to user application.
- "recovery" ... rather hard.
- (I'd appreciate it if Linas could fill here by his suitable words.)
- write callback function for each event(*3)

-----

*1:
Traditionally, there are 2 types of error:
- soft error:
data was broken (ex. due to low voltage, natural radiation etc.)
temporary error
- hard error:
device or bus was physically broken (i.e. uncorrectable)
permanent error

*2:
it's difficult to distinguish hard errors from soft errors, without
any retry.

*3:
Linas, how many stages/events would you prefer to be there? is 3 enough?

ex. IMHO:

IOERR_DETECTED
- An error was detected, so error logging or device isolation would be
major request. On PPC64, isolation would be already done by hardware.
IOERR_PREPARE_RECOVERY
- Require preparation before attempting error recovery by OS.
IOERR_DO_RECOVERY
- Require device specific recovery and result of the recovery.
OS will gather all results and will decide recovered or not.
IOERR_RECOVERED
- OS recovery was succeeded.
IOERR_DEAD
- OS recovery was failed.

And as Ben said and as you already proposed, I also think only one callback
is enough and better, like:
int pci_emergency_callback(pci_dev *dev, err_event event, void *extra)

It allows us to add new event if desired.

################

Thanks,
H.Seto

2005-03-04 14:40:38

by Pavel Machek

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Hi!

> > If there's no ->error method, at leat call ->remove so one device only
> > takes itself down.
> >
> > Does this make sense?
>
> This was my thought too last time we had this discussion. A completely
> asynchronous call is probably needed in addition to Hidetoshi's proposed API,
> since as you point out, the driver may not be running when an error occurs
> (e.g. in the case of a DMA error or more general bus problem). The async

Hmm, before we go async way (nasty locking, no?) could driver simply
ask "did something bad happen while I was sleeping?" at begining of each
function?

For DMA problems, driver probably has its own, timer-based,
"something is wrong" timer, anyway, no?
Pavel

--
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms

2005-03-04 16:56:17

by linas

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Fri, Mar 04, 2005 at 11:03:29AM +0900, Hidetoshi Seto was heard to remark:
> >p.s. I would like to have iochk_read() take struct pci_dev * as an
> >argument. (I could store a pointer to pci_dev in the "cookie" but
> >that seems odd).
>
> I'd like to store the pointer and handle all only with the cookie...

OK then.

> Or is it needed to pass different device to iochk_clear() and iochk_read()?

No.

--linas

2005-03-04 17:57:08

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Friday, March 4, 2005 5:54 am, Pavel Machek wrote:
> Hi!
>
> > > If there's no ->error method, at leat call ->remove so one device only
> > > takes itself down.
> > >
> > > Does this make sense?
> >
> > This was my thought too last time we had this discussion. A completely
> > asynchronous call is probably needed in addition to Hidetoshi's proposed
> > API, since as you point out, the driver may not be running when an error
> > occurs (e.g. in the case of a DMA error or more general bus problem).
> > The async
>
> Hmm, before we go async way (nasty locking, no?) could driver simply
> ask "did something bad happen while I was sleeping?" at begining of each
> function?

This is what Seto is proposing, aiui. I.e. calls around I/O so you can
gracefully handle errors during that I/O.

> For DMA problems, driver probably has its own, timer-based,
> "something is wrong" timer, anyway, no?

The idea is to allow them to do something like that, or consolidate such
threads in a platform specific error handling thread or interrupt handler
that can call a driver's ->dma_error(dev) routine (or ->error(dev, ERROR_DMA)
or whatever) routine.

Jesse

2005-03-05 00:00:27

by Pavel Machek

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

Hi!

> > Hmm, before we go async way (nasty locking, no?) could driver simply
> > ask "did something bad happen while I was sleeping?" at begining of each
> > function?
> >
> > For DMA problems, driver probably has its own, timer-based,
> > "something is wrong" timer, anyway, no?
>
> No, there is no nasty locking, when the callback happens, pretty much
> all IOs have stopped anyway due to errors, and we aren't on a critical
> code path.

What prevents driver from being run on another CPU, maybe just doing
mdelay() between hardware accesses?

Pavel
--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-05 00:36:42

by Pavel Machek

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On So 05-03-05 10:03:37, Benjamin Herrenschmidt wrote:
> On Fri, 2005-03-04 at 23:57 +0100, Pavel Machek wrote:
>
> > What prevents driver from being run on another CPU, maybe just doing
> > mdelay() between hardware accesses?
>
> Almost all drivers that I know have some sort of locking. Nothing nasty
> about it. Besides, you can't expect everything to be as simple as
> putting two bit of lego together, the problem isn't simple.

If error() is allowed to sleep, then yes, its probably easy enough. If
it is not allowed to sleep, it will just postpone work to context that
is allowed to sleep, and it will probably be okay, too.

=> there are some locking issues, but they are probably easy
enough. Sorry for noise.
Pavel

--
People were complaining that M$ turns users into beta-testers...
...jr ghea gurz vagb qrirybcref, naq gurl frrz gb yvxr vg gung jnl!

2005-03-05 00:36:39

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Sat, 2005-03-05 at 00:18 +0100, Pavel Machek wrote:
> On So 05-03-05 10:03:37, Benjamin Herrenschmidt wrote:
> > On Fri, 2005-03-04 at 23:57 +0100, Pavel Machek wrote:
> >
> > > What prevents driver from being run on another CPU, maybe just doing
> > > mdelay() between hardware accesses?
> >
> > Almost all drivers that I know have some sort of locking. Nothing nasty
> > about it. Besides, you can't expect everything to be as simple as
> > putting two bit of lego together, the problem isn't simple.
>
> If error() is allowed to sleep, then yes, its probably easy enough. If
> it is not allowed to sleep, it will just postpone work to context that
> is allowed to sleep, and it will probably be okay, too.

Yes, it's my itend that the notification callback is to be called in a
task context where it can sleep.

> => there are some locking issues, but they are probably easy
> enough. Sorry for noise.
> Pavel
>
--
Benjamin Herrenschmidt <[email protected]>

2005-03-04 23:52:29

[permalink] [raw]

Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling

On Fri, 2005-03-04 at 14:54 +0100, Pavel Machek wrote:
> Hi!
>
> > > If there's no ->error method, at leat call ->remove so one device only
> > > takes itself down.
> > >
> > > Does this make sense?
> >
> > This was my thought too last time we had this discussion. A completely
> > asynchronous call is probably needed in addition to Hidetoshi's proposed API,
> > since as you point out, the driver may not be running when an error occurs
> > (e.g. in the case of a DMA error or more general bus problem). The async
>
> Hmm, before we go async way (nasty locking, no?) could driver simply
> ask "did something bad happen while I was sleeping?" at begining of each
> function?
>
> For DMA problems, driver probably has its own, timer-based,
> "something is wrong" timer, anyway, no?

No, there is no nasty locking, when the callback happens, pretty much
all IOs have stopped anyway due to errors, and we aren't on a critical
code path.

Polling for error might be possible, but async notification is the way
to go because whatever does error management need to be able to
separately:

- notify all drivers on the affected bus segment
- one the above is done, and based on system/driver capabilities (API
to be defined) eventually re-enable IO access and do a new round of
notifications
- based on system/driver capabilities, eventually reset the slot and
notify drivers to re-initialize themselves.

Ben.

2005-03-05 05:50:02