2005-01-06 20:31:32

by linas

[permalink] [raw]
Subject: [PATCH] PPC64: EEH Recovery


Hi Paul,

The patch below implements hotplug style EEH error recovery.
Its split into two pieces: a part that needs to be applied to the
PPC64 arch tree, and a part that needs to be applied to the
RPA PHP hotplug tree. The PPC64 part needs to go in first.

Assuming this doesn't generate a round of discussion, please
forward upstream to akpm/torvalds.

Signed-off-by: Linas Vepstas <[email protected]>



Attachments:
(No filename) (406.00 B)
eeh-recovery-bk-ppc64-3.patch (18.96 kB)
eeh-recovery-bk-hotplug-3.patch (8.65 kB)
Download all attachments

2005-01-17 20:15:59

by linas

[permalink] [raw]
Subject: Re: [PATCH] PPC64: EEH Recovery


Andrew,

The attached file describes PCI bus EEH "Extended Error Handling"
concepts and operation; could you drop this into the kernel
documentation tree, at
linux-2.6/Documentation/powerpc/eeh-pci-error-recovery.txt ?

Signed-off-by: Linas Vepstas <[email protected]>

--linas

p.s. It was not clear to me if the EEH patch previously sent
(6 January 2005, same subject line) will be wending its way into
the main Torvalds kernel tree, or not. I hadn't really gotten
confirmation one way or another.



Attachments:
(No filename) (506.00 B)
eeh-pci-error-recovery.txt (14.92 kB)
Download all attachments

2005-01-19 06:01:22

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] PPC64: EEH Recovery

Linas Vepstas writes:

> p.s. It was not clear to me if the EEH patch previously sent
> (6 January 2005, same subject line) will be wending its way into
> the main Torvalds kernel tree, or not. I hadn't really gotten
> confirmation one way or another.

I'm not really totally happy with it yet, on a number of fronts:

1. You're adding more PCI-specific stuff to the device_node struct,
which I don't like. I would prefer that the device_node tree
contains basically just what we get from OF, and that we have a
separate struct for storing ppc64-specific information for each PCI
device. Fixing that is outside the scope of your patch, though.

2. I don't see why the device nodes for the PCI subtree being reset
would go away, and thus I don't see the need for your eeh_cfg_tree
struct.

3. Is there a good reason why we can't use the assigned-addresses
property on the relevant device tree nodes to tell us what to set
the BARs to?

4. I think the 5 second sleep is quite bogus, and shows that we have
the flow of control wrong. In particular I think it should be a
userland write to a sysfs file that kicks off the restart process
rather than it just happening after 5 seconds. Anyway, what
process or thread is executing that 5 second sleep? Is it keventd
or something?

5. AFAICS userland will get an unplug notification for the device, but
nothing to indicate that is due to an EEH slot isolation event. I
think userland should be told about EEH events.

Regards,
Paul.

2005-01-19 16:03:34

by Nathan Fontenot

[permalink] [raw]
Subject: Re: [PATCH] PPC64: EEH Recovery


Paul Mackerras wrote:

> 5. AFAICS userland will get an unplug notification for the device, but
> nothing to indicate that is due to an EEH slot isolation event. I
> think userland should be told about EEH events.
>

Currently there is a way for userland to determine if a hotplug event
they receive is due to an EEH slot isolation event. It's not very
pretty and requires the rtas_errd daemon to be running.

The RTAS event generated from the EEH event is logged to
/var/log/platform by rtas_errd. Userland scripts would have to search
the file for a recent EEH event matching their device to make this
determination. This isn't as nice as a direct notification but is what
we have at this point.

--
Nathan Fontenot

2005-01-20 22:39:47

by linas

[permalink] [raw]
Subject: Re: [PATCH] PPC64: EEH Recovery


On Wed, Jan 19, 2005 at 05:06:05PM +1100, Paul Mackerras was heard to remark:
> Linas Vepstas writes:
>
> > p.s. It was not clear to me if the EEH patch previously sent
> > (6 January 2005, same subject line) will be wending its way into
> > the main Torvalds kernel tree, or not. I hadn't really gotten
> > confirmation one way or another.
>
> I'm not really totally happy with it yet, on a number of fronts:
>
> 1. You're adding more PCI-specific stuff to the device_node struct,
> which I don't like. I would prefer that the device_node tree
> contains basically just what we get from OF, and that we have a
> separate struct for storing ppc64-specific information for each PCI
> device. Fixing that is outside the scope of your patch, though.

I wrote this down on my to-do list. Its the sort of thing that
evaporates from my consciousness when other things come along,
but I'll give it a shot.

> 2. I don't see why the device nodes for the PCI subtree being reset
> would go away, and thus I don't see the need for your eeh_cfg_tree
> struct.

Its not the reset, its the hot-plug remove. The hot plug code assumes
that you are going to physically remove the device from the slot, so
it removes the device_node as part of the "unconfig".

Of course, I found this out only after performing a null-pointer deref.
Note only does the node go away, but all of the various pointers it holds
are zeroed in the process.

The cfg tree holds on to those pointers, so that I wouldn't have to
muck with the device_node removal code to do something tricky.

> 3. Is there a good reason why we can't use the assigned-addresses
> property on the relevant device tree nodes to tell us what to set
> the BARs to?

Yes, the reason is that after a reset, that property doesn't hold any
decent data. I discussed this with the firmware developers, and thier
response was that it is the kernel's responsibility to compute
(or save/restore) such values. (Except for bridges, which they will do for us).

> 4. I think the 5 second sleep is quite bogus, and shows that we have
> the flow of control wrong.

:) Yes, well, indeed it is. Don't look at me, not my idea.

> In particular I think it should be a
> userland write to a sysfs file that kicks off the restart process
> rather than it just happening after 5 seconds. Anyway, what
> process or thread is executing that 5 second sleep? Is it keventd
> or something?

Its a workqueue.

> 5. AFAICS userland will get an unplug notification for the device, but
> nothing to indicate that is due to an EEH slot isolation event. I
> think userland should be told about EEH events.

In principle, I'd agree. In practice, this would seem to require changes
or additions or enhancements to udev that I don't quite understand, as
well as potential changes to udev scripts. Maybe I don't understand
sysfs sufficiently well. I am very tempted to punt on this, and wait
for the Intel-backed PCI-E code to get to this point, and then do whatever
they're doing.

--linas

2005-01-20 22:48:31

by linas

[permalink] [raw]
Subject: Re: [PATCH] PPC64: EEH Recovery

On Wed, Jan 19, 2005 at 05:06:05PM +1100, Paul Mackerras was heard to remark:
> Linas Vepstas writes:
>
> > p.s. It was not clear to me if the EEH patch previously sent
> > (6 January 2005, same subject line) will be wending its way into
> > the main Torvalds kernel tree, or not. I hadn't really gotten
> > confirmation one way or another.
>
> I'm not really totally happy with it yet, on a number of fronts:

[...]

I forgot to mention: while I agree with some/many of these points,
especially with regards to recovery, I'd also like to note that the
patch was mailed in two independent parts:

-- a number of generic infrastructure routines, all in a ppc64 patch, and
-- the code that actually performs the recovery, as a patch to
the drivers/pci/hotplug subsystem.

While the actual recovery code is controversial (e.g. no support of
scsi recovery), I'd like to at least get in the the generic
infrastructure pieces.

--linas

2005-01-21 02:45:23

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH] PPC64: EEH Recovery

Linas Vepstas writes:

> > 2. I don't see why the device nodes for the PCI subtree being reset
> > would go away, and thus I don't see the need for your eeh_cfg_tree
> > struct.
>
> Its not the reset, its the hot-plug remove. The hot plug code assumes
> that you are going to physically remove the device from the slot, so
> it removes the device_node as part of the "unconfig".

OK, I missed that. It seems a bit bogus to me. Could you point me at
where in the code this happens?

> > 3. Is there a good reason why we can't use the assigned-addresses
> > property on the relevant device tree nodes to tell us what to set
> > the BARs to?
>
> Yes, the reason is that after a reset, that property doesn't hold any
> decent data. I discussed this with the firmware developers, and thier
> response was that it is the kernel's responsibility to compute
> (or save/restore) such values. (Except for bridges, which they will do for us).

The not holding any decent data is a consequence of the device nodes
getting thrown away, isn't it? I fail to see how resetting the device
can of itself affect our copy of the device tree.

> > In particular I think it should be a
> > userland write to a sysfs file that kicks off the restart process
> > rather than it just happening after 5 seconds. Anyway, what
> > process or thread is executing that 5 second sleep? Is it keventd
> > or something?
>
> Its a workqueue.

Which get run in keventd's context. In other words no other
workqueues will get run during the 5 second sleep, or at least not on
that cpu.

Paul.