2003-02-11 01:26:21

by Kenneth Sumrall

[permalink] [raw]
Subject: Re: Kexec, DMA, and SMP

"Eric W. Biederman" wrote:
>
> Suparna Bhattacharya <[email protected]> writes:
>
> > On Sun, Feb 09, 2003 at 11:39:27AM -0700, Eric W. Biederman wrote:
> > > Corey Minyard <[email protected]> writes:
> > >
> > > With respect to DMA and SMP handling for kexec on panic that case is
> > > much trickier. A lot of the normal methods simply don't apply because
> > > by definition in a panic something is broken, and that something may
> > > be the code we need to cleanly shutdown the hardware. But I an not
> > > ready to sacrifice a method that works well in a properly working
> > > kernel just because the panic case can't use it.
> > >
> > > In getting it working I suggest we start with the easy cases, where
> > > DMA and SMP are not big issues. And then we can have a working
> > > framework.
> >
> > I'd agree. That was also the idea behind the patch we'd just posted
> > for LKCD. With a basic working framework in hand that works for
> > simpler cases, we can now keep working on addressing more and harder
> > situations bit by bit.
>
> Agreed. I guess the primary question is can we trust the current
> device shutdown + reboot notifier path or do we need to make some
> large changes to avoid it.
>
So are the functions registered on the reboot notifier path guaranteed
to be non-blocking? In the kexec on panic case, calls that can block
would obviously be a bad thing. If they can block, perhaps we could add
a new flag SYS_PANIC or something like that to tell the driver to only
do a non-blocking shutdown of the chip.


> > Are you trying to address the possibility that DMA is overwriting
> > memory we are using in the recovery code, due to a runaway driver
> > or other code passing a wrong memory address to a device (e.g. in
> > a corrupted command area) ?
>
> Not primarily. Instead I am trying to address the possibility that
> DMA is overwriting the recovery code due to a device not being shutdown
> properly. Though it would happen to cover many cases of the wrong
> memory address being passed to a device.
>
The problem we were seeing was that rogue DMA from a network interface
chip was corrupting dentry's in the dirent cache when the rebooted
kernel was coming back up. This caused a whole new set of panics. :-(

Ken Sumrall
[email protected]


2003-02-11 04:58:37

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Kexec, DMA, and SMP

Kenneth Sumrall <[email protected]> writes:

> > Suparna Bhattacharya <[email protected]> writes:

> > Agreed. I guess the primary question is can we trust the current
> > device shutdown + reboot notifier path or do we need to make some
> > large changes to avoid it.
> >
> So are the functions registered on the reboot notifier path guaranteed
> to be non-blocking? In the kexec on panic case, calls that can block
> would obviously be a bad thing. If they can block, perhaps we could add
> a new flag SYS_PANIC or something like that to tell the driver to only
> do a non-blocking shutdown of the chip.

I think there is some amount of blocking allowed. But that has not be
clearly defined. Note in 2.5.x there is a specific subset
of the reboot notifiers the shutdown() device method. That you
don't need to register a notifier for. The rules are the same
and it is just a little bit cleaner.

> > Not primarily. Instead I am trying to address the possibility that
> > DMA is overwriting the recovery code due to a device not being shutdown
> > properly. Though it would happen to cover many cases of the wrong
> > memory address being passed to a device.
> >
> The problem we were seeing was that rogue DMA from a network interface
> chip was corrupting dentry's in the dirent cache when the rebooted
> kernel was coming back up. This caused a whole new set of panics. :-(

And this a reserved hunk of memory from of memory from say 16MB to 20MB
would handle. As the DMA could never have been setup at that address
it obviously will never be used...

Eric



2003-02-11 16:59:37

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Kexec, DMA, and SMP

On Mon, 2003-02-10 at 21:08, Eric W. Biederman wrote:
> Kenneth Sumrall <[email protected]> writes:
>
> > > Suparna Bhattacharya <[email protected]> writes:
>
> > > Agreed. I guess the primary question is can we trust the current
> > > device shutdown + reboot notifier path or do we need to make some
> > > large changes to avoid it.
> > >
> > So are the functions registered on the reboot notifier path guaranteed
> > to be non-blocking? In the kexec on panic case, calls that can block
> > would obviously be a bad thing. If they can block, perhaps we could add
> > a new flag SYS_PANIC or something like that to tell the driver to only
> > do a non-blocking shutdown of the chip.
>
Some of the network shutdown reboot notifiers can block.
I found this out the hard way when trying to convert notifiers to use RCU
and discovered many warnings. So many that the effort was abandoned.