2002-10-13 19:18:48

by Adam J. Richter

[permalink] [raw]
Subject: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

linux-2.5.42 had an annoying new behavior. When I would
try to do a warm reboot, it would spin down the hard drives, which
just made the reboot take longer and gave the impression that a
halt or poweroff was in progress.

At first, I suspected IDE, but I think the new behavior in IDE
of spinning down the hard drives on suspend is correct. The problem
is that the warm reboot system call is trying to suspend all of the
devices before a warm reboot for no reason. We already have a reboot
notifier chain that drivers can use to register code that has to be
run in order to safely reboot or halt. I am not talking about
eliminating that. I am only talking about the soft reboot putting
devices into a power saving mode that is allowed to take a long
recovery time, especially given that the reboot is likely to want to
talk to every hardware device connected to the system.

Anyhow, here is the patch. As far as I can tell, there is no
delegated mainainer for kernel/sys.c, so I am sending this to
linux-kernel and I will resend it to Linus later if nobody points me
to another maintainer to go through and there are no complaints.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

--- linux-2.5.42/kernel/sys.c 2002-10-11 21:21:31.000000000 -0700
+++ linux/kernel/sys.c 2002-10-13 11:57:45.000000000 -0700
@@ -365,7 +365,6 @@
case LINUX_REBOOT_CMD_RESTART:
notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
system_running = 0;
- device_shutdown();
printk(KERN_EMERG "Restarting system.\n");
machine_restart(NULL);
break;


2002-10-13 19:51:36

by Eric Blade

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

On Sun, 2002-10-13 at 15:24, Adam J. Richter wrote:
> linux-2.5.42 had an annoying new behavior. When I would
> try to do a warm reboot, it would spin down the hard drives, which
> just made the reboot take longer and gave the impression that a
> halt or poweroff was in progress.
>
> At first, I suspected IDE, but I think the new behavior in IDE
> of spinning down the hard drives on suspend is correct. The problem
> is that the warm reboot system call is trying to suspend all of the
> devices before a warm reboot for no reason. We already have a reboot
> notifier chain that drivers can use to register code that has to be
> run in order to safely reboot or halt. I am not talking about
> eliminating that. I am only talking about the soft reboot putting
> devices into a power saving mode that is allowed to take a long
> recovery time, especially given that the reboot is likely to want to
> talk to every hardware device connected to the system.
>
> Anyhow, here is the patch. As far as I can tell, there is no
> delegated mainainer for kernel/sys.c, so I am sending this to
> linux-kernel and I will resend it to Linus later if nobody points me
> to another maintainer to go through and there are no complaints.

Adam,
I'm not sure the proper thing to do is necessarily remove the
device_shutdown() call. I did the changes to the device_shutdown()
function, but as far as I can tell, it should not have changed any
behavior like that - all I did was re-work the logic a bit. In any
case, what I did submit to the mailing list was absent a small piece of
code (a change to device.h), and the person who forwarded it onto Linus
(thank you!) did make a change to make it compile without that.

Please try this patch to the base 2.5.42 code, and let me know if this
returns it to the previous behavior?

--- a/drivers/base/power.c Sat Oct 12 00:22:11 2002
+++ linux/drivers/base/power.c Sun Oct 13 15:42:46 2002
@@ -31,7 +31,7 @@
struct device * prev = NULL;
int error = 0;

- if(level == SUSPEND_POWER_DOWN)
+ if(level == SUSPEND_SHUT_DOWN)
printk(KERN_EMERG "Shutting down devices\n");
else
printk(KERN_EMERG "Suspending devices\n");
@@ -42,7 +42,7 @@
if (dev) {
spin_unlock(&device_lock);
if(dev->driver) {
- if(level == SUSPEND_POWER_DOWN) {
+ if(level == SUSPEND_SHUT_DOWN) {
if(dev->driver->remove)

dev->driver->remove(dev);
} else if(dev->driver->suspend)
@@ -96,7 +96,7 @@
*/
void device_shutdown(void)
{
- device_suspend(4, SUSPEND_POWER_DOWN);
+ device_suspend(4, SUSPEND_SHUT_DOWN);
}

EXPORT_SYMBOL(device_suspend);
--- a/include/linux/device.h Sat Oct 12 00:22:19 2002
+++ linux/include/linux/device.h Sun Oct 13 15:43:03 2002
@@ -40,6 +40,7 @@
SUSPEND_SAVE_STATE,
SUSPEND_DISABLE,
SUSPEND_POWER_DOWN,
+ SUSPEND_SHUT_DOWN,
};

enum {



2002-10-13 21:23:19

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric Blade <[email protected]> writes:

> On Sun, 2002-10-13 at 15:24, Adam J. Richter wrote:
> > linux-2.5.42 had an annoying new behavior. When I would
> > try to do a warm reboot, it would spin down the hard drives, which
> > just made the reboot take longer and gave the impression that a
> > halt or poweroff was in progress.
> >
> > At first, I suspected IDE, but I think the new behavior in IDE
> > of spinning down the hard drives on suspend is correct. The problem
> > is that the warm reboot system call is trying to suspend all of the
> > devices before a warm reboot for no reason.


There is a very good reason for calling the driver->remove() function
on a reboot so that we place the devices in a consistent state. And
stop any on going DMA etc. If the ide module spins down disks when
you remove it that is a little odd.

> We already have a reboot
> > notifier chain that drivers can use to register code that has to be
> > run in order to safely reboot or halt.

The reboot notifier call chain, is highly underused, and all of the drivers
already have the code they need in their remove function.

> I am not talking about
> > eliminating that. I am only talking about the soft reboot putting
> > devices into a power saving mode that is allowed to take a long
> > recovery time, especially given that the reboot is likely to want to
> > talk to every hardware device connected to the system.

If
driver->remove()
does that it is an issue with that device driver.

> > Anyhow, here is the patch. As far as I can tell, there is no
> > delegated mainainer for kernel/sys.c, so I am sending this to
> > linux-kernel and I will resend it to Linus later if nobody points me
> > to another maintainer to go through and there are no complaints.
>
> Adam,
> I'm not sure the proper thing to do is necessarily remove the
> device_shutdown() call.

I am certain it is.

> I did the changes to the device_shutdown()
> function, but as far as I can tell, it should not have changed any
> behavior like that - all I did was re-work the logic a bit. In any
> case, what I did submit to the mailing list was absent a small piece of
> code (a change to device.h), and the person who forwarded it onto Linus
> (thank you!) did make a change to make it compile without that.

Hmm. Does SUSPEND_POWER_DOWN have noticeably different behavior?

> Please try this patch to the base 2.5.42 code, and let me know if this
> returns it to the previous behavior?

How can this affect device_shutdown()?
If POWER_DOWN is used in different places I can see the desire for the change.
But I don't even see why we would want this.

Oh, and thank you for the code by the way. I can finally write a correct
version of kexec, and expect the device drivers to clean themselves up
properly.

Eric

2002-10-13 22:08:44

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric W. Biederman writes:
>Eric Blade <[email protected]> writes:

>> On Sun, 2002-10-13 at 15:24, Adam J. Richter wrote:
>> > At first, I suspected IDE, but I think the new behavior in IDE
>> > of spinning down the hard drives on suspend is correct. The problem
>> > is that the warm reboot system call is trying to suspend all of the
>> > devices before a warm reboot for no reason.


>There is a very good reason for calling the driver->remove() function
>on a reboot so that we place the devices in a consistent state. And
>stop any on going DMA etc.

A warm reboot is supposed to do this by the platform dependent
machine_restart(), and whatever processes machine_restart sets off
(e.g., by making a RESET signal active). If a device driver needs
some special processing prior to that, that is what the reboot
notifier chain is for.

>The reboot notifier call chain, is highly underused, and all of the drivers
>already have the code they need in their remove function.

What specific drivers are you referring to that meet all three
of the criteria you have stated?:

1. They need to use reboot_notifier because the
machine_restart() is not enough to reset them,

2. They do not currently use reboot_notifier,

3. They have a device->remove() that does what the
reboot_notifier should do (shuts down ongoing DMA,
because they have a some device that ignores a reset
signal or something like that).

In general I would expective that a driver->remove(device)
function would not be able to touch much of the hardware at all,
because it would have to assume that the device has been or is being
physically yanked out of the computer, so it should do software cleanup
and maybe talk to the PCMCIA socket, USB hub port, etc. that previously
held the device.

If you have a platform where, for example, somehow PCI devices
are able to continue jabbering away after the computer has been reset,
then that could probably be done more consistently for most drivers by
having machine_restart on that platform walk the PCI bus and shut down
everything (drivers that need to do something really special would
still use the reboot notifier).

I could even see calling device_shutdown from machine_restart
on that platform only, and I could even image
"if(buggy_motherboard) device_shutdown();", but I would be really
surprised to learn that sort of thorough resetting is necessary for
most hardware on ordinary x86 PC's.

I could also understand something like your system call for
booting Linux from Linux needing to do more thorough shutdowns and
resets, but not something that uses machine_restart().

>> I am not talking about
>> > eliminating that. I am only talking about the soft reboot putting
>> > devices into a power saving mode that is allowed to take a long
>> > recovery time, especially given that the reboot is likely to want to
>> > talk to every hardware device connected to the system.

>If
>driver->remove()
>does that it is an issue with that device driver.

device_shutdown and device_suspend are for power management.
It is important to turn the device off as soon as possible if a power
management routine has told you that the device is not going to be
used any more.

If you've identified a bunch of devices that need
reboot_notifier, please list them so we can fix them rather than
taxing the users with unnecessarily slow reboots or poor battery life
(due to device not being shut down when they are no longer being used).

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-10-13 22:25:33

by Russell King

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

On Sun, Oct 13, 2002 at 03:14:27PM -0700, Adam J. Richter wrote:
> >There is a very good reason for calling the driver->remove() function
> >on a reboot so that we place the devices in a consistent state. And
> >stop any on going DMA etc.
>
> A warm reboot is supposed to do this by the platform dependent
> machine_restart(), and whatever processes machine_restart sets off
> (e.g., by making a RESET signal active). If a device driver needs
> some special processing prior to that, that is what the reboot
> notifier chain is for.

I'd imagine the reboot notifier chain is going away - especially as
stuff would need to be shut down in the right order, which is one of
the reasons we have the device model. It already knows the right
order.

> If you have a platform where, for example, somehow PCI devices
> are able to continue jabbering away after the computer has been reset,
> then that could probably be done more consistently for most drivers by
> having machine_restart on that platform walk the PCI bus and shut down
> everything (drivers that need to do something really special would
> still use the reboot notifier).

x86, I believe, is one example of such a platform that can leave PCI
devices jabbering over a warm reboot.

> device_shutdown and device_suspend are for power management.
> It is important to turn the device off as soon as possible if a power
> management routine has told you that the device is not going to be
> used any more.

I'd think the sane solution would be a device_reboot() call, and kill
off the reboot notifier. That's just my opinion though, given that I
have some devices that definitely need this type of treatment.

I'd rather have one registration system for this type of device
management, not multiple random lists to register stuff with all over
the place.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-10-13 22:46:33

by Andries Brouwer

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

On Sun, Oct 13, 2002 at 12:24:26PM -0700, Adam J. Richter wrote:

> linux-2.5.42 had an annoying new behavior. When I would
> try to do a warm reboot, it would spin down the hard drives, which
> just made the reboot take longer and gave the impression that a
> halt or poweroff was in progress.

Yes. In my case worse than annoying:
The drives spin down, but have not yet completed spindown when
the machine is started again. LILO fails (prints a single 's'
where I would have expected "uncompressing kernel" and dies).
Pressing reset results in a strange garbled BIOS screen, and a hang.
After a power cycle all is well again.

So, my hardware is very unhappy with the new 2.5.42 behaviour.

Andries


[2.5.33 works for me, but no later kernel does.]

2002-10-13 23:04:21

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Russell King writes:
>x86, I believe, is one example of such a platform that can leave PCI
>devices jabbering over a warm reboot.

The standards on pcisig.com are apparently proprietary, so I'm
afraid I can only quote a proprietary book I have handy:

Reset Signal (RST#):

When asserted, the reset signal forces all PCI configuration
registers, master and target state machines and output drivers to an
initialized state. RST# may be asserted or deasserted asynchronously
to the PCI CLK edge. The assertion of RST# also initializes other,
device-specific functions, but this subject is beyond the scope of the
PCI specification. All PCI output signals must be driver to their
bening states. In general, this means they must be tri-stated [...]

Chapter 4, page 37
_PCI System Architecture, 4th Edition_
Tom Shanley and Don Anderson


So, you must be talking about a PC that does not ground RST#
during a warm reboot or out of spec (according to this book) PCI devices,
which would not be specific to x86 unless we're talking about motherboard
chipset devices.

I understand the benefits of being conservative, but let's not
be taken in by urban legend, or, more likely, some quirkly hardware
that we can set a flag for while we can reboot more quickly with most
other hardware. Anyhow, if you or anyone can give me specifics about
devices jabbering away after reboot, that would be great

I have no objection to replacing or supplementing the reboot
notifier chain with a method in struct device_driver, but let's not
overload these methods with ambiguous semantics. I do not want to
call thirty functions that primarily return memory to various memory
allocators, mark a bunch of inodes as invalid, and otherwise arrange
things so that the kernel can smoothly continue to run user level
programs when, in fact, we just want to pull the reset line on the
computer.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."


2002-10-13 23:10:04

by Russell King

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

On Sun, Oct 13, 2002 at 04:10:01PM -0700, Adam J. Richter wrote:
> I have no objection to replacing or supplementing the reboot
> notifier chain with a method in struct device_driver, but let's not
> overload these methods with ambiguous semantics. I do not want to
> call thirty functions that primarily return memory to various memory
> allocators, mark a bunch of inodes as invalid, and otherwise arrange
> things so that the kernel can smoothly continue to run user level
> programs when, in fact, we just want to pull the reset line on the
> computer.

And what about setups where you can't pull the reset line from software.
I have several machines here like that. And one of them needs software
to talk to the cards to put them back into a sane state before rebooting.

"rebooting" in this particular case is "turn MMU off, jump to location 0"

And I never said anything about needing to allocate memory to do this.
I agree with you that suspending devices on reboot _is_ silly. However,
that's not what I was proposing.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-10-13 23:44:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices


O.k. After having been away from active development on the kernel
for a while I have now reviewed what you are seeing and I have a much
firmer grasp of what is going on.

The change that merged device_shutdown and device_suspend was buggy.
There really should have been a second state added so no behaviour
would change. This does not affect the reboot case, but it does
affect the other users of SUSPEND_POWER_DOWN which does request
devices be powered off. Which is noticeably from what you want in
device_shutdown() which is the preparation for reboot/halt.

The change that went into 2.5.42 was the behavior of the ide
driver changed. device_shutdown has always called device->remove,
and it has been in the reboot path for a very long time. The ide
driver just implemented a remove method, replacing it's reboot
notifier, and in that change the ide driver started spinning down disk
on reboot as well as shutdown. So please talk to the author of the
ide change if you do not like the current ide behaviour.


machine_restart returns control to the BIOS.
It works this way on both x86 and alpha. And the BIOS does
not always toggle the RESET line on the machine. The frequent case
of devices not working in Linux after rebooting from windows, and
visa versa is evidence of this. On alpha the SRM doesn't even
pretend to reset the machine.

Therefore the reboot code needs to put devices in a sane state, and
given the general perversity of hardware and that there is not a
defined way to do this generically except resetting the pci bus
the device drivers need to be involved. A reboot notifier is an ugly
way to do this, and walking the power management tree is a much
cleaner way to accomplish this. Additionally it was decided quite a
while ago that calling device->remove() was the correct way to
accomplish this.

Reboot problems show up more clearly when booting linux from linux but
that just makes it more explicit. It does not introduce any new
potentials for problems, in this area.

Eric

2002-10-13 23:49:41

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

"Adam J. Richter" <[email protected]> writes:

> Russell King writes:
> >x86, I believe, is one example of such a platform that can leave PCI
> >devices jabbering over a warm reboot.
>
> The standards on pcisig.com are apparently proprietary, so I'm
> afraid I can only quote a proprietary book I have handy:

Rebooting on x86 is returning control to the BIOS not pressing the external
reset line.

> So, you must be talking about a PC that does not ground RST#
> during a warm reboot or out of spec (according to this book) PCI devices,
> which would not be specific to x86 unless we're talking about motherboard
> chipset devices.

Exactly an in spec, PC does not need to ground RST# on reboot.

> I understand the benefits of being conservative, but let's not
> be taken in by urban legend, or, more likely, some quirkly hardware
> that we can set a flag for while we can reboot more quickly with most
> other hardware. Anyhow, if you or anyone can give me specifics about
> devices jabbering away after reboot, that would be great

On 2.4.x don't down a network interface, before you reboot.

> I have no objection to replacing or supplementing the reboot
> notifier chain with a method in struct device_driver, but let's not
> overload these methods with ambiguous semantics. I do not want to
> call thirty functions that primarily return memory to various memory
> allocators, mark a bunch of inodes as invalid, and otherwise arrange
> things so that the kernel can smoothly continue to run user level
> programs when, in fact, we just want to pull the reset line on the
> computer.

As soon as you start tracking the code and complaining about the correct
pieces I think it will start digging up a list. Currently I do not
see that a productive piece of conversation.

Eric

2002-10-13 23:53:33

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Russell King wrote:
>On Sun, Oct 13, 2002 at 04:10:01PM -0700, Adam J. Richter wrote:
>> I have no objection to replacing or supplementing the reboot
>> notifier chain with a method in struct device_driver, but let's not
>> overload these methods with ambiguous semantics. I do not want to
>> call thirty functions that primarily return memory to various memory
>> allocators, mark a bunch of inodes as invalid, and otherwise arrange
>> things so that the kernel can smoothly continue to run user level
>> programs when, in fact, we just want to pull the reset line on the
>> computer.
>
>And what about setups where you can't pull the reset line from software.
>I have several machines here like that. And one of them needs software
>to talk to the cards to put them back into a sane state before rebooting.
>
>"rebooting" in this particular case is "turn MMU off, jump to location 0"

As I send in my response Eric Biederman,

| If you have a platform where, for example, somehow PCI devices
| are able to continue jabbering away after the computer has been reset,
| then that could probably be done more consistently for most drivers by
| having machine_restart on that platform walk the PCI bus and shut down
| everything (drivers that need to do something really special would
| still use the reboot notifier).
|
| I could even see calling device_shutdown from machine_restart
| on that platform only, [...]


>And I never said anything about needing to allocate memory to do this.
>I agree with you that suspending devices on reboot _is_ silly. However,
>that's not what I was proposing.

Then you've started a new thread of discussion, because
device_shutdown is defined in drivers/base/power.c as:

void device_shutdown(void)
{
device_suspend(4, SUSPEND_POWER_DOWN);
}


Perhaps device_suspend ought to be renamed device_power_down.

However, I'm not trying to quash what you want to discuss.
I'd be interested in hearing about clarifications and perhaps
extensions of the struct device_driver methods, which I think is what
you're getting at, perhaps here or on linux-hotplug. It's just that,
for this thread, I'm trying to focus on my patch that eliminates the
software suspend on reboot (pros and cons, alternatives to it, etc.).

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-10-13 23:59:00

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Russell King <[email protected]> writes:

> On Sun, Oct 13, 2002 at 04:10:01PM -0700, Adam J. Richter wrote:
> > I have no objection to replacing or supplementing the reboot
> > notifier chain with a method in struct device_driver, but let's not
> > overload these methods with ambiguous semantics. I do not want to
> > call thirty functions that primarily return memory to various memory
> > allocators, mark a bunch of inodes as invalid, and otherwise arrange
> > things so that the kernel can smoothly continue to run user level
> > programs when, in fact, we just want to pull the reset line on the
> > computer.
>
> And what about setups where you can't pull the reset line from software.
> I have several machines here like that. And one of them needs software
> to talk to the cards to put them back into a sane state before rebooting.
>
> "rebooting" in this particular case is "turn MMU off, jump to location 0"
And for x86 it is turn MMU off, jump to location 0xffff0.

> And I never said anything about needing to allocate memory to do this.
> I agree with you that suspending devices on reboot _is_ silly. However,
> that's not what I was proposing.
>

Documentation/driver-mode/devices.txt says:

> int (*remove) (struct device * dev);
>
> remove is called to dissociate a driver with a device. This may be
> called if a device is physically removed from the system, if the
> driver module is being unloaded, or during a reboot sequence.
>
> It is up to the driver to determine if the device is present or
> not. It should free any resources allocated specifically for the
> device; i.e. anything in the device's driver_data field.
>
> If the device is still present, it should quiesce the device and place
> it into a supported low-power state.

So there is a little bit that deals with a low power state.

At any rate until someone changes the kernel API again ->remove()
is the proper method to be calling in this case.

Eric

2002-10-14 00:02:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

"Adam J. Richter" <[email protected]> writes:

> Russell King wrote:
> >On Sun, Oct 13, 2002 at 04:10:01PM -0700, Adam J. Richter wrote:
> >> I have no objection to replacing or supplementing the reboot
> >> notifier chain with a method in struct device_driver, but let's not
> >> overload these methods with ambiguous semantics. I do not want to
> >> call thirty functions that primarily return memory to various memory
> >> allocators, mark a bunch of inodes as invalid, and otherwise arrange
> >> things so that the kernel can smoothly continue to run user level
> >> programs when, in fact, we just want to pull the reset line on the
> >> computer.
> >
> >And what about setups where you can't pull the reset line from software.
> >I have several machines here like that. And one of them needs software
> >to talk to the cards to put them back into a sane state before rebooting.
> >
> >"rebooting" in this particular case is "turn MMU off, jump to location 0"
>
> As I send in my response Eric Biederman,
>
> | If you have a platform where, for example, somehow PCI devices
> | are able to continue jabbering away after the computer has been reset,
> | then that could probably be done more consistently for most drivers by
> | having machine_restart on that platform walk the PCI bus and shut down
> | everything (drivers that need to do something really special would
> | still use the reboot notifier).
> |
> | I could even see calling device_shutdown from machine_restart
> | on that platform only, [...]
>
>
> >And I never said anything about needing to allocate memory to do this.
> >I agree with you that suspending devices on reboot _is_ silly. However,
> >that's not what I was proposing.
>
> Then you've started a new thread of discussion, because
> device_shutdown is defined in drivers/base/power.c as:
>
> void device_shutdown(void)
> {
> device_suspend(4, SUSPEND_POWER_DOWN);
> }
>
>
> Perhaps device_suspend ought to be renamed device_power_down.
>
> However, I'm not trying to quash what you want to discuss.
> I'd be interested in hearing about clarifications and perhaps
> extensions of the struct device_driver methods, which I think is what
> you're getting at, perhaps here or on linux-hotplug. It's just that,
> for this thread, I'm trying to focus on my patch that eliminates the
> software suspend on reboot (pros and cons, alternatives to it, etc.).

The 2.5.41 variant is below. The bug is reusing the old enumeration value
as was previously mentioned.

/**
* device_shutdown - queisce all the devices before reboot/shutdown
*
* Do depth first iteration over device tree, calling ->remove() for each
* device. This should ensure the devices are put into a sane state before
* we reboot the system.
*
* device_shutdown - call device_suspend with status set to shutdown, to
* cause all devices to remove themselves cleanly
*/
void device_shutdown(void)
{
struct list_head * node, * next;
struct device * prev = NULL;

printk(KERN_EMERG "Shutting down devices\n");

spin_lock(&device_lock);
list_for_each_safe(node,next,&global_device_list) {
struct device * dev = get_device_locked(to_dev(node));
if (dev) {
spin_unlock(&device_lock);
if (dev->driver && dev->driver->remove)
dev->driver->remove(dev);
if (prev)
put_device(prev);
prev = dev;
spin_lock(&device_lock);
}
}
spin_unlock(&device_lock);
if (prev)
put_device(prev);
}

Eric
k

2002-10-14 00:25:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Andries Brouwer <[email protected]> writes:

> On Sun, Oct 13, 2002 at 12:24:26PM -0700, Adam J. Richter wrote:
>
> > linux-2.5.42 had an annoying new behavior. When I would
> > try to do a warm reboot, it would spin down the hard drives, which
> > just made the reboot take longer and gave the impression that a
> > halt or poweroff was in progress.
>
> Yes. In my case worse than annoying:
> The drives spin down, but have not yet completed spindown when
> the machine is started again. LILO fails (prints a single 's'
> where I would have expected "uncompressing kernel" and dies).
> Pressing reset results in a strange garbled BIOS screen, and a hang.
> After a power cycle all is well again.
>
> So, my hardware is very unhappy with the new 2.5.42 behaviour.

>From ChangeLog-2.5.42

<[email protected]>
IDE: Add generic remove() method for drives; remove reboot notifier.

The remove() method is generic for all drives, and set in ide_driver_t::gen_driver.
The call simply forwards the call to ide_driver_t::standby().

This obviates the need for IDE reboot notifier. The core iterates over all present
devices in device_shutdown() and unregisters each one.

<[email protected]>
IDE: make ide_drive_remove() call driver's ->cleanup().

This was accidentally dropped before, but re-added now to completely mimic
behavior of the reboot notifier IDE used to have.

And if you look at the changes you will notice ->suspend used to be called
only on a halt, but now it is also called on a reboot.

Eric



2002-10-14 05:38:17

by Eric Blade

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Here is where I realize that I've forgotten to CC: the list on all the
traffic that I keep sending between Eric and Adam. D'oh.


On Sun, 2002-10-13 at 20:07, Eric W. Biederman wrote:
> >
> > However, I'm not trying to quash what you want to discuss.
> > I'd be interested in hearing about clarifications and perhaps
> > extensions of the struct device_driver methods, which I think is what
> > you're getting at, perhaps here or on linux-hotplug. It's just that,
> > for this thread, I'm trying to focus on my patch that eliminates the
> > software suspend on reboot (pros and cons, alternatives to it, etc.).
>
> The 2.5.41 variant is below. The bug is reusing the old enumeration value
> as was previously mentioned.
>

I tried to submit a fix to this, but the only response I've gotten back
is that it failed to apply. My original patch excluded the bit from
device.h that added a new state to the enumeration, and when it got into
the tree, it got into the tree using the current states that were
available. My bad.

Eric has already indicated earlier, that Adam's issue is, however, not
with the changes to drivers/base/power.c but to the changes to the IDE
driver.

- Eric



2002-10-14 15:19:40

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric W. Biederman wrote:
>Russell King <[email protected]> writes:

>> "rebooting" in this particular case is "turn MMU off, jump to location 0"
>And for x86 it is turn MMU off, jump to location 0xffff0.


The default reboot method of linux-2.5.42 on x86 is to have the
keyboard controller do a reset. If you want to reboot by jumping to ffff00,
you have to pass "reboot=b" unless you are running on one of three specific
models of computers: the Dell PowerEdge 300, 1300 or 2400
(linux-2.5.42/arch/i386/kernel/dmi-scan.c line 522). Here is the
relevant excerpt from machine_restart in linux-2.5.42/arch/i386/reboot.c:

if(!reboot_thru_bios) {
/* rebooting needs to touch the page at absolute addr 0 */
*((unsigned short *)__va(0x472)) = reboot_mode;
for (;;) {
int i;
for (i=0; i<100; i++) {
kb_wait();
udelay(50);
outb(0xfe,0x64); /* pulse reset low */
udelay(50);
}
/* That didn't work - force a triple fault.. */
__asm__ __volatile__("lidt %0": :"m" (no_idt));
__asm__ __volatile__("int3");
}
}

In another message, you wrote:
>Exactly an in spec, PC does not need to ground RST# on reboot.

By "as in spec", were you referring to my quotation from _PCI
System Architecture, 4th Ed._ by Shanley and Anderson or are saying
that there is a specification that supports your statement that "a PC
does not need to ground RST# on reboot." If the latter, I would
appreciate it if you would identify that specification and the
appropriate section with it for verification.

Elsewhere you wrote:
>Additionally it was decided quite a
>while ago that calling device->remove() was the correct way to
>accomplish this.

Do you remember where this "was decided?" If it was on an
archived mailing list, do you remember approximately when and what the
subject line might be? I'd like to look at this discussion both to
verify you statement and to avoid repetition.

In another message you wrote:
>machine_restart returns control to the BIOS.
>It works this way on both x86 and alpha. And the BIOS does
>not always toggle the RESET line on the machine. The frequent case
>of devices not working in Linux after rebooting from windows, and
>visa versa is evidence of this. On alpha the SRM doesn't even
>pretend to reset the machine.

Here you are talk first about doing a warm reboot from Windows
to Linux and the device does not work. That sounds like a quirky
device. Also, you have failed to show that this is a device where the
Linux driver's ->remove() function solves the problem and that there
are so many of these devices that (where the remove function *does*
solve the problem and there is no reboot notifier) that it would be
impractical change those drivers.

If the problem is that the warm reboot procedure failed to
"stop any ongoing DMA, etc." as you put it, then failure mode that you
would more typically expect would be memory corruption, manifesting
itself as random of other programs, not getting through the reboot
process half of the time, etc.

The underlying trade-off is that if I want to reboot fast, I
should not have to call a bunch of hotplug routines that have to do a
lot of book keeping because they have to assume that the computer may
continue to run user level programs indefinitely after they return
(things like deallocating memory, unregistering devfs inodes, etc.).
On the other hand, if I want to halt, I really do want the disks to be
spun down even if my motherboard does not do a master power-off from
software. Doing that requires that the power management request that
reboot makes differ from the one that halt makes. So they could
not both be just "device_shtudown();".

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-10-14 15:24:12

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric Blade <[email protected]> writes:

> Here is where I realize that I've forgotten to CC: the list on all the
> traffic that I keep sending between Eric and Adam. D'oh.
>
>
> On Sun, 2002-10-13 at 20:07, Eric W. Biederman wrote:
> > >
> > > However, I'm not trying to quash what you want to discuss.
> > > I'd be interested in hearing about clarifications and perhaps
> > > extensions of the struct device_driver methods, which I think is what
> > > you're getting at, perhaps here or on linux-hotplug. It's just that,
> > > for this thread, I'm trying to focus on my patch that eliminates the
> > > software suspend on reboot (pros and cons, alternatives to it, etc.).
> >
> > The 2.5.41 variant is below. The bug is reusing the old enumeration value
> > as was previously mentioned.
> >
>
> I tried to submit a fix to this, but the only response I've gotten back
> is that it failed to apply. My original patch excluded the bit from
> device.h that added a new state to the enumeration, and when it got into
> the tree, it got into the tree using the current states that were
> available. My bad.
>
> Eric has already indicated earlier, that Adam's issue is, however, not
> with the changes to drivers/base/power.c but to the changes to the IDE
> driver.

Correct. But when people try to use power management they will see
the bug. SUSPEND_POWER_DOWN (a request for the driver to remove power
from the device) is a very different state from SUSPEND_DETACH (a
request to disassociate the driver from the device). I am not fully
convinced the two cases should be merged.

SUSPEND_POWER_DOWN as defined should actually cause the behavior Adam
was seeing. Which made it doubly interesting.

Will you submit a patch detangling this mess? I believe Adams
only problem was that it did not obviously fix his problem.

Eric



2002-10-14 16:40:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

"Adam J. Richter" <[email protected]> writes:

> Eric W. Biederman wrote:
> >Russell King <[email protected]> writes:
>
> >> "rebooting" in this particular case is "turn MMU off, jump to location 0"
> >And for x86 it is turn MMU off, jump to location 0xffff0.
>
>
> The default reboot method of linux-2.5.42 on x86 is to have the
> keyboard controller do a reset.

Resetting the cpu != resetting the system. And the keyboard controller
only does a cpu level reset. Which is basically a convoluted way to jump
to: 0xfffffff0.

> If you want to reboot by jumping to ffff00,
> you have to pass "reboot=b" unless you are running on one of three specific
> models of computers: the Dell PowerEdge 300, 1300 or 2400
> (linux-2.5.42/arch/i386/kernel/dmi-scan.c line 522). Here is the
> relevant excerpt from machine_restart in linux-2.5.42/arch/i386/reboot.c:
>
> if(!reboot_thru_bios) {
> /* rebooting needs to touch the page at absolute addr 0 */
> *((unsigned short *)__va(0x472)) = reboot_mode;
> for (;;) {
> int i;
> for (i=0; i<100; i++) {
> kb_wait();
> udelay(50);
> outb(0xfe,0x64); /* pulse reset low */
> udelay(50);
> }
> /* That didn't work - force a triple fault.. */
> __asm__ __volatile__("lidt %0": :"m" (no_idt));
> __asm__ __volatile__("int3");
> }
> }

Please note the write to: 0x40:0x72. If this was a full machine reset
this would not make sense as memory would need to be reinitialized
making it unsafe to keep values in ram.

> In another message, you wrote:
> >Exactly an in spec, PC does not need to ground RST# on reboot.
>
> By "as in spec", were you referring to my quotation from _PCI
> System Architecture, 4th Ed._ by Shanley and Anderson or are saying
> that there is a specification that supports your statement that "a PC
> does not need to ground RST# on reboot." If the latter, I would
> appreciate it if you would identify that specification and the
> appropriate section with it for verification.

I'd love to but mostly I was referring to the de facto standard for
pc architecture. Which I don't know if anyone ever wrote down, all
in one place.

> Elsewhere you wrote:
> >Additionally it was decided quite a
> >while ago that calling device->remove() was the correct way to
> >accomplish this.
>
> Do you remember where this "was decided?" If it was on an
> archived mailing list, do you remember approximately when and what the
> subject line might be? I'd like to look at this discussion both to
> verify you statement and to avoid repetition.

No. I was not in on that conversation. However given the fact it
is all over the documentation. And has been in the kernel for quite
a while now. I tracked it as early as 2.5.30.

This is both device_shutdown in the reboot path,
and device_shutdown calling ->remove()

> In another message you wrote:
> >machine_restart returns control to the BIOS.
> >It works this way on both x86 and alpha. And the BIOS does
> >not always toggle the RESET line on the machine. The frequent case
> >of devices not working in Linux after rebooting from windows, and
> >visa versa is evidence of this. On alpha the SRM doesn't even
> >pretend to reset the machine.
>
> Here you are talk first about doing a warm reboot from Windows
> to Linux and the device does not work. That sounds like a quirky
> device.

No it sounds like a driver that either can or cannot handle devices
in an arbitrary state.

> Also, you have failed to show that this is a device where the
> Linux driver's ->remove() function solves the problem and that there
> are so many of these devices that (where the remove function *does*
> solve the problem and there is no reboot notifier) that it would be
> impractical change those drivers.

I do not feel I need to defend the status quo. It is you who are proposing
a change that needs to support why using ->remove() is a bad way to go.

I will just say that shutting devices in a depth first manner is a lot safer,
and more reliable than a random walk, you will get with a reboot notifier.

> If the problem is that the warm reboot procedure failed to
> "stop any ongoing DMA, etc." as you put it, then failure mode that you
> would more typically expect would be memory corruption, manifesting
> itself as random of other programs, not getting through the reboot
> process half of the time, etc.

You would only see it on the boot up side. Not placing devices in
a consistent state so another os can talk to them is significant as well.

> The underlying trade-off is that if I want to reboot fast, I
> should not have to call a bunch of hotplug routines that have to do a
> lot of book keeping because they have to assume that the computer may
> continue to run user level programs indefinitely after they return
> (things like deallocating memory, unregistering devfs inodes, etc.).

I would be very surprised if this showed up in practice. If Russell King
makes the argument about excess book keeping slowing things down. I
will buy it because he runs on some very slow machines. Personally I
doubt anything will be user noticeable.

> On the other hand, if I want to halt, I really do want the disks to be
> spun down even if my motherboard does not do a master power-off from
> software. Doing that requires that the power management request that
> reboot makes differ from the one that halt makes. So they could
> not both be just "device_shtudown();".

So you want to send device_suspend(?, SUSPEND_POWER_DOWN) on a halt.
Or do a software shutdown thing.

Whoever mapped the SUSPEND_SHUT_DOWN to SUSPEND_POWER_DOWN very much
had it wrong when that code was merged with 2.5.42.

Mostly it looks to me that you have a problem with IDE driver spinning
down disks on reboot. That change also is new in 2.5.41.

Eric

2002-10-14 17:40:52

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

On 14 Oct 2002, Eric W. Biederman wrote:

> "Adam J. Richter" <[email protected]> writes:
>
> > Eric W. Biederman wrote:
> > >Russell King <[email protected]> writes:
> >
> > >> "rebooting" in this particular case is "turn MMU off, jump to location 0"
> > >And for x86 it is turn MMU off, jump to location 0xffff0.
> >
> >
> > The default reboot method of linux-2.5.42 on x86 is to have the
> > keyboard controller do a reset.
>
> Resetting the cpu != resetting the system. And the keyboard controller
> only does a cpu level reset. Which is basically a convoluted way to jump
> to: 0xfffffff0.
>

If you really do get to 0xfffffff0, in real mode, you reset the CPU and
the BIOS will perform a complete reinitialization of everything unless
data at absolute address 0x0472 tells the BIOS not to. If the entry
is 0x1234, the BIOS will bypass the memory test, but this does not
guarantee that memory contents are saved. It just speeds up the boot.
If the entry is 0x4321, the BIOS will not initialize memory nor its
controller. RAM contents will be saved.

This is all documented in "System BIOS for IBM PC/XT/AT Computers
and Compatibles", Phoenix Technologies, Ltd. Addison-Wesley Publishing
Company, Inc. ISBN 0-201-51806-6. This is the "bible" of the BIOS
industry because Phoenix was the first to document what IBM BIOS
did, and then, using "clean-room" techniques, write a BIOS from that
documentation.
[SNIPPED...]

>
> Please note the write to: 0x40:0x72. If this was a full machine reset
> this would not make sense as memory would need to be reinitialized
> making it unsafe to keep values in ram.
>

This effectively hits the reset button. This will reset anything that
is connected to the reset line which includes the memory controller.
You don't want to do this if you intend to preserve RAM contents.

[SNIPPED...]


>
> I'd love to but mostly I was referring to the de facto standard for
> pc architecture. Which I don't know if anyone ever wrote down, all
> in one place.

See above. It's documented well enough so you can "roll your own"
from the specification alone.

[SNIPPED...]
>
> No it sounds like a driver that either can or cannot handle devices
> in an arbitrary state.
>

Some drivers don't work after a "warm-boot" because the I/O resources
shown in the PCI initialization have changed. It's only after they
get hit with a reset that they put their default resource type and
length back into their registers. I am told that some lap-tops have
semi-documented methods of sending a hardware reset to the PCI bridge(s)
only. This is needed to get them into a restartable state.

[SNIPPED...]
>
> I will just say that shutting devices in a depth first manner is a lot safer,
> and more reliable than a random walk, you will get with a reboot notifier.
>

Shutting down devices seldom (perhaps never) restores the reset-entries
of the PCI devices. If this was done, it might make warm-boot restarts
more reliable. Emphasis on the 'might'.


> > If the problem is that the warm reboot procedure failed to
> > "stop any ongoing DMA, etc." as you put it, then failure mode that you
> > would more typically expect would be memory corruption, manifesting
> > itself as random of other programs, not getting through the reboot
> > process half of the time, etc.
>

A processor reset will get the processor onto the bus even if there is
an ongoing DMA operation. Since the first of many instructions are
fetched from ROM, it is quite likely that any DMA activity would have
stopped before the ROM is shadowed by the BIOS. I don't see "ongoing"
DMA as being a problem, which you can verify by forcing a reset in
the FDC code (easiest to do) while waiting for read DMA to complete.
FDC DMA is slow, so you can catch it 100% of the time.


> You would only see it on the boot up side. Not placing devices in
> a consistent state so another os can talk to them is significant as well.
>

The BIOS is the only thing that can really put the devices into
a consistant state because, in the case of PCI initialization, if
the device requests no resources upon startup, it's not going to
get any. The only "known" way to get the devices to write their
true resource requests into their registers is a hardware reset.

[SNIPPED...]



Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-10-14 18:36:08

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric W. Biederman wrote:
>"Adam J. Richter" <[email protected]> writes:

>> Eric W. Biederman wrote:
>> >Russell King <[email protected]> writes:
>>
>> >> "rebooting" in this particular case is "turn MMU off, jump to location 0"
>> >And for x86 it is turn MMU off, jump to location 0xffff0.
>>
>> The default reboot method of linux-2.5.42 on x86 is to have the
>> keyboard controller do a reset.

>Resetting the cpu != resetting the system. And the keyboard controller
>only does a cpu level reset. Which is basically a convoluted way to jump
>to: 0xfffffff0.

I would be quite surprised if that reset was not wired
to eventually ground RST# on the PCI bus.


[...]
>> if(!reboot_thru_bios) {
>> /* rebooting needs to touch the page at absolute addr 0 */
>> *((unsigned short *)__va(0x472)) = reboot_mode;
>> for (;;) {
>> int i;
>> for (i=0; i<100; i++) {
>> kb_wait();
>> udelay(50);
>> outb(0xfe,0x64); /* pulse reset low */
>> udelay(50);
>> }
>> /* That didn't work - force a triple fault.. */
>> __asm__ __volatile__("lidt %0": :"m" (no_idt));
>> __asm__ __volatile__("int3");
>> }
>> }

>Please note the write to: 0x40:0x72. If this was a full machine reset
>this would not make sense as memory would need to be reinitialized
>making it unsafe to keep values in ram.

The issue is not how you define the term "full machine reset."
The issue is whether the PCI bus is eventually reset as a result of
this procedure before by BIOS Power On Self Test starts executing at
0xffff00. If it is, then most of the device cleanup is unnecessary.
Even if it does not, it would make a lot more sense to just have
machine_reset assert RST# after those few devices that really need
cleanup are done.

>> In another message, you wrote:
>> >Exactly an in spec, PC does not need to ground RST# on reboot.
>>
>> By "as in spec", were you referring to my quotation from _PCI
>> System Architecture, 4th Ed._ by Shanley and Anderson or are saying
>> that there is a specification that supports your statement that "a PC
>> does not need to ground RST# on reboot." If the latter, I would
>> appreciate it if you would identify that specification and the
>> appropriate section with it for verification.

>I'd love to but mostly I was referring to the de facto standard for
>pc architecture. Which I don't know if anyone ever wrote down, all
>in one place.

Please do not say "as in spec" if you are not actually
referrring to a specification document (electronic form is OK). It is
extremely misleading as to how reliable your claims are. Instead,
please indicate what you remember about where you learned this
information.


>> Elsewhere you wrote:
>> >Additionally it was decided quite a
>> >while ago that calling device->remove() was the correct way to
>> >accomplish this.
>>
>> Do you remember where this "was decided?" If it was on an
>> archived mailing list, do you remember approximately when and what the
>> subject line might be? I'd like to look at this discussion both to
>> verify you statement and to avoid repetition.

>No. I was not in on that conversation. However given the fact it
>is all over the documentation. And has been in the kernel for quite
>a while now. I tracked it as early as 2.5.30.

Making stuff up just to win an argument is unlikely to produce
the best Linux kernel. If you don't remember things, if you're not
sure of something, please just admit it instead of wasting an
iteration of email by faking authoratitive references.


>This is both device_shutdown in the reboot path,
>and device_shutdown calling ->remove()
>
>> In another message you wrote:
>> >machine_restart returns control to the BIOS.
>> >It works this way on both x86 and alpha. And the BIOS does
>> >not always toggle the RESET line on the machine. The frequent case
>> >of devices not working in Linux after rebooting from windows, and
>> >visa versa is evidence of this. On alpha the SRM doesn't even
>> >pretend to reset the machine.
>>
>> Here you are talk first about doing a warm reboot from Windows
>> to Linux and the device does not work. That sounds like a quirky
>> device.

>No it sounds like a driver that either can or cannot handle devices
>in an arbitrary state.

>> Also, you have failed to show that this is a device where the
>> Linux driver's ->remove() function solves the problem and that there
>> are so many of these devices that (where the remove function *does*
>> solve the problem and there is no reboot notifier) that it would be
>> impractical change those drivers.

>I do not feel I need to defend the status quo. It is you who are proposing
>a change that needs to support why using ->remove() is a bad way to go.

sys_reboot does not call device_shutdown in 2.4.19 and it was
added to 2.5 in 2.5.8, so calling it status quo to the point where you
think there is no need to defend it is pretty iffy. If most Linux
systems are running 2.4 these days, then it appears that most are
running without the ->remove() call just fine.

Also, it's not just a question of whether you should feel a
need to defend the status quo. The reality is that you made some
factual claims ("frequent case of devices not working in Linux after
rebooting from windows") and you seem to be unable to support,
espeically with regard to the question of how many of these devices
actually started to work correctly with ->remove() and which do not
work without it (i.e., do not work in 2.4.19 but do work in 2.5.8+).
I wouldn't have a problem with it if you would just admit that you
don't remember or that you may have accidentally made this up or that
you were repeating something you heard elsewhere but never checked,
and we could weight that information accordingly.

>I will just say that shutting devices in a depth first manner is a lot safer,
>and more reliable than a random walk, you will get with a reboot notifier.

I already said that I don't have a problem with having a
reboot notifier call in struct device for those devices that need some
sort of specific shutdown code that cannot be handled by resetting
their parent bus or something similar.

>> If the problem is that the warm reboot procedure failed to
>> "stop any ongoing DMA, etc." as you put it, then failure mode that you
>> would more typically expect would be memory corruption, manifesting
>> itself as random of other programs, not getting through the reboot
>> process half of the time, etc.

>You would only see it on the boot up side. Not placing devices in
>a consistent state so another os can talk to them is significant as well.

I quite don't understand what you're saying here. Are you
saying people usually cannot boot from linux-2.4.19 into Windows?

>> The underlying trade-off is that if I want to reboot fast, I
>> should not have to call a bunch of hotplug routines that have to do a
>> lot of book keeping because they have to assume that the computer may
>> continue to run user level programs indefinitely after they return
>> (things like deallocating memory, unregistering devfs inodes, etc.).

>I would be very surprised if this showed up in practice. If Russell King
>makes the argument about excess book keeping slowing things down. I
>will buy it because he runs on some very slow machines. Personally I
>doubt anything will be user noticeable.

When you want to optimize for resources like speed and space,
one is generally happy if an optimizations by a percent or two at a
time. These calls can do IO. It could add up. Also, for code
compiled into the kernel (or non-unloadable modules if anyone
implements them, I suppose), it means that otherwise could be compiled
out with __exit, so you have a bigger kernel foot print. Finally,
from a reliability standpoint, in the case of rebooting because of a
kernel bug, it means that the reboot process wants to execute code and
walk through data structures in many more drivers.


>> On the other hand, if I want to halt, I really do want the disks to be
>> spun down even if my motherboard does not do a master power-off from
>> software. Doing that requires that the power management request that
>> reboot makes differ from the one that halt makes. So they could
>> not both be just "device_shtudown();".

>So you want to send device_suspend(?, SUSPEND_POWER_DOWN) on a halt.
>Or do a software shutdown thing.

>Whoever mapped the SUSPEND_SHUT_DOWN to SUSPEND_POWER_DOWN very much
>had it wrong when that code was merged with 2.5.42.

>Mostly it looks to me that you have a problem with IDE driver spinning
>down disks on reboot. That change also is new in 2.5.41.

You do not understand. The fundamental problem is that
sys_reboot in linux-2.5.42/kernel/sys.c makes exactly the same power
management call for a reboot and for a halt (device_shutdown()). I
want my IDE, SCSI, USB, and FireWire hard disks to spin down if I do a
halt. I do not want them to do that when I reboot. I also do not
want my reboot to wait for an interrupt from a sound card because
remove() needs to determine if it is being called because the device
had just been unplugged. It is not just about IDE disk drives.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-10-14 19:23:57

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices


So in summary.
- Asserting the system level reset line is the most reliable way to
reboot.

Note: This is not always true, I have seen machines with hardware
bugs where asserting the reset line destabilizes things. And they do
not come back reliably.

- The kernel does not assert this line but politely requests the x86
BIOS to do so.

- There is at least some BIOS code that runs before the reset line is
toggled. And the reset line may not actually be toggled.

If we want reliable reboots what can the Linux kernel do.
1) Include chipset+motherboard specific logic that toggles the reset
line directly.
2) Walk the driver tree and ask the drivers politely to place their
devices in a reasonable state.

The dangers of not doing this are:
- Network cards can DMA random packets into memory. And if the
network is quiet this can take a while.

- Disks can have multiple outstanding disk requests and may
DMA them into memory as well.

- Drivers may not be able to initialize a device in a random state.
I doubt you can do much with a floppy or ide device that is
transfer data via pio and the transaction is left half finished.

- On some platforms if we reassign device resources the firmware
may fail to track where the new resources are. This is arguably
a firmware bug.

Additionally there is one extra nasty case. On some platforms the
firmware cannot reliably reboot in a reasonable amount of time. On
those platforms we need some kind of soft reboot that does the
Linux booting linux thing to get a reliable reboot in a reasonable
amount of time.

I am all for reboots being fast. But until I observe that walking
the device tree and placing devices in a quiescent state is a major
time waster I do not see that we should skip it. And given the recent
problems with the IDE driver I think it is good that we are running
this code on every reboot so we can see the drivers that take way to
much time placing their devices in a quiescent state.

With respect to the IDE code if we have to spin down the disk to get
a reliable sync we should do this whenever the sync command is issued.
Or we could have data loss from an unplanned power outage. When I ask
the device driver to let go of the device, turning it off sounds quite
excessive to me.

"Richard B. Johnson" <[email protected]> writes:

> Some drivers don't work after a "warm-boot" because the I/O resources
> shown in the PCI initialization have changed. It's only after they
> get hit with a reset that they put their default resource type and
> length back into their registers. I am told that some lap-tops have
> semi-documented methods of sending a hardware reset to the PCI bridge(s)
> only. This is needed to get them into a restartable state.

I have only seen this on Alpha, with the SRM. Or with drivers that
think they are dealing with ISA devices when they are not.

In a standard pci base address register, you can always derive
the resource type, and length. In fact the kernel does this when
it loads. You do not need a pci reset to be able to get the
pci base address, and resource size out of a pci bar.

Actually I can confirm that pci devices do not place anything special
in their bars after a pci reset. Since a random bar is safe they save
hardware circuits not setting it to anything during a reset. I have
actually had this byte me, in conjunction with detecting read-only BARs.

> A processor reset will get the processor onto the bus even if there is
> an ongoing DMA operation. Since the first of many instructions are
> fetched from ROM, it is quite likely that any DMA activity would have
> stopped before the ROM is shadowed by the BIOS. I don't see "ongoing"
> DMA as being a problem, which you can verify by forcing a reset in
> the FDC code (easiest to do) while waiting for read DMA to complete.
> FDC DMA is slow, so you can catch it 100% of the time.

I have seen it as a problem in real life. With a NIC several seconds
after a reboot. The network was quiet and than an arp request came
in. So baring a reset of the device in question this can happen.


So in summary:
I am firmly convinced that:
- device_shutdown should walk the device tree asking drivers to place
their devices in a quiescent state. ->remove() is likely the right
method for this, and it what the device-mode people have selected.
I personally don't care so long as there is an appropriate method.

- device_shutdown should be fast.

- The kernel has more general hardware knowledge than the BIOS so
it is more likely to get the case of a soft reboot (no reset line
asserted) correct.

- The x86 BIOS does not always toggle the reset line on the
motherboard, and device_shutdown should have a negligible cost.
So we should always calling device_shutdown, to so it does not
regress from being under used.

Eric

2002-10-14 20:01:27

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices


For why 2.5.xx does as it does talk to the people who
introduced are working on the driver model.

Docs are in Documentation/driver-model.
I am satisfied that there is some uniform policy in place
that device drivers can uniformly tap into.

As for just using the reboot notifier. There are other times
the same code should be called, like in the rmmod path so I don't
think it should have a special case just for reboot.

For myself I am happy that someone introduced
device_shutdown and it has reasonable semantics.

As for halting the system, we currently have two cases:


case LINUX_REBOOT_CMD_HALT:
notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
system_running = 0;
device_shutdown();
printk(KERN_EMERG "System halted.\n");
machine_halt();
do_exit(0);
break;

case LINUX_REBOOT_CMD_POWER_OFF:
notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
system_running = 0;
device_shutdown();
printk(KERN_EMERG "Power down.\n");
machine_power_off();
do_exit(0);
break;

And quite possibly we should have a third:
case LINUX_REBOOT_CMD_SOFTWARE_POWER_OFF:
notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
system_running = 0;
suspend_device(SUSPEND_POWER_OFF);
device_shutdown();
printk(KERN_EMERG "Power down.\n");
machine_halt();
do_exit(0);
break;

Yes there is a bug in 2.5.42 that maps SUSPEND_POWER_OFF to
SUSPEND_SHUT_DOWN, but with that fixed you get a case that does what
you want trivially.

More capable halt functions have been in the kernel. Please
let's enhance the halt path instead of denuding the reboot path.

> You do not understand. The fundamental problem is that
> sys_reboot in linux-2.5.42/kernel/sys.c makes exactly the same power
> management call for a reboot and for a halt (device_shutdown()). I
> want my IDE, SCSI, USB, and FireWire hard disks to spin down if I do a
> halt. I do not want them to do that when I reboot. I also do not
> want my reboot to wait for an interrupt from a sound card because
> remove() needs to determine if it is being called because the device
> had just been unplugged. It is not just about IDE disk drives.

If detecting hot-plug removal is non-trivial that makes the
work remove does not suitable for a reboot. However all of the hot-plug
layers I am aware of give an interrupt when a device is removed so
that should be trivially fixable by simply passing that information.
Though I would expect a simple pci_read_config_byte to be just as good.

Do you seriously have a hot-plug sound card, that waits for an
interrupt to be certain the card is plugged in during remove?

If there is non-trivial work to detect if a card is present it
probably makes sense to factor remove into
->quiet() and ->remove()
Where quiet would put the device into a quiescent state, and
remove would simply clean up the driver state.

But for the most part my impression is that we need to get devices
drivers behaving properly in the 2.5.x And that the basic model
is o.k. It just needs some pounding on the rough spots.

Eric

2002-10-14 20:09:36

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

On 14 Oct 2002, Eric W. Biederman wrote:

>
> So in summary.
> - Asserting the system level reset line is the most reliable way to
> reboot.
[SNIPPED...to save space]

I think you can get all the PCI Bus-Masters shut off simply
by writing PCI device 0 (The bridge itself) command register
to 0.

This is what I do in my BIOS (Intel dest<--source):

;-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
;
; The PCI bus may be on from a previous boot or from a failure to
; reset. This causes a lot of problems that took me weeks to find.
; Make sure the damn thing is dead, dead, dead before we continue.
; It is not sufficient to just clear SYSARBMEMB as the documentation
; shows. PCI devices will "jabber", corrupting data if the host
; bridge is not turned off.
;
MOV DX,PCICFGADR ; PCI configuration address
MOV EAX,80000004H ; Enable, index 1 (command)
OUT DX,EAX ; Set index register
MOV DX,PCICFGDATA ; Data address
IN EAX,DX ; Get status/command
AND EAX,0FFFF0000H ; Really turn it OFF
OUT DX,EAX ; This should do it.
;
MOV WORD PTR DS:[SYSARBMEMB],00H ; Now disable PCI access
MOV DX,PCICFGADR ; PCI configuration address
MOV EAX,80000000H ; Enable, index 0
OUT DX,EAX ; Set index register
MOV DX,PCICFGDATA ; Data address
IN EAX,DX ; Get, throw away



Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-10-15 02:47:58

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric W. Biederman writes:
>> You do not understand. The fundamental problem is that
>> sys_reboot in linux-2.5.42/kernel/sys.c makes exactly the same power
>> management call for a reboot and for a halt (device_shutdown()). I
>> want my IDE, SCSI, USB, and FireWire hard disks to spin down if I do a
>> halt. I do not want them to do that when I reboot. I also do not
>> want my reboot to wait for an interrupt from a sound card because
>> remove() needs to determine if it is being called because the device
>> had just been unplugged. It is not just about IDE disk drives.
[...]
>Do you seriously have a hot-plug sound card, that waits for an
>interrupt to be certain the card is plugged in during remove?

Although I was being hypothetical, come to think of it, I do
some USB speakers that I have not been using, and detecting their
removal is done by the host contoller polling the hub for an
"interrupt", typically once per millisecond.

>If there is non-trivial work to detect if a card is present it
>probably makes sense to factor remove into
>->quiet() and ->remove()
>Where quiet would put the device into a quiescent state, and
>remove would simply clean up the driver state.

Splitting into ->quiet() and ->removed() would be helpful
in any case, where removed() would normally not touch the hardware,
since it is quite possible the device has already been removed,
since the callers of these routines generally know if they are
calling because the device has been removed or because they want
just want to turn it off, while ->remove() currently has to guess,
which not only wastes time but also can be difficult to do safely
when you don't know if the device that you're talking to is even
present anymore.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America

2002-10-15 04:33:56

by Eric Blade

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

On Mon, 2002-10-14 at 11:28, Eric W. Biederman wrote:
>
> SUSPEND_POWER_DOWN as defined should actually cause the behavior Adam
> was seeing. Which made it doubly interesting.
>
> Will you submit a patch detangling this mess? I believe Adams
> only problem was that it did not obviously fix his problem.
>

Submitted earlier last night.

subj: Patch: linux-2.5.42/drivers/base/power.c - add state for
SHUT_DOWN

Cheers,
- Eric




2002-10-15 04:55:27

by Eric Blade

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

On Mon, 2002-10-14 at 16:05, Eric W. Biederman wrote:
> For myself I am happy that someone introduced
> device_shutdown and it has reasonable semantics.
>
> As for halting the system, we currently have two cases:
>
>
> case LINUX_REBOOT_CMD_HALT:
> notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
> system_running = 0;
> device_shutdown();
> printk(KERN_EMERG "System halted.\n");
> machine_halt();
> do_exit(0);
> break;
>
> case LINUX_REBOOT_CMD_POWER_OFF:
> notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
> system_running = 0;
> device_shutdown();
> printk(KERN_EMERG "Power down.\n");
> machine_power_off();
> do_exit(0);
> break;
>

i see four occurences in my 2.5.42: LINUX_REBOOT_CMD_RESTART,
LINUX_REBOOT_CMD_HALT, LINUX_REBOOT_CMD_POWER_OFF, and
LINUX_REBOOT_CMD_RESTART2

>
> But for the most part my impression is that we need to get devices
> drivers behaving properly in the 2.5.x And that the basic model
> is o.k. It just needs some pounding on the rough spots.

I agree on this point :)




2002-10-15 16:26:40

by Patrick Mochel

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices


> device_shutdown and device_suspend are for power management.
> It is important to turn the device off as soon as possible if a power
> management routine has told you that the device is not going to be
> used any more.

device_suspend() is for power management. device_shutdown() is for
quiescing devices before a system reboot or power off.

It's true that the same function is called when the device is physically
removed from the system as when the system is shutting down, and that
might be kinda bad. If it gets to the point where it's really difficult to
deal with for drivers, we can create another callback: ->shutdown() for
struct device_driver.

> If you've identified a bunch of devices that need
> reboot_notifier, please list them so we can fix them rather than
> taxing the users with unnecessarily slow reboots or poor battery life
> (due to device not being shut down when they are no longer being used).

No, reboot notifiers are the completely wrong way to go, for one reason
alone: ordering. device_shutdown() does a depth-first walk of the tree to
shut down children devices before ancestors. You cannot guarantee that
with reboot notifiers.

Please don't try and convolute the code because you're worried about a few
microseconds. It's about correctness first; then we can worry about
micro-optimizing the hell out of it.

-pat

2002-10-15 16:56:25

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

"Adam J. Richter" <[email protected]> writes:

> Eric W. Biederman writes:

> >If there is non-trivial work to detect if a card is present it
> >probably makes sense to factor remove into
> >->quiet() and ->remove()
> >Where quiet would put the device into a quiescent state, and
> >remove would simply clean up the driver state.
>
> Splitting into ->quiet() and ->removed() would be helpful
> in any case, where removed() would normally not touch the hardware,
> since it is quite possible the device has already been removed,
> since the callers of these routines generally know if they are
> calling because the device has been removed or because they want
> just want to turn it off, while ->remove() currently has to guess,
> which not only wastes time but also can be difficult to do safely
> when you don't know if the device that you're talking to is even
> present anymore.

Except for the case when a device is physically swapped before ->remove()
which is really, really, nasty. But it is quiet unlikely anyone will
actually be that fast. Whatever talks to the hardware has to check to
see if it's device is present. But if usb is anything like PCI it
should be a very inexpensive check to see if a driver is present. And
writes to a non-existent device should be safe.

Splitting ->remove() into quiet() and ->removed() will be racy. So
unless there is shown to be a compelling reason to do the split I
don't want to go there. It appears easier to write correct code
with a single ->remove() method than two separate methods in the
hot-plug case.

Eric

2002-10-15 18:49:46

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Patrick Mochel writes:
>> If you've identified a bunch of devices that need
>> reboot_notifier, please list them so we can fix them rather than
>> taxing the users with unnecessarily slow reboots or poor battery life
>> (due to device not being shut down when they are no longer being used).

>No, reboot notifiers are the completely wrong way to go, for one reason
>alone: ordering. device_shutdown() does a depth-first walk of the tree to
>shut down children devices before ancestors. You cannot guarantee that
>with reboot notifiers.

The question is not whether reboot notifiers should also be
part of struct device for sequencing purposes. Nobody is arguing
against that.

The question is whether it is impractical to have a distinct
device->reboot_notifier() for those devices that truely need it so
that we can avoid calling ->remove() for most devices. To answer that,
I think you need list a large number of devices that match ALL THREE
of the following criteria:

| 1. They need to use reboot_notifier because the
| machine_restart() is not enough to reset them,
|
| 2. They do not currently use reboot_notifier,
|
| 3. They have a device->remove() that does what the
| reboot_notifier should do (shuts down ongoing DMA,
| because they have a some device that ignores a reset
| signal or something like that).

>Please don't try and convolute the code because you're worried about a few
>microseconds. It's about correctness first; then we can worry about
>micro-optimizing the hell out of it.

1. When something has gone wrong with a device driver, you generally
gather what information you can and then try a warm reboot,
especially when one is working remotely. Data structures in
device drivers can often corrupted under these circumstances.
So it is important that warm reboots reboot the system as
simply as they can with reliability. That means only executing
the special shutdown code that really needs to be executed.

2. For small platforms, I can about keeping the __devexit code out
of the kernel footprint. I don't want Linux to lose the
competition small embedded devices (grated such decisions do not
pivot on __devexit alone, but a cultrue of wastefulness leads
creates bloat in steps such like this one).

3. I do care that reboot goes fast. Optimizations for speed and
space generally come about by making lots of little clean ups.
The standard power on self-test takes ages on most PCs, but
it doesn't have to be that way, and it isn't necessarily that
way on other platforms. I'd linux not to be an impedement
to having systems where you don't even see the screen go black
when you reboot.

Also, while your post is a very mild example, circular
arguments that implicitly attempt to define a term like "correctness"
waste everyone's time at best. At worst, such arguments lead to
software that unnecessarily biggers, less functional, slower to run,
or slower to be developed, harder to maintain or inferior by some
other real metric because someone made a trade-off against a real
benefit in favor of something that sounded beneficial but was actally
just a a circular definition.

Instead, please show underlying benefits. Show me how calling
->remove() for every device to do a warm reboot will make the kernel
smaller or will make reboots go faster. If your change improves
reliability, then you should be able show real examples where reboots
work in 2.5.8+ and fail in 2.4, so that we might try to quantify the
probable trade-off between that scenario (where we still have reboot
notifiers available) and potentially running into problems by calling
clean-up code in each device driver. In comparison, I frequently get
into trouble with the IDE drivers by inserting and removing
CompactFlash cards (haven't tried in 2.4.42), so I really do not want
any more clean-up code called that is necessarily if the kernel is
confused. If you cannot show some benefit, it looks like calling all
those remove() routines will be a lose on speed, size and even
reliability.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-10-15 19:46:30

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric W. Biederman writes:

>> >If there is non-trivial work to detect if a card is present it
>> >probably makes sense to factor remove into
>> >->quiet() and ->remove()
>> >Where quiet would put the device into a quiescent state, and
>> >remove would simply clean up the driver state.
>>
>> Splitting into ->quiet() and ->removed() would be helpful
>> in any case, where removed() would normally not touch the hardware,
>> since it is quite possible the device has already been removed,
>> since the callers of these routines generally know if they are
>> calling because the device has been removed or because they want
>> just want to turn it off, while ->remove() currently has to guess,
>> which not only wastes time but also can be difficult to do safely
>> when you don't know if the device that you're talking to is even
>> present anymore.

>Except for the case when a device is physically swapped before ->remove()
>which is really, really, nasty. But it is quiet unlikely anyone will
>actually be that fast. Whatever talks to the hardware has to check to
>see if it's device is present. But if usb is anything like PCI it
>should be a very inexpensive check to see if a driver is present. And
>writes to a non-existent device should be safe.

I am not aware of any hotplug bus where removing a device and
inserting a new one leads to the new device being in a state where the
driver for the old device could accidentally talk to it without the
kernel actively doing something in the meantime like turning the
affected socket back on.

For example, for USB, the hub port will the new device will be
plugged in (and the computer will be notified of this by the hub) but
the hub port will not remain connected. The computer has to turn the
hub port back on. The whole USB initialization process is documented
extremely well in the USB 2.0 standards which you can download for
free at http://www.usb.org, and it is also documented in the MindShare
books (_Universal Serail Bus System Architecture_ by Don Anderson and
Dave Dzatko, both 1st and 2nd Edition), although it is just as clear in
the standards document.

For FireWire, both the removal and the insertion cause cause
all addresses of all devices to be reassigned, and the new device will
not have an address before that is complete, at least if I am
correclty reading _FireWire Sytem Architecure, 2nd Edition_ by Don
Anderson, as I haven't read the standard.

I know that PCMCIA can power down a socket. I assume that the
interrupt that detects card removal does this or otherwise disconnects
the socket when a card is removed and that the kernel does not turn
the socket back on until the remove routine for the card has
completed. [From the manaul page for cardctl and looking at the
CardServices interface described in _PCMCIA System Architecture_,
2nd Edition by Don Anderson.]

For CardBus, I assume that PCMCIA's protections apply, and I
suspect the PCI base address registers are defined to be clear at
insertion, so the device-specific IO ports and memory regions will not
be mapped. In addition, the kernel does book keeping on which IO
ports and memory regions are currently allocated with
request_region(), so I assume it would assign the newly plugged in
card's IO and memory elsewhere if even if it were, so it may even be
safe for the kernel to map the new card and start its initialization
before the clean up for the old card has completed.

For hotplug PCI, CompactPCI and PCI Mezzanine Cards, it seems
the user actually has to ask permission to insert and remove cards,
but, even so, a newly inserted card is initially electrically isolated
from the PCI bus, and the computer uses special hardware that allows
it to provide power and assert the PCI RST# signal for that card only
before deasserting RST# and then connecting it to the rest of the bus.
This clears the newly inserted card's PCI base address
registers. [_PCI System Architecture_, 4th edition by Tom Shanley and
Don Anderson, Chapter 22: Hot-Plug PCI, pages 464-465, 753-754. Alas,
the pcisig.org standards are propritary.] What I said about the
request_region() bookkeeping for CardBus applies here too.

Besides, you have not identified a safe way that a combined
->remove() function can detect such situations more reliably than
separate ->quiet() and ->removed(), which at least have the benefit of
knowing what the kernel currently thinks the situation is. So, you
really have no basis for saying "Splitting ->remove() into quiet() and
->removed() will be racy."

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-10-15 19:58:33

by Mikael Pettersson

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Patrick Mochel writes:
> Please don't try and convolute the code because you're worried about a few
> microseconds. It's about correctness first; then we can worry about
> micro-optimizing the hell out of it.

5 seconds is quite a bit more than "a few microseconds". That's
approximately how much longer it takes for my P4 to reboot, due
to 2.5.42's "oh lets spin down the disks on reboot" change.

It's even slower to reboot than to do a cold boot because on a cold
boot the disks start spinning up directly, but on a warm boot after
2.5.42 the disks don't start spinning up until the BIOS starts to
identify them and look for a bootable device.

2.5.42 is a PITA.

2002-10-16 12:10:05

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

"Adam J. Richter" <[email protected]> writes:

[snip details on a lot of hot plug code]

No major comment except that I know for compact pci you can just
walk up and unplug it. As we have several compact pci boards here
at work.

But thank you for the clarification that a only on badly designed
hardware a newly plugged in device will

> Besides, you have not identified a safe way that a combined
> ->remove() function can detect such situations more reliably than
> separate ->quiet() and ->removed(), which at least have the benefit of
> knowing what the kernel currently thinks the situation is. So, you
> really have no basis for saying "Splitting ->remove() into quiet() and
> ->removed() will be racy."

O.k. Let me attempt a clarification, of what I was thinking.
Currently pseudo code for remove does:

remove() {
if (device_present()) {
device_be_quiet()
}
device_free(device_strucutres);
}

The way I imagined the split up:
if (device->present()) {
device->be_quiet();
}
device->free_resources();

Doing device_be_quiet() without the device_present() check is racy,
because devices can be physically removed at arbitrary times. It
is unreasonable for the generic code to make the strong assumption
about being able to tell a device driver if the device is still
present. So any splitting of the device_present() check and the
device_be_quiet() code would have to be done, very carefully, and
would require bus specific code. And it would be complexity without a
real payoff.

So if things are split it can at most be:
device->be_quiet(); /* With the possibility the device has been removed */
device->free_resources();

I cannot imagine freeing data structures will noticeably slow a reboot
down, so unless we actually need to tell a device to be quiet for
other reasons besides driver removal, and machine reboot I do not see
a point in adding complexity by changing the interface. There may be
a valid argument in the suspend to swap case, but I will cross that
bridge when i come to it.

For my thinking on how this should be handled:

Except in some very select special instances, I do not know
of a single device where it is safe to assume the hardware is in a
sane state at any random given moment when we decide to reboot. And
in no common case I know of does linux immediately trigger a machine
level reset which would render this issue moot. Therefore asking all
of the device drivers to be quiet on a reboot sounds very reasonable.

Further the only way I know to make solid code is to put all of your
eggs in one basket, and just make certain it is a good basket.
Meaning the more often a function is run with the same requirements
the more likely it is to be correct. So running the device ->remove()
method on reboot and module remove will greatly enhance the chance the
method is both fast and correct, because it is run often enough that
people will complain if it is not.

As for the semantic change in ->remove() from 2.4 it feels to me like
more of a clarification than a real change, in particular ->remove() is
the opposite of ->probe(), and ->probe() does both allocation and
device state initialization, and any code that works correctly with
the 2.5 clarification should also work in 2.4. Having to check to
see if the device is already present has to be done for correctness,
and that was the only thing that felt to me like the ->remove()
routine was being really overloaded.

For 2.5 calling remove on reboot may not fix any issues, but it is
certainly a correct thing to do. And even if it does not fix issues
today, it allows bugs to be fixed, as the come up. And it allows
the code to be tested by just doing a modular build and inserting
and removing the module.

Eric

2002-10-16 20:09:16

by Pavel Machek

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Hi!
> >Resetting the cpu != resetting the system. And the keyboard controller
> >only does a cpu level reset. Which is basically a convoluted way to jump
> >to: 0xfffffff0.
>
> I would be quite surprised if that reset was not wired
> to eventually ground RST# on the PCI bus.

Surprise for you, then. That reset was used to return to
real mode by win31 & similar. Resetting PCI would
screw that.
PavelEnd_of_mail_magic_4669

2002-10-17 01:45:11

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric Beiderman wrote:
>[snip details on a lot of hot plug code]
>
>No major comment except that I know for compact pci you can just
>walk up and unplug it. As we have several compact pci boards here
>at work.
>
>But thank you for the clarification that a only on badly designed
>hardware a newly plugged in device will

From the previously referenced MindShare book, it looks like
CompactPCI has pins of different lengths to ensure that the interrupt
indicating card removal will never later than the other contacts being
broken, to avoid races in code like this:

control = inb(dev->base + CONTROL_PORT);
data = inb(dev->base + CONTROL_PORT);
if (dev->was_removed)
return -ENODEV;
...

static void remove_interrupt(struct device *dev)
{
dev->was_removed = 1;
unref_hardware(dev);
}
static inline void unref_hardware(struct device *dev)
{
/* hw_ref_count is:
- set 1 when driver is attached
- incremented when a device or network interface is opened
- decremented when a device or network interface is closed
- decremented when hardware is removed or driver is detached
*/
if (atomic_dec(&dev->hw_ref_count))
(*dev->driver->removed)(dev);
}


>> Besides, you have not identified a safe way that a combined
>> ->remove() function can detect such situations more reliably than
>> separate ->quiet() and ->removed(), which at least have the benefit of
>> knowing what the kernel currently thinks the situation is. So, you
>> really have no basis for saying "Splitting ->remove() into quiet() and
>> ->removed() will be racy."

>O.k. Let me attempt a clarification, of what I was thinking.
>Currently pseudo code for remove does:

>remove() {
> if (device_present()) {
XXXXXXXXXXXX
> device_be_quiet()
> }
> device_free(device_strucutres);
>}

The device could still be removed where I put the "XXXXXXXXXX"
or in the midst of device_be_quiet(). You appear to be trying to
narrow the window of vulnerability because you apparently don't know
how to eliminate it. That's fine if you cannot eliminate it, but, for
almost all if not all busses designed for hot plugging I believe you
can eliminate it entirely.

You should not just be bracketing code that attempts
transactions with a device inside of a test of the device's presence.
Instead, you should look understand the behavior that occurs when
device is gone and program for that. For example, if removal of a
device that is mapped to IO and memory space (PCMCIA, PCI, etc.) an
initial notification interrupt followed by writes to the previously
mapped areas being ignored and reads returning garbage, then you can
write code like:

static int foodev_send(struct foodev *dev, dma_addr_t ptr, int len)
{
outl(dev->dma_addr, ptr);
outl(dev->send_len, len);
outw(dev->control_port, GET_READY_TO_SEND);
/* Notice we did not check if the device is present until now. */
return (dev->device.is_removed) ? -ENODEV : 0;
}

In general, as long as you know that the addresses that your
driver is using will not be reassigned until your driver releases
them, you can do operations and just be prepared to handle the defined
error that occurs when you send do a transaction to a non-existant
device. The driver typically only has to check for the device's
presence for things like taking some externally visible action or is
in some kind of wait loop awaiting an event that won't occur if the
device has been removed. This is true device not mapped to the memory
and IO busses as well, such as USB, FireWire or SCSI (SCSI devices
that support hot plugging have assignable addresses).


>The way I imagined the split up:
>if (device->present()) {
> device->be_quiet();
>}
>device->free_resources();


device->present() is usually not specific to a device, but
rather to the parent bus, and that usually involves just checking the
internal information that it already has as to whether it has received
some kind of interrupt. More importantly, the generators of requests
to power down a device and notification that the device is removed are
usually separate and they usually one want one of those functions.
There is really very little reason to group them into the same
function. Typically, the callers will look like this:


void
foobus_interrupt(void *arg;)
{
struct foobus *foobus = arg;
event = inb(foobus_intr_register) & EVENT_MASK;
arg = inb(foobus_slow_register);
switch(event) {
case FOOBUS_REMOVE:
slot = arg & SLOT_MASK;
unref_hardware(foobus->devs[slot]);
break;
...
}
outb(foobus_intr_acknlowledge, 1);
}

foobus_powerdown(usb_device *dev)
{
(*dev->device->driver->quiet)(dev);
}

>Doing device_be_quiet() without the device_present() check is racy,
>because devices can be physically removed at arbitrary times.

Your code definitely is racy. I can see how to write these
things without races, which is something that I don't think you really
understand.

Engineers think about these things when they design busses for
hot plugging. I notice that programmers often have trouble really
grasping that other people might be much better than them in a
particular skill area. Here I think you are really assuming a too low
of a competence level among those who design these busses. I'm not
asking you to assume no mistakes have been made in the support of hot
plugging in these busses, but it's clear that you're assuming that
obvious mistakes have been made, and that assumption doesn't check
out. If you think there is a race in a particular bus, show me a
clear example.


[...]
>I cannot imagine freeing data structures will noticeably slow a reboot
>down, so unless we actually need to tell a device to be quiet for
>other reasons besides driver removal, and machine reboot I do not see
>a point in adding complexity by changing the interface.

Changing the interface will reduce complexity as only the code
that needs to be executed will be called. Since your current
->remove() function can potentially do blocking IO to turn off a
device, it can potentially take a long time. Also, you apparently
did not read this part of my previous message to Eric Blade in this thread:

| 1. When something has gone wrong with a device driver, you generally
| gather what information you can and then try a warm reboot,
| especially when one is working remotely. Data structures in
| device drivers can often corrupted under these circumstances.
| So it is important that warm reboots reboot the system as
| simply as they can with reliability. That means only executing
| the special shutdown code that really needs to be executed.
|
| 2. For small platforms, I can about keeping the __devexit code out
| of the kernel footprint. I don't want Linux to lose the
| competition small embedded devices (grated such decisions do not
| pivot on __devexit alone, but a cultrue of wastefulness leads
| creates bloat in steps such like this one).
|
| 3. I do care that reboot goes fast. Optimizations for speed and
| space generally come about by making lots of little clean ups.
| The standard power on self-test takes ages on most PCs, but
| it doesn't have to be that way, and it isn't necessarily that
| way on other platforms. I'd linux not to be an impedement
| to having systems where you don't even see the screen go black
| when you reboot.


>There may be
>a valid argument in the suspend to swap case, but I will cross that
>bridge when i come to it.

>For my thinking on how this should be handled:

>Except in some very select special instances, I do not know
>of a single device where it is safe to assume the hardware is in a
>sane state at any random given moment when we decide to reboot.

I don't know what you mean by "sane state." If you mean that
it's in a state that the reboot code can handle, I think almost all
hardware typically is. If that were not the case, warm reboots from
2.4.x would not work.

[...]
>For 2.5 calling remove on reboot may not fix any issues, but it is
>certainly a correct thing to do.

Here you go again, trying to make a circular argument by
arguing a definition of "correct" instead of show concrete advantages.
You're just wasting your time and any other reader's when you do that.

>And even if it does not fix issues
>today, it allows bugs to be fixed, as the come up. And it allows
>the code to be tested by just doing a modular build and inserting
>and removing the module.

You are confused. I never said that the module removal code
should not quiet the hardware that it is talking to.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."
`h

2002-10-17 09:04:34

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

"Adam J. Richter" <[email protected]> writes:

After having looked at it, device_shutdown is correct enough and
close enough to what I need, that I no longer care about the exact details.
Just so long as sys_reboot calls always calls it, and kexec can call it.

If you can talk someone into improving it's implementation that is fine
with me. As I find drivers that are causing problems problems because they
won't shut up I will point the driver authors in the direction of whatever
the current kernel api is.

Eric

2002-10-19 18:26:19

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Patrick Mochel <[email protected]> writes:

> > device_shutdown and device_suspend are for power management.
> > It is important to turn the device off as soon as possible if a power
> > management routine has told you that the device is not going to be
> > used any more.
>
> device_suspend() is for power management. device_shutdown() is for
> quiescing devices before a system reboot or power off.
>
> It's true that the same function is called when the device is physically
> removed from the system as when the system is shutting down, and that
> might be kinda bad. If it gets to the point where it's really difficult to
> deal with for drivers, we can create another callback: ->shutdown() for
> struct device_driver.

Obviously you have decided this was worth it.

My feedback.
* dev->shutdown is not called on module removal.
* dev->shutdown does not update dev->state to anything like
DEVICE_SHUTDOWN, that could be used as a sanity check, to not
use a device after it has been shut down.

* The semantics of dev->shutdown() and are not clear, in their
interactions with other parts of the system. Can the code from
dev->remove() that shuts down a device be removed?

I care because getting devices to properly shutdown on when
sys_kexec is what I need to do to get sys_kexec stable.

Would it be worth it for me to split out my fixes to smp shutdown,
and i8259 shutdown and to send them along to you?

The smp shutdown case on x86 probably cannot be handled by the device
model because it must be the very last code to run. When you stop
interrupts from coming in several drivers get cranky and do not
shutdown cleanly.

Hopefully I am not coming off harsh but I am a little cranky this
morning, As before this change I thought things on the device side
were pretty much under control they just needed a little stabilization
and testing. And now someone possibly me has to go through every
driver and write a shutdown method because someone figured calling
free was expensive.

That was the point of calling dev->shutdown instead of dev->remove()
wasn't it? If there was a way to reawaken a device after calling
dev -> shutdown I might see it. But to do that you still need to call
dev -> remove() followed by dev ->probe()

Eric

2002-10-20 06:55:29

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric W. Biederman wrote:
>Hopefully I am not coming off harsh but I am a little cranky this
>morning, As before this change I thought things on the device side
>were pretty much under control they just needed a little stabilization
>and testing. And now someone possibly me has to go through every
>driver and write a shutdown method because someone figured calling
>free was expensive.

I think I've essentially refuted this in parts of previous
messages on this thread that Eric has basically ignored in his
responses.

Interested readers should check the previous lkml messages
with this subject line. Of particular relevance to this issue, see my
list of three advantages of the 2.4.x approach of not calling
unnecessary device-specific shutdown sequences in my message at
http://marc.theaimsgroup.com/?l=linux-kernel&m=103481960911183&w=2,
where I pasted in a response on the same issue that I originally
wroted in this thread to Eric Blade (after the paragraph that begins
"Changing the interface will reduce coplexity as only the code that
needs to be executed will be called."). Also of interest is Richard
B. Johnson's example of BIOS reset code that shuts off the PCI bus
before touching any RAM at
http://marc.theaimsgroup.com/?l=linux-kernel&m=103462697923792&w=2.

I'm willing to get into this in detail again upon request.
Otherwise, I think it would be a better use of everyone's time not to
subject linux-kernel readers to an infinite loop.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-10-20 09:13:29

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

"Adam J. Richter" <[email protected]> writes:

> Eric W. Biederman wrote:
> >Hopefully I am not coming off harsh but I am a little cranky this
> >morning, As before this change I thought things on the device side
> >were pretty much under control they just needed a little stabilization
> >and testing. And now someone possibly me has to go through every
> >driver and write a shutdown method because someone figured calling
> >free was expensive.
>
> I think I've essentially refuted this in parts of previous
> messages on this thread that Eric has basically ignored in his
> responses.

About free, and the interface I don't much care. Except that I care
that it is clean, and that device_shutdown does not get neutered. My
current complaint is now that the interface in 2.5.44 changed it is
ill specified, and setup in a way so that the code will rarely get
tested, and when it is tested it will be tested in a way that is
likely to not expose problems.

But judging from my experience with etherboot, and by the bug reports
I have coming in there is a lot of work where I need to get drivers to
shut themselves down.

There are two cases on reboot.
A) eventually the BIOS toggles the machine Reset line,
common but not guaranteed to happen.
B) Whatever we call on reboot does not toggle the reset line.

> Interested readers should check the previous lkml messages
> with this subject line. Of particular relevance to this issue, see my
> list of three advantages of the 2.4.x approach of not calling
> unnecessary device-specific shutdown sequences in my message at

Breaking it up is fine, but a real pain.

> http://marc.theaimsgroup.com/?l=linux-kernel&m=103481960911183&w=2,
> where I pasted in a response on the same issue that I originally
> wroted in this thread to Eric Blade (after the paragraph that begins
> "Changing the interface will reduce coplexity as only the code that
> needs to be executed will be called.").

Not calling the same code in rmmod is wrong because then the is hard
to test. Without that the code only gets tested when someone is
likely to toggle the reset line on the device anyway, trivially hiding
bugs. running rmmod and insmod a bunch of times makes a good test for
this kind of thing, and is likely to happen during driver development.

With respect to the embedded case, embedded machines are more likely
to see this problem than desktop machines, on an ordinary reboot. But
for code bloat I don't have a problem if there is a compile option
that totally removes device_shutdown, to save space. So long as it
works when it is enabled.

And I do agree that running the bare minimum can increase reliability
in some cases as Eric Blade pointed. But I don't want an interface
that is optimized for buggy drivers. We have the source so we can fix
them.

> Also of interest is Richard
> B. Johnson's example of BIOS reset code that shuts off the PCI bus
> before touching any RAM at
> http://marc.theaimsgroup.com/?l=linux-kernel&m=103462697923792&w=2.

Yes the BIOS work around because some OS's don't shut down their
devices.

> I'm willing to get into this in detail again upon request.
> Otherwise, I think it would be a better use of everyone's time not to
> subject linux-kernel readers to an infinite loop.

Mostly I want a comment from Patrick Mochel why he made the change,
and roughly what he was thinking. So I have a good idea about which
code I need to dig into and send patches to fix. If he makes a good
case for an independent shutdown, method I am fine with that, just
every driver in the kernel needs to change, and that is a heck of a
lot of work before 2.6. Otherwise we can go back to calling remove.

Eric

2002-10-20 09:45:53

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices


Assuming it is worth it to split remove into 2 parts, we need
the following so rmmod calls ->shutdown. Otherwise we will get
code duplication in the drivers.

And then we need all of the patches that split remove into 2 parts,
in the drivers.

Eric

diff -uNr linux-2.5.44/drivers/base/bus.c linux-2.5.44.shutdown/drivers/base/bus.c
--- linux-2.5.44/drivers/base/bus.c Sat Oct 19 00:57:58 2002
+++ linux-2.5.44.shutdown/drivers/base/bus.c Sun Oct 20 03:44:46 2002
@@ -164,6 +164,8 @@
if (drv) {
list_del_init(&dev->driver_list);
devclass_remove_device(dev);
+ if (drv->shutdown && device_present(drv))
+ drv->shutdown(dev);
if (drv->remove)
drv->remove(dev);
dev->driver = NULL;

2002-10-20 20:34:30

by Patrick Mochel

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices


> Mostly I want a comment from Patrick Mochel why he made the change,
> and roughly what he was thinking. So I have a good idea about which
> code I need to dig into and send patches to fix. If he makes a good
> case for an independent shutdown, method I am fine with that, just
> every driver in the kernel needs to change, and that is a heck of a
> lot of work before 2.6. Otherwise we can go back to calling remove.

The main problem is locking and refcounting on the device objects.
->remove() is removing objects from the device tree and freeing them. This
is not good when we expect the list to remain intact while we iterate over
it.

This is fine when a device is unplugged or a module is removed, but
completely unnecessary during a power transition. Nothing is going away;
we're just turning everything off. And, we don't we don't have to mess
with getting the list traversal right, since we can assume it's intact.

In short, it's about the data structures, not the hardware. It is going to
require modification to drivers, but the changes should be small and make
the code cleaner. It can also happen gradually. There is going to be a lot
of cleanup of drivers in the coming months as more things are converted to
exploit the driver model, anyway.

In general, I agree with the patch that you sent later in the thread. I'll
apply it, at least for now.

-pat

2002-10-20 23:52:57

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Patrick Mochel <[email protected]> writes:

> > Mostly I want a comment from Patrick Mochel why he made the change,
> > and roughly what he was thinking. So I have a good idea about which
> > code I need to dig into and send patches to fix. If he makes a good
> > case for an independent shutdown, method I am fine with that, just
> > every driver in the kernel needs to change, and that is a heck of a
> > lot of work before 2.6. Otherwise we can go back to calling remove.
>
> The main problem is locking and refcounting on the device objects.
> ->remove() is removing objects from the device tree and freeing them. This
> is not good when we expect the list to remain intact while we iterate over
> it.
>
> This is fine when a device is unplugged or a module is removed, but
> completely unnecessary during a power transition. Nothing is going away;
> we're just turning everything off. And, we don't we don't have to mess
> with getting the list traversal right, since we can assume it's intact.

O.k. That is very good reason for making the change.

> In short, it's about the data structures, not the hardware. It is going to
> require modification to drivers, but the changes should be small and make
> the code cleaner. It can also happen gradually. There is going to be a lot
> of cleanup of drivers in the coming months as more things are converted to
> exploit the driver model, anyway.
>>
> In general, I agree with the patch that you sent later in the thread. I'll
> apply it, at least for now.

My big concern is with getting the shutdown path setup in a manner
that works, and gets testing. When booting linux from linux with
sys_kexec a lot of my problems come back to some device driver not
getting shutdown.

Question, is there a method from the class shutdown code that we
can/should call, during reboot. I just have this memory that for
network interfaces simply downing the interface tends to put it in
a quiescent state. And I am wondering if that might be a general
thing we can take advantage of. Though if the class remove methods
modify the data structures I guess that is out.

Eric

2002-10-21 17:03:55

by Patrick Mochel

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices


> Question, is there a method from the class shutdown code that we
> can/should call, during reboot. I just have this memory that for
> network interfaces simply downing the interface tends to put it in
> a quiescent state. And I am wondering if that might be a general
> thing we can take advantage of. Though if the class remove methods
> modify the data structures I guess that is out.

No. Bringing down the network interface, at least, can be done from
userspace.

-pat

2002-10-21 20:51:04

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

Eric Biederman writes:
>My big concern is with getting the shutdown path setup in a manner
>that works, and gets testing.

Rebooting without traversing the device tree seems to have
essentially worked fine for 2.4.x.

>When booting linux from linux with
>sys_kexec a lot of my problems come back to some device driver not
>getting shutdown.

kmonte and sys_kexec skip the BIOS reset code and therefore
may need to do more elaborate shutdown, but please do not saddle the
normal reboot case with the reliability risk of calling code in each
driver when a user might be rebooting a remote machine precisely
because of a a confused device driver or the potential slow down
(especially since you want an interface where the function that gets
called before reboot may need to do blocking IO). For
kmonte/sys_kexec, this high cost might be necessary, but for the
normal reboot the cost is not worth the benefit.

By way, given the ability to register reboot notifiers in the
device tree, I would be happy to see one registered at the top of the
PCI bus tree (so it would be called last) that would shut off the PCI
bus before reboot, along the lines of what Richard B. Johnson posted.
That would not involve walking a lot of data structures in many
different device drivers and it would be just a few instrutions.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."

2002-10-21 22:20:58

by Adam J. Richter

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

I wrote to Eric Biederman:
> kmonte and sys_kexec skip the BIOS reset code and therefore
>may need to do more elaborate shutdown, but please do not saddle the
>normal reboot case with [that].

Let me add that I would be quite happy to see the common code
in sys_reboot moved into a shared exported routine that sys_kexec and
kmonte could use, so that they would automatically track changes to
that procedure rather than requiring maintenance of a duplicate copy
of that code.

Adam J. Richter __ ______________ 575 Oroville Road
[email protected] \ / Milpitas, California 95035
+1 408 309-6081 | g g d r a s i l United States of America
"Free Software For The Rest Of Us."


2002-10-22 04:24:34

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices

"Adam J. Richter" <[email protected]> writes:

> Eric Biederman writes:
> >My big concern is with getting the shutdown path setup in a manner
> >that works, and gets testing.
>
> Rebooting without traversing the device tree seems to have
> essentially worked fine for 2.4.x.

Yes. I expect that most of the shutdown routines will be made conditional
on something like, like CONFIG_HOTPLUG so you can disable the cleanly.
Or perhaps, device_shutdown will be made a compile time conditional.

But please note a number of 2.4.x drivers are starting to grow reboot
notifiers. So it appears that some people have needed code to shut
down their driver at reboot time in 2.4.x

> >When booting linux from linux with
> >sys_kexec a lot of my problems come back to some device driver not
> >getting shutdown.
>
> kmonte and sys_kexec skip the BIOS reset code and therefore
> may need to do more elaborate shutdown, but please do not saddle the
> normal reboot case with the reliability risk of calling code in each
> driver when a user might be rebooting a remote machine precisely
> because of a a confused device driver or the potential slow down
> (especially since you want an interface where the function that gets
> called before reboot may need to do blocking IO). For
> kmonte/sys_kexec, this high cost might be necessary, but for the
> normal reboot the cost is not worth the benefit.

In general if a routine takes a long time, that is a bug.

> By way, given the ability to register reboot notifiers in the
> device tree, I would be happy to see one registered at the top of the
> PCI bus tree (so it would be called last) that would shut off the PCI
> bus before reboot, along the lines of what Richard B. Johnson posted.
> That would not involve walking a lot of data structures in many
> different device drivers and it would be just a few instrutions.

That code was chipset specific, so it may be a good thing for a host bridge
driver to do that but it is by no means generally possible.

Eric