Subject: [patch] call reboot notifier list when doing an emergency reboot
From: Ingo Molnar <[email protected]>
my laptop (Lenovo T60) hangs during reboot if the shutdown notifiers are
not called. So the following command, which on other systems i use as a
quick way to reboot into a new kernel:
echo b > /proc/sysrq-trigger
just hangs indefinitely after the kernel prints "Restarting system".
we dont call the reboot notifiers during emergency reboot mainly because
it could be called from atomic context and reboot notifiers are a
blocking notifier list. But actually the kernel is often perfectly
reschedulable in this stage, so we could as well process the
reboot_notifier_list.
(furthermore, on -rt kernels this place is preemptable even during
SysRq-b)
So just process the reboot notifier list if we are preemptable. This
will shut disk caches and chipsets off.
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sys.c | 7 +++++++
1 file changed, 7 insertions(+)
Index: linux/kernel/sys.c
===================================================================
--- linux.orig/kernel/sys.c
+++ linux/kernel/sys.c
@@ -710,6 +710,13 @@ out_unlock:
*/
void emergency_restart(void)
{
+ /*
+ * Call the notifier chain if we are not in an
+ * atomic context:
+ */
+ if (!preempt_count() && !irqs_disabled())
+ blocking_notifier_call_chain(&reboot_notifier_list,
+ SYS_RESTART, NULL);
machine_emergency_restart();
}
EXPORT_SYMBOL_GPL(emergency_restart);
On Wed, Jan 17, 2007 at 10:13:19AM +0100, Ingo Molnar wrote:
> we dont call the reboot notifiers during emergency reboot mainly because
> it could be called from atomic context and reboot notifiers are a
> blocking notifier list. But actually the kernel is often perfectly
> reschedulable in this stage, so we could as well process the
> reboot_notifier_list.
My experience has been that when there has been the need to use this
facility, the kernel hasn't been reschedulable. (If it were then I'd
use "reboot -f" instead.)
If we're going to do this, can we make the new behaviour have a different
key combination so the original way remains?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
* Russell King <[email protected]> wrote:
> On Wed, Jan 17, 2007 at 10:13:19AM +0100, Ingo Molnar wrote:
> > we dont call the reboot notifiers during emergency reboot mainly because
> > it could be called from atomic context and reboot notifiers are a
> > blocking notifier list. But actually the kernel is often perfectly
> > reschedulable in this stage, so we could as well process the
> > reboot_notifier_list.
>
> My experience has been that when there has been the need to use this
> facility, the kernel hasn't been reschedulable. [...]
this decision is totally automatic - so if your situation happens and
the kernel isnt reschedulable, then the notifier chain wont be called
and nothing changes from your perspective. Hm, perhaps this should be
dependent on CONFIG_PREEMPT, to make sure preempt_count() is reliable?
but from my perspective this patch fixes a real regression.
updated patch attached below.
Ingo
-------------------->
Subject: [patch] call reboot notifier list when doing an emergency reboot
From: Ingo Molnar <[email protected]>
my laptop does not reboot unless the shutdown notifiers are called
first. So the following command, which i use as a fast way to reboot
into a new kernel:
echo b > /proc/sysrq-trigger
just hangs indefinitely after the kernel prints "System rebooting".
the thing is, that the kernel is actually reschedulable in this stage,
so we could as well process the reboot_notifier_list. (furthermore,
on -rt kernels this place is preemptable even during SysRq-b)
So just process the reboot notifier list if we are preemptable. This
will shut disk caches and chipsets off.
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sys.c | 10 ++++++++++
1 file changed, 10 insertions(+)
Index: linux/kernel/sys.c
===================================================================
--- linux.orig/kernel/sys.c
+++ linux/kernel/sys.c
@@ -29,6 +29,7 @@
#include <linux/signal.h>
#include <linux/cn_proc.h>
#include <linux/getcpu.h>
+#include <linux/hardirq.h>
#include <linux/compat.h>
#include <linux/syscalls.h>
@@ -710,6 +711,15 @@ out_unlock:
*/
void emergency_restart(void)
{
+ /*
+ * Call the notifier chain if we are not in an
+ * atomic context:
+ */
+#ifdef CONFIG_PREEMPT
+ if (!in_atomic() && !irqs_disabled())
+ blocking_notifier_call_chain(&reboot_notifier_list,
+ SYS_RESTART, NULL);
+#endif
machine_emergency_restart();
}
EXPORT_SYMBOL_GPL(emergency_restart);
* Ingo Molnar <[email protected]> wrote:
> Subject: [patch] call reboot notifier list when doing an emergency reboot
> From: Ingo Molnar <[email protected]>
>
> my laptop (Lenovo T60) hangs during reboot if the shutdown notifiers are
> not called. So the following command, which on other systems i use as a
> quick way to reboot into a new kernel:
>
> echo b > /proc/sysrq-trigger
>
> just hangs indefinitely after the kernel prints "Restarting system".
i also figured out which one of the many reboot notifiers makes the
crutial difference: it's kvm_reboot().
So i think we should do the patch below - this makes reboot work even in
atomic contexts. (My previous patch makes sense nevertheless, as reboot
notifiers are quite useful in general, even during SysRq-b or panic
reboots. For example the SATA disk caches are flushed.)
I have tested this without the other kernel/sys.c patch, and this solves
the hung reboot problem equally well.
Ingo
--------------------->
Subject: [patch] KVM: do VMXOFF upon reboot
From: Ingo Molnar <[email protected]>
my laptop's BIOS apparently gets confused if the kernel tries to
reboot without first turning VT context off, which results in a
hung (emergency-)reboot. So make sure this happens, right before
we reboot.
( NOTE: this is a dual-core system, but only the core where the
BIOS executes seems to be affected - the other core can have an
active VT context just fine - so we dont have to risk reboot
robustness by doing a CPU cross-call in the emergency reboot
handler. )
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/i386/kernel/reboot.c | 8 ++++++++
arch/x86_64/kernel/reboot.c | 7 +++++++
2 files changed, 15 insertions(+)
Index: linux/arch/i386/kernel/reboot.c
===================================================================
--- linux.orig/arch/i386/kernel/reboot.c
+++ linux/arch/i386/kernel/reboot.c
@@ -318,6 +318,14 @@ void machine_shutdown(void)
void machine_emergency_restart(void)
{
+ unsigned long ecx = cpuid_ecx(1);
+
+ /*
+ * Disable any possibly active VT context (if VT supported):
+ */
+ if (test_bit(5, &ecx)) /* has VT support */
+ asm volatile (".byte 0x0f, 0x01, 0xc4"); /* vmxoff */
+
if (!reboot_thru_bios) {
if (efi_enabled) {
efi.reset_system(EFI_RESET_COLD, EFI_SUCCESS, 0, NULL);
Index: linux/arch/x86_64/kernel/reboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/reboot.c
+++ linux/arch/x86_64/kernel/reboot.c
@@ -114,8 +114,15 @@ void machine_shutdown(void)
void machine_emergency_restart(void)
{
+ unsigned long ecx = cpuid_ecx(1);
int i;
+ /*
+ * Disable any possibly active VT context (if VT supported):
+ */
+ if (test_bit(5, &ecx)) /* has VT support */
+ asm volatile (".byte 0x0f, 0x01, 0xc4"); /* vmxoff */
+
/* Tell the BIOS if we want cold or warm reboot */
*((unsigned short *)__va(0x472)) = reboot_mode;
* Ingo Molnar <[email protected]> wrote:
> So i think we should do the patch below - this makes reboot work even
> in atomic contexts. [...]
hm, this causes problems if KVM is not active on a VT-capable CPU: even
on CPUs with VT supported, if a VT context is not actually activated, a
vmxoff causes an invalid opcode exception. So the updated patch below
uses a slightly more sophisticated approach to avoid that problem.
Ingo
-------------------->
Subject: [patch] kvm: do VMXOFF upon reboot
From: Ingo Molnar <[email protected]>
my laptop's BIOS apparently gets confused if the kernel tries to
reboot without first turning VT context off, which results in a
hung (emergency-)reboot. So make sure this happens, right before
we reboot.
( NOTE: this is a dual-core system, but only the core where the
BIOS executes seems to be affected - the other core can have an
active VT context just fine - so we dont have to risk reboot
robustness by doing a CPU cross-call in the emergency reboot
handler. )
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/i386/kernel/reboot.c | 16 ++++++++++++++++
arch/x86_64/kernel/reboot.c | 15 +++++++++++++++
2 files changed, 31 insertions(+)
Index: linux/arch/i386/kernel/reboot.c
===================================================================
--- linux.orig/arch/i386/kernel/reboot.c
+++ linux/arch/i386/kernel/reboot.c
@@ -318,6 +318,22 @@ void machine_shutdown(void)
void machine_emergency_restart(void)
{
+ unsigned long ecx = cpuid_ecx(1);
+
+ /*
+ * Disable any possibly active VT context (if VT supported):
+ */
+ if (test_bit(5, &ecx)) { /* has VT support */
+ asm volatile (
+ "1: .byte 0x0f, 0x01, 0xc4 \n" /* vmxoff */
+ "2: \n"
+ ".section __ex_table,\"a\" \n"
+ " .align 4 \n"
+ " .long 1b,2b \n"
+ ".previous \n"
+ );
+ }
+
if (!reboot_thru_bios) {
if (efi_enabled) {
efi.reset_system(EFI_RESET_COLD, EFI_SUCCESS, 0, NULL);
Index: linux/arch/x86_64/kernel/reboot.c
===================================================================
--- linux.orig/arch/x86_64/kernel/reboot.c
+++ linux/arch/x86_64/kernel/reboot.c
@@ -114,8 +114,23 @@ void machine_shutdown(void)
void machine_emergency_restart(void)
{
+ unsigned long ecx = cpuid_ecx(1);
int i;
+ /*
+ * Disable any possibly active VT context (if VT supported):
+ */
+ if (test_bit(5, &ecx)) { /* has VT support */
+ asm volatile (
+ "1: .byte 0x0f, 0x01, 0xc4 \n" /* vmxoff */
+ "2: \n"
+ ".section __ex_table,\"a\" \n"
+ " .align 8 \n"
+ " .quad 1b,2b \n"
+ ".previous \n"
+ );
+ }
+
/* Tell the BIOS if we want cold or warm reboot */
*((unsigned short *)__va(0x472)) = reboot_mode;
> On Wed, 17 Jan 2007 10:39:17 +0100 Ingo Molnar <[email protected]> wrote:
>
> * Russell King <[email protected]> wrote:
>
> > On Wed, Jan 17, 2007 at 10:13:19AM +0100, Ingo Molnar wrote:
> > > we dont call the reboot notifiers during emergency reboot mainly because
> > > it could be called from atomic context and reboot notifiers are a
> > > blocking notifier list. But actually the kernel is often perfectly
> > > reschedulable in this stage, so we could as well process the
> > > reboot_notifier_list.
> >
> > My experience has been that when there has been the need to use this
> > facility, the kernel hasn't been reschedulable. [...]
>
> this decision is totally automatic - so if your situation happens and
> the kernel isnt reschedulable, then the notifier chain wont be called
> and nothing changes from your perspective. Hm, perhaps this should be
> dependent on CONFIG_PREEMPT, to make sure preempt_count() is reliable?
>
> but from my perspective this patch fixes a real regression.
>
> updated patch attached below.
>
Making it dependent upon CONFIG_PREEMPT seems a bit sucky. Perhaps pass in
some "you were called from /proc/sysrq-trigger" notification?
Also, there are ways of telling if the kernel has oopsed (oops counter,
oops_in_progress, etc) which should perhaps be tested.
Or just learn to type `reboot -fn' ;)
* Andrew Morton <[email protected]> wrote:
> Making it dependent upon CONFIG_PREEMPT seems a bit sucky. Perhaps
> pass in some "you were called from /proc/sysrq-trigger" notification?
looks quite invasive to the whole sysrq interfaces, it trickles all the
way down into sysrq.c's handler prototype, affecting 20 prototypes.
Worth the trouble?
> Also, there are ways of telling if the kernel has oopsed (oops
> counter, oops_in_progress, etc) which should perhaps be tested.
i'm not sure. Should we perhaps forget this patch and only do the
i386/x86_64 VMX patch i sent?
> Or just learn to type `reboot -fn' ;)
well, emergency_reboot() might also be called from panic(), if someone
sets panic_timeout, resulting in a similar hang. It might be called from
a serial console on a soft-locked-up system, having no physical access
to the system. Having a hung reboot in that scenario is not really
productive.
Ingo
Ingo Molnar wrote:
> * Ingo Molnar <[email protected]> wrote:
>
>
>> So i think we should do the patch below - this makes reboot work even
>> in atomic contexts. [...]
>>
>
> hm, this causes problems if KVM is not active on a VT-capable CPU: even
> on CPUs with VT supported, if a VT context is not actually activated, a
> vmxoff causes an invalid opcode exception. So the updated patch below
> uses a slightly more sophisticated approach to avoid that problem.
>
>
There is already code to that effect. Any idea why it is not called?
> static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
> void *v)
> {
> if (val == SYS_RESTART) {
> /*
> * Some (well, at least mine) BIOSes hang on reboot if
> * in vmx root mode.
> */
> printk(KERN_INFO "kvm: exiting hardware virtualization\n");
> on_each_cpu(kvm_arch_ops->hardware_disable, 0, 0, 1);
> }
> return NOTIFY_OK;
> }
>
> static struct notifier_block kvm_reboot_notifier = {
> .notifier_call = kvm_reboot,
> .priority = 0,
> };
>
Note that it performs the vmxoff on all cpus, not just one, and that it
is svm friendly too. Maybe it should check for values other than
SYS_RESTART?
--
error compiling committee.c: too many arguments to function
Hi!
> > > On Wed, Jan 17, 2007 at 10:13:19AM +0100, Ingo Molnar wrote:
> > > > we dont call the reboot notifiers during emergency reboot mainly because
> > > > it could be called from atomic context and reboot notifiers are a
> > > > blocking notifier list. But actually the kernel is often perfectly
> > > > reschedulable in this stage, so we could as well process the
> > > > reboot_notifier_list.
> > >
> > > My experience has been that when there has been the need to use this
> > > facility, the kernel hasn't been reschedulable. [...]
> >
> > this decision is totally automatic - so if your situation happens and
> > the kernel isnt reschedulable, then the notifier chain wont be called
> > and nothing changes from your perspective. Hm, perhaps this should be
> > dependent on CONFIG_PREEMPT, to make sure preempt_count() is reliable?
> >
> > but from my perspective this patch fixes a real regression.
> >
> > updated patch attached below.
> >
>
> Making it dependent upon CONFIG_PREEMPT seems a bit sucky. Perhaps pass in
> some "you were called from /proc/sysrq-trigger" notification?
What about adding 'B' with 'reboot with notifications' meaning?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Tue, 23 Jan 2007 19:57:12 +0000 Pavel Machek wrote:
> Hi!
>
> > > > On Wed, Jan 17, 2007 at 10:13:19AM +0100, Ingo Molnar wrote:
> > > > > we dont call the reboot notifiers during emergency reboot mainly because
> > > > > it could be called from atomic context and reboot notifiers are a
> > > > > blocking notifier list. But actually the kernel is often perfectly
> > > > > reschedulable in this stage, so we could as well process the
> > > > > reboot_notifier_list.
> > > >
> > > > My experience has been that when there has been the need to use this
> > > > facility, the kernel hasn't been reschedulable. [...]
> > >
> > > this decision is totally automatic - so if your situation happens and
> > > the kernel isnt reschedulable, then the notifier chain wont be called
> > > and nothing changes from your perspective. Hm, perhaps this should be
> > > dependent on CONFIG_PREEMPT, to make sure preempt_count() is reliable?
> > >
> > > but from my perspective this patch fixes a real regression.
> > >
> > > updated patch attached below.
> > >
> >
> > Making it dependent upon CONFIG_PREEMPT seems a bit sucky. Perhaps pass in
> > some "you were called from /proc/sysrq-trigger" notification?
>
> What about adding 'B' with 'reboot with notifications' meaning?
if I am reading 'B' vs. 'b' correctly:
sysrq key codes are currently all lower case only.
Of course, that could be changed to support upper case.
That would require sysrq help messages in a different format
(currently the keycode for the action is Capitalized, e.g.,
b => reBoot). And I suspect that it would allow some users to
enter b vs. B or B vs. b etc. unintentionally.
In any case, the sysrq_key_table could easily fill up soon,
so we may be force to support UpperCase codes.
---
~Randy
On Wed, 24 Jan 2007, Randy Dunlap wrote:
> On Tue, 23 Jan 2007 19:57:12 +0000 Pavel Machek wrote:
>
>> Hi!
>>
>>>>> On Wed, Jan 17, 2007 at 10:13:19AM +0100, Ingo Molnar wrote:
>>>>>> we dont call the reboot notifiers during emergency reboot mainly because
>>>>>> it could be called from atomic context and reboot notifiers are a
>>>>>> blocking notifier list. But actually the kernel is often perfectly
>>>>>> reschedulable in this stage, so we could as well process the
>>>>>> reboot_notifier_list.
>>>>>
>>>>> My experience has been that when there has been the need to use this
>>>>> facility, the kernel hasn't been reschedulable. [...]
>>>>
>>>> this decision is totally automatic - so if your situation happens and
>>>> the kernel isnt reschedulable, then the notifier chain wont be called
>>>> and nothing changes from your perspective. Hm, perhaps this should be
>>>> dependent on CONFIG_PREEMPT, to make sure preempt_count() is reliable?
>>>>
>>>> but from my perspective this patch fixes a real regression.
>>>>
>>>> updated patch attached below.
>>>>
>>>
>>> Making it dependent upon CONFIG_PREEMPT seems a bit sucky. Perhaps pass in
>>> some "you were called from /proc/sysrq-trigger" notification?
>>
>> What about adding 'B' with 'reboot with notifications' meaning?
>
> if I am reading 'B' vs. 'b' correctly:
>
> sysrq key codes are currently all lower case only.
> Of course, that could be changed to support upper case.
> That would require sysrq help messages in a different format
> (currently the keycode for the action is Capitalized, e.g.,
> b => reBoot). And I suspect that it would allow some users to
> enter b vs. B or B vs. b etc. unintentionally.
> In any case, the sysrq_key_table could easily fill up soon,
> so we may be force to support UpperCase codes.
>
> ---
> ~Randy
The you can take the lower scan-code and AND it with 95 (decimal)
before the test. That will allow all A-z codes to work (not numbers,
though). They get compared with the upper-case.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.59 BogoMips).
New book: http://www.AbominableFirebug.com/
_
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.