2001-12-23 17:59:12

by Russell King

[permalink] [raw]
Subject: Total system lockup with Alt-SysRQ-L

Ok, alt-sysrq-l is a pretty major thing to do, as it has the effect of
killing everything, including init.

When pid1 exits (maybe due to a kill signal), we lockup hard in (iirc)
exit_notify. I don't remember the details I'm afraid.

Back in 2.3, I had a go at fixing this, Linus rejected the patch saying
that it was doing the wrong thing. To this day, the kernel still suffers
from this, and I've not had the inclination to spend any more time on it.

So, I'm just letting people know that alt-sysrq-l is rather fatal,
especially if you want to do the following sequence to avoid a fsck:

alt-sysrq-l
alt-sysrq-s
alt-sysrq-u
alt-sysrq-b

IMHO either alt-sysrq-l should be removed, or someone who knows the logic
behind the linking of tasks together needs to fix exit_notify so it doesn't
enter an infinite loop when init exits.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html


2001-12-24 02:25:00

by Alan

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L

> When pid1 exits (maybe due to a kill signal), we lockup hard in (iirc)
> exit_notify. I don't remember the details I'm afraid.

pid1 ends up trying to kill pid1 and it goes deeply down the toilet from
that point onwards. The Unix traditional world reboots when pid 1 dies.

2001-12-24 08:38:24

by Russell King

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L

On Mon, Dec 24, 2001 at 02:34:20AM +0000, Alan Cox wrote:
> > When pid1 exits (maybe due to a kill signal), we lockup hard in (iirc)
> > exit_notify. I don't remember the details I'm afraid.
>
> pid1 ends up trying to kill pid1 and it goes deeply down the toilet from
> that point onwards. The Unix traditional world reboots when pid 1 dies.

The problem was definitely in the exit_notify code, where it manipulated
the task links indefinitely. (I think it was cptr never becomes null,
so the loop never terminates).

However, if we're saying that "pid1 must not die" then maybe we should get
rid of the 'killall' sysrq option since it serves no useful purpose, and
add a suitable panic in the do_exit path?

I'll generate a patch for that if there's interest.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-24 11:50:18

by Denis Oliver Kropp

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L

Quoting Russell King ([email protected]):
> On Mon, Dec 24, 2001 at 02:34:20AM +0000, Alan Cox wrote:
> > > When pid1 exits (maybe due to a kill signal), we lockup hard in (iirc)
> > > exit_notify. I don't remember the details I'm afraid.
> >
> > pid1 ends up trying to kill pid1 and it goes deeply down the toilet from
> > that point onwards. The Unix traditional world reboots when pid 1 dies.
>
> The problem was definitely in the exit_notify code, where it manipulated
> the task links indefinitely. (I think it was cptr never becomes null,
> so the loop never terminates).
>
> However, if we're saying that "pid1 must not die" then maybe we should get
> rid of the 'killall' sysrq option since it serves no useful purpose, and
> add a suitable panic in the do_exit path?

Another annoying thing that happens sometimes is that I accidently
press 'L' or 'E' instead of 'K' or 'R', the mostly used SysRQs for me.

An additional modifier for the harmful actions would be useful, e.g. Shift.
So pressing Alt-SysRQ-E would do nothing until Shift is pressed, too.

--
Best regards,
Denis Oliver Kropp

.------------------------------------------.
| DirectFB - Hardware accelerated graphics |
| http://www.directfb.org/ |
"------------------------------------------"

convergence integrated media GmbH

2001-12-24 12:26:28

by Russell King

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L

On Mon, Dec 24, 2001 at 08:37:52AM +0000, Russell King wrote:
> The problem was definitely in the exit_notify code, where it manipulated
> the task links indefinitely. (I think it was cptr never becomes null,
> so the loop never terminates).
>
> However, if we're saying that "pid1 must not die" then maybe we should get
> rid of the 'killall' sysrq option since it serves no useful purpose, and
> add a suitable panic in the do_exit path?

Ok, can someone explain *why* it is desirable to attempt to kill pid1
given that doing so will completely lockup the machine? (should we
rename it to "Lockup" instead of "killalL"? 8)

We do have some tests in the do_exit() path to panic if/when init dies,
which rely on the init PID being '1'. Unfortunately, these don't trigger
because of the following bogosity in drivers/char/sysrq.c:

if (p->pid == 1 && even_init)
/* Ugly hack to kill init */
p->pid = 0x8000;

So, I propose we get rid of this "ugly hack", and the alt-sysrq-l
option altogether - it would appear to serve no useful purpose.

Here is a patch that does just this. It should apply to 2.4.17 and 2.5.1
kernels fine (generated on 2.5.1).

--- orig/drivers/char/sysrq.c Wed Dec 12 11:37:40 2001
+++ linux/drivers/char/sysrq.c Mon Dec 24 12:19:58 2001
@@ -284,24 +284,20 @@

/* signal sysrq helper function
* Sends a signal to all user processes */
-static void send_sig_all(int sig, int even_init)
+static void send_sig_all(int sig)
{
struct task_struct *p;

for_each_task(p) {
- if (p->mm) { /* Not swapper nor kernel thread */
- if (p->pid == 1 && even_init)
- /* Ugly hack to kill init */
- p->pid = 0x8000;
- if (p->pid != 1)
- force_sig(sig, p);
- }
+ if (p->mm && p->pid != 1)
+ /* Not swapper, init nor kernel thread */
+ force_sig(sig, p);
}
}

static void sysrq_handle_term(int key, struct pt_regs *pt_regs,
struct kbd_struct *kbd, struct tty_struct *tty) {
- send_sig_all(SIGTERM, 0);
+ send_sig_all(SIGTERM);
console_loglevel = 8;
}
static struct sysrq_key_op sysrq_term_op = {
@@ -312,7 +308,7 @@

static void sysrq_handle_kill(int key, struct pt_regs *pt_regs,
struct kbd_struct *kbd, struct tty_struct *tty) {
- send_sig_all(SIGKILL, 0);
+ send_sig_all(SIGKILL);
console_loglevel = 8;
}
static struct sysrq_key_op sysrq_kill_op = {
@@ -321,17 +317,6 @@
action_msg: "Kill All Tasks",
};

-static void sysrq_handle_killall(int key, struct pt_regs *pt_regs,
- struct kbd_struct *kbd, struct tty_struct *tty) {
- send_sig_all(SIGKILL, 1);
- console_loglevel = 8;
-}
-static struct sysrq_key_op sysrq_killall_op = {
- handler: sysrq_handle_killall,
- help_msg: "killalL",
- action_msg: "Kill All Tasks (even init)",
-};
-
/* END SIGNAL SYSRQ HANDLERS BLOCK */


@@ -366,7 +351,7 @@
#else
/* k */ NULL,
#endif
-/* l */ &sysrq_killall_op,
+/* l */ NULL,
/* m */ &sysrq_showmem_op,
/* n */ NULL,
/* o */ NULL, /* This will often be registered

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-12-24 14:28:10

by M. Edward Borasky

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L

On Mon, 24 Dec 2001, Russell King wrote:

> The problem was definitely in the exit_notify code, where it
> manipulated the task links indefinitely. (I think it was cptr never
> becomes null, so the loop never terminates).
>
> However, if we're saying that "pid1 must not die" then maybe we should
> get rid of the 'killall' sysrq option since it serves no useful
> purpose, and add a suitable panic in the do_exit path?
>
> I'll generate a patch for that if there's interest.

What would be even better, and I think there may already be such an
option, would be a one-button "sync up all the disks, forbid any more
writes, save as much state as possbile (registers, memory) to a swap
partition, set a flag for crash dump processing and reboot" capability.

--
M. Edward Borasky

[email protected]
http://www.borasky-research.net

If God had meant carrots to be eaten cooked, He would have given rabbits
fire.

2001-12-24 16:58:28

by Alan

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L

> option, would be a one-button "sync up all the disks, forbid any more
> writes, save as much state as possbile (registers, memory) to a swap
> partition, set a flag for crash dump processing and reboot" capability.

Very hard to do - you can't trust the I/O systems state so the dump code
has to verify it hasnt been corrupted, reconfigure the drive it wishes to
write to, write the data out using its own non interrupt driven code and
then halt the box.

There are folks with patches that do a lot of that (lkcd)

2001-12-25 19:59:02

by Pavel Machek

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L


Hi!

> We do have some tests in the do_exit() path to panic if/when init dies,
> which rely on the init PID being '1'. Unfortunately, these don't trigger
> because of the following bogosity in drivers/char/sysrq.c:
>
> if (p->pid == 1 && even_init)
> /* Ugly hack to kill init */
> p->pid = 0x8000;
>
> So, I propose we get rid of this "ugly hack", and the alt-sysrq-l
> option altogether - it would appear to serve no useful purpose.

Ask mj if it was ever usefull... But I guess it was not. Kill it.

>
> Here is a patch that does just this. It should apply to 2.4.17 and 2.5.1
> kernels fine (generated on 2.5.1).
>
> --- orig/drivers/char/sysrq.c Wed Dec 12 11:37:40 2001
> +++ linux/drivers/char/sysrq.c Mon Dec 24 12:19:58 2001
> @@ -284,24 +284,20 @@
>
> /* signal sysrq helper function
> * Sends a signal to all user processes */
> -static void send_sig_all(int sig, int even_init)
> +static void send_sig_all(int sig)
> {
> struct task_struct *p;
>
> for_each_task(p) {
> - if (p->mm) { /* Not swapper nor kernel thread */
> - if (p->pid == 1 && even_init)
> - /* Ugly hack to kill init */
> - p->pid = 0x8000;
> - if (p->pid != 1)
> - force_sig(sig, p);
> - }
> + if (p->mm && p->pid != 1)
> + /* Not swapper, init nor kernel thread */
> + force_sig(sig, p);
> }
> }
>
> static void sysrq_handle_term(int key, struct pt_regs *pt_regs,
> struct kbd_struct *kbd, struct tty_struct *tty) {
> - send_sig_all(SIGTERM, 0);
> + send_sig_all(SIGTERM);
> console_loglevel = 8;
> }
> static struct sysrq_key_op sysrq_term_op = {
> @@ -312,7 +308,7 @@
>
> static void sysrq_handle_kill(int key, struct pt_regs *pt_regs,
> struct kbd_struct *kbd, struct tty_struct *tty) {
> - send_sig_all(SIGKILL, 0);
> + send_sig_all(SIGKILL);
> console_loglevel = 8;
> }
> static struct sysrq_key_op sysrq_kill_op = {
> @@ -321,17 +317,6 @@
> action_msg: "Kill All Tasks",
> };
>
> -static void sysrq_handle_killall(int key, struct pt_regs *pt_regs,
> - struct kbd_struct *kbd, struct tty_struct *tty) {
> - send_sig_all(SIGKILL, 1);
> - console_loglevel = 8;
> -}
> -static struct sysrq_key_op sysrq_killall_op = {
> - handler: sysrq_handle_killall,
> - help_msg: "killalL",
> - action_msg: "Kill All Tasks (even init)",
> -};
> -
> /* END SIGNAL SYSRQ HANDLERS BLOCK */
>
>
> @@ -366,7 +351,7 @@
> #else
> /* k */ NULL,
> #endif
> -/* l */ &sysrq_killall_op,
> +/* l */ NULL,
> /* m */ &sysrq_showmem_op,
> /* n */ NULL,
> /* o */ NULL, /* This will often be registered
>
> --
> Russell King ([email protected]) The developer of ARM Linux
> http://www.arm.linux.org.uk/personal/aboutme.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2001-12-25 20:00:32

by Pavel Machek

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L

Hi!

> > option, would be a one-button "sync up all the disks, forbid any more
> > writes, save as much state as possbile (registers, memory) to a swap
> > partition, set a flag for crash dump processing and reboot" capability.
>
> Very hard to do - you can't trust the I/O systems state so the dump code

Actually... swsusp should be usable for most of this... But swsusp will
not work in bad state and I guess that's showtopper.

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2001-12-28 01:00:41

by David Woodhouse

[permalink] [raw]
Subject: Re: Total system lockup with Alt-SysRQ-L


[email protected] said:
> Ok, can someone explain *why* it is desirable to attempt to kill pid1
> given that doing so will completely lockup the machine? (should we
> rename it to "Lockup" instead of "killalL"? 8)

It's not. I believe SysRq-L was implemented while Linux would still exhibit
sane behaviour upon pid1 dying, and was never removed when the current
brokenness was introduced.

--
dwmw2