LinuxLists.cc - [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

2009-07-20 12:45:20

Subject: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

1) This fix breaks our tools.
This fix changes the ABI. panic_on_oops is default 0,
and a lots system do not specify the boot option "panic",
thus, Sysrq-c will not cause CrashDump(Kdump) as expected.

2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
But this fix makes it a valid command and let it do a
hazard thing: cause a page fault(NULL dereference) in kernel.

So, we revert this fix.

|commit d6580a9f15238b87e618310c862231ae3f352d2d
|Author: Neil Horman <[email protected]>
|Date: Wed Jun 17 16:28:17 2009 -0700

| kexec: sysrq: simplify sysrq-c handler

| Currently the sysrq-c handler is bit over-engineered. Its behavior is
| dependent on a few compile time and run time factors that alter its
| behavior which is really unnecessecary.

| If CONFIG_KEXEC is not configured, sysrq-c, crashes the system with a NULL
| pointer dereference. If CONFIG_KEXEC is configured, it calls crash_kexec
| directly, which implies that the kexec kernel will either be booted (if
| its been previously loaded), or it will simply do nothing (the no kexec
| kernel has been loaded).

| It would be much easier to just simplify the whole thing to dereference a
| NULL pointer all the time regardless of configuration. That way, it will
| always try to crash the system, and if a kexec kernel has been loaded into
| reserved space, it will still boot from the page fault trap handler
| (assuming panic_on_oops is set appropriately).

Signed-off-by: Lai Jiangshan <[email protected]>
---
diff --git a/drivers/char/sysrq.c b/drivers/char/sysrq.c
index 0db3585..39a05b5 100644
--- a/drivers/char/sysrq.c
+++ b/drivers/char/sysrq.c
@@ -121,17 +121,20 @@ static struct sysrq_key_op sysrq_unraw_op = {
#define sysrq_unraw_op (*(struct sysrq_key_op *)0)
#endif /* CONFIG_VT */

-static void sysrq_handle_crash(int key, struct tty_struct *tty)
+#ifdef CONFIG_KEXEC
+static void sysrq_handle_crashdump(int key, struct tty_struct *tty)
{
- char *killer = NULL;
- *killer = 1;
+ crash_kexec(get_irq_regs());
}
static struct sysrq_key_op sysrq_crashdump_op = {
- .handler = sysrq_handle_crash,
- .help_msg = "Crash",
- .action_msg = "Trigger a crash",
+ .handler = sysrq_handle_crashdump,
+ .help_msg = "Crashdump",
+ .action_msg = "Trigger a crashdump",
.enable_mask = SYSRQ_ENABLE_DUMP,
};
+#else
+#define sysrq_crashdump_op (*(struct sysrq_key_op *)0)
+#endif

static void sysrq_handle_reboot(int key, struct tty_struct *tty)
{

2009-07-20 19:23:06

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Lai Jiangshan <[email protected]> writes:

> 1) This fix breaks our tools.
> This fix changes the ABI. panic_on_oops is default 0,
> and a lots system do not specify the boot option "panic",
> thus, Sysrq-c will not cause CrashDump(Kdump) as expected.

How does it break your tools?

> 2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
> command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
> But this fix makes it a valid command and let it do a
> hazard thing: cause a page fault(NULL dereference) in kernel.
>
> So, we revert this fix.

The idea was to extend sysrq-d to also be a way of testing NULL
pointer dereferences. How is that a bad idea?

Eric

2009-07-20 21:17:20

by Neil Horman

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

On Mon, Jul 20, 2009 at 12:22:57PM -0700, Eric W. Biederman wrote:
> Lai Jiangshan <[email protected]> writes:
>
> > 1) This fix breaks our tools.
> > This fix changes the ABI. panic_on_oops is default 0,
> > and a lots system do not specify the boot option "panic",
> > thus, Sysrq-c will not cause CrashDump(Kdump) as expected.
>
> How does it break your tools?
>
Well, upstream doesn't really care about ABI stability, particularly in the
sysrq space. That aside however, it seems like sysrq-c is doing the right thing
with my patch in place, namely, attempting to crash the system. If the
panic_on_oops sysctl isn't set, then a crash fails, as expected (unlike the
prior behavior, which forced a kexec reboot of the system while ignoring the
sysctl, which seems like it would be labeled the unexpected behavior to me.
Regardless, it seems like the right thing to do if you want sysrq-c to do the
right thing from the start is set panic on the kernel command line. Not sure
what the problem is there.

> > 2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
> > command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
> > But this fix makes it a valid command and let it do a
> > hazard thing: cause a page fault(NULL dereference) in kernel.
> >
> > So, we revert this fix.
>
> The idea was to extend sysrq-d to also be a way of testing NULL
> pointer dereferences. How is that a bad idea?
>
Agreed, about the only thing that I see as wrong with my change is that I
neglected to change the documentation. Prior to my change, the behavior was
completely muddled. Sysrq-c would do one of 3 things:

1) If kexec wasn't built into the kernel, it would do nothing
2) If kexec was built into the kernel but not enabled, it would try and fail to
execute a kdump
3) if kdump was enabled and configured, it would crash

Under the current implementation, you can always crash the kernel (assuming
you've enabled sysrq, and have permission to use it), which will trigger a kdump
(or just crash the system, which is usefull for development in and of its own
right), or will simply record and oops (if panic_on_oops is clear). The only
case that left open is booting a kdump kernel without handling a bad page fault,
which you can do from user space anyway via the kexec -e command. I fail to see
how the previous implementation is superior.
Neil

> Eric
>

2009-07-21 06:00:23

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Eric W. Biederman wrote:
> Lai Jiangshan <[email protected]> writes:
>
>> 1) This fix breaks our tools.
>> This fix changes the ABI. panic_on_oops is default 0,
>> and a lots system do not specify the boot option "panic",
>> thus, Sysrq-c will not cause CrashDump(Kdump) as expected.
>
> How does it break your tools?

Sysrq-c is known for causing a CrashDump.
This fix make Sysrq-c just causing an oops.
An oops in process context just kills current task and
does nothing. (panic_on_oops=0)

Why we let a cleanup patch changes the kernel behavior so much?

>
>> 2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
>> command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
>> But this fix makes it a valid command and let it do a
>> hazard thing: cause a page fault(NULL dereference) in kernel.
>>
>> So, we revert this fix.
>
> The idea was to extend sysrq-d to also be a way of testing NULL
> pointer dereferences. How is that a bad idea?
>

When CONFIG_KEXEC=n, Crashdump is not available,
Sysrq-c should become an invalid command.

2009-07-21 06:46:08

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Neil Horman wrote:
> On Mon, Jul 20, 2009 at 12:22:57PM -0700, Eric W. Biederman wrote:
>> Lai Jiangshan <[email protected]> writes:
>>
>>> 1) This fix breaks our tools.
>>> This fix changes the ABI. panic_on_oops is default 0,
>>> and a lots system do not specify the boot option "panic",
>>> thus, Sysrq-c will not cause CrashDump(Kdump) as expected.
>> How does it break your tools?
>>
> Well, upstream doesn't really care about ABI stability, particularly in the
> sysrq space.

"simplify sysrq-c handler" sounds like a cleanup patch,
why we let a cleanup patch changes the ABI?

> That aside however, it seems like sysrq-c is doing the right thing
> with my patch in place, namely, attempting to crash the system. If the
> panic_on_oops sysctl isn't set, then a crash fails, as expected

The original name is "crashdump", so crash should be performed as expected.

(unlike the prior behavior, which forced a kexec reboot of the system while ignoring the
> sysctl, which seems like it would be labeled the unexpected behavior to me.
> Regardless, it seems like the right thing to do if you want sysrq-c to do the
> right thing from the start is set panic on the kernel command line. Not sure
> what the problem is there.
>
>>> 2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
>>> command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
>>> But this fix makes it a valid command and let it do a
>>> hazard thing: cause a page fault(NULL dereference) in kernel.
>>>
>>> So, we revert this fix.
>> The idea was to extend sysrq-d to also be a way of testing NULL
>> pointer dereferences. How is that a bad idea?
>>
> Agreed, about the only thing that I see as wrong with my change is that I

also the naming.

> neglected to change the documentation. Prior to my change, the behavior was
> completely muddled. Sysrq-c would do one of 3 things:
>
> 1) If kexec wasn't built into the kernel, it would do nothing
> 2) If kexec was built into the kernel but not enabled, it would try and fail to
> execute a kdump
> 3) if kdump was enabled and configured, it would crash

I don't think it's muddled.
1) If kexec wasn't built into the kernel, IT'S NOT A VALID COMMAND.
2),3) It always try to crashdump. not oops, not normal panic.

>
> Under the current implementation, you can always crash the kernel (assuming
> you've enabled sysrq, and have permission to use it), which will trigger a kdump
> (or just crash the system, which is usefull for development in and of its own
> right), or will simply record and oops (if panic_on_oops is clear). The only
> case that left open is booting a kdump kernel without handling a bad page fault,
> which you can do from user space anyway via the kexec -e command. I fail to see
> how the previous implementation is superior.

Even I agreed your fix, I don't agreed your naming,
For your fix, the correct naming should be:

.help_msg = "oops(C)",
.action_msg = "Trigger an oops"

And document it:
Sysrq-c always causes an oops by an indirect way. It'll do one of 4 things:
1) panic_on_oops=0, it is just kill the current task.
2) panic_on_oops=1, but CONFIG_KEXEC=n, just normal panic
3) panic_on_oops=1, CONFIG_KEXEC=y, but not enabled, just normal panic
4) panic_on_oops=1, CONFIG_KEXEC=y, kdump was enabled, CrashDump.

Lai.

2009-07-21 06:49:53

by Hidetoshi Seto

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Lai Jiangshan wrote:
> 1) This fix breaks our tools.
> This fix changes the ABI. panic_on_oops is default 0,
> and a lots system do not specify the boot option "panic",
> thus, Sysrq-c will not cause CrashDump(Kdump) as expected.
>
> 2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
> command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
> But this fix makes it a valid command and let it do a
> hazard thing: cause a page fault(NULL dereference) in kernel.
>
> So, we revert this fix.
>
> |commit d6580a9f15238b87e618310c862231ae3f352d2d
> |Author: Neil Horman <[email protected]>
> |Date: Wed Jun 17 16:28:17 2009 -0700
>
> | kexec: sysrq: simplify sysrq-c handler
>
> | Currently the sysrq-c handler is bit over-engineered. Its behavior is
> | dependent on a few compile time and run time factors that alter its
> | behavior which is really unnecessecary.
>
> | If CONFIG_KEXEC is not configured, sysrq-c, crashes the system with a NULL
> | pointer dereference. If CONFIG_KEXEC is configured, it calls crash_kexec
> | directly, which implies that the kexec kernel will either be booted (if
> | its been previously loaded), or it will simply do nothing (the no kexec
> | kernel has been loaded).
>
> | It would be much easier to just simplify the whole thing to dereference a
> | NULL pointer all the time regardless of configuration. That way, it will
> | always try to crash the system, and if a kexec kernel has been loaded into
> | reserved space, it will still boot from the page fault trap handler
> | (assuming panic_on_oops is set appropriately).
>
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---

FYI, this problem has already pointed by Ohmichi-san and this will be an
another patch for the following discussion:
http://lists.infradead.org/pipermail/kexec/2009-July/003433.html
You can find my sloppy memo in:
http://lists.infradead.org/pipermail/kexec/2009-July/003443.html

I agree with you that SysRq-'c' is well known as for 'C'rashdump, and it is
not expected as 'C'rash without dump.

Thanks,
H.Seto

2009-07-21 06:56:32

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Lai Jiangshan <[email protected]> writes:

> Eric W. Biederman wrote:
>> Lai Jiangshan <[email protected]> writes:
>>
>>> 1) This fix breaks our tools.
>>> This fix changes the ABI. panic_on_oops is default 0,
>>> and a lots system do not specify the boot option "panic",
>>> thus, Sysrq-c will not cause CrashDump(Kdump) as expected.
>>
>> How does it break your tools?
>
> Sysrq-c is known for causing a CrashDump.
> This fix make Sysrq-c just causing an oops.
> An oops in process context just kills current task and
> does nothing. (panic_on_oops=0)
>
> Why we let a cleanup patch changes the kernel behavior so much?

Because it seemed reasonable to be able to test the oops path
as well.

>>> 2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
>>> command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
>>> But this fix makes it a valid command and let it do a
>>> hazard thing: cause a page fault(NULL dereference) in kernel.
>>>
>>> So, we revert this fix.
>>
>> The idea was to extend sysrq-d to also be a way of testing NULL
>> pointer dereferences. How is that a bad idea?
>>
>
> When CONFIG_KEXEC=n, Crashdump is not available,
> Sysrq-c should become an invalid command.

Why should sysrq-c become an invalid command?

What problem does this cause for you?

Eric

2009-07-21 11:09:28

by Neil Horman

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

On Tue, Jul 21, 2009 at 03:49:22PM +0900, Hidetoshi Seto wrote:
> Lai Jiangshan wrote:
> > 1) This fix breaks our tools.
> > This fix changes the ABI. panic_on_oops is default 0,
> > and a lots system do not specify the boot option "panic",
> > thus, Sysrq-c will not cause CrashDump(Kdump) as expected.
> >
> > 2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
> > command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
> > But this fix makes it a valid command and let it do a
> > hazard thing: cause a page fault(NULL dereference) in kernel.
> >
> > So, we revert this fix.
> >
> > |commit d6580a9f15238b87e618310c862231ae3f352d2d
> > |Author: Neil Horman <[email protected]>
> > |Date: Wed Jun 17 16:28:17 2009 -0700
> >
> > | kexec: sysrq: simplify sysrq-c handler
> >
> > | Currently the sysrq-c handler is bit over-engineered. Its behavior is
> > | dependent on a few compile time and run time factors that alter its
> > | behavior which is really unnecessecary.
> >
> > | If CONFIG_KEXEC is not configured, sysrq-c, crashes the system with a NULL
> > | pointer dereference. If CONFIG_KEXEC is configured, it calls crash_kexec
> > | directly, which implies that the kexec kernel will either be booted (if
> > | its been previously loaded), or it will simply do nothing (the no kexec
> > | kernel has been loaded).
> >
> > | It would be much easier to just simplify the whole thing to dereference a
> > | NULL pointer all the time regardless of configuration. That way, it will
> > | always try to crash the system, and if a kexec kernel has been loaded into
> > | reserved space, it will still boot from the page fault trap handler
> > | (assuming panic_on_oops is set appropriately).
> >
> >
> > Signed-off-by: Lai Jiangshan <[email protected]>
> > ---
>
> FYI, this problem has already pointed by Ohmichi-san and this will be an
> another patch for the following discussion:
> http://lists.infradead.org/pipermail/kexec/2009-July/003433.html
> You can find my sloppy memo in:
> http://lists.infradead.org/pipermail/kexec/2009-July/003443.html
>
> I agree with you that SysRq-'c' is well known as for 'C'rashdump, and it is
> not expected as 'C'rash without dump.
>
>
> Thanks,
> H.Seto
>
>
None of this answers Erics question, what is it that you could do before, that
you couldn't do now? There are reasons to want to have a convienient way to
crash the kernel, other than to test kdump (several distributions have augmented
sysrq-c to do this for some time to test other previous dump mechanisms and
features), so while its not been upstream, saying that its well known to test
kdump without causing an oops is a bit of a misleading statement. It seems to
me that right now your major complaint is that the documentation is out of date,
and you're having to do things slightly differently to get the same behavioral
results. Would it solve your issue, if we simply updated the documentation to
illustrate how it works now?

Neil

2009-07-21 12:16:17

by Lai Jiangshan

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Neil Horman wrote:
> On Tue, Jul 21, 2009 at 03:49:22PM +0900, Hidetoshi Seto wrote:
>> Lai Jiangshan wrote:
>>> 1) This fix breaks our tools.
>>> This fix changes the ABI. panic_on_oops is default 0,
>>> and a lots system do not specify the boot option "panic",
>>> thus, Sysrq-c will not cause CrashDump(Kdump) as expected.
>>>
>>> 2) When CONFIG_KEXEC=n, Sysrq-c should become an invalid
>>> command like Sysrq-D(CONFIG_LOCKDEP, show-all-locks).
>>> But this fix makes it a valid command and let it do a
>>> hazard thing: cause a page fault(NULL dereference) in kernel.
>>>
>>> So, we revert this fix.
>>>
>>> |commit d6580a9f15238b87e618310c862231ae3f352d2d
>>> |Author: Neil Horman <[email protected]>
>>> |Date: Wed Jun 17 16:28:17 2009 -0700
>>>
>>> | kexec: sysrq: simplify sysrq-c handler
>>>
>>> | Currently the sysrq-c handler is bit over-engineered. Its behavior is
>>> | dependent on a few compile time and run time factors that alter its
>>> | behavior which is really unnecessecary.
>>>
>>> | If CONFIG_KEXEC is not configured, sysrq-c, crashes the system with a NULL
>>> | pointer dereference. If CONFIG_KEXEC is configured, it calls crash_kexec
>>> | directly, which implies that the kexec kernel will either be booted (if
>>> | its been previously loaded), or it will simply do nothing (the no kexec
>>> | kernel has been loaded).
>>>
>>> | It would be much easier to just simplify the whole thing to dereference a
>>> | NULL pointer all the time regardless of configuration. That way, it will
>>> | always try to crash the system, and if a kexec kernel has been loaded into
>>> | reserved space, it will still boot from the page fault trap handler
>>> | (assuming panic_on_oops is set appropriately).
>>>
>>>
>>> Signed-off-by: Lai Jiangshan <[email protected]>
>>> ---
>> FYI, this problem has already pointed by Ohmichi-san and this will be an
>> another patch for the following discussion:
>> http://lists.infradead.org/pipermail/kexec/2009-July/003433.html
>> You can find my sloppy memo in:
>> http://lists.infradead.org/pipermail/kexec/2009-July/003443.html
>>
>> I agree with you that SysRq-'c' is well known as for 'C'rashdump, and it is
>> not expected as 'C'rash without dump.
>>
>>
>> Thanks,
>> H.Seto
>>
>>
> None of this answers Erics question, what is it that you could do before, that
> you couldn't do now? There are reasons to want to have a convienient way to
> crash the kernel, other than to test kdump (several distributions have augmented
> sysrq-c to do this for some time to test other previous dump mechanisms and
> features), so while its not been upstream, saying that its well known to test
> kdump without causing an oops is a bit of a misleading statement. It seems to
> me that right now your major complaint is that the documentation is out of date,
> and you're having to do things slightly differently to get the same behavioral
> results.

I totally agreed that we just do things slightly differently
to get the same behavioral results. And we did it as you said.

So, Actually, our problem has been solved as you said
before I send the patch.

I just thought your fix is not proper for the kernel.

> Would it solve your issue, if we simply updated the documentation to
> illustrate how it works now?
>

Also the naming should be updated.
Your Sysrq-c triggers an oops if I didn't misunderstand your fix.

Lai

---------
Quote from Documentation/sysrq.txt
"""
It is a 'magical' key combo you can hit which the kernel will
respond to regardless of whatever else it is doing, unless
it is completely locked up.
"""

We can image this condition, the kernel is close to dead,
we can't type any command to set panic_on_oops to 1.
so we try to get the kernel's core file and analyze it some day,
but sysrq-c just print something without crashdown.

How can we do now? SysRq becomes not so magical.
----------

2009-07-21 22:18:41

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Lai Jiangshan <[email protected]> writes:

> Even I agreed your fix, I don't agreed your naming,
> For your fix, the correct naming should be:
>
> .help_msg = "oops(C)",
> .action_msg = "Trigger an oops"
>
> And document it:
> Sysrq-c always causes an oops by an indirect way. It'll do one of 4 things:
> 1) panic_on_oops=0, it is just kill the current task.
> 2) panic_on_oops=1, but CONFIG_KEXEC=n, just normal panic
> 3) panic_on_oops=1, CONFIG_KEXEC=y, but not enabled, just normal panic
> 4) panic_on_oops=1, CONFIG_KEXEC=y, kdump was enabled, CrashDump.

That sounds like a great way to sort out the understanding.

Acked-by: "Eric W. Biederman" <[email protected]>

Could you turn that into a proper patch?

Eric

2009-07-22 02:02:10

by Hidetoshi Seto

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Neil Horman wrote:
> None of this answers Erics question, what is it that you could do before, that
> you couldn't do now?

One is, as Ohmichi-san pointed, triggering kdump via echo c > /proc/sysrq-trigger.

In contrast to oops via SysRq-c from keyboard interrupt which results in
panic due to in_interrupt(), oops via echo-c will not become panic unless
panic_on_oops.

So in other words, we could expect same effect in both of echo-c and SysRq-c
before, but now we cannot because it depends on the panic_on_oops.
Isn't it a regression?

Whether kdump should be executed on oops (which is not panic) or not is a
separate thing.

> There are reasons to want to have a convenient way to
> crash the kernel, other than to test kdump (several distributions have augmented
> sysrq-c to do this for some time to test other previous dump mechanisms and
> features), so while its not been upstream, saying that its well known to test
> kdump without causing an oops is a bit of a misleading statement.

Let make me sure the difference between 'crash', 'oops', and 'panic'.
At least 'oops' is not panic, as is obvious from the name of panic_on_oops.
And it seems you are using 'crash' and 'oops' in mixture.

If you mean 'crash' as 'panic', my complaint is echo-c does not panic while
SysRq-c does panic. So if possible I'd like to suggest a change like:

static void sysrq_handle_crash(int key, struct tty_struct *tty)
{
- char *killer = NULL;
- *killer = 1;
+ panic("SysRq-triggered panic!\n");
}

I agree that causing a real crash(panic) is better way to test crashdump than
calling the entry function of the crashdump directly, and also that opening
the path for other dump mechanisms is welcomed.

> It seems to
> me that right now your major complaint is that the documentation is out of date,
> and you're having to do things slightly differently to get the same behavioral
> results. Would it solve your issue, if we simply updated the documentation to
> illustrate how it works now?

Of course the documentation should be updated asap.
But I think the major complaint is about a difference in the behaviors of SysRq-c
and "echo c > /proc/sysrq-trigger".

Thanks,
H.Seto

2009-07-22 11:11:27

by Neil Horman

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

On Wed, Jul 22, 2009 at 11:01:29AM +0900, Hidetoshi Seto wrote:
> Neil Horman wrote:
> > None of this answers Erics question, what is it that you could do before, that
> > you couldn't do now?
>
> One is, as Ohmichi-san pointed, triggering kdump via echo c > /proc/sysrq-trigger.
>
> In contrast to oops via SysRq-c from keyboard interrupt which results in
> panic due to in_interrupt(), oops via echo-c will not become panic unless
> panic_on_oops.
>
> So in other words, we could expect same effect in both of echo-c and SysRq-c
> before, but now we cannot because it depends on the panic_on_oops.
> Isn't it a regression?
>
Only if you blindly consider a change in behavior to be a regression. consider
that previously executing a sysrq-c did the same thing if you did a echo c >
/proc/sysrq-trigger on a keyboard sysrq-c, but did different things based on
weather or not your had a kexec kernel loaded.

> Whether kdump should be executed on oops (which is not panic) or not is a
> separate thing.
>
> > There are reasons to want to have a convenient way to
> > crash the kernel, other than to test kdump (several distributions have augmented
> > sysrq-c to do this for some time to test other previous dump mechanisms and
> > features), so while its not been upstream, saying that its well known to test
> > kdump without causing an oops is a bit of a misleading statement.
>
> Let make me sure the difference between 'crash', 'oops', and 'panic'.
> At least 'oops' is not panic, as is obvious from the name of panic_on_oops.
> And it seems you are using 'crash' and 'oops' in mixture.
>
I'm perfectly well aware of the difference, I just assert theres value to having
sysrq-c be able to test both paths, especially given that we already have the
sysrq-c sysctl available to toggle behavior for just this case.

> If you mean 'crash' as 'panic', my complaint is echo-c does not panic while
> SysRq-c does panic. So if possible I'd like to suggest a change like:
>
See above, I think theres value to having sysrq-c be able to do both, although I
agree the method by which it triggers both is a bit muddled.

> static void sysrq_handle_crash(int key, struct tty_struct *tty)
> {
> - char *killer = NULL;
> - *killer = 1;
> + panic("SysRq-triggered panic!\n");
> }
>
Well, this removes the ability from sysrq-c to test the oops handling path, but
I suppose it does buy us consistent behavior between the keyboard and proc
interfaces, which is likely more important. I can agree to that. Perhaps we
can create another sysctl to test the oops path later.

> I agree that causing a real crash(panic) is better way to test crashdump than
> calling the entry function of the crashdump directly, and also that opening
> the path for other dump mechanisms is welcomed.
>
Ok, so we're in line there :)

> > It seems to
> > me that right now your major complaint is that the documentation is out of date,
> > and you're having to do things slightly differently to get the same behavioral
> > results. Would it solve your issue, if we simply updated the documentation to
> > illustrate how it works now?
>
> Of course the documentation should be updated asap.
> But I think the major complaint is about a difference in the behaviors of SysRq-c
> and "echo c > /proc/sysrq-trigger".
>
Ok, I can agree with that. I'd support a change like what you have above to
bring the keyboard and proc interface behavior in line.

Regards
Neil

>
> Thanks,
> H.Seto
>
>

2009-07-22 13:42:30

by Vivek Goyal

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

On Wed, Jul 22, 2009 at 07:10:49AM -0400, Neil Horman wrote:
> On Wed, Jul 22, 2009 at 11:01:29AM +0900, Hidetoshi Seto wrote:
> > Neil Horman wrote:
> > > None of this answers Erics question, what is it that you could do before, that
> > > you couldn't do now?
> >
> > One is, as Ohmichi-san pointed, triggering kdump via echo c > /proc/sysrq-trigger.
> >
> > In contrast to oops via SysRq-c from keyboard interrupt which results in
> > panic due to in_interrupt(), oops via echo-c will not become panic unless
> > panic_on_oops.
> >
> > So in other words, we could expect same effect in both of echo-c and SysRq-c
> > before, but now we cannot because it depends on the panic_on_oops.
> > Isn't it a regression?
> >
> Only if you blindly consider a change in behavior to be a regression. consider
> that previously executing a sysrq-c did the same thing if you did a echo c >
> /proc/sysrq-trigger on a keyboard sysrq-c, but did different things based on
> weather or not your had a kexec kernel loaded.
>
> > Whether kdump should be executed on oops (which is not panic) or not is a
> > separate thing.
> >
> > > There are reasons to want to have a convenient way to
> > > crash the kernel, other than to test kdump (several distributions have augmented
> > > sysrq-c to do this for some time to test other previous dump mechanisms and
> > > features), so while its not been upstream, saying that its well known to test
> > > kdump without causing an oops is a bit of a misleading statement.
> >
> > Let make me sure the difference between 'crash', 'oops', and 'panic'.
> > At least 'oops' is not panic, as is obvious from the name of panic_on_oops.
> > And it seems you are using 'crash' and 'oops' in mixture.
> >
> I'm perfectly well aware of the difference, I just assert theres value to having
> sysrq-c be able to test both paths, especially given that we already have the
> sysrq-c sysctl available to toggle behavior for just this case.
>
> > If you mean 'crash' as 'panic', my complaint is echo-c does not panic while
> > SysRq-c does panic. So if possible I'd like to suggest a change like:
> >
> See above, I think theres value to having sysrq-c be able to do both, although I
> agree the method by which it triggers both is a bit muddled.
>
> > static void sysrq_handle_crash(int key, struct tty_struct *tty)
> > {
> > - char *killer = NULL;
> > - *killer = 1;
> > + panic("SysRq-triggered panic!\n");
> > }
> >

> Well, this removes the ability from sysrq-c to test the oops handling path, but
> I suppose it does buy us consistent behavior between the keyboard and proc
> interfaces, which is likely more important. I can agree to that. Perhaps we
> can create another sysctl to test the oops path later.
>

Can't we just set panic_on_oops = 1 in sysrq_handle_crash()? This will
make sure that we test oops path as well as have consistent behavior
between two methods of sysrc-c invocation.

Thanks
Vivek

> > I agree that causing a real crash(panic) is better way to test crashdump than
> > calling the entry function of the crashdump directly, and also that opening
> > the path for other dump mechanisms is welcomed.
> >
> Ok, so we're in line there :)
>
> > > It seems to
> > > me that right now your major complaint is that the documentation is out of date,
> > > and you're having to do things slightly differently to get the same behavioral
> > > results. Would it solve your issue, if we simply updated the documentation to
> > > illustrate how it works now?
> >
> > Of course the documentation should be updated asap.
> > But I think the major complaint is about a difference in the behaviors of SysRq-c
> > and "echo c > /proc/sysrq-trigger".
> >
> Ok, I can agree with that. I'd support a change like what you have above to
> bring the keyboard and proc interface behavior in line.
>
> Regards
> Neil
>
>
> >
> > Thanks,
> > H.Seto
> >
> >

2009-07-22 19:39:05

by Neil Horman

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

On Wed, Jul 22, 2009 at 09:42:11AM -0400, Vivek Goyal wrote:
> On Wed, Jul 22, 2009 at 07:10:49AM -0400, Neil Horman wrote:
> > On Wed, Jul 22, 2009 at 11:01:29AM +0900, Hidetoshi Seto wrote:
> > > Neil Horman wrote:
> > > > None of this answers Erics question, what is it that you could do before, that
> > > > you couldn't do now?
> > >
> > > One is, as Ohmichi-san pointed, triggering kdump via echo c > /proc/sysrq-trigger.
> > >
> > > In contrast to oops via SysRq-c from keyboard interrupt which results in
> > > panic due to in_interrupt(), oops via echo-c will not become panic unless
> > > panic_on_oops.
> > >
> > > So in other words, we could expect same effect in both of echo-c and SysRq-c
> > > before, but now we cannot because it depends on the panic_on_oops.
> > > Isn't it a regression?
> > >
> > Only if you blindly consider a change in behavior to be a regression. consider
> > that previously executing a sysrq-c did the same thing if you did a echo c >
> > /proc/sysrq-trigger on a keyboard sysrq-c, but did different things based on
> > weather or not your had a kexec kernel loaded.
> >
> > > Whether kdump should be executed on oops (which is not panic) or not is a
> > > separate thing.
> > >
> > > > There are reasons to want to have a convenient way to
> > > > crash the kernel, other than to test kdump (several distributions have augmented
> > > > sysrq-c to do this for some time to test other previous dump mechanisms and
> > > > features), so while its not been upstream, saying that its well known to test
> > > > kdump without causing an oops is a bit of a misleading statement.
> > >
> > > Let make me sure the difference between 'crash', 'oops', and 'panic'.
> > > At least 'oops' is not panic, as is obvious from the name of panic_on_oops.
> > > And it seems you are using 'crash' and 'oops' in mixture.
> > >
> > I'm perfectly well aware of the difference, I just assert theres value to having
> > sysrq-c be able to test both paths, especially given that we already have the
> > sysrq-c sysctl available to toggle behavior for just this case.
> >
> > > If you mean 'crash' as 'panic', my complaint is echo-c does not panic while
> > > SysRq-c does panic. So if possible I'd like to suggest a change like:
> > >
> > See above, I think theres value to having sysrq-c be able to do both, although I
> > agree the method by which it triggers both is a bit muddled.
> >
> > > static void sysrq_handle_crash(int key, struct tty_struct *tty)
> > > {
> > > - char *killer = NULL;
> > > - *killer = 1;
> > > + panic("SysRq-triggered panic!\n");
> > > }
> > >
>
> > Well, this removes the ability from sysrq-c to test the oops handling path, but
> > I suppose it does buy us consistent behavior between the keyboard and proc
> > interfaces, which is likely more important. I can agree to that. Perhaps we
> > can create another sysctl to test the oops path later.
> >
>
> Can't we just set panic_on_oops = 1 in sysrq_handle_crash()? This will
> make sure that we test oops path as well as have consistent behavior
> between two methods of sysrc-c invocation.
>
Thats a good point too, seems simpler than the other approach.
Neil

> Thanks
> Vivek
>
> > > I agree that causing a real crash(panic) is better way to test crashdump than
> > > calling the entry function of the crashdump directly, and also that opening
> > > the path for other dump mechanisms is welcomed.
> > >
> > Ok, so we're in line there :)
> >
> > > > It seems to
> > > > me that right now your major complaint is that the documentation is out of date,
> > > > and you're having to do things slightly differently to get the same behavioral
> > > > results. Would it solve your issue, if we simply updated the documentation to
> > > > illustrate how it works now?
> > >
> > > Of course the documentation should be updated asap.
> > > But I think the major complaint is about a difference in the behaviors of SysRq-c
> > > and "echo c > /proc/sysrq-trigger".
> > >
> > Ok, I can agree with that. I'd support a change like what you have above to
> > bring the keyboard and proc interface behavior in line.
> >
> > Regards
> > Neil
> >
> >
> > >
> > > Thanks,
> > > H.Seto
> > >
> > >
>

2009-07-23 01:10:03

by Hidetoshi Seto

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Neil Horman wrote:
> On Wed, Jul 22, 2009 at 11:01:29AM +0900, Hidetoshi Seto wrote:
>> Neil Horman wrote:
>> static void sysrq_handle_crash(int key, struct tty_struct *tty)
>> {
>> - char *killer = NULL;
>> - *killer = 1;
>> + panic("SysRq-triggered panic!\n");
>> }
>>
> Well, this removes the ability from sysrq-c to test the oops handling path, but
> I suppose it does buy us consistent behavior between the keyboard and proc
> interfaces, which is likely more important. I can agree to that. Perhaps we
> can create another sysctl to test the oops path later.
>
>> I agree that causing a real crash(panic) is better way to test crashdump than
>> calling the entry function of the crashdump directly, and also that opening
>> the path for other dump mechanisms is welcomed.
>>
> Ok, so we're in line there :)
>
>>> It seems to
>>> me that right now your major complaint is that the documentation is out of date,
>>> and you're having to do things slightly differently to get the same behavioral
>>> results. Would it solve your issue, if we simply updated the documentation to
>>> illustrate how it works now?
>> Of course the documentation should be updated asap.
>> But I think the major complaint is about a difference in the behaviors of SysRq-c
>> and "echo c > /proc/sysrq-trigger".
>>
> Ok, I can agree with that. I'd support a change like what you have above to
> bring the keyboard and proc interface behavior in line.

Thank you for your understanding!

I'll write & post a patch soon. Please review it.

Thanks,
H.Seto

2009-07-23 01:10:50

by Hidetoshi Seto

[permalink] [raw]

Subject: Re: [PATCH] sysrq, kdump: fix regression, revert "simplify sysrq-c handler"

Neil Horman wrote:
> On Wed, Jul 22, 2009 at 09:42:11AM -0400, Vivek Goyal wrote:
>> Can't we just set panic_on_oops = 1 in sysrq_handle_crash()? This will
>> make sure that we test oops path as well as have consistent behavior
>> between two methods of sysrc-c invocation.
>>
> Thats a good point too, seems simpler than the other approach.

OK, I'll take this way. Please wait for a moment for a patch.

Thanks,
H.Seto