2014-11-18 12:09:33

by Kiran Kumar Raparthy

[permalink] [raw]
Subject: [RFC] debug: add parameters to prevent entering debug mode on errors

From: Colin Cross <[email protected]>

debug: add parameters to prevent entering debug mode on errors

On non-developer devices kgdb prevents CONFIG_PANIC_TIMEOUT from rebooting the
device after a panic. Add module parameters debug_core.break_on_exception and
debug_core.break_on_panic to allow skipping debug on panics and exceptions
respectively. Both default to true to preserve existing behavior.

Cc: Jason Wessel <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Android Kernel Team <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Sumit Semwal <[email protected]>
Signed-off-by: Colin Cross <[email protected]>
[Kiran: Added context to commit message]
Signed-off-by: Kiran Raparthy <[email protected]>
---
This is one of the number of patches from the Android AOSP common.git tree,
which is used on almost all Android devices. I wanted to submit it for review
to see if it should go upstream.

kernel/debug/debug_core.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 1adf62b..af06122 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -87,6 +87,10 @@ static int kgdb_use_con;
bool dbg_is_early = true;
/* Next cpu to become the master debug core */
int dbg_switch_cpu;
+/* Flag for entering kdb when a panic occurs */
+static bool break_on_panic = true;
+/* Flag for entering kdb when an exception occurs */
+static bool break_on_exception = true;

/* Use kdb or gdbserver mode */
int dbg_kdb_mode = 1;
@@ -101,6 +105,8 @@ early_param("kgdbcon", opt_kgdb_con);

module_param(kgdb_use_con, int, 0644);
module_param(kgdbreboot, int, 0644);
+module_param(break_on_panic, bool, 0644);
+module_param(break_on_exception, bool, 0644);

/*
* Holds information about breakpoints in a kernel. These breakpoints are
@@ -690,6 +696,9 @@ kgdb_handle_exception(int evector, int signo, int ecode, struct pt_regs *regs)
if (arch_kgdb_ops.enable_nmi)
arch_kgdb_ops.enable_nmi(0);

+ if (unlikely(signo != SIGTRAP && !break_on_exception))
+ return 1;
+
memset(ks, 0, sizeof(struct kgdb_state));
ks->cpu = raw_smp_processor_id();
ks->ex_vector = evector;
@@ -821,6 +830,9 @@ static int kgdb_panic_event(struct notifier_block *self,
unsigned long val,
void *data)
{
+ if (!break_on_panic)
+ return NOTIFY_DONE;
+
if (dbg_kdb_mode)
kdb_printf("PANIC: %s\n", (char *)data);
kgdb_breakpoint();
--
1.8.2.1


2014-11-18 17:13:17

by Daniel Thompson

[permalink] [raw]
Subject: Re: [RFC] debug: add parameters to prevent entering debug mode on errors

On 18/11/14 12:08, Kiran Kumar Raparthy wrote:
> From: Colin Cross <[email protected]>
>
> debug: add parameters to prevent entering debug mode on errors
>
> On non-developer devices kgdb prevents CONFIG_PANIC_TIMEOUT from rebooting the
> device after a panic. Add module parameters debug_core.break_on_exception and
> debug_core.break_on_panic to allow skipping debug on panics and exceptions
> respectively. Both default to true to preserve existing behavior.

I am a little unsure about break_on_panic.

It ought to be possible for kgdb/kdb to honour CONFIG_PANIC_TIMEOUT by
tracking how long it takes for the user to attach a debugger (or to run
the first kdb command after the panic). As it happens the timeout value
is already an exported kernel symbol so all the info it there for us to
use...

Doing so would save us imposing further configuration burden on the user
(although it would be a good deal more code).

Note that I can't think of an automatic way to handle break_on_exception
so I'm less worried about that one.


> Cc: Jason Wessel <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: Android Kernel Team <[email protected]>
> Cc: John Stultz <[email protected]>
> Cc: Sumit Semwal <[email protected]>
> Signed-off-by: Colin Cross <[email protected]>
> [Kiran: Added context to commit message]
> Signed-off-by: Kiran Raparthy <[email protected]>
> ---
> This is one of the number of patches from the Android AOSP common.git tree,
> which is used on almost all Android devices. I wanted to submit it for review
> to see if it should go upstream.
>
> kernel/debug/debug_core.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 1adf62b..af06122 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -87,6 +87,10 @@ static int kgdb_use_con;
> bool dbg_is_early = true;
> /* Next cpu to become the master debug core */
> int dbg_switch_cpu;
> +/* Flag for entering kdb when a panic occurs */
> +static bool break_on_panic = true;
> +/* Flag for entering kdb when an exception occurs */
> +static bool break_on_exception = true;
>
> /* Use kdb or gdbserver mode */
> int dbg_kdb_mode = 1;
> @@ -101,6 +105,8 @@ early_param("kgdbcon", opt_kgdb_con);
>
> module_param(kgdb_use_con, int, 0644);
> module_param(kgdbreboot, int, 0644);
> +module_param(break_on_panic, bool, 0644);
> +module_param(break_on_exception, bool, 0644);

kgdbreboot, which controls whether or not to trap into kgdb during
reboot, has a similar purpose to these new parameters. Perhaps any new
symbols should follow a similar naming scheme.

> /*
> * Holds information about breakpoints in a kernel. These breakpoints are
> @@ -690,6 +696,9 @@ kgdb_handle_exception(int evector, int signo, int ecode, struct pt_regs *regs)
> if (arch_kgdb_ops.enable_nmi)
> arch_kgdb_ops.enable_nmi(0);
>
> + if (unlikely(signo != SIGTRAP && !break_on_exception))

This is nit picking but...

There should be no need to predict branch results here. Its not
performance critical and anyway the "fast" path implied by the
unlikely() results in all CPUs screaming to a halt and burning up
millions of cycles waiting for user input.

> + return 1;
> +
> memset(ks, 0, sizeof(struct kgdb_state));
> ks->cpu = raw_smp_processor_id();
> ks->ex_vector = evector;
> @@ -821,6 +830,9 @@ static int kgdb_panic_event(struct notifier_block *self,
> unsigned long val,
> void *data)
> {
> + if (!break_on_panic)
> + return NOTIFY_DONE;
> +
> if (dbg_kdb_mode)
> kdb_printf("PANIC: %s\n", (char *)data);
> kgdb_breakpoint();
>

2014-11-20 08:18:12

by Kiran Kumar Raparthy

[permalink] [raw]
Subject: Re: [RFC] debug: add parameters to prevent entering debug mode on errors

Hi Daniel,

On 18 November 2014 22:43, Daniel Thompson <[email protected]> wrote:
> On 18/11/14 12:08, Kiran Kumar Raparthy wrote:
>> From: Colin Cross <[email protected]>
>>
>> debug: add parameters to prevent entering debug mode on errors
>>
>> On non-developer devices kgdb prevents CONFIG_PANIC_TIMEOUT from rebooting the
>> device after a panic. Add module parameters debug_core.break_on_exception and
>> debug_core.break_on_panic to allow skipping debug on panics and exceptions
>> respectively. Both default to true to preserve existing behavior.
>
> I am a little unsure about break_on_panic.
>
> It ought to be possible for kgdb/kdb to honour CONFIG_PANIC_TIMEOUT by
> tracking how long it takes for the user to attach a debugger (or to run
> the first kdb command after the panic). As it happens the timeout value
> is already an exported kernel symbol so all the info it there for us to
> use...
>
> Doing so would save us imposing further configuration burden on the user
> (although it would be a good deal more code).
>
> Note that I can't think of an automatic way to handle break_on_exception
> so I'm less worried about that one.
Alright,so it it okay if we have this mechanism limited to "skip debug
on exceptions"?
please let me know if i have misunderstood your point.
>
>
>> Cc: Jason Wessel <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: Android Kernel Team <[email protected]>
>> Cc: John Stultz <[email protected]>
>> Cc: Sumit Semwal <[email protected]>
>> Signed-off-by: Colin Cross <[email protected]>
>> [Kiran: Added context to commit message]
>> Signed-off-by: Kiran Raparthy <[email protected]>
>> ---
>> This is one of the number of patches from the Android AOSP common.git tree,
>> which is used on almost all Android devices. I wanted to submit it for review
>> to see if it should go upstream.
>>
>> kernel/debug/debug_core.c | 12 ++++++++++++
>> 1 file changed, 12 insertions(+)
>>
>> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
>> index 1adf62b..af06122 100644
>> --- a/kernel/debug/debug_core.c
>> +++ b/kernel/debug/debug_core.c
>> @@ -87,6 +87,10 @@ static int kgdb_use_con;
>> bool dbg_is_early = true;
>> /* Next cpu to become the master debug core */
>> int dbg_switch_cpu;
>> +/* Flag for entering kdb when a panic occurs */
>> +static bool break_on_panic = true;
>> +/* Flag for entering kdb when an exception occurs */
>> +static bool break_on_exception = true;
>>
>> /* Use kdb or gdbserver mode */
>> int dbg_kdb_mode = 1;
>> @@ -101,6 +105,8 @@ early_param("kgdbcon", opt_kgdb_con);
>>
>> module_param(kgdb_use_con, int, 0644);
>> module_param(kgdbreboot, int, 0644);
>> +module_param(break_on_panic, bool, 0644);
>> +module_param(break_on_exception, bool, 0644);
>
> kgdbreboot, which controls whether or not to trap into kgdb during
> reboot, has a similar purpose to these new parameters. Perhaps any new
> symbols should follow a similar naming scheme.
>
>> /*
>> * Holds information about breakpoints in a kernel. These breakpoints are
>> @@ -690,6 +696,9 @@ kgdb_handle_exception(int evector, int signo, int ecode, struct pt_regs *regs)
>> if (arch_kgdb_ops.enable_nmi)
>> arch_kgdb_ops.enable_nmi(0);
>>
>> + if (unlikely(signo != SIGTRAP && !break_on_exception))
>
> This is nit picking but...
>
> There should be no need to predict branch results here. Its not
> performance critical and anyway the "fast" path implied by the
> unlikely() results in all CPUs screaming to a halt and burning up
> millions of cycles waiting for user input.
Thanks for your time and review comments.
Regards,
Kiran
>
>> + return 1;
>> +
>> memset(ks, 0, sizeof(struct kgdb_state));
>> ks->cpu = raw_smp_processor_id();
>> ks->ex_vector = evector;
>> @@ -821,6 +830,9 @@ static int kgdb_panic_event(struct notifier_block *self,
>> unsigned long val,
>> void *data)
>> {
>> + if (!break_on_panic)
>> + return NOTIFY_DONE;
>> +
>> if (dbg_kdb_mode)
>> kdb_printf("PANIC: %s\n", (char *)data);
>> kgdb_breakpoint();
>>
>

2014-11-20 09:34:50

by Daniel Thompson

[permalink] [raw]
Subject: Re: [RFC] debug: add parameters to prevent entering debug mode on errors

On 20/11/14 08:18, Kiran Raparthy wrote:
> Hi Daniel,
>
> On 18 November 2014 22:43, Daniel Thompson <[email protected]> wrote:
>> On 18/11/14 12:08, Kiran Kumar Raparthy wrote:
>>> From: Colin Cross <[email protected]>
>>>
>>> debug: add parameters to prevent entering debug mode on errors
>>>
>>> On non-developer devices kgdb prevents CONFIG_PANIC_TIMEOUT from rebooting the
>>> device after a panic. Add module parameters debug_core.break_on_exception and
>>> debug_core.break_on_panic to allow skipping debug on panics and exceptions
>>> respectively. Both default to true to preserve existing behavior.
>>
>> I am a little unsure about break_on_panic.
>>
>> It ought to be possible for kgdb/kdb to honour CONFIG_PANIC_TIMEOUT by
>> tracking how long it takes for the user to attach a debugger (or to run
>> the first kdb command after the panic). As it happens the timeout value
>> is already an exported kernel symbol so all the info it there for us to
>> use...
>>
>> Doing so would save us imposing further configuration burden on the user
>> (although it would be a good deal more code).
>>
>> Note that I can't think of an automatic way to handle break_on_exception
>> so I'm less worried about that one.
> Alright,so it it okay if we have this mechanism limited to "skip debug
> on exceptions"?
> please let me know if i have misunderstood your point.

Spliting it up would certainly stop a review comment from needlessly
interfering with good stuff being delivered. That's always a good thing.

To be clear though, providing the user a way to prevent kgdb from
preventing the machine from rebooting after panic seems to me to be a
useful feature. It is simply that I think the existing panic_timeout
value could be used to realize it.

>>> + return 1;
>>> +
>>> memset(ks, 0, sizeof(struct kgdb_state));
>>> ks->cpu = raw_smp_processor_id();
>>> ks->ex_vector = evector;
>>> @@ -821,6 +830,9 @@ static int kgdb_panic_event(struct notifier_block *self,
>>> unsigned long val,
>>> void *data)
>>> {
>>> + if (!break_on_panic)
>>> + return NOTIFY_DONE;

How about simply:

if (panic_timeout)
return NOTIFY_DONE;

(plus a nice comment explaining why)

This doesn't implement a timeout and so does not prevent a physically
present user from exploiting kgdb. Nevertheless its an accurate
interpretation of what the user told us to do and leaves the door open
to adding a timeout in the future.

Actually it might be a good idea to use panic_timeout to control
trap-on-oops as well! If the user wants the machine to reboot itself on
panic they certainly don't want it to hang during an oops.

if (panic_timeout)
return NOTIFY_DONE;

>>> +
>>> if (dbg_kdb_mode)
>>> kdb_prinf("PANIC: %s\n", (char *)data);
>>> kgdb_breakpoint();
>>>
>>

2014-11-20 10:24:14

by Kiran Kumar Raparthy

[permalink] [raw]
Subject: Re: [RFC] debug: add parameters to prevent entering debug mode on errors

Hi,

On 20 November 2014 15:04, Daniel Thompson <[email protected]> wrote:
> On 20/11/14 08:18, Kiran Raparthy wrote:
>> Hi Daniel,
>>
>> On 18 November 2014 22:43, Daniel Thompson <[email protected]> wrote:
>>> On 18/11/14 12:08, Kiran Kumar Raparthy wrote:
>>>> From: Colin Cross <[email protected]>
>>>>
>>>> debug: add parameters to prevent entering debug mode on errors
>>>>
>>>> On non-developer devices kgdb prevents CONFIG_PANIC_TIMEOUT from rebooting the
>>>> device after a panic. Add module parameters debug_core.break_on_exception and
>>>> debug_core.break_on_panic to allow skipping debug on panics and exceptions
>>>> respectively. Both default to true to preserve existing behavior.
>>>
>>> I am a little unsure about break_on_panic.
>>>
>>> It ought to be possible for kgdb/kdb to honour CONFIG_PANIC_TIMEOUT by
>>> tracking how long it takes for the user to attach a debugger (or to run
>>> the first kdb command after the panic). As it happens the timeout value
>>> is already an exported kernel symbol so all the info it there for us to
>>> use...
>>>
>>> Doing so would save us imposing further configuration burden on the user
>>> (although it would be a good deal more code).
>>>
>>> Note that I can't think of an automatic way to handle break_on_exception
>>> so I'm less worried about that one.
>> Alright,so it it okay if we have this mechanism limited to "skip debug
>> on exceptions"?
>> please let me know if i have misunderstood your point.
>
> Spliting it up would certainly stop a review comment from needlessly
> interfering with good stuff being delivered. That's always a good thing.
>
> To be clear though, providing the user a way to prevent kgdb from
> preventing the machine from rebooting after panic seems to me to be a
> useful feature. It is simply that I think the existing panic_timeout
> value could be used to realize it.
Yeah,got it now.
>
>>>> + return 1;
>>>> +
>>>> memset(ks, 0, sizeof(struct kgdb_state));
>>>> ks->cpu = raw_smp_processor_id();
>>>> ks->ex_vector = evector;
>>>> @@ -821,6 +830,9 @@ static int kgdb_panic_event(struct notifier_block *self,
>>>> unsigned long val,
>>>> void *data)
>>>> {
>>>> + if (!break_on_panic)
>>>> + return NOTIFY_DONE;
>
> How about simply:
>
> if (panic_timeout)
> return NOTIFY_DONE;
>
> (plus a nice comment explaining why)
>
> This doesn't implement a timeout and so does not prevent a physically
> present user from exploiting kgdb. Nevertheless its an accurate
> interpretation of what the user told us to do and leaves the door open
> to adding a timeout in the future.
>
> Actually it might be a good idea to use panic_timeout to control
> trap-on-oops as well! If the user wants the machine to reboot itself on
> panic they certainly don't want it to hang during an oops.
Okay,I'll resend the patch with suggested modifications.
Thanks for the inputs.
Regards,
Kiran
>
> if (panic_timeout)
> return NOTIFY_DONE;
>
>>>> +
>>>> if (dbg_kdb_mode)
>>>> kdb_prinf("PANIC: %s\n", (char *)data);
>>>> kgdb_breakpoint();
>>>>
>>>
>