This series has a few kdb fixes for back tracing on CPUs. The
previous version[1] had only one patch, but while making v3 I found a
few cleanups that made sense to break into other pieces.
As with all things kdb / kgdb, this patch set tries to inch us towards
a better state of the world but doesn't attempt to solve all known
problems.
Please enjoy.
[1] https://lore.kernel.org/r/[email protected]
Changes in v3:
- Patch ("Remove unused DCPU_SSTEP definition") new for v3.
- Patch ("kdb: Remove unused "argcount" param from...") new for v3.
- Patch ("kdb: Fix "btc <cpu>" crash if the CPU...") new for v3.
- Use exception state instead of new dbg_slave_dumpstack_cpu var.
- Move horror to debug core, cleaning up control flow.
- Avoid need for timeout by only waiting for CPUs marked as slaves.
Changes in v2:
- Totally new approach; now arch agnostic.
Douglas Anderson (4):
kgdb: Remove unused DCPU_SSTEP definition
kdb: Remove unused "argcount" param from kdb_bt1(); make btaprompt
bool
kdb: Fix "btc <cpu>" crash if the CPU didn't round up
kdb: Fix stack crawling on 'running' CPUs that aren't the master
kernel/debug/debug_core.c | 34 ++++++++++++++
kernel/debug/debug_core.h | 3 +-
kernel/debug/kdb/kdb_bt.c | 94 +++++++++++++++++++--------------------
3 files changed, 83 insertions(+), 48 deletions(-)
--
2.23.0.351.gc4317032e6-goog
From doing a 'git log --patch kernel/debug', it looks as if DCPU_SSTEP
has never been used. Presumably it used to be used back when kgdb was
out of tree and nobody thought to delete the definition when the usage
went away. Delete.
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- Patch ("Remove unused DCPU_SSTEP definition") new for v3.
Changes in v2: None
kernel/debug/debug_core.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/kernel/debug/debug_core.h b/kernel/debug/debug_core.h
index b4a7c326d546..804b0fe5a0ba 100644
--- a/kernel/debug/debug_core.h
+++ b/kernel/debug/debug_core.h
@@ -33,7 +33,6 @@ struct kgdb_state {
#define DCPU_WANT_MASTER 0x1 /* Waiting to become a master kgdb cpu */
#define DCPU_NEXT_MASTER 0x2 /* Transition from one master cpu to another */
#define DCPU_IS_SLAVE 0x4 /* Slave cpu enter exception */
-#define DCPU_SSTEP 0x8 /* CPU is single stepping */
struct debuggerinfo_struct {
void *debuggerinfo;
--
2.23.0.351.gc4317032e6-goog
The kdb_bt1() had a mysterious "argcount" parameter passed in (always
the number 5, by the way) and never used. Presumably this is just old
cruft. Remove it. While at it, upgrade the btaprompt parameter to a
full fledged bool instead of an int.
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- Patch ("kdb: Remove unused "argcount" param from...") new for v3.
Changes in v2: None
kernel/debug/kdb/kdb_bt.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
index 7e2379aa0a1e..120fc686c919 100644
--- a/kernel/debug/kdb/kdb_bt.c
+++ b/kernel/debug/kdb/kdb_bt.c
@@ -78,8 +78,7 @@ static void kdb_show_stack(struct task_struct *p, void *addr)
*/
static int
-kdb_bt1(struct task_struct *p, unsigned long mask,
- int argcount, int btaprompt)
+kdb_bt1(struct task_struct *p, unsigned long mask, bool btaprompt)
{
char buffer[2];
if (kdb_getarea(buffer[0], (unsigned long)p) ||
@@ -106,7 +105,6 @@ int
kdb_bt(int argc, const char **argv)
{
int diag;
- int argcount = 5;
int btaprompt = 1;
int nextarg;
unsigned long addr;
@@ -125,7 +123,7 @@ kdb_bt(int argc, const char **argv)
/* Run the active tasks first */
for_each_online_cpu(cpu) {
p = kdb_curr_task(cpu);
- if (kdb_bt1(p, mask, argcount, btaprompt))
+ if (kdb_bt1(p, mask, btaprompt))
return 0;
}
/* Now the inactive tasks */
@@ -134,7 +132,7 @@ kdb_bt(int argc, const char **argv)
return 0;
if (task_curr(p))
continue;
- if (kdb_bt1(p, mask, argcount, btaprompt))
+ if (kdb_bt1(p, mask, btaprompt))
return 0;
} kdb_while_each_thread(g, p);
} else if (strcmp(argv[0], "btp") == 0) {
@@ -148,7 +146,7 @@ kdb_bt(int argc, const char **argv)
p = find_task_by_pid_ns(pid, &init_pid_ns);
if (p) {
kdb_set_current_task(p);
- return kdb_bt1(p, ~0UL, argcount, 0);
+ return kdb_bt1(p, ~0UL, false);
}
kdb_printf("No process with pid == %ld found\n", pid);
return 0;
@@ -159,7 +157,7 @@ kdb_bt(int argc, const char **argv)
if (diag)
return diag;
kdb_set_current_task((struct task_struct *)addr);
- return kdb_bt1((struct task_struct *)addr, ~0UL, argcount, 0);
+ return kdb_bt1((struct task_struct *)addr, ~0UL, false);
} else if (strcmp(argv[0], "btc") == 0) {
unsigned long cpu = ~0;
struct task_struct *save_current_task = kdb_current_task;
@@ -211,7 +209,7 @@ kdb_bt(int argc, const char **argv)
kdb_show_stack(kdb_current_task, (void *)addr);
return 0;
} else {
- return kdb_bt1(kdb_current_task, ~0UL, argcount, 0);
+ return kdb_bt1(kdb_current_task, ~0UL, false);
}
}
--
2.23.0.351.gc4317032e6-goog
In kdb when you do 'btc' (back trace on CPU) it doesn't necessarily
give you the right info. Specifically on many architectures
(including arm64, where I tested) you can't dump the stack of a
"running" process that isn't the process running on the current CPU.
This can be seen by this:
echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
# wait 2 seconds
<sysrq>g
Here's what I see now on rk3399-gru-kevin. I see the stack crawl for
the CPU that handled the sysrq but everything else just shows me stuck
in __switch_to() which is bogus:
======
[0]kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0, 1-3(I), 4, 5(I)
Stack traceback for pid 0
0xffffff801101a9c0 0 0 1 0 R 0xffffff801101b3b0 *swapper/0
Call trace:
dump_backtrace+0x0/0x138
...
kgdb_compiled_brk_fn+0x34/0x44
...
sysrq_handle_dbg+0x34/0x5c
Stack traceback for pid 0
0xffffffc0f175a040 0 0 1 1 I 0xffffffc0f175aa30 swapper/1
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65616c0
Stack traceback for pid 0
0xffffffc0f175d040 0 0 1 2 I 0xffffffc0f175da30 swapper/2
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65806c0
Stack traceback for pid 0
0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f659f6c0
Stack traceback for pid 1474
0xffffffc0dde8b040 1474 727 1 4 R 0xffffffc0dde8ba30 bash
Call trace:
__switch_to+0x1e4/0x240
__schedule+0x464/0x618
0xffffffc0dde8b040
Stack traceback for pid 0
0xffffffc0f17b0040 0 0 1 5 I 0xffffffc0f17b0a30 swapper/5
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65dd6c0
===
The problem is that 'btc' eventually boils down to
show_stack(task_struct, NULL);
...and show_stack() doesn't work for "running" CPUs because their
registers haven't been stashed.
On x86 things might work better (I haven't tested) because kdb has a
special case for x86 in kdb_show_stack() where it passes the stack
pointer to show_stack(). This wouldn't work on arm64 where the stack
crawling function seems needs the "fp" and "pc", not the "sp" which is
presumably why arm64's show_stack() function totally ignores the "sp"
parameter.
NOTE: we _can_ get a good stack dump for all the cpus if we manually
switch each one to the kdb master and do a back trace. AKA:
cpu 4
bt
...will give the expected trace. That's because now arm64's
dump_backtrace will now see that "tsk == current" and go through a
different path.
In this patch I fix the problems by catching a request to stack crawl
a task that's running on a CPU and then I ask that CPU to do the stack
crawl.
NOTE: this will (presumably) change what stack crawls are printed for
x86 machines. Now kdb functions will show up in the stack crawl.
Presumably this is OK but if it's not we can go back and add a special
case for x86 again.
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- Use exception state instead of new dbg_slave_dumpstack_cpu var.
- Move horror to debug core, cleaning up control flow.
- Avoid need for timeout by only waiting for CPUs marked as slaves.
Changes in v2:
- Totally new approach; now arch agnostic.
kernel/debug/debug_core.c | 34 ++++++++++++++++++++++++++++++++++
kernel/debug/debug_core.h | 2 ++
kernel/debug/kdb/kdb_bt.c | 19 +++++++------------
3 files changed, 43 insertions(+), 12 deletions(-)
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 10f1187b3907..5456e09d9354 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -441,6 +441,37 @@ int dbg_remove_all_break(void)
return 0;
}
+#ifdef CONFIG_KGDB_KDB
+void kdb_dump_stack_on_cpu(int cpu)
+{
+ if (cpu == raw_smp_processor_id()) {
+ dump_stack();
+ return;
+ }
+
+ if (!(kgdb_info[cpu].exception_state & DCPU_IS_SLAVE)) {
+ kdb_printf("ERROR: Task on cpu %d didn't stop in the debugger\n",
+ cpu);
+ return;
+ }
+
+ /*
+ * In general, architectures don't support dumping the stack of a
+ * "running" process that's not the current one. From the point of
+ * view of the Linux, kernel processes that are looping in the kgdb
+ * slave loop are still "running". There's also no API (that actually
+ * works across all architectures) that can do a stack crawl based
+ * on registers passed as a parameter.
+ *
+ * Solve this conundrum by asking slave CPUs to do the backtrace
+ * themselves.
+ */
+ kgdb_info[cpu].exception_state |= DCPU_WANT_BT;
+ while (kgdb_info[cpu].exception_state & DCPU_WANT_BT)
+ cpu_relax();
+}
+#endif
+
/*
* Return true if there is a valid kgdb I/O module. Also if no
* debugger is attached a message can be printed to the console about
@@ -580,6 +611,9 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
atomic_xchg(&kgdb_active, cpu);
break;
}
+ } else if (kgdb_info[cpu].exception_state & DCPU_WANT_BT) {
+ dump_stack();
+ kgdb_info[cpu].exception_state &= ~DCPU_WANT_BT;
} else if (kgdb_info[cpu].exception_state & DCPU_IS_SLAVE) {
if (!raw_spin_is_locked(&dbg_slave_lock))
goto return_normal;
diff --git a/kernel/debug/debug_core.h b/kernel/debug/debug_core.h
index 804b0fe5a0ba..cd22b5f68831 100644
--- a/kernel/debug/debug_core.h
+++ b/kernel/debug/debug_core.h
@@ -33,6 +33,7 @@ struct kgdb_state {
#define DCPU_WANT_MASTER 0x1 /* Waiting to become a master kgdb cpu */
#define DCPU_NEXT_MASTER 0x2 /* Transition from one master cpu to another */
#define DCPU_IS_SLAVE 0x4 /* Slave cpu enter exception */
+#define DCPU_WANT_BT 0x8 /* Slave cpu should backtrace then clear flag */
struct debuggerinfo_struct {
void *debuggerinfo;
@@ -75,6 +76,7 @@ extern int kdb_stub(struct kgdb_state *ks);
extern int kdb_parse(const char *cmdstr);
extern int kdb_common_init_state(struct kgdb_state *ks);
extern int kdb_common_deinit_state(void);
+extern void kdb_dump_stack_on_cpu(int cpu);
#else /* ! CONFIG_KGDB_KDB */
static inline int kdb_stub(struct kgdb_state *ks)
{
diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
index d9af139f9a31..0e94efe07b72 100644
--- a/kernel/debug/kdb/kdb_bt.c
+++ b/kernel/debug/kdb/kdb_bt.c
@@ -22,20 +22,15 @@
static void kdb_show_stack(struct task_struct *p, void *addr)
{
int old_lvl = console_loglevel;
+
console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;
kdb_trap_printk++;
- kdb_set_current_task(p);
- if (addr) {
- show_stack((struct task_struct *)p, addr);
- } else if (kdb_current_regs) {
-#ifdef CONFIG_X86
- show_stack(p, &kdb_current_regs->sp);
-#else
- show_stack(p, NULL);
-#endif
- } else {
- show_stack(p, NULL);
- }
+
+ if (!addr && kdb_task_has_cpu(p))
+ kdb_dump_stack_on_cpu(kdb_process_cpu(p));
+ else
+ show_stack(p, addr);
+
console_loglevel = old_lvl;
kdb_trap_printk--;
}
--
2.23.0.351.gc4317032e6-goog
I noticed that when I did "btc <cpu>" and the CPU I passed in hadn't
rounded up that I'd crash. I was going to copy the same fix from
commit 162bc7f5afd7 ("kdb: Don't back trace on a cpu that didn't round
up") into the "not all the CPUs" case, but decided it'd be better to
clean things up a little bit.
This consolidates the two code paths. It is _slightly_ wasteful in in
that the checks for "cpu" being too small or being offline isn't
really needed when we're iterating over all online CPUs, but that
really shouldn't hurt. Better to have the same code path.
While at it, eliminate at least one slightly ugly (and totally
needless) recursive use of kdb_parse().
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v3:
- Patch ("kdb: Fix "btc <cpu>" crash if the CPU...") new for v3.
Changes in v2: None
kernel/debug/kdb/kdb_bt.c | 61 ++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 27 deletions(-)
diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
index 120fc686c919..d9af139f9a31 100644
--- a/kernel/debug/kdb/kdb_bt.c
+++ b/kernel/debug/kdb/kdb_bt.c
@@ -101,6 +101,27 @@ kdb_bt1(struct task_struct *p, unsigned long mask, bool btaprompt)
return 0;
}
+static void
+kdb_bt_cpu(unsigned long cpu)
+{
+ struct task_struct *kdb_tsk;
+
+ if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
+ kdb_printf("WARNING: no process for cpu %ld\n", cpu);
+ return;
+ }
+
+ /* If a CPU failed to round up we could be here */
+ kdb_tsk = KDB_TSK(cpu);
+ if (!kdb_tsk) {
+ kdb_printf("WARNING: no task for cpu %ld\n", cpu);
+ return;
+ }
+
+ kdb_set_current_task(kdb_tsk);
+ kdb_bt1(kdb_tsk, ~0UL, false);
+}
+
int
kdb_bt(int argc, const char **argv)
{
@@ -161,7 +182,6 @@ kdb_bt(int argc, const char **argv)
} else if (strcmp(argv[0], "btc") == 0) {
unsigned long cpu = ~0;
struct task_struct *save_current_task = kdb_current_task;
- char buf[80];
if (argc > 1)
return KDB_ARGCOUNT;
if (argc == 1) {
@@ -169,35 +189,22 @@ kdb_bt(int argc, const char **argv)
if (diag)
return diag;
}
- /* Recursive use of kdb_parse, do not use argv after
- * this point */
- argv = NULL;
if (cpu != ~0) {
- if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
- kdb_printf("no process for cpu %ld\n", cpu);
- return 0;
- }
- sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
- kdb_parse(buf);
- return 0;
- }
- kdb_printf("btc: cpu status: ");
- kdb_parse("cpu\n");
- for_each_online_cpu(cpu) {
- void *kdb_tsk = KDB_TSK(cpu);
-
- /* If a CPU failed to round up we could be here */
- if (!kdb_tsk) {
- kdb_printf("WARNING: no task for cpu %ld\n",
- cpu);
- continue;
+ kdb_bt_cpu(cpu);
+ } else {
+ /*
+ * Recursive use of kdb_parse, do not use argv after
+ * this point.
+ */
+ argv = NULL;
+ kdb_printf("btc: cpu status: ");
+ kdb_parse("cpu\n");
+ for_each_online_cpu(cpu) {
+ kdb_bt_cpu(cpu);
+ touch_nmi_watchdog();
}
-
- sprintf(buf, "btt 0x%px\n", kdb_tsk);
- kdb_parse(buf);
- touch_nmi_watchdog();
+ kdb_set_current_task(save_current_task);
}
- kdb_set_current_task(save_current_task);
return 0;
} else {
if (argc) {
--
2.23.0.351.gc4317032e6-goog
On 9/25/19 3:02 PM, Douglas Anderson wrote:
> From doing a 'git log --patch kernel/debug', it looks as if DCPU_SSTEP
> has never been used. Presumably it used to be used back when kgdb was
> out of tree and nobody thought to delete the definition when the usage
> went away. Delete.
>
> Signed-off-by: Douglas Anderson <[email protected]>
The history on this one is that it was part of the logic for the soft stepping on ARM v5 cores. The code was never merged to the mainline for doing this, so the .h definition can certainly go.
Acked-by: Jason Wessel <[email protected]>
> ---
>
> Changes in v3:
> - Patch ("Remove unused DCPU_SSTEP definition") new for v3.
>
> Changes in v2: None
>
> kernel/debug/debug_core.h | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/kernel/debug/debug_core.h b/kernel/debug/debug_core.h
> index b4a7c326d546..804b0fe5a0ba 100644
> --- a/kernel/debug/debug_core.h
> +++ b/kernel/debug/debug_core.h
> @@ -33,7 +33,6 @@ struct kgdb_state {
> #define DCPU_WANT_MASTER 0x1 /* Waiting to become a master kgdb cpu */
> #define DCPU_NEXT_MASTER 0x2 /* Transition from one master cpu to another */
> #define DCPU_IS_SLAVE 0x4 /* Slave cpu enter exception */
> -#define DCPU_SSTEP 0x8 /* CPU is single stepping */
>
> struct debuggerinfo_struct {
> void *debuggerinfo;
>
On Wed, Sep 25, 2019 at 01:02:19PM -0700, Douglas Anderson wrote:
> I noticed that when I did "btc <cpu>" and the CPU I passed in hadn't
> rounded up that I'd crash. I was going to copy the same fix from
> commit 162bc7f5afd7 ("kdb: Don't back trace on a cpu that didn't round
> up") into the "not all the CPUs" case, but decided it'd be better to
> clean things up a little bit.
>
> This consolidates the two code paths. It is _slightly_ wasteful in in
nit: in in
Will
On Wed, Sep 25, 2019 at 01:02:16PM -0700, Douglas Anderson wrote:
> Please enjoy.
This comment made me smile and then I ended up reading all the patches,
so FWIW:
Acked-by: Will Deacon <[email protected]>
Will
On Wed, Sep 25, 2019 at 01:02:19PM -0700, Douglas Anderson wrote:
>
> I noticed that when I did "btc <cpu>" and the CPU I passed in hadn't
> rounded up that I'd crash. I was going to copy the same fix from
> commit 162bc7f5afd7 ("kdb: Don't back trace on a cpu that didn't round
> up") into the "not all the CPUs" case, but decided it'd be better to
> clean things up a little bit.
>
> This consolidates the two code paths. It is _slightly_ wasteful in in
> that the checks for "cpu" being too small or being offline isn't
> really needed when we're iterating over all online CPUs, but that
> really shouldn't hurt. Better to have the same code path.
>
> While at it, eliminate at least one slightly ugly (and totally
> needless) recursive use of kdb_parse().
>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> Changes in v3:
> - Patch ("kdb: Fix "btc <cpu>" crash if the CPU...") new for v3.
>
> Changes in v2: None
>
> kernel/debug/kdb/kdb_bt.c | 61 ++++++++++++++++++++++-----------------
> 1 file changed, 34 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
> index 120fc686c919..d9af139f9a31 100644
> --- a/kernel/debug/kdb/kdb_bt.c
> +++ b/kernel/debug/kdb/kdb_bt.c
> @@ -101,6 +101,27 @@ kdb_bt1(struct task_struct *p, unsigned long mask, bool btaprompt)
> return 0;
> }
>
> +static void
> +kdb_bt_cpu(unsigned long cpu)
> +{
> + struct task_struct *kdb_tsk;
> +
> + if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
> + kdb_printf("WARNING: no process for cpu %ld\n", cpu);
> + return;
> + }
> +
> + /* If a CPU failed to round up we could be here */
> + kdb_tsk = KDB_TSK(cpu);
> + if (!kdb_tsk) {
> + kdb_printf("WARNING: no task for cpu %ld\n", cpu);
> + return;
> + }
> +
> + kdb_set_current_task(kdb_tsk);
> + kdb_bt1(kdb_tsk, ~0UL, false);
> +}
> +
> int
> kdb_bt(int argc, const char **argv)
> {
> @@ -161,7 +182,6 @@ kdb_bt(int argc, const char **argv)
> } else if (strcmp(argv[0], "btc") == 0) {
> unsigned long cpu = ~0;
> struct task_struct *save_current_task = kdb_current_task;
> - char buf[80];
> if (argc > 1)
> return KDB_ARGCOUNT;
> if (argc == 1) {
> @@ -169,35 +189,22 @@ kdb_bt(int argc, const char **argv)
> if (diag)
> return diag;
> }
> - /* Recursive use of kdb_parse, do not use argv after
> - * this point */
> - argv = NULL;
> if (cpu != ~0) {
> - if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
> - kdb_printf("no process for cpu %ld\n", cpu);
> - return 0;
> - }
> - sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
> - kdb_parse(buf);
> - return 0;
> - }
> - kdb_printf("btc: cpu status: ");
> - kdb_parse("cpu\n");
> - for_each_online_cpu(cpu) {
> - void *kdb_tsk = KDB_TSK(cpu);
> -
> - /* If a CPU failed to round up we could be here */
> - if (!kdb_tsk) {
> - kdb_printf("WARNING: no task for cpu %ld\n",
> - cpu);
> - continue;
> + kdb_bt_cpu(cpu);
> + } else {
> + /*
> + * Recursive use of kdb_parse, do not use argv after
> + * this point.
> + */
> + argv = NULL;
> + kdb_printf("btc: cpu status: ");
> + kdb_parse("cpu\n");
> + for_each_online_cpu(cpu) {
> + kdb_bt_cpu(cpu);
> + touch_nmi_watchdog();
> }
> -
> - sprintf(buf, "btt 0x%px\n", kdb_tsk);
> - kdb_parse(buf);
> - touch_nmi_watchdog();
> + kdb_set_current_task(save_current_task);
> }
> - kdb_set_current_task(save_current_task);
Why does this move out into only one of the conditional branches?
Don't both of the above paths modify the current task?
Daniel.
> return 0;
> } else {
> if (argc) {
> --
> 2.23.0.351.gc4317032e6-goog
>
Hi,
On Mon, Oct 7, 2019 at 6:55 AM Daniel Thompson
<[email protected]> wrote:
>
> On Wed, Sep 25, 2019 at 01:02:19PM -0700, Douglas Anderson wrote:
> >
> > I noticed that when I did "btc <cpu>" and the CPU I passed in hadn't
> > rounded up that I'd crash. I was going to copy the same fix from
> > commit 162bc7f5afd7 ("kdb: Don't back trace on a cpu that didn't round
> > up") into the "not all the CPUs" case, but decided it'd be better to
> > clean things up a little bit.
> >
> > This consolidates the two code paths. It is _slightly_ wasteful in in
> > that the checks for "cpu" being too small or being offline isn't
> > really needed when we're iterating over all online CPUs, but that
> > really shouldn't hurt. Better to have the same code path.
> >
> > While at it, eliminate at least one slightly ugly (and totally
> > needless) recursive use of kdb_parse().
> >
> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
> >
> > Changes in v3:
> > - Patch ("kdb: Fix "btc <cpu>" crash if the CPU...") new for v3.
> >
> > Changes in v2: None
> >
> > kernel/debug/kdb/kdb_bt.c | 61 ++++++++++++++++++++++-----------------
> > 1 file changed, 34 insertions(+), 27 deletions(-)
> >
> > diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
> > index 120fc686c919..d9af139f9a31 100644
> > --- a/kernel/debug/kdb/kdb_bt.c
> > +++ b/kernel/debug/kdb/kdb_bt.c
> > @@ -101,6 +101,27 @@ kdb_bt1(struct task_struct *p, unsigned long mask, bool btaprompt)
> > return 0;
> > }
> >
> > +static void
> > +kdb_bt_cpu(unsigned long cpu)
> > +{
> > + struct task_struct *kdb_tsk;
> > +
> > + if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
> > + kdb_printf("WARNING: no process for cpu %ld\n", cpu);
> > + return;
> > + }
> > +
> > + /* If a CPU failed to round up we could be here */
> > + kdb_tsk = KDB_TSK(cpu);
> > + if (!kdb_tsk) {
> > + kdb_printf("WARNING: no task for cpu %ld\n", cpu);
> > + return;
> > + }
> > +
> > + kdb_set_current_task(kdb_tsk);
> > + kdb_bt1(kdb_tsk, ~0UL, false);
> > +}
> > +
> > int
> > kdb_bt(int argc, const char **argv)
> > {
> > @@ -161,7 +182,6 @@ kdb_bt(int argc, const char **argv)
> > } else if (strcmp(argv[0], "btc") == 0) {
> > unsigned long cpu = ~0;
> > struct task_struct *save_current_task = kdb_current_task;
> > - char buf[80];
> > if (argc > 1)
> > return KDB_ARGCOUNT;
> > if (argc == 1) {
> > @@ -169,35 +189,22 @@ kdb_bt(int argc, const char **argv)
> > if (diag)
> > return diag;
> > }
> > - /* Recursive use of kdb_parse, do not use argv after
> > - * this point */
> > - argv = NULL;
> > if (cpu != ~0) {
> > - if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
> > - kdb_printf("no process for cpu %ld\n", cpu);
> > - return 0;
> > - }
> > - sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
> > - kdb_parse(buf);
> > - return 0;
> > - }
> > - kdb_printf("btc: cpu status: ");
> > - kdb_parse("cpu\n");
> > - for_each_online_cpu(cpu) {
> > - void *kdb_tsk = KDB_TSK(cpu);
> > -
> > - /* If a CPU failed to round up we could be here */
> > - if (!kdb_tsk) {
> > - kdb_printf("WARNING: no task for cpu %ld\n",
> > - cpu);
> > - continue;
> > + kdb_bt_cpu(cpu);
> > + } else {
> > + /*
> > + * Recursive use of kdb_parse, do not use argv after
> > + * this point.
> > + */
> > + argv = NULL;
> > + kdb_printf("btc: cpu status: ");
> > + kdb_parse("cpu\n");
> > + for_each_online_cpu(cpu) {
> > + kdb_bt_cpu(cpu);
> > + touch_nmi_watchdog();
> > }
> > -
> > - sprintf(buf, "btt 0x%px\n", kdb_tsk);
> > - kdb_parse(buf);
> > - touch_nmi_watchdog();
> > + kdb_set_current_task(save_current_task);
> > }
> > - kdb_set_current_task(save_current_task);
>
> Why does this move out into only one of the conditional branches?
> Don't both of the above paths modify the current task?
The old code has a "return 0 in the case that "cpu != ~0", so this
basically matches the prior behavior in restoring the current task for
a "btc" but not leaving the current task changed in the case of "btc
<cpu>". Thus my patch doesn't actually change the existing behavior,
but I guess that it does make the control flow simpler so it's easier
to understand what the behavior is. ;-)
Reading through other control flows of the various backtrace commands,
it looks like it is intentional to leave the current task changed when
you explicitly do an action on that task (or a CPU).
Actually, though, it wasn't clear to me that it ever made sense for
any of these commands to implicitly leave the current task changed.
If you agree, I can send a follow-up patch to change this behavior.
-Doug
On Mon, Oct 07, 2019 at 04:34:55PM -0700, Doug Anderson wrote:
> Hi,
>
> On Mon, Oct 7, 2019 at 6:55 AM Daniel Thompson
> <[email protected]> wrote:
> >
> > On Wed, Sep 25, 2019 at 01:02:19PM -0700, Douglas Anderson wrote:
> > >
> > > I noticed that when I did "btc <cpu>" and the CPU I passed in hadn't
> > > rounded up that I'd crash. I was going to copy the same fix from
> > > commit 162bc7f5afd7 ("kdb: Don't back trace on a cpu that didn't round
> > > up") into the "not all the CPUs" case, but decided it'd be better to
> > > clean things up a little bit.
> > >
> > > This consolidates the two code paths. It is _slightly_ wasteful in in
> > > that the checks for "cpu" being too small or being offline isn't
> > > really needed when we're iterating over all online CPUs, but that
> > > really shouldn't hurt. Better to have the same code path.
> > >
> > > While at it, eliminate at least one slightly ugly (and totally
> > > needless) recursive use of kdb_parse().
> > >
> > > Signed-off-by: Douglas Anderson <[email protected]>
> > > ---
> > >
> > > Changes in v3:
> > > - Patch ("kdb: Fix "btc <cpu>" crash if the CPU...") new for v3.
> > >
> > > Changes in v2: None
> > >
> > > kernel/debug/kdb/kdb_bt.c | 61 ++++++++++++++++++++++-----------------
> > > 1 file changed, 34 insertions(+), 27 deletions(-)
> > >
> > > diff --git a/kernel/debug/kdb/kdb_bt.c b/kernel/debug/kdb/kdb_bt.c
> > > index 120fc686c919..d9af139f9a31 100644
> > > --- a/kernel/debug/kdb/kdb_bt.c
> > > +++ b/kernel/debug/kdb/kdb_bt.c
> > > @@ -101,6 +101,27 @@ kdb_bt1(struct task_struct *p, unsigned long mask, bool btaprompt)
> > > return 0;
> > > }
> > >
> > > +static void
> > > +kdb_bt_cpu(unsigned long cpu)
> > > +{
> > > + struct task_struct *kdb_tsk;
> > > +
> > > + if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
> > > + kdb_printf("WARNING: no process for cpu %ld\n", cpu);
> > > + return;
> > > + }
> > > +
> > > + /* If a CPU failed to round up we could be here */
> > > + kdb_tsk = KDB_TSK(cpu);
> > > + if (!kdb_tsk) {
> > > + kdb_printf("WARNING: no task for cpu %ld\n", cpu);
> > > + return;
> > > + }
> > > +
> > > + kdb_set_current_task(kdb_tsk);
> > > + kdb_bt1(kdb_tsk, ~0UL, false);
> > > +}
> > > +
> > > int
> > > kdb_bt(int argc, const char **argv)
> > > {
> > > @@ -161,7 +182,6 @@ kdb_bt(int argc, const char **argv)
> > > } else if (strcmp(argv[0], "btc") == 0) {
> > > unsigned long cpu = ~0;
> > > struct task_struct *save_current_task = kdb_current_task;
> > > - char buf[80];
> > > if (argc > 1)
> > > return KDB_ARGCOUNT;
> > > if (argc == 1) {
> > > @@ -169,35 +189,22 @@ kdb_bt(int argc, const char **argv)
> > > if (diag)
> > > return diag;
> > > }
> > > - /* Recursive use of kdb_parse, do not use argv after
> > > - * this point */
> > > - argv = NULL;
> > > if (cpu != ~0) {
> > > - if (cpu >= num_possible_cpus() || !cpu_online(cpu)) {
> > > - kdb_printf("no process for cpu %ld\n", cpu);
> > > - return 0;
> > > - }
> > > - sprintf(buf, "btt 0x%px\n", KDB_TSK(cpu));
> > > - kdb_parse(buf);
> > > - return 0;
> > > - }
> > > - kdb_printf("btc: cpu status: ");
> > > - kdb_parse("cpu\n");
> > > - for_each_online_cpu(cpu) {
> > > - void *kdb_tsk = KDB_TSK(cpu);
> > > -
> > > - /* If a CPU failed to round up we could be here */
> > > - if (!kdb_tsk) {
> > > - kdb_printf("WARNING: no task for cpu %ld\n",
> > > - cpu);
> > > - continue;
> > > + kdb_bt_cpu(cpu);
> > > + } else {
> > > + /*
> > > + * Recursive use of kdb_parse, do not use argv after
> > > + * this point.
> > > + */
> > > + argv = NULL;
> > > + kdb_printf("btc: cpu status: ");
> > > + kdb_parse("cpu\n");
> > > + for_each_online_cpu(cpu) {
> > > + kdb_bt_cpu(cpu);
> > > + touch_nmi_watchdog();
> > > }
> > > -
> > > - sprintf(buf, "btt 0x%px\n", kdb_tsk);
> > > - kdb_parse(buf);
> > > - touch_nmi_watchdog();
> > > + kdb_set_current_task(save_current_task);
> > > }
> > > - kdb_set_current_task(save_current_task);
> >
> > Why does this move out into only one of the conditional branches?
> > Don't both of the above paths modify the current task?
>
> The old code has a "return 0 in the case that "cpu != ~0", so this
> basically matches the prior behavior in restoring the current task for
> a "btc" but not leaving the current task changed in the case of "btc
> <cpu>". Thus my patch doesn't actually change the existing behavior,
> but I guess that it does make the control flow simpler so it's easier
> to understand what the behavior is. ;-)
Point taken. Horrific though it may be ;-) .
> Reading through other control flows of the various backtrace commands,
> it looks like it is intentional to leave the current task changed when
> you explicitly do an action on that task (or a CPU).
>
> Actually, though, it wasn't clear to me that it ever made sense for
> any of these commands to implicitly leave the current task changed.
> If you agree, I can send a follow-up patch to change this behavior.
Personally I don't like implicit changes of state but I might need a bit
more thinking to agree (or disagree ;-) ).
Daniel.
Hi,
On Thu, Oct 10, 2019 at 8:07 AM Daniel Thompson
<[email protected]> wrote:
> > Reading through other control flows of the various backtrace commands,
> > it looks like it is intentional to leave the current task changed when
> > you explicitly do an action on that task (or a CPU).
> >
> > Actually, though, it wasn't clear to me that it ever made sense for
> > any of these commands to implicitly leave the current task changed.
> > If you agree, I can send a follow-up patch to change this behavior.
>
> Personally I don't like implicit changes of state but I might need a bit
> more thinking to agree (or disagree ;-) ).
I can post up a followup after this series lands and change it. I
have a feeling nobody is relying on the old behavior and thus nobody
will notice but it would be nice to get this cleaner.
BTW: if you want me to spin to fix the commit message typo that Will
found or add his Ack to the series, let me know. Otherwise I'll
assume that the typo can be fixed and Acks added when the patch is
applied.
-Doug
On Wed, Sep 25, 2019 at 01:02:16PM -0700, Douglas Anderson wrote:
>
> This series has a few kdb fixes for back tracing on CPUs. The
> previous version[1] had only one patch, but while making v3 I found a
> few cleanups that made sense to break into other pieces.
>
> As with all things kdb / kgdb, this patch set tries to inch us towards
> a better state of the world but doesn't attempt to solve all known
> problems.
>
> Please enjoy.
Applied.
Note: Given this series alters a very long standing behaviour I've queued
it for v5.5 rather than add it to a fixes branch. It should land
in linux-next shortly.
Daniel.
Hi,
On Thu, Oct 10, 2019 at 9:38 AM Doug Anderson <[email protected]> wrote:
>
> Hi,
>
> On Thu, Oct 10, 2019 at 8:07 AM Daniel Thompson
> <[email protected]> wrote:
> > > Reading through other control flows of the various backtrace commands,
> > > it looks like it is intentional to leave the current task changed when
> > > you explicitly do an action on that task (or a CPU).
> > >
> > > Actually, though, it wasn't clear to me that it ever made sense for
> > > any of these commands to implicitly leave the current task changed.
> > > If you agree, I can send a follow-up patch to change this behavior.
> >
> > Personally I don't like implicit changes of state but I might need a bit
> > more thinking to agree (or disagree ;-) ).
>
> I can post up a followup after this series lands and change it. I
> have a feeling nobody is relying on the old behavior and thus nobody
> will notice but it would be nice to get this cleaner.
Sorry it took so long, but follow-up series can be found at:
https://lore.kernel.org/r/[email protected]
-Doug