2024-02-02 13:29:38

by Feng Tang

[permalink] [raw]
Subject: [PATCH] panic: add option to dump blocked tasks in panic_print

For debugging kernel panic and other bugs, there is already option of
panic_print to dump all tasks' call stacks. On today's large servers
running many containers, there could be thousands of tasks or more,
and it will print out huge amount of call stacks, and take a lot of
time (for serial console which is main target user case of panic_print).

And in many cases, only those several tasks being blocked is key for
the panic, so add an option to only dump blocked tasks' call stack.

Signed-off-by: Feng Tang <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 1 +
Documentation/admin-guide/sysctl/kernel.rst | 1 +
kernel/panic.c | 4 ++++
3 files changed, 6 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 31b3a25680d0..0f2369e87175 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4182,6 +4182,7 @@
bit 4: print ftrace buffer
bit 5: print all printk messages in buffer
bit 6: print all CPUs backtrace (if available in the arch)
+ bit 7: print tasks in uninterruptible (blocked) state
*Be aware* that this option may print a _lot_ of lines,
so there are risks of losing older messages in the log.
Use this option carefully, maybe worth to setup a
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 6584a1f9bfe3..e066a16b35d5 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -850,6 +850,7 @@ bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
bit 4 print ftrace buffer
bit 5 print all printk messages in buffer
bit 6 print all CPUs backtrace (if available in the arch)
+bit 7 print tasks in uninterruptible (blocked) state
===== ============================================

So for example to print tasks and memory info on panic, user can::
diff --git a/kernel/panic.c b/kernel/panic.c
index 2807639aab51..aa17ae0897c0 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -73,6 +73,7 @@ EXPORT_SYMBOL_GPL(panic_timeout);
#define PANIC_PRINT_FTRACE_INFO 0x00000010
#define PANIC_PRINT_ALL_PRINTK_MSG 0x00000020
#define PANIC_PRINT_ALL_CPU_BT 0x00000040
+#define PANIC_PRINT_BLOCKED_TASKS 0x00000080
unsigned long panic_print;

ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
@@ -227,6 +228,9 @@ static void panic_print_sys_info(bool console_flush)

if (panic_print & PANIC_PRINT_FTRACE_INFO)
ftrace_dump(DUMP_ALL);
+
+ if (panic_print & PANIC_PRINT_BLOCKED_TASKS)
+ show_state_filter(TASK_UNINTERRUPTIBLE);
}

void check_panic_on_warn(const char *origin)
--
2.34.1



2024-02-03 12:12:18

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH] panic: add option to dump blocked tasks in panic_print

On 02/02/2024 10:20, Feng Tang wrote:
> For debugging kernel panic and other bugs, there is already option of
> panic_print to dump all tasks' call stacks. On today's large servers
> running many containers, there could be thousands of tasks or more,
> and it will print out huge amount of call stacks, and take a lot of
> time (for serial console which is main target user case of panic_print).
>
> And in many cases, only those several tasks being blocked is key for
> the panic, so add an option to only dump blocked tasks' call stack.
>
> Signed-off-by: Feng Tang <[email protected]>
> [...]

Thank you Feng Tang, this is an interesting and useful idea!
I've just tested the patch and works fine - also no code issues from my
side. So, feel free to add:


Tested-by: Guilherme G. Piccoli <[email protected]>


Cheers!

---
> Documentation/admin-guide/kernel-parameters.txt | 1 +
> Documentation/admin-guide/sysctl/kernel.rst | 1 +
> kernel/panic.c | 4 ++++
> 3 files changed, 6 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 31b3a25680d0..0f2369e87175 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4182,6 +4182,7 @@
> bit 4: print ftrace buffer
> bit 5: print all printk messages in buffer
> bit 6: print all CPUs backtrace (if available in the arch)
> + bit 7: print tasks in uninterruptible (blocked) state
> *Be aware* that this option may print a _lot_ of lines,
> so there are risks of losing older messages in the log.
> Use this option carefully, maybe worth to setup a
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 6584a1f9bfe3..e066a16b35d5 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -850,6 +850,7 @@ bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
> bit 4 print ftrace buffer
> bit 5 print all printk messages in buffer
> bit 6 print all CPUs backtrace (if available in the arch)
> +bit 7 print tasks in uninterruptible (blocked) state
> ===== ============================================
>
> So for example to print tasks and memory info on panic, user can::
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 2807639aab51..aa17ae0897c0 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -73,6 +73,7 @@ EXPORT_SYMBOL_GPL(panic_timeout);
> #define PANIC_PRINT_FTRACE_INFO 0x00000010
> #define PANIC_PRINT_ALL_PRINTK_MSG 0x00000020
> #define PANIC_PRINT_ALL_CPU_BT 0x00000040
> +#define PANIC_PRINT_BLOCKED_TASKS 0x00000080
> unsigned long panic_print;
>
> ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
> @@ -227,6 +228,9 @@ static void panic_print_sys_info(bool console_flush)
>
> if (panic_print & PANIC_PRINT_FTRACE_INFO)
> ftrace_dump(DUMP_ALL);
> +
> + if (panic_print & PANIC_PRINT_BLOCKED_TASKS)
> + show_state_filter(TASK_UNINTERRUPTIBLE);
> }
>
> void check_panic_on_warn(const char *origin)