2020-03-27 22:40:25

by Guilherme Piccoli

[permalink] [raw]
Subject: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

Commit 401c636a0eeb ("kernel/hung_task.c: show all hung tasks before panic")
introduced a change in that we started to show all CPUs backtraces when a
hung task is detected _and_ the sysctl/kernel parameter "hung_task_panic"
is set. The idea is good, because usually when observing deadlocks (that
may lead to hung tasks), the culprit is another task holding a lock and
not necessarily the task detected as hung.

The problem with this approach is that dumping backtraces is a slightly
expensive task, specially printing that on console (and specially in many
CPU machines, as servers commonly found nowadays). So, users that plan to
collect a kdump to investigate the hung tasks and narrow down the deadlock
definitely don't need the CPUs backtrace on dmesg/console, which will delay
the panic and pollute the log (crash tool would easily grab all CPUs traces
with 'bt -a' command).
Also, there's the reciprocal scenario: some users may be interested in
seeing the CPUs backtraces but not have the system panic when a hung task
is detected. The current approach hence is almost as embedding a policy in
the kernel, by forcing the CPUs backtraces' dump (only) on hung_task_panic.

This patch decouples the panic event on hung task from the CPUs
backtraces dump, by creating (and documenting) a new sysctl called
"hung_task_all_cpu_backtrace", analog to the approach taken on soft/hard
lockups, that have both a panic and an "all_cpu_backtrace" sysctl to allow
individual control. The new mechanism for dumping the CPUs backtraces on
hung task detection respects "hung_task_warnings" by not dumping the
traces in case there's no warnings left.

Cc: Tetsuo Handa <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---

V3: Following the suggestion from Vlastimil, removed the kernel parameter
since soon we (hopefully) will have a generic approach to set sysctls via
kernel parameters[0] - thanks Vlastimil, great idea!

Thanks,

Guilherme

[0] lore.kernel.org/lkml/[email protected]/T


Documentation/admin-guide/sysctl/kernel.rst | 15 +++++++++++++++
include/linux/sched/sysctl.h | 7 +++++++
kernel/hung_task.c | 20 ++++++++++++++++++--
kernel/sysctl.c | 11 +++++++++++
4 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index def074807cee..8b4ff69d2348 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -40,6 +40,7 @@ show up in /proc/sys/kernel:
- hotplug
- hardlockup_all_cpu_backtrace
- hardlockup_panic
+- hung_task_all_cpu_backtrace
- hung_task_panic
- hung_task_check_count
- hung_task_timeout_secs
@@ -338,6 +339,20 @@ Path for the hotplug policy agent.
Default value is "/sbin/hotplug".


+hung_task_all_cpu_backtrace:
+================
+
+If this option is set, the kernel will send an NMI to all CPUs to dump
+their backtraces when a hung task is detected. This file shows up if
+CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.
+
+0: Won't show all CPUs backtraces when a hung task is detected.
+This is the default behavior.
+
+1: Will non-maskably interrupt all CPUs and dump their backtraces when
+a hung task is detected.
+
+
hung_task_panic:
================

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index d4f6215ee03f..8cd29440ec8a 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -7,6 +7,13 @@
struct ctl_table;

#ifdef CONFIG_DETECT_HUNG_TASK
+
+#ifdef CONFIG_SMP
+extern unsigned int sysctl_hung_task_all_cpu_backtrace;
+#else
+#define sysctl_hung_task_all_cpu_backtrace 0
+#endif /* CONFIG_SMP */
+
extern int sysctl_hung_task_check_count;
extern unsigned int sysctl_hung_task_panic;
extern unsigned long sysctl_hung_task_timeout_secs;
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 14a625c16cb3..9a774aee1a44 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -53,9 +53,18 @@ int __read_mostly sysctl_hung_task_warnings = 10;
static int __read_mostly did_panic;
static bool hung_task_show_lock;
static bool hung_task_call_panic;
+static bool hung_task_show_all_bt;

static struct task_struct *watchdog_task;

+#ifdef CONFIG_SMP
+/*
+ * Should we dump all CPUs backtraces in a hung task event?
+ * Defaults to 0, can be changed via sysctl.
+ */
+unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace;
+#endif /* CONFIG_SMP */
+
/*
* Should we panic (and reboot, if panic_timeout= is set) when a
* hung task is detected:
@@ -137,6 +146,9 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
" disables this message.\n");
sched_show_task(t);
hung_task_show_lock = true;
+
+ if (sysctl_hung_task_all_cpu_backtrace)
+ hung_task_show_all_bt = true;
}

touch_nmi_watchdog();
@@ -201,10 +213,14 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
rcu_read_unlock();
if (hung_task_show_lock)
debug_show_all_locks();
- if (hung_task_call_panic) {
+
+ if (hung_task_show_all_bt) {
+ hung_task_show_all_bt = false;
trigger_all_cpu_backtrace();
+ }
+
+ if (hung_task_call_panic)
panic("hung_task: blocked tasks");
- }
}

static long hung_timeout_jiffies(unsigned long last_checked,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index ad5b88a53c5a..238f268de486 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1098,6 +1098,17 @@ static struct ctl_table kern_table[] = {
},
#endif
#ifdef CONFIG_DETECT_HUNG_TASK
+#ifdef CONFIG_SMP
+ {
+ .procname = "hung_task_all_cpu_backtrace",
+ .data = &sysctl_hung_task_all_cpu_backtrace,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+#endif /* CONFIG_SMP */
{
.procname = "hung_task_panic",
.data = &sysctl_hung_task_panic,
--
2.25.1


2020-03-30 08:50:05

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

On Mon, Mar 30, 2020 at 2:43 AM Guilherme Piccoli
<[email protected]> wrote:
>
> Hi Tetsuo and Dmitry, thanks for noticing this Tetsuo. And sorry for
> not looping you in the patch Dmitry, I wasn't aware that you were
> working with testing. By the way, I suggest people interested in linux
> testing to create a ML; I'd be glad to have looped such list, but I
> couldn't find information about a group dealing with testing.
>
> So Tetsuo, you got it right: just change it to
> "sysctl.kernel.hung_task_all_cpu_backtrace=1" and that should work
> fine, once Vlastimil's patch gets merged (and I hope it happens soon).
> Cheers,
>
>
> Guilherme

+LKML, workflows, syzkaller, kernelci, cki, kbuild

Tetsuo, thanks for notifying again.

Yes, kernel devs breaking all testing happens from time to time and
currently there is no good way to address this.
Other things I remember is the introduction of CONFIG_DEBUG_MEMORY,
which defaults to =n and disables KASAN, which in turn produced an
explosion of assorted crashes caused by memory corruptions; also
periodic changes in kernel crash messages which I assume all testing
systems parse and need to understand.

Is there already a mailing list for this? Or should we create one?
I.e. announce and changes that may need actions from all testing
systems.
Another thing that may benefit from announcements is addition of new
useful debugging configs. Currently they are introduced silently and
don't reach the target audience.

2020-03-30 09:03:20

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

On Mon, Mar 30, 2020 at 10:49 AM Dmitry Vyukov <[email protected]> wrote:
>
> On Mon, Mar 30, 2020 at 2:43 AM Guilherme Piccoli
> <[email protected]> wrote:
> >
> > Hi Tetsuo and Dmitry, thanks for noticing this Tetsuo. And sorry for
> > not looping you in the patch Dmitry, I wasn't aware that you were
> > working with testing. By the way, I suggest people interested in linux
> > testing to create a ML; I'd be glad to have looped such list, but I
> > couldn't find information about a group dealing with testing.
> >
> > So Tetsuo, you got it right: just change it to
> > "sysctl.kernel.hung_task_all_cpu_backtrace=1" and that should work
> > fine, once Vlastimil's patch gets merged (and I hope it happens soon).
> > Cheers,
> >
> >
> > Guilherme
>
> +LKML, workflows, syzkaller, kernelci, cki, kbuild
>
> Tetsuo, thanks for notifying again.
>
> Yes, kernel devs breaking all testing happens from time to time and
> currently there is no good way to address this.
> Other things I remember is the introduction of CONFIG_DEBUG_MEMORY,
> which defaults to =n and disables KASAN, which in turn produced an
> explosion of assorted crashes caused by memory corruptions; also
> periodic changes in kernel crash messages which I assume all testing
> systems parse and need to understand.
>
> Is there already a mailing list for this? Or should we create one?
> I.e. announce and changes that may need actions from all testing
> systems.
> Another thing that may benefit from announcements is addition of new
> useful debugging configs. Currently they are introduced silently and
> don't reach the target audience.

I've fixed this up:
https://github.com/google/syzkaller/commit/c8d1cc20df5ca5d9ea437054720fa3cfdfa1f578

But what would be even better is some kind of canned configs/settings
for testing systems so that I enable it once and then such changes
magically auto-happen for me.
Imposing work on N testing systems maintainers is not good.
And there really is no good point in the current kernel dev process
for this. Announcing unmerged changes is too early (as this patch
showed). And once it's in linux-next it's already too late..
And I don't want to be inventing a new unique kernel configuration for
testing. I don't think it's the right way to approach this. Whatever
is "the testing configuration", whatever kernel developers want to see
in task hang reports, I just want the system to provide that.

2020-03-30 12:18:49

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

On 2020/03/30 18:01, Dmitry Vyukov wrote:
> On Mon, Mar 30, 2020 at 10:49 AM Dmitry Vyukov <[email protected]> wrote:
>>
>> On Mon, Mar 30, 2020 at 2:43 AM Guilherme Piccoli
>> <[email protected]> wrote:
>>>
>>> Hi Tetsuo and Dmitry, thanks for noticing this Tetsuo. And sorry for
>>> not looping you in the patch Dmitry, I wasn't aware that you were
>>> working with testing. By the way, I suggest people interested in linux
>>> testing to create a ML; I'd be glad to have looped such list, but I
>>> couldn't find information about a group dealing with testing.
>>>
>>> So Tetsuo, you got it right: just change it to
>>> "sysctl.kernel.hung_task_all_cpu_backtrace=1" and that should work
>>> fine, once Vlastimil's patch gets merged (and I hope it happens soon).
>>> Cheers,
>>>
>>>
>>> Guilherme
>>
>> +LKML, workflows, syzkaller, kernelci, cki, kbuild
>>
>> Tetsuo, thanks for notifying again.
>>
>> Yes, kernel devs breaking all testing happens from time to time and
>> currently there is no good way to address this.
>> Other things I remember is the introduction of CONFIG_DEBUG_MEMORY,
>> which defaults to =n and disables KASAN, which in turn produced an
>> explosion of assorted crashes caused by memory corruptions; also
>> periodic changes in kernel crash messages which I assume all testing
>> systems parse and need to understand.
>>
>> Is there already a mailing list for this? Or should we create one?
>> I.e. announce and changes that may need actions from all testing
>> systems.

I think we can create one.

>> Another thing that may benefit from announcements is addition of new
>> useful debugging configs. Currently they are introduced silently and
>> don't reach the target audience.
>
> I've fixed this up:
> https://github.com/google/syzkaller/commit/c8d1cc20df5ca5d9ea437054720fa3cfdfa1f578

Excuse me, but that commit is incorrect.
There will be no /proc/sys/sysctl/kernel/hung_task_all_cpu_backtrace file.
What Vlastimil's patch will allow is to set sysctl values via kernel command
line option using "sysctl." as a prefix. That is,

-hung_task_all_cpu_backtrace=1
+sysctl.kernel.hung_task_all_cpu_backtrace=1

on dashboard/config/*.cmdline files is what you need.

>
> But what would be even better is some kind of canned configs/settings
> for testing systems so that I enable it once and then such changes
> magically auto-happen for me.

Yes, updating kernel config is a burden for kernel testing projects.
I'm waiting for comments from more kernel testing projects on
https://lkml.kernel.org/r/[email protected] .

> Imposing work on N testing systems maintainers is not good.
> And there really is no good point in the current kernel dev process
> for this. Announcing unmerged changes is too early (as this patch
> showed). And once it's in linux-next it's already too late..
> And I don't want to be inventing a new unique kernel configuration for
> testing. I don't think it's the right way to approach this. Whatever
> is "the testing configuration", whatever kernel developers want to see
> in task hang reports, I just want the system to provide that.
>

2020-04-06 12:29:30

by Guilherme Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

Hi Andrew / Kees, sorry for the ping.
Is there anything else missing in this patch? What are the necessary
steps to get it merged?

Thanks in advance,


Guilherme

2020-04-06 15:56:28

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

On Mon, Apr 06, 2020 at 09:28:41AM -0300, Guilherme G. Piccoli wrote:
> Hi Andrew / Kees, sorry for the ping.
> Is there anything else missing in this patch? What are the necessary
> steps to get it merged?

I'm expecting Andrew to pick it up, but it's the middle of the merge
window right now, so there is likely to be a couple weeks delay before
this will appear in -mm.

--
Kees Cook

2020-04-06 18:34:09

by Guilherme Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

Thanks Kees! I was expecting this could be merged in the current
window, but there's really no problem in waiting for the next!

Cheers,


Guilherme

2020-04-06 18:38:05

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

On Mon, Apr 06, 2020 at 03:32:42PM -0300, Guilherme Piccoli wrote:
> Thanks Kees! I was expecting this could be merged in the current
> window, but there's really no problem in waiting for the next!

That's not usually how these things work, unfortunately. Stuff going
into the merge window are those things that have been living in
linux-next for a while (usually since before the -rc5). Once Andrew is
done with the merge window for the -mm tree, he'll start scanning for
new things to pull in. I expect this to be one of them. :)

--
Kees Cook

2020-04-06 18:47:12

by Guilherme Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] kernel/hung_task.c: Introduce sysctl to print all traces when a hung task is detected

Awesome, no problem at all. And thanks for all the information and review =)