2023-02-02 14:00:04

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH v3] locking/lockdep: add debug_show_all_lock_holders()

Currently, check_hung_uninterruptible_tasks() reports details of locks
held in the system. Also, lockdep_print_held_locks() does not report
details of locks held by a thread if that thread is in TASK_RUNNING state.
Several years of experience of debugging without vmcore tells me that
these limitations have been a barrier for understanding what went wrong
in syzbot's "INFO: task hung in" reports.

I initially thought that the cause of "INFO: task hung in" reports is
due to over-stressing. But I understood that over-stressing is unlikely.
I now consider that there likely is a deadlock/livelock bug where lockdep
cannot report as a deadlock when "INFO: task hung in" is reported.

A typical case is that thread-1 is waiting for something to happen (e.g.
wait_event_*()) with a lock held. When thread-2 tries to hold that lock
using e.g. mutex_lock(), check_hung_uninterruptible_tasks() reports that
thread-2 is hung and thread-1 is holding a lock which thread-2 is trying
to hold. But currently check_hung_uninterruptible_tasks() cannot report
the exact location of thread-1 which gives us an important hint for
understanding why thread-1 is holding that lock for so long period.

When check_hung_uninterruptible_tasks() reports a thread waiting for a
lock, it is important to report backtrace of threads which already held
that lock. Therefore, allow check_hung_uninterruptible_tasks() to report
the exact location of threads which is holding any lock.

debug_show_all_lock_holders() skips current thread if the caller is
holding no lock, for reporting RCU lock taken inside that function is
generally useless.

Signed-off-by: Tetsuo Handa <[email protected]>
---
Changes in v3:
Unshare debug_show_all_lock_holders() and debug_show_all_locks(),
suggested by Ingo Molnar <[email protected]>.

Changes in v2:
Share debug_show_all_lock_holders() and debug_show_all_locks(),
suggested by Waiman Long <[email protected]>.

include/linux/debug_locks.h | 5 +++++
kernel/hung_task.c | 2 +-
kernel/locking/lockdep.c | 28 ++++++++++++++++++++++++++++
3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h
index dbb409d77d4f..0567d5ce5b4a 100644
--- a/include/linux/debug_locks.h
+++ b/include/linux/debug_locks.h
@@ -50,6 +50,7 @@ extern int debug_locks_off(void);
#ifdef CONFIG_LOCKDEP
extern void debug_show_all_locks(void);
extern void debug_show_held_locks(struct task_struct *task);
+extern void debug_show_all_lock_holders(void);
extern void debug_check_no_locks_freed(const void *from, unsigned long len);
extern void debug_check_no_locks_held(void);
#else
@@ -61,6 +62,10 @@ static inline void debug_show_held_locks(struct task_struct *task)
{
}

+static inline void debug_show_all_lock_holders(void)
+{
+}
+
static inline void
debug_check_no_locks_freed(const void *from, unsigned long len)
{
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index c71889f3f3fc..5fba784258b7 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -213,7 +213,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
unlock:
rcu_read_unlock();
if (hung_task_show_lock)
- debug_show_all_locks();
+ debug_show_all_lock_holders();

if (hung_task_show_all_bt) {
hung_task_show_all_bt = false;
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e3375bc40dad..d9394de09b79 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -32,6 +32,7 @@
#include <linux/sched/clock.h>
#include <linux/sched/task.h>
#include <linux/sched/mm.h>
+#include <linux/sched/debug.h>
#include <linux/delay.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
@@ -6511,6 +6512,33 @@ void debug_show_all_locks(void)
pr_warn("=============================================\n\n");
}
EXPORT_SYMBOL_GPL(debug_show_all_locks);
+
+void debug_show_all_lock_holders(void)
+{
+ struct task_struct *g, *p;
+
+ if (unlikely(!debug_locks)) {
+ pr_warn("INFO: lockdep is turned off.\n");
+ return;
+ }
+ pr_warn("\nShowing all threads with locks held in the system:\n");
+
+ rcu_read_lock();
+ for_each_process_thread(g, p) {
+ if (!p->lockdep_depth)
+ continue;
+ if (p == current && p->lockdep_depth == 1)
+ continue;
+ sched_show_task(p);
+ lockdep_print_held_locks(p);
+ touch_nmi_watchdog();
+ touch_all_softlockup_watchdogs();
+ }
+ rcu_read_unlock();
+
+ pr_warn("\n");
+ pr_warn("=============================================\n\n");
+}
#endif

/*
--
2.18.4


2023-02-13 09:44:08

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3] locking/lockdep: add debug_show_all_lock_holders()

Ingo or Peter, are you there?

Linus is expecting that this patch (if acceptable) is sent from the locking people.

https://lkml.kernel.org/r/CAHk-=wjbu9USn=hVWQ9v9t1H+8R6qXj8REkm36==w10zM0cM6g@mail.gmail.com

On 2023/02/02 22:59, Tetsuo Handa wrote:
> Currently, check_hung_uninterruptible_tasks() reports details of locks
> held in the system. Also, lockdep_print_held_locks() does not report
> details of locks held by a thread if that thread is in TASK_RUNNING state.
> Several years of experience of debugging without vmcore tells me that
> these limitations have been a barrier for understanding what went wrong
> in syzbot's "INFO: task hung in" reports.
>
> I initially thought that the cause of "INFO: task hung in" reports is
> due to over-stressing. But I understood that over-stressing is unlikely.
> I now consider that there likely is a deadlock/livelock bug where lockdep
> cannot report as a deadlock when "INFO: task hung in" is reported.
>
> A typical case is that thread-1 is waiting for something to happen (e.g.
> wait_event_*()) with a lock held. When thread-2 tries to hold that lock
> using e.g. mutex_lock(), check_hung_uninterruptible_tasks() reports that
> thread-2 is hung and thread-1 is holding a lock which thread-2 is trying
> to hold. But currently check_hung_uninterruptible_tasks() cannot report
> the exact location of thread-1 which gives us an important hint for
> understanding why thread-1 is holding that lock for so long period.
>
> When check_hung_uninterruptible_tasks() reports a thread waiting for a
> lock, it is important to report backtrace of threads which already held
> that lock. Therefore, allow check_hung_uninterruptible_tasks() to report
> the exact location of threads which is holding any lock.
>
> debug_show_all_lock_holders() skips current thread if the caller is
> holding no lock, for reporting RCU lock taken inside that function is
> generally useless.
>
> Signed-off-by: Tetsuo Handa <[email protected]>
> ---
> Changes in v3:
> Unshare debug_show_all_lock_holders() and debug_show_all_locks(),
> suggested by Ingo Molnar <[email protected]>.
>
> Changes in v2:
> Share debug_show_all_lock_holders() and debug_show_all_locks(),
> suggested by Waiman Long <[email protected]>.
>
> include/linux/debug_locks.h | 5 +++++
> kernel/hung_task.c | 2 +-
> kernel/locking/lockdep.c | 28 ++++++++++++++++++++++++++++
> 3 files changed, 34 insertions(+), 1 deletion(-)
>


2023-02-13 11:03:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3] locking/lockdep: add debug_show_all_lock_holders()

On Thu, Feb 02, 2023 at 10:59:50PM +0900, Tetsuo Handa wrote:
> Currently, check_hung_uninterruptible_tasks() reports details of locks
> held in the system. Also, lockdep_print_held_locks() does not report
> details of locks held by a thread if that thread is in TASK_RUNNING state.
> Several years of experience of debugging without vmcore tells me that
> these limitations have been a barrier for understanding what went wrong
> in syzbot's "INFO: task hung in" reports.
>
> I initially thought that the cause of "INFO: task hung in" reports is
> due to over-stressing. But I understood that over-stressing is unlikely.
> I now consider that there likely is a deadlock/livelock bug where lockdep
> cannot report as a deadlock when "INFO: task hung in" is reported.
>
> A typical case is that thread-1 is waiting for something to happen (e.g.
> wait_event_*()) with a lock held. When thread-2 tries to hold that lock
> using e.g. mutex_lock(), check_hung_uninterruptible_tasks() reports that
> thread-2 is hung and thread-1 is holding a lock which thread-2 is trying
> to hold. But currently check_hung_uninterruptible_tasks() cannot report
> the exact location of thread-1 which gives us an important hint for
> understanding why thread-1 is holding that lock for so long period.
>
> When check_hung_uninterruptible_tasks() reports a thread waiting for a
> lock, it is important to report backtrace of threads which already held
> that lock. Therefore, allow check_hung_uninterruptible_tasks() to report
> the exact location of threads which is holding any lock.
>
> debug_show_all_lock_holders() skips current thread if the caller is
> holding no lock, for reporting RCU lock taken inside that function is
> generally useless.

> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index c71889f3f3fc..5fba784258b7 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -213,7 +213,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> unlock:
> rcu_read_unlock();
> if (hung_task_show_lock)
> - debug_show_all_locks();
> + debug_show_all_lock_holders();
>
> if (hung_task_show_all_bt) {
> hung_task_show_all_bt = false;

This being the hung-task detector, which is mostly about sleeping locks.

> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index e3375bc40dad..d9394de09b79 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -32,6 +32,7 @@
> #include <linux/sched/clock.h>
> #include <linux/sched/task.h>
> #include <linux/sched/mm.h>
> +#include <linux/sched/debug.h>
> #include <linux/delay.h>
> #include <linux/module.h>
> #include <linux/proc_fs.h>
> @@ -6511,6 +6512,33 @@ void debug_show_all_locks(void)
> pr_warn("=============================================\n\n");
> }
> EXPORT_SYMBOL_GPL(debug_show_all_locks);
> +
> +void debug_show_all_lock_holders(void)
> +{
> + struct task_struct *g, *p;
> +
> + if (unlikely(!debug_locks)) {
> + pr_warn("INFO: lockdep is turned off.\n");
> + return;
> + }
> + pr_warn("\nShowing all threads with locks held in the system:\n");
> +
> + rcu_read_lock();
> + for_each_process_thread(g, p) {
> + if (!p->lockdep_depth)
> + continue;
> + if (p == current && p->lockdep_depth == 1)
> + continue;
> + sched_show_task(p);

And sched_show_task() being an utter piece of crap that will basically
print garbage for anything that's running (it doesn't have much
options).

Should we try and do better? dump_cpu_task() prefers
trigger_single_cpu_backtrace(), which sends an interrupt in order to get
active registers for the CPU.


2023-02-13 11:35:22

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3] locking/lockdep: add debug_show_all_lock_holders()

On 2023/02/13 20:02, Peter Zijlstra wrote:
>> @@ -213,7 +213,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>> unlock:
>> rcu_read_unlock();
>> if (hung_task_show_lock)
>> - debug_show_all_locks();
>> + debug_show_all_lock_holders();
>>
>> if (hung_task_show_all_bt) {
>> hung_task_show_all_bt = false;
>
> This being the hung-task detector, which is mostly about sleeping locks.

Yes, the intent of this patch is to report tasks sleeping with locks held,
for the cause of hung task is sometimes a deadlock.

>> + rcu_read_lock();
>> + for_each_process_thread(g, p) {
>> + if (!p->lockdep_depth)
>> + continue;
>> + if (p == current && p->lockdep_depth == 1)
>> + continue;
>> + sched_show_task(p);
>
> And sched_show_task() being an utter piece of crap that will basically
> print garbage for anything that's running (it doesn't have much
> options).
>
> Should we try and do better? dump_cpu_task() prefers
> trigger_single_cpu_backtrace(), which sends an interrupt in order to get
> active registers for the CPU.

What is the intent of using trigger_single_cpu_backtrace() here?
check_hung_uninterruptible_tasks() is calling trigger_all_cpu_backtrace()
if sysctl_hung_task_all_cpu_backtrace is set.

Locks held and kernel backtrace are helpful for describing deadlock
situation, but registers values are not. What is important is that
tasks which are not on CPUs are reported, for when a task is reported as
hung, that task must be sleeping. Therefore, I think sched_show_task()
is fine.


2023-02-13 12:49:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3] locking/lockdep: add debug_show_all_lock_holders()

On Mon, Feb 13, 2023 at 08:34:55PM +0900, Tetsuo Handa wrote:
> On 2023/02/13 20:02, Peter Zijlstra wrote:
> >> @@ -213,7 +213,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> >> unlock:
> >> rcu_read_unlock();
> >> if (hung_task_show_lock)
> >> - debug_show_all_locks();
> >> + debug_show_all_lock_holders();
> >>
> >> if (hung_task_show_all_bt) {
> >> hung_task_show_all_bt = false;
> >
> > This being the hung-task detector, which is mostly about sleeping locks.
>
> Yes, the intent of this patch is to report tasks sleeping with locks held,
> for the cause of hung task is sometimes a deadlock.
>
> >> + rcu_read_lock();
> >> + for_each_process_thread(g, p) {
> >> + if (!p->lockdep_depth)
> >> + continue;
> >> + if (p == current && p->lockdep_depth == 1)
> >> + continue;
> >> + sched_show_task(p);
> >
> > And sched_show_task() being an utter piece of crap that will basically
> > print garbage for anything that's running (it doesn't have much
> > options).
> >
> > Should we try and do better? dump_cpu_task() prefers
> > trigger_single_cpu_backtrace(), which sends an interrupt in order to get
> > active registers for the CPU.
>
> What is the intent of using trigger_single_cpu_backtrace() here?
> check_hung_uninterruptible_tasks() is calling trigger_all_cpu_backtrace()
> if sysctl_hung_task_all_cpu_backtrace is set.

Then have that also print the held locks for those tasks. And skip over
them again later.

> Locks held and kernel backtrace are helpful for describing deadlock
> situation, but registers values are not.

Register state is required to start the unwind. You can't unwind a
running task out of thin-air.

> What is important is that tasks which are not on CPUs are reported,
> for when a task is reported as hung, that task must be sleeping.
> Therefore, I think sched_show_task() is fine.

The backtraces generated by sched_show_task() for a running task are
absolutely worthless, might as well not print them.

And if I read your Changelog right, you explicitly wanted useful
backtraces for the running tasks -- such that you could see what they
were doing while holding the lock the other tasks were blocked on.

The only way to do that is to send an interrupt, the interrupt will have
the register state for the interrupted task -- including the stack
pointer. By virtue of running the interrupt handler we know the stack
won't shrink, so we can then safely traverse the stack starting from the
given stack pointer.



2023-02-13 13:49:38

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3] locking/lockdep: add debug_show_all_lock_holders()

On 2023/02/13 21:49, Peter Zijlstra wrote:
>>> And sched_show_task() being an utter piece of crap that will basically
>>> print garbage for anything that's running (it doesn't have much
>>> options).
>>>
>>> Should we try and do better? dump_cpu_task() prefers
>>> trigger_single_cpu_backtrace(), which sends an interrupt in order to get
>>> active registers for the CPU.
>>
>> What is the intent of using trigger_single_cpu_backtrace() here?
>> check_hung_uninterruptible_tasks() is calling trigger_all_cpu_backtrace()
>> if sysctl_hung_task_all_cpu_backtrace is set.
>
> Then have that also print the held locks for those tasks. And skip over
> them again later.
>
>> Locks held and kernel backtrace are helpful for describing deadlock
>> situation, but registers values are not.
>
> Register state is required to start the unwind. You can't unwind a
> running task out of thin-air.

Excuse me. There are two types of TASK_RUNNING tasks, one is that a thread
is actually running on some CPU, and the other is that a thread is waiting
for CPU to become available for that thread, aren't there?

lockdep_print_held_locks() does not show locks held even if a thread is
waiting for CPU to become available for that thread, does it?

But sched_show_task() can show backtrace even if a thread is waiting for
CPU to become available for that thread, can't it?

Therefore, calling sched_show_task() helps understanding what that thread
is doing when lockdep_print_held_locks() did not show locks held.

>
>> What is important is that tasks which are not on CPUs are reported,
>> for when a task is reported as hung, that task must be sleeping.
>> Therefore, I think sched_show_task() is fine.
>
> The backtraces generated by sched_show_task() for a running task are
> absolutely worthless, might as well not print them.

"a thread actually running on some CPU" or
"a thread waiting for CPU to become available for that thread",
which does this "running task" mean?

>
> And if I read your Changelog right, you explicitly wanted useful
> backtraces for the running tasks -- such that you could see what they
> were doing while holding the lock the other tasks were blocked on.

Yes, we can get useful backtraces for threads that are waiting for CPU
to become available for that thread. That's why sched_show_task() is chosen.

>
> The only way to do that is to send an interrupt, the interrupt will have
> the register state for the interrupted task -- including the stack
> pointer. By virtue of running the interrupt handler we know the stack
> won't shrink, so we can then safely traverse the stack starting from the
> given stack pointer.

But trigger_single_cpu_backtrace() is for a thread actually running on some CPU,
isn't it? While it would be helpful to get backtrace of a thread that is actually
running on some CPU, it would be helpless not getting backtrace of a thread
that is waiting for CPU to become available for that thread.

We can later get backtrace of threads actually running on some CPU using
trigger_all_cpu_backtrace() via sysctl_hung_task_all_cpu_backtrace setting,
though I seldom find useful backtraces via trigger_all_cpu_backtrace(); it is
likely that khungtaskd thread and some random workqueue thread (which are
irrelevant to hung task) are reported via trigger_all_cpu_backtrace()...


2023-03-18 11:15:36

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3] locking/lockdep: add debug_show_all_lock_holders()

Peter?

On 2023/02/13 22:49, Tetsuo Handa wrote:
> On 2023/02/13 21:49, Peter Zijlstra wrote:
>>>> And sched_show_task() being an utter piece of crap that will basically
>>>> print garbage for anything that's running (it doesn't have much
>>>> options).
>>>>
>>>> Should we try and do better? dump_cpu_task() prefers
>>>> trigger_single_cpu_backtrace(), which sends an interrupt in order to get
>>>> active registers for the CPU.
>>>
>>> What is the intent of using trigger_single_cpu_backtrace() here?
>>> check_hung_uninterruptible_tasks() is calling trigger_all_cpu_backtrace()
>>> if sysctl_hung_task_all_cpu_backtrace is set.
>>
>> Then have that also print the held locks for those tasks. And skip over
>> them again later.
>>
>>> Locks held and kernel backtrace are helpful for describing deadlock
>>> situation, but registers values are not.
>>
>> Register state is required to start the unwind. You can't unwind a
>> running task out of thin-air.
>
> Excuse me. There are two types of TASK_RUNNING tasks, one is that a thread
> is actually running on some CPU, and the other is that a thread is waiting
> for CPU to become available for that thread, aren't there?
>
> lockdep_print_held_locks() does not show locks held even if a thread is
> waiting for CPU to become available for that thread, does it?
>
> But sched_show_task() can show backtrace even if a thread is waiting for
> CPU to become available for that thread, can't it?
>
> Therefore, calling sched_show_task() helps understanding what that thread
> is doing when lockdep_print_held_locks() did not show locks held.
>
>>
>>> What is important is that tasks which are not on CPUs are reported,
>>> for when a task is reported as hung, that task must be sleeping.
>>> Therefore, I think sched_show_task() is fine.
>>
>> The backtraces generated by sched_show_task() for a running task are
>> absolutely worthless, might as well not print them.
>
> "a thread actually running on some CPU" or
> "a thread waiting for CPU to become available for that thread",
> which does this "running task" mean?
>
>>
>> And if I read your Changelog right, you explicitly wanted useful
>> backtraces for the running tasks -- such that you could see what they
>> were doing while holding the lock the other tasks were blocked on.
>
> Yes, we can get useful backtraces for threads that are waiting for CPU
> to become available for that thread. That's why sched_show_task() is chosen.
>
>>
>> The only way to do that is to send an interrupt, the interrupt will have
>> the register state for the interrupted task -- including the stack
>> pointer. By virtue of running the interrupt handler we know the stack
>> won't shrink, so we can then safely traverse the stack starting from the
>> given stack pointer.
>
> But trigger_single_cpu_backtrace() is for a thread actually running on some CPU,
> isn't it? While it would be helpful to get backtrace of a thread that is actually
> running on some CPU, it would be helpless not getting backtrace of a thread
> that is waiting for CPU to become available for that thread.
>
> We can later get backtrace of threads actually running on some CPU using
> trigger_all_cpu_backtrace() via sysctl_hung_task_all_cpu_backtrace setting,
> though I seldom find useful backtraces via trigger_all_cpu_backtrace(); it is
> likely that khungtaskd thread and some random workqueue thread (which are
> irrelevant to hung task) are reported via trigger_all_cpu_backtrace()...
>