2022-02-28 11:26:52

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v3 00/21] DEPT(Dependency Tracker)

I didn't want to bother you so I was planning to send the next spin
after making more progress. However, PATCH v2 reports too many false
positives because Dept tracked the bit_wait_table[] wrong way - I
apologize for that. So I decided to send PATCH v3 first before going
further for those who want to run Dept for now.

There might still be some false positives but not overwhelming.

---

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.17-rc1 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1

Benifit:

0. Works with all lock primitives.
1. Works with wait_for_completion()/complete().
2. Works with 'wait' on PG_locked.
3. Works with 'wait' on PG_writeback.
4. Works with swait/wakeup.
5. Works with waitqueue.
6. Multiple reports are allowed.
7. Deduplication control on multiple reports.
8. Withstand false positives thanks to 6.
9. Easy to tag any wait/event.

Future work:

0. To make it more stable.
1. To separates Dept from Lockdep.
2. To improves performance in terms of time and space.
3. To use Dept as a dependency engine for Lockdep.
4. To add any missing tags of wait/event in the kernel.
5. To deduplicate stack trace.

How to interpret reports:

1. E(event) in each context cannot be triggered because of the
W(wait) that cannot be woken.
2. The stack trace helping find the problematic code is located
in each conext's detail.

Thanks,
Byungchul

---

Changes from v2:

1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
reporting a lot of false positives, which is my fault.
Wait/event for bit_wait_table[] should've been tagged in a
higher layer for better work, which is a future work.
(feedback from Jan Kara)
2. Disable Dept on crypto_larval's completion to prevent a false
positive.

Changes from v1:

1. Fix coding style and typo. (feedback from Steven)
2. Distinguish each work context from another in workqueue.
3. Skip checking lock acquisition with nest_lock, which is about
correct lock usage that should be checked by Lockdep.

Changes from RFC:

1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
(feedback from Linus and Matthew)
2. Use try version at lockdep_acquire_cpus_lock() annotation.
3. Distinguish each syscall context from another.

Byungchul Park (21):
llist: Move llist_{head,node} definition to types.h
dept: Implement Dept(Dependency Tracker)
dept: Embed Dept data in Lockdep
dept: Add a API for skipping dependency check temporarily
dept: Apply Dept to spinlock
dept: Apply Dept to mutex families
dept: Apply Dept to rwlock
dept: Apply Dept to wait_for_completion()/complete()
dept: Apply Dept to seqlock
dept: Apply Dept to rwsem
dept: Add proc knobs to show stats and dependency graph
dept: Introduce split map concept and new APIs for them
dept: Apply Dept to wait/event of PG_{locked,writeback}
dept: Apply SDT to swait
dept: Apply SDT to wait(waitqueue)
locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
dept: Distinguish each syscall context from another
dept: Distinguish each work from another
dept: Disable Dept within the wait_bit layer by default
dept: Add nocheck version of init_completion()
dept: Disable Dept on struct crypto_larval's completion for now

crypto/api.c | 7 +-
include/linux/completion.h | 50 +-
include/linux/dept.h | 535 +++++++
include/linux/dept_page.h | 78 ++
include/linux/dept_sdt.h | 62 +
include/linux/hardirq.h | 3 +
include/linux/irqflags.h | 33 +-
include/linux/llist.h | 8 -
include/linux/lockdep.h | 158 ++-
include/linux/lockdep_types.h | 3 +
include/linux/mutex.h | 33 +
include/linux/page-flags.h | 45 +-
include/linux/pagemap.h | 7 +-
include/linux/percpu-rwsem.h | 10 +-
include/linux/rtmutex.h | 7 +
include/linux/rwlock.h | 52 +
include/linux/rwlock_api_smp.h | 8 +-
include/linux/rwlock_types.h | 7 +
include/linux/rwsem.h | 33 +
include/linux/sched.h | 7 +
include/linux/seqlock.h | 59 +-
include/linux/spinlock.h | 26 +
include/linux/spinlock_types_raw.h | 13 +
include/linux/swait.h | 4 +
include/linux/types.h | 8 +
include/linux/wait.h | 6 +-
init/init_task.c | 2 +
init/main.c | 4 +
kernel/Makefile | 1 +
kernel/cpu.c | 2 +-
kernel/dependency/Makefile | 4 +
kernel/dependency/dept.c | 2712 ++++++++++++++++++++++++++++++++++++
kernel/dependency/dept_hash.h | 10 +
kernel/dependency/dept_internal.h | 26 +
kernel/dependency/dept_object.h | 13 +
kernel/dependency/dept_proc.c | 92 ++
kernel/entry/common.c | 3 +
kernel/exit.c | 1 +
kernel/fork.c | 2 +
kernel/locking/lockdep.c | 12 +-
kernel/module.c | 2 +
kernel/sched/completion.c | 12 +-
kernel/sched/core.c | 3 +
kernel/sched/swait.c | 10 +
kernel/sched/wait.c | 16 +
kernel/sched/wait_bit.c | 5 +-
kernel/softirq.c | 6 +-
kernel/trace/trace_preemptirq.c | 19 +-
kernel/workqueue.c | 3 +
lib/Kconfig.debug | 21 +
mm/filemap.c | 68 +
mm/page_ext.c | 5 +
52 files changed, 4257 insertions(+), 59 deletions(-)
create mode 100644 include/linux/dept.h
create mode 100644 include/linux/dept_page.h
create mode 100644 include/linux/dept_sdt.h
create mode 100644 kernel/dependency/Makefile
create mode 100644 kernel/dependency/dept.c
create mode 100644 kernel/dependency/dept_hash.h
create mode 100644 kernel/dependency/dept_internal.h
create mode 100644 kernel/dependency/dept_object.h
create mode 100644 kernel/dependency/dept_proc.c

--
1.9.1


2022-02-28 12:07:55

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v3 10/21] dept: Apply Dept to rwsem

Makes Dept able to track dependencies by rwsem.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/lockdep.h | 24 ++++++++++++++++++++----
include/linux/percpu-rwsem.h | 10 +++++++++-
include/linux/rwsem.h | 33 +++++++++++++++++++++++++++++++++
3 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b93a707..37af50c 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -646,10 +646,26 @@ static inline void print_irqtrace_events(struct task_struct *curr)
dept_mutex_unlock(&(l)->dmap, i); \
} while (0)

-#define rwsem_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwsem_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
-#define rwsem_acquire_read(l, s, t, i) lock_acquire_shared(l, s, t, NULL, i)
-#define rwsem_release(l, i) lock_release(l, i)
+#define rwsem_acquire(l, s, t, i) \
+do { \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+ dept_rwsem_lock(&(l)->dmap, s, t, NULL, "up_write", i); \
+} while (0)
+#define rwsem_acquire_nest(l, s, t, n, i) \
+do { \
+ lock_acquire_exclusive(l, s, t, n, i); \
+ dept_rwsem_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "up_write", i);\
+} while (0)
+#define rwsem_acquire_read(l, s, t, i) \
+do { \
+ lock_acquire_shared(l, s, t, NULL, i); \
+ dept_rwsem_lock(&(l)->dmap, s, t, NULL, "up_read", i); \
+} while (0)
+#define rwsem_release(l, i) \
+do { \
+ lock_release(l, i); \
+ dept_rwsem_unlock(&(l)->dmap, i); \
+} while (0)

#define lock_map_acquire(l) lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
#define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 5fda40f..ac2b1a5 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -20,8 +20,16 @@ struct percpu_rw_semaphore {
#endif
};

+#ifdef CONFIG_DEPT
+#define __PERCPU_RWSEM_DMAP_INIT(lockname) .dmap = { .name = #lockname, .skip_cnt = ATOMIC_INIT(0) }
+#else
+#define __PERCPU_RWSEM_DMAP_INIT(lockname)
+#endif
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname },
+#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { \
+ .name = #lockname, \
+ __PERCPU_RWSEM_DMAP_INIT(lockname) },
#else
#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)
#endif
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index f934876..dc7977a 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -16,11 +16,18 @@
#include <linux/atomic.h>
#include <linux/err.h>

+#ifdef CONFIG_DEPT
+# define RWSEM_DMAP_INIT(lockname) .dmap = { .name = #lockname, .skip_cnt = ATOMIC_INIT(0) },
+#else
+# define RWSEM_DMAP_INIT(lockname)
+#endif
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define __RWSEM_DEP_MAP_INIT(lockname) \
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ RWSEM_DMAP_INIT(lockname) \
},
#else
# define __RWSEM_DEP_MAP_INIT(lockname)
@@ -32,6 +39,32 @@
#include <linux/osq_lock.h>
#endif

+#ifdef CONFIG_DEPT
+#define dept_rwsem_lock(m, ne, t, n, e_fn, ip) \
+do { \
+ if (t) { \
+ dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne); \
+ dept_ask_event(m); \
+ } else if (n) { \
+ dept_skip(m); \
+ } else { \
+ dept_wait(m, 1UL, ip, __func__, ne); \
+ dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne); \
+ dept_ask_event(m); \
+ } \
+} while (0)
+#define dept_rwsem_unlock(m, ip) \
+do { \
+ if (!dept_unskip_if_skipped(m)) { \
+ dept_event(m, 1UL, ip, __func__); \
+ dept_ecxt_exit(m, ip); \
+ } \
+} while (0)
+#else
+#define dept_rwsem_lock(m, ne, t, n, e_fn, ip) do { } while (0)
+#define dept_rwsem_unlock(m, ip) do { } while (0)
+#endif
+
/*
* For an uncontended rwsem, count and owner are the only fields a task
* needs to touch when acquiring the rwsem. So they are put next to each
--
1.9.1

2022-02-28 13:00:37

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v3 19/21] dept: Disable Dept within the wait_bit layer by default

The struct wait_queue_head array, bit_wait_table[] in sched/wait_bit.c
are shared by all its users, which unfortunately vary in terms of class.
So each should've been assigned its own class to avoid false positives.

It'd better let Dept work at a higher layer than wait_bit. So disabled
Dept within the wait_bit layer by default.

It's worth noting that Dept is still working with the other struct
wait_queue_head ones that are mostly well-classified.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/sched/wait_bit.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/wait_bit.c b/kernel/sched/wait_bit.c
index 02ce292..3e5a3eb 100644
--- a/kernel/sched/wait_bit.c
+++ b/kernel/sched/wait_bit.c
@@ -3,6 +3,7 @@
* The implementation of the wait_bit*() and related waiting APIs:
*/
#include "sched.h"
+#include <linux/dept.h>

#define WAIT_TABLE_BITS 8
#define WAIT_TABLE_SIZE (1 << WAIT_TABLE_BITS)
@@ -246,6 +247,8 @@ void __init wait_bit_init(void)
{
int i;

- for (i = 0; i < WAIT_TABLE_SIZE; i++)
+ for (i = 0; i < WAIT_TABLE_SIZE; i++) {
init_waitqueue_head(bit_wait_table + i);
+ dept_map_nocheck(&(bit_wait_table + i)->dmap);
+ }
}
--
1.9.1

2022-02-28 14:00:30

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v3 03/21] dept: Embed Dept data in Lockdep

Dept should work independently from Lockdep. However, there's no choise
but to rely on Lockdep code and its instances for now.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/lockdep.h | 71 ++++++++++++++++++++++++++++++++++++++++---
include/linux/lockdep_types.h | 3 ++
kernel/locking/lockdep.c | 12 ++++----
3 files changed, 76 insertions(+), 10 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 467b942..c56f6b6 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -20,6 +20,33 @@
extern int prove_locking;
extern int lock_stat;

+#ifdef CONFIG_DEPT
+static inline void dept_after_copy_map(struct dept_map *to,
+ struct dept_map *from)
+{
+ int i;
+
+ if (from->keys == &from->keys_local)
+ to->keys = &to->keys_local;
+
+ if (!to->keys)
+ return;
+
+ /*
+ * Since the class cache can be modified concurrently we could observe
+ * half pointers (64bit arch using 32bit copy insns). Therefore clear
+ * the caches and take the performance hit.
+ *
+ * XXX it doesn't work well with lockdep_set_class_and_subclass(), since
+ * that relies on cache abuse.
+ */
+ for (i = 0; i < DEPT_MAX_SUBCLASSES_CACHE; i++)
+ to->keys->classes[i] = NULL;
+}
+#else
+#define dept_after_copy_map(t, f) do { } while (0)
+#endif
+
#ifdef CONFIG_LOCKDEP

#include <linux/linkage.h>
@@ -43,6 +70,8 @@ static inline void lockdep_copy_map(struct lockdep_map *to,
*/
for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
to->class_cache[i] = NULL;
+
+ dept_after_copy_map(&to->dmap, &from->dmap);
}

/*
@@ -176,8 +205,19 @@ struct held_lock {
current->lockdep_recursion -= LOCKDEP_OFF; \
} while (0)

-extern void lockdep_register_key(struct lock_class_key *key);
-extern void lockdep_unregister_key(struct lock_class_key *key);
+extern void __lockdep_register_key(struct lock_class_key *key);
+extern void __lockdep_unregister_key(struct lock_class_key *key);
+
+#define lockdep_register_key(k) \
+do { \
+ __lockdep_register_key(k); \
+ dept_key_init(&(k)->dkey); \
+} while (0)
+#define lockdep_unregister_key(k) \
+do { \
+ __lockdep_unregister_key(k); \
+ dept_key_destroy(&(k)->dkey); \
+} while (0)

/*
* These methods are used by specific locking variants (spinlocks,
@@ -185,9 +225,18 @@ struct held_lock {
* to lockdep:
*/

-extern void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
+extern void __lockdep_init_map_type(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass, u8 inner, u8 outer, u8 lock_type);

+#define lockdep_init_map_type(l, n, k, s, i, o, t) \
+do { \
+ __lockdep_init_map_type(l, n, k, s, i, o, t); \
+ if ((k) == &__lockdep_no_validate__) \
+ dept_map_nocheck(&(l)->dmap); \
+ else \
+ dept_map_init(&(l)->dmap, &(k)->dkey, s, n); \
+} while (0)
+
static inline void
lockdep_init_map_waits(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass, u8 inner, u8 outer)
@@ -431,13 +480,27 @@ enum xhlock_context_t {
XHLOCK_CTX_NR,
};

+#ifdef CONFIG_DEPT
+/*
+ * TODO: I found the case to use an address of other than a real key as
+ * _key, for instance, in workqueue. So for now, we cannot use the
+ * assignment like '.dmap.keys = &(_key)->dkey' unless it's fixed.
+ */
+#define STATIC_DEPT_MAP_INIT(_name, _key) .dmap = { \
+ .name = (_name), \
+ .keys = NULL },
+#else
+#define STATIC_DEPT_MAP_INIT(_name, _key)
+#endif
+
#define lockdep_init_map_crosslock(m, n, k, s) do {} while (0)
/*
* To initialize a lockdep_map statically use this macro.
* Note that _name must not be NULL.
*/
#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
- { .name = (_name), .key = (void *)(_key), }
+ { .name = (_name), .key = (void *)(_key), \
+ STATIC_DEPT_MAP_INIT(_name, _key) }

static inline void lockdep_invariant_state(bool force) {}
static inline void lockdep_free_task(struct task_struct *task) {}
diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index d224308..50c8879 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -11,6 +11,7 @@
#define __LINUX_LOCKDEP_TYPES_H

#include <linux/types.h>
+#include <linux/dept.h>

#define MAX_LOCKDEP_SUBCLASSES 8UL

@@ -76,6 +77,7 @@ struct lock_class_key {
struct hlist_node hash_entry;
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
};
+ struct dept_key dkey;
};

extern struct lock_class_key __lockdep_no_validate__;
@@ -185,6 +187,7 @@ struct lockdep_map {
int cpu;
unsigned long ip;
#endif
+ struct dept_map dmap;
};

struct pin_cookie { unsigned int val; };
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4a882f8..a85468d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1184,7 +1184,7 @@ static inline struct hlist_head *keyhashentry(const struct lock_class_key *key)
}

/* Register a dynamically allocated key. */
-void lockdep_register_key(struct lock_class_key *key)
+void __lockdep_register_key(struct lock_class_key *key)
{
struct hlist_head *hash_head;
struct lock_class_key *k;
@@ -1207,7 +1207,7 @@ void lockdep_register_key(struct lock_class_key *key)
restore_irqs:
raw_local_irq_restore(flags);
}
-EXPORT_SYMBOL_GPL(lockdep_register_key);
+EXPORT_SYMBOL_GPL(__lockdep_register_key);

/* Check whether a key has been registered as a dynamic key. */
static bool is_dynamic_key(const struct lock_class_key *key)
@@ -4771,7 +4771,7 @@ static inline int check_wait_context(struct task_struct *curr,
/*
* Initialize a lock instance's lock-class mapping info:
*/
-void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
+void __lockdep_init_map_type(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass,
u8 inner, u8 outer, u8 lock_type)
{
@@ -4831,7 +4831,7 @@ void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
raw_local_irq_restore(flags);
}
}
-EXPORT_SYMBOL_GPL(lockdep_init_map_type);
+EXPORT_SYMBOL_GPL(__lockdep_init_map_type);

struct lock_class_key __lockdep_no_validate__;
EXPORT_SYMBOL_GPL(__lockdep_no_validate__);
@@ -6291,7 +6291,7 @@ void lockdep_reset_lock(struct lockdep_map *lock)
}

/* Unregister a dynamically allocated key. */
-void lockdep_unregister_key(struct lock_class_key *key)
+void __lockdep_unregister_key(struct lock_class_key *key)
{
struct hlist_head *hash_head = keyhashentry(key);
struct lock_class_key *k;
@@ -6326,7 +6326,7 @@ void lockdep_unregister_key(struct lock_class_key *key)
/* Wait until is_dynamic_key() has finished accessing k->hash_entry. */
synchronize_rcu();
}
-EXPORT_SYMBOL_GPL(lockdep_unregister_key);
+EXPORT_SYMBOL_GPL(__lockdep_unregister_key);

void __init lockdep_init(void)
{
--
1.9.1

2022-03-03 02:28:01

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v3 00/21] DEPT(Dependency Tracker)

On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > I didn't want to bother you so I was planning to send the next spin
> > after making more progress. However, PATCH v2 reports too many false
> > positives because Dept tracked the bit_wait_table[] wrong way - I
> > apologize for that. So I decided to send PATCH v3 first before going
> > further for those who want to run Dept for now.
> >
> > There might still be some false positives but not overwhelming.
> >
>
> Hello Byungchul, I'm running DEPT v3 on my system
> and I see report below.
>
> Looking at the kmemleak code and comment, I think
> kmemleak tried to avoid lockdep recursive warning
> but detected by DEPT?
>
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: G W
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
>
> context A
> [S] __raw_spin_lock_irqsave(&object->lock:0)
> [W] _raw_spin_lock_nested(&object->lock:0)
> [E] spin_unlock(&object->lock:0)
>
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
> [S] __raw_spin_lock_irqsave(&object->lock:0)
> [W] _raw_spin_lock_nested(&object->lock:0)
> [E] spin_unlock(&object->lock:0)
>
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> stacktrace:
> dept_ecxt_enter+0x88/0xf4
> _raw_spin_lock_irqsave+0xf0/0x1c4
> scan_gray_list+0x84/0x13c
> kmemleak_scan+0x2d8/0x54c
> kmemleak_scan_thread+0xac/0xd4
> kthread+0xd4/0xe4
> ret_from_fork+0x10/0x20

[W]'s stack trace is missed. But I guess this issue is the same issue of
what you reported following this one. We can discuss this issue on the
other report's thread.

Thanks,
Byunghcul

> [E] spin_unlock(&object->lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 1 PID: 38 Comm: kmemleak Tainted: G W 5.17.0-rc1+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
> dump_backtrace.part.0+0x9c/0xc4
> show_stack+0x14/0x28
> dump_stack_lvl+0x9c/0xcc
> dump_stack+0x14/0x2c
> print_circle+0x2d4/0x438
> cb_check_dl+0x44/0x70
> bfs+0x60/0x168
> add_dep+0x88/0x11c
> add_wait+0x2d0/0x2dc
> __dept_wait+0x8c/0xa4
> dept_wait+0x6c/0x88
> _raw_spin_lock_nested+0xa8/0x1b0
> scan_block+0xb4/0x128
> scan_gray_list+0xc4/0x13c
> kmemleak_scan+0x2d8/0x54c
> kmemleak_scan_thread+0xac/0xd4
> kthread+0xd4/0xe4
> ret_from_fork+0x10/0x20
>
> > ---
> >
> > Hi Linus and folks,
> >
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> >
> > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> >
> [...]
> > Benifit:
> >
> > 0. Works with all lock primitives.
> > 1. Works with wait_for_completion()/complete().
> > 2. Works with 'wait' on PG_locked.
> > 3. Works with 'wait' on PG_writeback.
> > 4. Works with swait/wakeup.
> > 5. Works with waitqueue.
> > 6. Multiple reports are allowed.
> > 7. Deduplication control on multiple reports.
> > 8. Withstand false positives thanks to 6.
> > 9. Easy to tag any wait/event.
> >
> > Future work:
> >
> > 0. To make it more stable.
> > 1. To separates Dept from Lockdep.
> > 2. To improves performance in terms of time and space.
> > 3. To use Dept as a dependency engine for Lockdep.
> > 4. To add any missing tags of wait/event in the kernel.
> > 5. To deduplicate stack trace.
> >
> > How to interpret reports:
> >
> > 1. E(event) in each context cannot be triggered because of the
> > W(wait) that cannot be woken.
> > 2. The stack trace helping find the problematic code is located
> > in each conext's detail.
> >
> > Thanks,
> > Byungchul
> >
> > ---
> >
> > Changes from v2:
> >
> > 1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
> > reporting a lot of false positives, which is my fault.
> > Wait/event for bit_wait_table[] should've been tagged in a
> > higher layer for better work, which is a future work.
> > (feedback from Jan Kara)
> > 2. Disable Dept on crypto_larval's completion to prevent a false
> > positive.
> >
> > Changes from v1:
> >
> > 1. Fix coding style and typo. (feedback from Steven)
> > 2. Distinguish each work context from another in workqueue.
> > 3. Skip checking lock acquisition with nest_lock, which is about
> > correct lock usage that should be checked by Lockdep.
> >
> > Changes from RFC:
> >
> > 1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
> > (feedback from Linus and Matthew)
> > 2. Use try version at lockdep_acquire_cpus_lock() annotation.
> > 3. Distinguish each syscall context from another.
> [ ... ]
>
> --
> Thank you, You are awesome!
> Hyeonggon :-)

2022-03-03 11:07:20

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v3 00/21] DEPT(Dependency Tracker)

On Thu, Mar 03, 2022 at 08:03:21AM +0000, Hyeonggon Yoo wrote:
> On Thu, Mar 03, 2022 at 09:18:13AM +0900, Byungchul Park wrote:
> > Hi Hyeonggon,
> >
> > Dept also allows the following scenario when an user guarantees that
> > each lock instance is different from another at a different depth:
> >
> > lock A0 with depth
> > lock A1 with depth + 1
> > lock A2 with depth + 2
> > lock A3 with depth + 3
> > (and so on)
> > ..
> > unlock A3
> > unlock A2
> > unlock A1
> > unlock A0

Look at this. Dept allows object->lock -> other_object->lock (with a
different depth using *_lock_nested()) so won't report it.

> > However, Dept does not allow the following scenario where another lock
> > class cuts in the dependency chain:
> >
> > lock A0 with depth
> > lock B
> > lock A1 with depth + 1
> > lock A2 with depth + 2
> > lock A3 with depth + 3
> > (and so on)
> > ..
> > unlock A3
> > unlock A2
> > unlock A1
> > unlock B
> > unlock A0
> >
> > This scenario is clearly problematic. What do you think is going to
> > happen with another context running the following?
> >
>
> First of all, I want to say I'm not expert at locking primitives.
> I may be wrong.

It's okay. Thanks anyway for your feedback.

> > > 45 * scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> > > 46 *
> > > 47 * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> > > 48 * regions.
>
> lock order in kmemleak is described above.
>
> and DEPT detects two cases as deadlock:
>
> 1) object->lock -> other_object->lock

It's not a deadlock *IF* two have different depth using *_lock_nested().
Dept also allows this case. So Dept wouldn't report it.

> 2) object->lock -> kmemleak_lock, kmemleak_lock -> other_object->lock

But this usage is risky. I already explained it in the mail you replied
to. I copied it. See the below.

context A
> > lock A0 with depth
> > lock B
> > lock A1 with depth + 1
> > lock A2 with depth + 2
> > lock A3 with depth + 3
> > (and so on)
> > ..
> > unlock A3
> > unlock A2
> > unlock A1
> > unlock B
> > unlock A0

...

context B
> > lock A1 with depth
> > lock B
> > lock A2 with depth + 1
> > lock A3 with depth + 2
> > (and so on)
> > ..
> > unlock A3
> > unlock A2
> > unlock B
> > unlock A1

where Ax : object->lock, B : kmemleak_lock.

A deadlock might occur if the two contexts run at the same time.

> And in kmemleak case, 1) and 2) is not possible because it must hold
> scan_mutex first.

This is another issue. Let's focus on whether the order is okay for now.

> I think the author of kmemleak intended lockdep to treat object->lock
> and other_object->lock as different class, using raw_spin_lock_nested().

Yes. The author meant to assign a different class according to its depth
using a Lockdep API. Strictly speaking, those are the same class anyway
but we assign a different class to each depth to avoid Lockdep splats
*IF* the user guarantees the nesting lock usage is safe, IOW, guarantees
each lock instance is different at a different depth.

I was fundamentally asking you... so... is the nesting lock usage safe
for real? I hope you distinguish between the safe case and the risky
case when *_lock_nested() is involved. Thoughts?

Thanks,
Byungchul

> Am I missing something?
>
> Thanks.
>
> > lock A1 with depth
> > lock B
> > lock A2 with depth + 1
> > lock A3 with depth + 2
> > (and so on)
> > ..
> > unlock A3
> > unlock A2
> > unlock B
> > unlock A1
> >
> > It's a deadlock. That's why Dept reports this case as a problem. Or am I
> > missing something?
> >
> > Thanks,
> > Byungchul
> >
> > > ---------------------------------------------------
> > > context A's detail
> > > ---------------------------------------------------
> > > context A
> > > [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> > > [E] spin_unlock(&object->lock:0)
> > >
> > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > stacktrace:
> > > dept_ecxt_enter+0x88/0xf4
> > > _raw_spin_lock_irqsave+0xf0/0x1c4
> > > scan_gray_list+0x84/0x13c
> > > kmemleak_scan+0x2d8/0x54c
> > > kmemleak_scan_thread+0xac/0xd4
> > > kthread+0xd4/0xe4
> > > ret_from_fork+0x10/0x20
> > >
> > > [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > > stacktrace:
> > > __dept_wait+0x8c/0xa4
> > > dept_wait+0x6c/0x88
> > > _raw_spin_lock_irqsave+0xb8/0x1c4
> > > scan_block+0x3c/0x128
> > > scan_gray_list+0xc4/0x13c
> > > kmemleak_scan+0x2d8/0x54c
> > > kmemleak_scan_thread+0xac/0xd4
> > > kthread+0xd4/0xe4
> > > ret_from_fork+0x10/0x20
> > >
> > > [E] spin_unlock(&object->lock:0):
> > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > >
> > > ---------------------------------------------------
> > > context B's detail
> > > ---------------------------------------------------
> > > context B
> > > [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> > > [W] _raw_spin_lock_nested(&object->lock:0)
> > > [E] spin_unlock(kmemleak_lock:0)
> > >
> > > [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > > stacktrace:
> > > dept_ecxt_enter+0x88/0xf4
> > > _raw_spin_lock_irqsave+0xf0/0x1c4
> > > scan_block+0x3c/0x128
> > > kmemleak_scan+0x19c/0x54c
> > > kmemleak_scan_thread+0xac/0xd4
> > > kthread+0xd4/0xe4
> > > ret_from_fork+0x10/0x20
> > >
> > > [W] _raw_spin_lock_nested(&object->lock:0):
> > > [<ffffffc008102f34>] scan_block+0xb4/0x128
> > > stacktrace:
> > > dept_wait+0x74/0x88
> > > _raw_spin_lock_nested+0xa8/0x1b0
> > > scan_block+0xb4/0x128
> > > kmemleak_scan+0x19c/0x54c
> > > kmemleak_scan_thread+0xac/0xd4
> > > kthread+0xd4/0xe4
> > > ret_from_fork+0x10/0x20
> > > [E] spin_unlock(kmemleak_lock:0):
> > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > stacktrace:
> > > dept_event+0x7c/0xfc
> > > _raw_spin_unlock_irqrestore+0x8c/0x120
> > > scan_block+0x60/0x128
> > > kmemleak_scan+0x19c/0x54c
> > > kmemleak_scan_thread+0xac/0xd4
> > > kthread+0xd4/0xe4
> > > ret_from_fork+0x10/0x20
> > > ---------------------------------------------------
> > > information that might be helpful
> > > ---------------------------------------------------
> > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G W 5.17.0-rc1+ #1
> > > Hardware name: linux,dummy-virt (DT)
> > > Call trace:
> > > dump_backtrace.part.0+0x9c/0xc4
> > > show_stack+0x14/0x28
> > > dump_stack_lvl+0x9c/0xcc
> > > dump_stack+0x14/0x2c
> > > print_circle+0x2d4/0x438
> > > cb_check_dl+0x6c/0x70
> > > bfs+0xc0/0x168
> > > add_dep+0x88/0x11c
> > > add_wait+0x2d0/0x2dc
> > > __dept_wait+0x8c/0xa4
> > > dept_wait+0x6c/0x88
> > > _raw_spin_lock_irqsave+0xb8/0x1c4
> > > scan_block+0x3c/0x128
> > > scan_gray_list+0xc4/0x13c
> > > kmemleak_scan+0x2d8/0x54c
> > > kmemleak_scan_thread+0xac/0xd4
> > > kthread+0xd4/0xe4
> > > ret_from_fork+0x10/0x20
> > >
> > > > ===================================================
> > > > DEPT: Circular dependency has been detected.
> > > > 5.17.0-rc1+ #1 Tainted: G W
> > > > ---------------------------------------------------
> > > > summary
> > > > ---------------------------------------------------
> > > > *** AA DEADLOCK ***
> > > >
> > > > context A
> > > > [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > > [W] _raw_spin_lock_nested(&object->lock:0)
> > > > [E] spin_unlock(&object->lock:0)
> > > >
> > > > [S]: start of the event context
> > > > [W]: the wait blocked
> > > > [E]: the event not reachable
> > > > ---------------------------------------------------
> > > > context A's detail
> > > > ---------------------------------------------------
> > > > context A
> > > > [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > > [W] _raw_spin_lock_nested(&object->lock:0)
> > > > [E] spin_unlock(&object->lock:0)
> > > >
> > > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > > stacktrace:
> > > > dept_ecxt_enter+0x88/0xf4
> > > > _raw_spin_lock_irqsave+0xf0/0x1c4
> > > > scan_gray_list+0x84/0x13c
> > > > kmemleak_scan+0x2d8/0x54c
> > > > kmemleak_scan_thread+0xac/0xd4
> > > > kthread+0xd4/0xe4
> > > > ret_from_fork+0x10/0x20
> > > >
> > > > [E] spin_unlock(&object->lock:0):
> > > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > > ---------------------------------------------------
> > > > information that might be helpful
> > > > ---------------------------------------------------
> > > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G W 5.17.0-rc1+ #1
> > > > Hardware name: linux,dummy-virt (DT)
> > > > Call trace:
> > > > dump_backtrace.part.0+0x9c/0xc4
> > > > show_stack+0x14/0x28
> > > > dump_stack_lvl+0x9c/0xcc
> > > > dump_stack+0x14/0x2c
> > > > print_circle+0x2d4/0x438
> > > > cb_check_dl+0x44/0x70
> > > > bfs+0x60/0x168
> > > > add_dep+0x88/0x11c
> > > > add_wait+0x2d0/0x2dc
> > > > __dept_wait+0x8c/0xa4
> > > > dept_wait+0x6c/0x88
> > > > _raw_spin_lock_nested+0xa8/0x1b0
> > > > scan_block+0xb4/0x128
> > > > scan_gray_list+0xc4/0x13c
> > > > kmemleak_scan+0x2d8/0x54c
> > > > kmemleak_scan_thread+0xac/0xd4
> > > > kthread+0xd4/0xe4
> > > > ret_from_fork+0x10/0x20
> > > >
> > > [...]
> > >
> > > --
> > > Thank you, You are awesome!
> > > Hyeonggon :-)
>
> --
> Thank you, You are awesome!
> Hyeonggon :-)