2023-01-09 04:06:22

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

Just for those who want to try the latest version of DEPT.

---

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v6.2-rc2.

https://github.com/lgebyungchulpark/linux-dept/commits/dept2.0_on_v6.2-rc2

Benifit:

0. Works with all lock primitives.
1. Works with wait_for_completion()/complete().
2. Works with 'wait' on PG_locked.
3. Works with 'wait' on PG_writeback.
4. Works with swait/wakeup.
5. Works with waitqueue.
6. Multiple reports are allowed.
7. Deduplication control on multiple reports.
8. Withstand false positives thanks to 6.
9. Easy to tag any wait/event.

Future work:

0. To make it more stable.
1. To separates Dept from Lockdep.
2. To improves performance in terms of time and space.
3. To use Dept as a dependency engine for Lockdep.
4. To add any missing tags of wait/event in the kernel.
5. To deduplicate stack trace.

How to interpret reports:

1. E(event) in each context cannot be triggered because of the
W(wait) that cannot be woken.
2. The stack trace helping find the problematic code is located
in each conext's detail.

Thanks,
Byungchul

---

Changes from v6:

1. Tie to task scheduler code to track sleep and try_to_wake_up()
assuming sleeps cause waits, try_to_wake_up()s would be the
events that those are waiting for, of course with proper DEPT
annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong()
and so on. For these cases, class is classified at sleep
entrance rather than the synchronization initialization code.
Which would extremely reduce false alarms.
2. Remove the DEPT associated instance in each page struct for
tracking dependencies by PG_locked and PG_writeback thanks to
the 1. work above.
3. Introduce CONFIG_DEPT_AGGRESIVE_TIMEOUT_WAIT to suppress
reports that waits with timeout set are involved, for those
who don't like verbose reporting.
4. Add a mechanism to refill the internal memory pools on
running out so that DEPT could keep working as long as free
memory is available in the system.
5. Re-enable tracking hashed-waitqueue wait. That's going to no
longer generate false positives because class is classified
at sleep entrance rather than the waitqueue initailization.
6. Refactor to make it easier to port onto each new version of
the kernel.
7. Apply DEPT to dma fence.
8. Do trivial optimizaitions.

Changes from v5:

1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
of internal resources because WARN_*() printing stacktrace is
too much for informing the lack. (feedback from Ted, Hyeonggon)
2. Fix trivial bugs like missing initializing a struct before
using it.
3. Assign a different class per task when handling onstack
variables for waitqueue or the like. Which makes Dept
distinguish between onstack variables of different tasks so
as to prevent false positives. (reported by Hyeonggon)
4. Make Dept aware of even raw_local_irq_*() to prevent false
positives. (reported by Hyeonggon)
5. Don't consider dependencies between the events that might be
triggered within __schedule() and the waits that requires
__schedule(), real ones. (reported by Hyeonggon)
6. Unstage the staged wait that has prepare_to_wait_event()'ed
*and* yet to get to __schedule(), if we encounter __schedule()
in-between for another sleep, which is possible if e.g. a
mutex_lock() exists in 'condition' of ___wait_event().
7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
on the hardirq and softirq entrance tracing to make Dept more
portable for now.

Changes from v4:

1. Fix some bugs that produce false alarms.
2. Distinguish each syscall context from another *for arm64*.
3. Make it not warn it but just print it in case Dept ring
buffer gets exhausted. (feedback from Hyeonggon)
4. Explicitely describe "EXPERIMENTAL" and "Dept might produce
false positive reports" in Kconfig. (feedback from Ted)

Changes from v3:

1. Dept shouldn't create dependencies between different depths
of a class that were indicated by *_lock_nested(). Dept
normally doesn't but it does once another lock class comes
in. So fixed it. (feedback from Hyeonggon)
2. Dept considered a wait as a real wait once getting to
__schedule() even if it has been set to TASK_RUNNING by wake
up sources in advance. Fixed it so that Dept doesn't consider
the case as a real wait. (feedback from Jan Kara)
3. Stop tracking dependencies with a map once the event
associated with the map has been handled. Dept will start to
work with the map again, on the next sleep.

Changes from v2:

1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
reporting a lot of false positives, which is my fault.
Wait/event for bit_wait_table[] should've been tagged in a
higher layer for better work, which is a future work.
(feedback from Jan Kara)
2. Disable Dept on crypto_larval's completion to prevent a false
positive.

Changes from v1:

1. Fix coding style and typo. (feedback from Steven)
2. Distinguish each work context from another in workqueue.
3. Skip checking lock acquisition with nest_lock, which is about
correct lock usage that should be checked by Lockdep.

Changes from RFC(v0):

1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
(feedback from Linus and Matthew)
2. Use try version at lockdep_acquire_cpus_lock() annotation.
3. Distinguish each syscall context from another.

Byungchul Park (23):
llist: Move llist_{head,node} definition to types.h
dept: Implement Dept(Dependency Tracker)
dept: Add single event dependency tracker APIs
dept: Add lock dependency tracker APIs
dept: Tie to Lockdep and IRQ tracing
dept: Add proc knobs to show stats and dependency graph
dept: Apply sdt_might_sleep_strong() to
wait_for_completion()/complete()
dept: Apply sdt_might_sleep_strong() to PG_{locked,writeback} wait
dept: Apply sdt_might_sleep_weak() to swait
dept: Apply sdt_might_sleep_weak() to waitqueue wait
dept: Apply sdt_might_sleep_weak() to hashed-waitqueue wait
dept: Distinguish each syscall context from another
dept: Distinguish each work from another
dept: Add a mechanism to refill the internal memory pools on running
out
locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
dept: Apply sdt_might_sleep_strong() to dma fence wait
dept: Track timeout waits separately with a new Kconfig
dept: Apply timeout consideration to wait_for_completion()/complete()
dept: Apply timeout consideration to swait
dept: Apply timeout consideration to waitqueue wait
dept: Apply timeout consideration to hashed-waitqueue wait
dept: Apply timeout consideration to dma fence wait
dept: Record the latest one out of consecutive waits of the same class

arch/arm64/kernel/syscall.c | 2 +
arch/x86/entry/common.c | 4 +
drivers/dma-buf/dma-fence.c | 11 +
include/linux/completion.h | 102 +-
include/linux/dept.h | 600 +++++++
include/linux/dept_ldt.h | 77 +
include/linux/dept_sdt.h | 101 ++
include/linux/hardirq.h | 3 +
include/linux/irqflags.h | 22 +-
include/linux/llist.h | 8 -
include/linux/local_lock_internal.h | 1 +
include/linux/lockdep.h | 108 +-
include/linux/lockdep_types.h | 3 +
include/linux/mutex.h | 1 +
include/linux/percpu-rwsem.h | 2 +-
include/linux/rtmutex.h | 1 +
include/linux/rwlock_types.h | 1 +
include/linux/rwsem.h | 1 +
include/linux/sched.h | 3 +
include/linux/seqlock.h | 2 +-
include/linux/spinlock_types_raw.h | 3 +
include/linux/srcu.h | 2 +-
include/linux/swait.h | 6 +
include/linux/types.h | 8 +
include/linux/wait.h | 6 +
include/linux/wait_bit.h | 6 +
init/init_task.c | 2 +
init/main.c | 2 +
kernel/Makefile | 1 +
kernel/cpu.c | 2 +-
kernel/dependency/Makefile | 4 +
kernel/dependency/dept.c | 3120 +++++++++++++++++++++++++++++++++++
kernel/dependency/dept_hash.h | 10 +
kernel/dependency/dept_internal.h | 26 +
kernel/dependency/dept_object.h | 13 +
kernel/dependency/dept_proc.c | 93 ++
kernel/exit.c | 1 +
kernel/fork.c | 2 +
kernel/locking/lockdep.c | 23 +
kernel/module/main.c | 2 +
kernel/sched/completion.c | 60 +-
kernel/sched/core.c | 9 +
kernel/workqueue.c | 3 +
lib/Kconfig.debug | 37 +
lib/locking-selftest.c | 2 +
mm/filemap.c | 11 +
46 files changed, 4432 insertions(+), 75 deletions(-)
create mode 100644 include/linux/dept.h
create mode 100644 include/linux/dept_ldt.h
create mode 100644 include/linux/dept_sdt.h
create mode 100644 kernel/dependency/Makefile
create mode 100644 kernel/dependency/dept.c
create mode 100644 kernel/dependency/dept_hash.h
create mode 100644 kernel/dependency/dept_internal.h
create mode 100644 kernel/dependency/dept_object.h
create mode 100644 kernel/dependency/dept_proc.c

--
1.9.1


2023-01-09 04:07:02

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 10/23] dept: Apply sdt_might_sleep_weak() to waitqueue wait

Makes Dept able to track dependencies by waitqueue waits, but weakly.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/wait.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index a0307b5..ede466c 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -7,6 +7,7 @@
#include <linux/list.h>
#include <linux/stddef.h>
#include <linux/spinlock.h>
+#include <linux/dept_sdt.h>

#include <asm/current.h>
#include <uapi/linux/wait.h>
@@ -303,6 +304,7 @@ static inline void wake_up_pollfree(struct wait_queue_head *wq_head)
struct wait_queue_entry __wq_entry; \
long __ret = ret; /* explicit shadow */ \
\
+ sdt_might_sleep_weak(NULL); \
init_wait_entry(&__wq_entry, exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);\
@@ -318,6 +320,7 @@ static inline void wake_up_pollfree(struct wait_queue_head *wq_head)
cmd; \
} \
finish_wait(&wq_head, &__wq_entry); \
+ sdt_might_sleep_finish(); \
__out: __ret; \
})

--
1.9.1

2023-01-09 04:07:02

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 01/23] llist: Move llist_{head,node} definition to types.h

llist_head and llist_node can be used by very primitives. For example,
Dept for tracking dependency uses llist things in its header. To avoid
header dependency, move those to types.h.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/llist.h | 8 --------
include/linux/types.h | 8 ++++++++
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/llist.h b/include/linux/llist.h
index 85bda2d..99cc3c3 100644
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -53,14 +53,6 @@
#include <linux/stddef.h>
#include <linux/types.h>

-struct llist_head {
- struct llist_node *first;
-};
-
-struct llist_node {
- struct llist_node *next;
-};
-
#define LLIST_HEAD_INIT(name) { NULL }
#define LLIST_HEAD(name) struct llist_head name = LLIST_HEAD_INIT(name)

diff --git a/include/linux/types.h b/include/linux/types.h
index ea8cf60a..b12a444 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -187,6 +187,14 @@ struct hlist_node {
struct hlist_node *next, **pprev;
};

+struct llist_head {
+ struct llist_node *first;
+};
+
+struct llist_node {
+ struct llist_node *next;
+};
+
struct ustat {
__kernel_daddr_t f_tfree;
#ifdef CONFIG_ARCH_32BIT_USTAT_F_TINODE
--
1.9.1

2023-01-09 04:07:22

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 06/23] dept: Add proc knobs to show stats and dependency graph

It'd be useful to show Dept internal stats and dependency graph on
runtime via proc for better information. Introduced the knobs.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/dependency/Makefile | 1 +
kernel/dependency/dept.c | 24 ++++------
kernel/dependency/dept_internal.h | 26 +++++++++++
kernel/dependency/dept_proc.c | 95 +++++++++++++++++++++++++++++++++++++++
4 files changed, 131 insertions(+), 15 deletions(-)
create mode 100644 kernel/dependency/dept_internal.h
create mode 100644 kernel/dependency/dept_proc.c

diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
index b5cfb8a..92f1654 100644
--- a/kernel/dependency/Makefile
+++ b/kernel/dependency/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0

obj-$(CONFIG_DEPT) += dept.o
+obj-$(CONFIG_DEPT) += dept_proc.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index e950954..d164482 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -73,6 +73,7 @@
#include <linux/hash.h>
#include <linux/dept.h>
#include <linux/utsname.h>
+#include "dept_internal.h"

static int dept_stop;
static int dept_per_cpu_ready;
@@ -260,20 +261,13 @@ static inline bool valid_key(struct dept_key *k)
* have been freed will be placed.
*/

-enum object_t {
-#define OBJECT(id, nr) OBJECT_##id,
- #include "dept_object.h"
-#undef OBJECT
- OBJECT_NR,
-};
-
#define OBJECT(id, nr) \
static struct dept_##id spool_##id[nr]; \
static DEFINE_PER_CPU(struct llist_head, lpool_##id);
#include "dept_object.h"
#undef OBJECT

-static struct dept_pool pool[OBJECT_NR] = {
+struct dept_pool dept_pool[OBJECT_NR] = {
#define OBJECT(id, nr) { \
.name = #id, \
.obj_sz = sizeof(struct dept_##id), \
@@ -303,7 +297,7 @@ static void *from_pool(enum object_t t)
if (DEPT_WARN_ON(!irqs_disabled()))
return NULL;

- p = &pool[t];
+ p = &dept_pool[t];

/*
* Try local pool first.
@@ -338,7 +332,7 @@ static void *from_pool(enum object_t t)

static void to_pool(void *o, enum object_t t)
{
- struct dept_pool *p = &pool[t];
+ struct dept_pool *p = &dept_pool[t];
struct llist_head *h;

preempt_disable();
@@ -2113,7 +2107,7 @@ void dept_map_copy(struct dept_map *to, struct dept_map *from)
clean_classes_cache(&to->map_key);
}

-static LIST_HEAD(classes);
+LIST_HEAD(dept_classes);

static inline bool within(const void *addr, void *start, unsigned long size)
{
@@ -2145,7 +2139,7 @@ void dept_free_range(void *start, unsigned int sz)
while (unlikely(!dept_lock()))
cpu_relax();

- list_for_each_entry_safe(c, n, &classes, all_node) {
+ list_for_each_entry_safe(c, n, &dept_classes, all_node) {
if (!within((void *)c->key, start, sz) &&
!within(c->name, start, sz))
continue;
@@ -2221,7 +2215,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
c->sub_id = sub_id;
c->key = (unsigned long)(k->base + sub_id);
hash_add_class(c);
- list_add(&c->all_node, &classes);
+ list_add(&c->all_node, &dept_classes);
unlock:
dept_unlock();
caching:
@@ -2932,8 +2926,8 @@ static void migrate_per_cpu_pool(void)
struct llist_head *from;
struct llist_head *to;

- from = &pool[i].boot_pool;
- to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+ from = &dept_pool[i].boot_pool;
+ to = per_cpu_ptr(dept_pool[i].lpool, boot_cpu);
move_llist(to, from);
}
}
diff --git a/kernel/dependency/dept_internal.h b/kernel/dependency/dept_internal.h
new file mode 100644
index 00000000..007c1ee
--- /dev/null
+++ b/kernel/dependency/dept_internal.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept(DEPendency Tracker) - runtime dependency tracker internal header
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __DEPT_INTERNAL_H
+#define __DEPT_INTERNAL_H
+
+#ifdef CONFIG_DEPT
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+ #include "dept_object.h"
+#undef OBJECT
+ OBJECT_NR,
+};
+
+extern struct list_head dept_classes;
+extern struct dept_pool dept_pool[];
+
+#endif
+#endif /* __DEPT_INTERNAL_H */
diff --git a/kernel/dependency/dept_proc.c b/kernel/dependency/dept_proc.c
new file mode 100644
index 00000000..7d61dfb
--- /dev/null
+++ b/kernel/dependency/dept_proc.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Procfs knobs for Dept(DEPendency Tracker)
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (C) 2021 LG Electronics, Inc. , Byungchul Park
+ */
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/dept.h>
+#include "dept_internal.h"
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ return seq_list_next(v, &dept_classes, pos);
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ return seq_list_start_head(&dept_classes, *pos);
+}
+
+static void l_stop(struct seq_file *m, void *v)
+{
+}
+
+static int l_show(struct seq_file *m, void *v)
+{
+ struct dept_class *fc = list_entry(v, struct dept_class, all_node);
+ struct dept_dep *d;
+ const char *prefix;
+
+ if (v == &dept_classes) {
+ seq_puts(m, "All classes:\n\n");
+ return 0;
+ }
+
+ prefix = fc->sched_map ? "<sched> " : "";
+ seq_printf(m, "[%p] %s%s\n", (void *)fc->key, prefix, fc->name);
+
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ list_for_each_entry(d, &fc->dep_head, dep_node) {
+ struct dept_class *tc = d->wait->class;
+
+ prefix = tc->sched_map ? "<sched> " : "";
+ seq_printf(m, " -> [%p] %s%s\n", (void *)tc->key, prefix, tc->name);
+ }
+ seq_puts(m, "\n");
+
+ return 0;
+}
+
+static const struct seq_operations dept_deps_ops = {
+ .start = l_start,
+ .next = l_next,
+ .stop = l_stop,
+ .show = l_show,
+};
+
+static int dept_stats_show(struct seq_file *m, void *v)
+{
+ int r;
+
+ seq_puts(m, "Availability in the static pools:\n\n");
+#define OBJECT(id, nr) \
+ r = atomic_read(&dept_pool[OBJECT_##id].obj_nr); \
+ if (r < 0) \
+ r = 0; \
+ seq_printf(m, "%s\t%d/%d(%d%%)\n", #id, r, nr, (r * 100) / (nr));
+ #include "dept_object.h"
+#undef OBJECT
+
+ return 0;
+}
+
+static int __init dept_proc_init(void)
+{
+ proc_create_seq("dept_deps", S_IRUSR, NULL, &dept_deps_ops);
+ proc_create_single("dept_stats", S_IRUSR, NULL, dept_stats_show);
+ return 0;
+}
+
+__initcall(dept_proc_init);
--
1.9.1

2023-01-09 04:07:23

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 07/23] dept: Apply sdt_might_sleep_strong() to wait_for_completion()/complete()

Makes Dept able to track dependencies by
wait_for_completion()/complete().

In order to obtain the meaningful caller points, replace all the
wait_for_completion*() declarations with macros in the header.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/completion.h | 89 +++++++++++++++++++++++++++++++++++++++++-----
kernel/sched/completion.c | 60 +++++++++++++++----------------
2 files changed, 110 insertions(+), 39 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 62b32b1..0408f6d 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -10,6 +10,7 @@
*/

#include <linux/swait.h>
+#include <linux/dept_sdt.h>

/*
* struct completion - structure used to maintain state for a "completion"
@@ -99,19 +100,89 @@ static inline void reinit_completion(struct completion *x)
x->done = 0;
}

-extern void wait_for_completion(struct completion *);
-extern void wait_for_completion_io(struct completion *);
-extern int wait_for_completion_interruptible(struct completion *x);
-extern int wait_for_completion_killable(struct completion *x);
-extern int wait_for_completion_state(struct completion *x, unsigned int state);
-extern unsigned long wait_for_completion_timeout(struct completion *x,
+extern void raw_wait_for_completion(struct completion *);
+extern void raw_wait_for_completion_io(struct completion *);
+extern int raw_wait_for_completion_interruptible(struct completion *x);
+extern int raw_wait_for_completion_killable(struct completion *x);
+extern int raw_wait_for_completion_state(struct completion *x, unsigned int state);
+extern unsigned long raw_wait_for_completion_timeout(struct completion *x,
unsigned long timeout);
-extern unsigned long wait_for_completion_io_timeout(struct completion *x,
+extern unsigned long raw_wait_for_completion_io_timeout(struct completion *x,
unsigned long timeout);
-extern long wait_for_completion_interruptible_timeout(
+extern long raw_wait_for_completion_interruptible_timeout(
struct completion *x, unsigned long timeout);
-extern long wait_for_completion_killable_timeout(
+extern long raw_wait_for_completion_killable_timeout(
struct completion *x, unsigned long timeout);
+
+#define wait_for_completion(x) \
+({ \
+ sdt_might_sleep_strong(NULL); \
+ raw_wait_for_completion(x); \
+ sdt_might_sleep_finish(); \
+})
+#define wait_for_completion_io(x) \
+({ \
+ sdt_might_sleep_strong(NULL); \
+ raw_wait_for_completion_io(x); \
+ sdt_might_sleep_finish(); \
+})
+#define wait_for_completion_interruptible(x) \
+({ \
+ int __ret; \
+ sdt_might_sleep_strong(NULL); \
+ __ret = raw_wait_for_completion_interruptible(x); \
+ sdt_might_sleep_finish(); \
+ __ret; \
+})
+#define wait_for_completion_killable(x) \
+({ \
+ int __ret; \
+ sdt_might_sleep_strong(NULL); \
+ __ret = raw_wait_for_completion_killable(x); \
+ sdt_might_sleep_finish(); \
+ __ret; \
+})
+#define wait_for_completion_state(x, s) \
+({ \
+ int __ret; \
+ sdt_might_sleep_strong(NULL); \
+ __ret = raw_wait_for_completion_state(x, s); \
+ sdt_might_sleep_finish(); \
+ __ret; \
+})
+#define wait_for_completion_timeout(x, t) \
+({ \
+ unsigned long __ret; \
+ sdt_might_sleep_strong(NULL); \
+ __ret = raw_wait_for_completion_timeout(x, t); \
+ sdt_might_sleep_finish(); \
+ __ret; \
+})
+#define wait_for_completion_io_timeout(x, t) \
+({ \
+ unsigned long __ret; \
+ sdt_might_sleep_strong(NULL); \
+ __ret = raw_wait_for_completion_io_timeout(x, t); \
+ sdt_might_sleep_finish(); \
+ __ret; \
+})
+#define wait_for_completion_interruptible_timeout(x, t) \
+({ \
+ long __ret; \
+ sdt_might_sleep_strong(NULL); \
+ __ret = raw_wait_for_completion_interruptible_timeout(x, t);\
+ sdt_might_sleep_finish(); \
+ __ret; \
+})
+#define wait_for_completion_killable_timeout(x, t) \
+({ \
+ long __ret; \
+ sdt_might_sleep_strong(NULL); \
+ __ret = raw_wait_for_completion_killable_timeout(x, t); \
+ sdt_might_sleep_finish(); \
+ __ret; \
+})
+
extern bool try_wait_for_completion(struct completion *x);
extern bool completion_done(struct completion *x);

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index d57a5c1..8fcf5ee 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -4,7 +4,7 @@
* Generic wait-for-completion handler;
*
* It differs from semaphores in that their default case is the opposite,
- * wait_for_completion default blocks whereas semaphore default non-block. The
+ * raw_wait_for_completion default blocks whereas semaphore default non-block. The
* interface also makes it easy to 'complete' multiple waiting threads,
* something which isn't entirely natural for semaphores.
*
@@ -20,7 +20,7 @@
* This will wake up a single thread waiting on this completion. Threads will be
* awakened in the same order in which they were queued.
*
- * See also complete_all(), wait_for_completion() and related routines.
+ * See also complete_all(), raw_wait_for_completion() and related routines.
*
* If this function wakes up a task, it executes a full memory barrier before
* accessing the task state.
@@ -124,23 +124,23 @@ void complete_all(struct completion *x)
}

/**
- * wait_for_completion: - waits for completion of a task
+ * raw_wait_for_completion: - waits for completion of a task
* @x: holds the state of this particular completion
*
* This waits to be signaled for completion of a specific task. It is NOT
* interruptible and there is no timeout.
*
- * See also similar routines (i.e. wait_for_completion_timeout()) with timeout
+ * See also similar routines (i.e. raw_wait_for_completion_timeout()) with timeout
* and interrupt capability. Also see complete().
*/
-void __sched wait_for_completion(struct completion *x)
+void __sched raw_wait_for_completion(struct completion *x)
{
wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion);
+EXPORT_SYMBOL(raw_wait_for_completion);

/**
- * wait_for_completion_timeout: - waits for completion of a task (w/timeout)
+ * raw_wait_for_completion_timeout: - waits for completion of a task (w/timeout)
* @x: holds the state of this particular completion
* @timeout: timeout value in jiffies
*
@@ -152,28 +152,28 @@ void __sched wait_for_completion(struct completion *x)
* till timeout) if completed.
*/
unsigned long __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
+raw_wait_for_completion_timeout(struct completion *x, unsigned long timeout)
{
return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion_timeout);
+EXPORT_SYMBOL(raw_wait_for_completion_timeout);

/**
- * wait_for_completion_io: - waits for completion of a task
+ * raw_wait_for_completion_io: - waits for completion of a task
* @x: holds the state of this particular completion
*
* This waits to be signaled for completion of a specific task. It is NOT
* interruptible and there is no timeout. The caller is accounted as waiting
* for IO (which traditionally means blkio only).
*/
-void __sched wait_for_completion_io(struct completion *x)
+void __sched raw_wait_for_completion_io(struct completion *x)
{
wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion_io);
+EXPORT_SYMBOL(raw_wait_for_completion_io);

/**
- * wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
+ * raw_wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
* @x: holds the state of this particular completion
* @timeout: timeout value in jiffies
*
@@ -186,14 +186,14 @@ void __sched wait_for_completion_io(struct completion *x)
* till timeout) if completed.
*/
unsigned long __sched
-wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
+raw_wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
{
return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion_io_timeout);
+EXPORT_SYMBOL(raw_wait_for_completion_io_timeout);

/**
- * wait_for_completion_interruptible: - waits for completion of a task (w/intr)
+ * raw_wait_for_completion_interruptible: - waits for completion of a task (w/intr)
* @x: holds the state of this particular completion
*
* This waits for completion of a specific task to be signaled. It is
@@ -201,7 +201,7 @@ void __sched wait_for_completion_io(struct completion *x)
*
* Return: -ERESTARTSYS if interrupted, 0 if completed.
*/
-int __sched wait_for_completion_interruptible(struct completion *x)
+int __sched raw_wait_for_completion_interruptible(struct completion *x)
{
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);

@@ -209,10 +209,10 @@ int __sched wait_for_completion_interruptible(struct completion *x)
return t;
return 0;
}
-EXPORT_SYMBOL(wait_for_completion_interruptible);
+EXPORT_SYMBOL(raw_wait_for_completion_interruptible);

/**
- * wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
+ * raw_wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
* @x: holds the state of this particular completion
* @timeout: timeout value in jiffies
*
@@ -223,15 +223,15 @@ int __sched wait_for_completion_interruptible(struct completion *x)
* or number of jiffies left till timeout) if completed.
*/
long __sched
-wait_for_completion_interruptible_timeout(struct completion *x,
+raw_wait_for_completion_interruptible_timeout(struct completion *x,
unsigned long timeout)
{
return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
+EXPORT_SYMBOL(raw_wait_for_completion_interruptible_timeout);

/**
- * wait_for_completion_killable: - waits for completion of a task (killable)
+ * raw_wait_for_completion_killable: - waits for completion of a task (killable)
* @x: holds the state of this particular completion
*
* This waits to be signaled for completion of a specific task. It can be
@@ -239,7 +239,7 @@ int __sched wait_for_completion_interruptible(struct completion *x)
*
* Return: -ERESTARTSYS if interrupted, 0 if completed.
*/
-int __sched wait_for_completion_killable(struct completion *x)
+int __sched raw_wait_for_completion_killable(struct completion *x)
{
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);

@@ -247,9 +247,9 @@ int __sched wait_for_completion_killable(struct completion *x)
return t;
return 0;
}
-EXPORT_SYMBOL(wait_for_completion_killable);
+EXPORT_SYMBOL(raw_wait_for_completion_killable);

-int __sched wait_for_completion_state(struct completion *x, unsigned int state)
+int __sched raw_wait_for_completion_state(struct completion *x, unsigned int state)
{
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, state);

@@ -257,10 +257,10 @@ int __sched wait_for_completion_state(struct completion *x, unsigned int state)
return t;
return 0;
}
-EXPORT_SYMBOL(wait_for_completion_state);
+EXPORT_SYMBOL(raw_wait_for_completion_state);

/**
- * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
+ * raw_wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
* @x: holds the state of this particular completion
* @timeout: timeout value in jiffies
*
@@ -272,12 +272,12 @@ int __sched wait_for_completion_state(struct completion *x, unsigned int state)
* or number of jiffies left till timeout) if completed.
*/
long __sched
-wait_for_completion_killable_timeout(struct completion *x,
+raw_wait_for_completion_killable_timeout(struct completion *x,
unsigned long timeout)
{
return wait_for_common(x, timeout, TASK_KILLABLE);
}
-EXPORT_SYMBOL(wait_for_completion_killable_timeout);
+EXPORT_SYMBOL(raw_wait_for_completion_killable_timeout);

/**
* try_wait_for_completion - try to decrement a completion without blocking
@@ -319,7 +319,7 @@ bool try_wait_for_completion(struct completion *x)
* completion_done - Test to see if a completion has any waiters
* @x: completion structure
*
- * Return: 0 if there are waiters (wait_for_completion() in progress)
+ * Return: 0 if there are waiters (raw_wait_for_completion() in progress)
* 1 if there are no waiters.
*
* Note, this will always return true if complete_all() was called on @X.
--
1.9.1

2023-01-09 04:07:23

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 02/23] dept: Implement Dept(Dependency Tracker)

CURRENT STATUS
--------------
Lockdep tracks acquisition order of locks in order to detect deadlock,
and IRQ and IRQ enable/disable state as well to take accident
acquisitions into account.

Lockdep should be turned off once it detects and reports a deadlock
since the data structure and algorithm are not reusable after detection
because of the complex design.

PROBLEM
-------
*Waits* and their *events* that never reach eventually cause deadlock.
However, Lockdep is only interested in lock acquisition order, forcing
to emulate lock acqusition even for just waits and events that have
nothing to do with real lock.

Even worse, no one likes Lockdep's false positive detection because that
prevents further one that might be more valuable. That's why all the
kernel developers are sensitive to Lockdep's false positive.

Besides those, by tracking acquisition order, it cannot correctly deal
with read lock and cross-event e.g. wait_for_completion()/complete() for
deadlock detection. Lockdep is no longer a good tool for that purpose.

SOLUTION
--------
Again, *waits* and their *events* that never reach eventually cause
deadlock. The new solution, Dept(DEPendency Tracker), focuses on waits
and events themselves. Dept tracks waits and events and report it if
any event would be never reachable.

Dept does:
. Works with read lock in the right way.
. Works with any wait and event e.i. cross-event.
. Continue to work even after reporting multiple times.
. Provides simple and intuitive APIs.
. Does exactly what dependency checker should do.

Q & A
-----
Q. Is this the first try ever to address the problem?
A. No. Cross-release feature (b09be676e0ff2 locking/lockdep: Implement
the 'crossrelease' feature) addressed it 2 years ago that was a
Lockdep extension and merged but reverted shortly because:

Cross-release started to report valuable hidden problems but started
to give report false positive reports as well. For sure, no one
likes Lockdep's false positive reports since it makes Lockdep stop,
preventing reporting further real problems.

Q. Why not Dept was developed as an extension of Lockdep?
A. Lockdep definitely includes all the efforts great developers have
made for a long time so as to be quite stable enough. But I had to
design and implement newly because of the following:

1) Lockdep was designed to track lock acquisition order. The APIs and
implementation do not fit on wait-event model.
2) Lockdep is turned off on detection including false positive. Which
is terrible and prevents developing any extension for stronger
detection.

Q. Do you intend to totally replace Lockdep?
A. No. Lockdep also checks if lock usage is correct. Of course, the
dependency check routine should be replaced but the other functions
should be still there.

Q. Do you mean the dependency check routine should be replaced right
away?
A. No. I admit Lockdep is stable enough thanks to great efforts kernel
developers have made. Lockdep and Dept, both should be in the kernel
until Dept gets considered stable.

Q. Stronger detection capability would give more false positive report.
Which was a big problem when cross-release was introduced. Is it ok
with Dept?
A. It's ok. Dept allows multiple reporting thanks to simple and quite
generalized design. Of course, false positive reports should be fixed
anyway but it's no longer as a critical problem as it was.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 573 ++++++++
include/linux/hardirq.h | 3 +
include/linux/sched.h | 3 +
init/init_task.c | 2 +
init/main.c | 2 +
kernel/Makefile | 1 +
kernel/dependency/Makefile | 3 +
kernel/dependency/dept.c | 2980 +++++++++++++++++++++++++++++++++++++++
kernel/dependency/dept_hash.h | 10 +
kernel/dependency/dept_object.h | 13 +
kernel/exit.c | 1 +
kernel/fork.c | 2 +
kernel/module/main.c | 2 +
kernel/sched/core.c | 9 +
lib/Kconfig.debug | 27 +
lib/locking-selftest.c | 2 +
16 files changed, 3633 insertions(+)
create mode 100644 include/linux/dept.h
create mode 100644 kernel/dependency/Makefile
create mode 100644 kernel/dependency/dept.c
create mode 100644 kernel/dependency/dept_hash.h
create mode 100644 kernel/dependency/dept_object.h

diff --git a/include/linux/dept.h b/include/linux/dept.h
new file mode 100644
index 00000000..f2a3057
--- /dev/null
+++ b/include/linux/dept.h
@@ -0,0 +1,573 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * DEPT(DEPendency Tracker) - runtime dependency tracker
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_H
+#define __LINUX_DEPT_H
+
+#ifdef CONFIG_DEPT
+
+#include <linux/types.h>
+
+struct task_struct;
+
+#define DEPT_MAX_STACK_ENTRY 16
+#define DEPT_MAX_WAIT_HIST 64
+#define DEPT_MAX_ECXT_HELD 48
+
+#define DEPT_MAX_SUBCLASSES 16
+#define DEPT_MAX_SUBCLASSES_EVT 2
+#define DEPT_MAX_SUBCLASSES_USR (DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
+#define DEPT_MAX_SUBCLASSES_CACHE 2
+
+#define DEPT_SIRQ 0
+#define DEPT_HIRQ 1
+#define DEPT_IRQS_NR 2
+#define DEPT_SIRQF (1UL << DEPT_SIRQ)
+#define DEPT_HIRQF (1UL << DEPT_HIRQ)
+
+struct dept_ecxt;
+struct dept_iecxt {
+ struct dept_ecxt *ecxt;
+ int enirq;
+ /*
+ * for preventing to add a new ecxt
+ */
+ bool staled;
+};
+
+struct dept_wait;
+struct dept_iwait {
+ struct dept_wait *wait;
+ int irq;
+ /*
+ * for preventing to add a new wait
+ */
+ bool staled;
+ bool touched;
+};
+
+struct dept_class {
+ union {
+ struct llist_node pool_node;
+ struct {
+ /*
+ * reference counter for object management
+ */
+ atomic_t ref;
+
+ /*
+ * unique information about the class
+ */
+ const char *name;
+ unsigned long key;
+ int sub_id;
+
+ /*
+ * for BFS
+ */
+ unsigned int bfs_gen;
+ int bfs_dist;
+ struct dept_class *bfs_parent;
+
+ /*
+ * for hashing this object
+ */
+ struct hlist_node hash_node;
+
+ /*
+ * for linking all classes
+ */
+ struct list_head all_node;
+
+ /*
+ * for associating its dependencies
+ */
+ struct list_head dep_head;
+ struct list_head dep_rev_head;
+
+ /*
+ * for tracking IRQ dependencies
+ */
+ struct dept_iecxt iecxt[DEPT_IRQS_NR];
+ struct dept_iwait iwait[DEPT_IRQS_NR];
+
+ /*
+ * classified by a map embedded in task_struct,
+ * not an explicit map
+ */
+ bool sched_map;
+ };
+ };
+};
+
+struct dept_key {
+ union {
+ /*
+ * Each byte-wise address will be used as its key.
+ */
+ char base[DEPT_MAX_SUBCLASSES];
+
+ /*
+ * for caching the main class pointer
+ */
+ struct dept_class *classes[DEPT_MAX_SUBCLASSES_CACHE];
+ };
+};
+
+struct dept_map {
+ const char *name;
+ struct dept_key *keys;
+
+ /*
+ * subclass that can be set from user
+ */
+ int sub_u;
+
+ /*
+ * It's local copy for fast access to the associated classes.
+ * Also used for dept_key for static maps.
+ */
+ struct dept_key map_key;
+
+ /*
+ * wait timestamp associated to this map
+ */
+ unsigned int wgen;
+
+ /*
+ * whether this map should be going to be checked or not
+ */
+ bool nocheck;
+};
+
+#define DEPT_MAP_INITIALIZER(n, k) \
+{ \
+ .name = #n, \
+ .keys = (struct dept_key *)(k), \
+ .sub_u = 0, \
+ .map_key = { .classes = { NULL, } }, \
+ .wgen = 0U, \
+ .nocheck = false, \
+}
+
+struct dept_stack {
+ union {
+ struct llist_node pool_node;
+ struct {
+ /*
+ * reference counter for object management
+ */
+ atomic_t ref;
+
+ /*
+ * backtrace entries
+ */
+ unsigned long raw[DEPT_MAX_STACK_ENTRY];
+ int nr;
+ };
+ };
+};
+
+struct dept_ecxt {
+ union {
+ struct llist_node pool_node;
+ struct {
+ /*
+ * reference counter for object management
+ */
+ atomic_t ref;
+
+ /*
+ * function that entered to this ecxt
+ */
+ const char *ecxt_fn;
+
+ /*
+ * event function
+ */
+ const char *event_fn;
+
+ /*
+ * associated class
+ */
+ struct dept_class *class;
+
+ /*
+ * flag indicating which IRQ has been
+ * enabled within the event context
+ */
+ unsigned long enirqf;
+
+ /*
+ * where the IRQ-enabled happened
+ */
+ unsigned long enirq_ip[DEPT_IRQS_NR];
+ struct dept_stack *enirq_stack[DEPT_IRQS_NR];
+
+ /*
+ * where the event context started
+ */
+ unsigned long ecxt_ip;
+ struct dept_stack *ecxt_stack;
+
+ /*
+ * where the event triggered
+ */
+ unsigned long event_ip;
+ struct dept_stack *event_stack;
+ };
+ };
+};
+
+struct dept_wait {
+ union {
+ struct llist_node pool_node;
+ struct {
+ /*
+ * reference counter for object management
+ */
+ atomic_t ref;
+
+ /*
+ * function causing this wait
+ */
+ const char *wait_fn;
+
+ /*
+ * the associated class
+ */
+ struct dept_class *class;
+
+ /*
+ * which IRQ the wait was placed in
+ */
+ unsigned long irqf;
+
+ /*
+ * where the IRQ wait happened
+ */
+ unsigned long irq_ip[DEPT_IRQS_NR];
+ struct dept_stack *irq_stack[DEPT_IRQS_NR];
+
+ /*
+ * where the wait happened
+ */
+ unsigned long wait_ip;
+ struct dept_stack *wait_stack;
+
+ /*
+ * whether this wait is for commit in scheduler
+ */
+ bool sched_sleep;
+ };
+ };
+};
+
+struct dept_dep {
+ union {
+ struct llist_node pool_node;
+ struct {
+ /*
+ * reference counter for object management
+ */
+ atomic_t ref;
+
+ /*
+ * key data of dependency
+ */
+ struct dept_ecxt *ecxt;
+ struct dept_wait *wait;
+
+ /*
+ * This object can be referred without dept_lock
+ * held but with IRQ disabled, e.g. for hash
+ * lookup. So deferred deletion is needed.
+ */
+ struct rcu_head rh;
+
+ /*
+ * for BFS
+ */
+ struct list_head bfs_node;
+
+ /*
+ * for hashing this object
+ */
+ struct hlist_node hash_node;
+
+ /*
+ * for linking to a class object
+ */
+ struct list_head dep_node;
+ struct list_head dep_rev_node;
+ };
+ };
+};
+
+struct dept_hash {
+ /*
+ * hash table
+ */
+ struct hlist_head *table;
+
+ /*
+ * size of the table e.i. 2^bits
+ */
+ int bits;
+};
+
+struct dept_pool {
+ const char *name;
+
+ /*
+ * object size
+ */
+ size_t obj_sz;
+
+ /*
+ * the number of the static array
+ */
+ atomic_t obj_nr;
+
+ /*
+ * offset of ->pool_node
+ */
+ size_t node_off;
+
+ /*
+ * pointer to the pool
+ */
+ void *spool;
+ struct llist_head boot_pool;
+ struct llist_head __percpu *lpool;
+};
+
+struct dept_ecxt_held {
+ /*
+ * associated event context
+ */
+ struct dept_ecxt *ecxt;
+
+ /*
+ * unique key for this dept_ecxt_held
+ */
+ struct dept_map *map;
+
+ /*
+ * class of the ecxt of this dept_ecxt_held
+ */
+ struct dept_class *class;
+
+ /*
+ * the wgen when the event context started
+ */
+ unsigned int wgen;
+
+ /*
+ * subclass that only works in the local context
+ */
+ int sub_l;
+};
+
+struct dept_wait_hist {
+ /*
+ * associated wait
+ */
+ struct dept_wait *wait;
+
+ /*
+ * unique id of all waits system-wise until wrapped
+ */
+ unsigned int wgen;
+
+ /*
+ * local context id to identify IRQ context
+ */
+ unsigned int ctxt_id;
+};
+
+struct dept_task {
+ /*
+ * all event contexts that have entered and before exiting
+ */
+ struct dept_ecxt_held ecxt_held[DEPT_MAX_ECXT_HELD];
+ int ecxt_held_pos;
+
+ /*
+ * ring buffer holding all waits that have happened
+ */
+ struct dept_wait_hist wait_hist[DEPT_MAX_WAIT_HIST];
+ int wait_hist_pos;
+
+ /*
+ * sequential id to identify each IRQ context
+ */
+ unsigned int irq_id[DEPT_IRQS_NR];
+
+ /*
+ * for tracking IRQ-enabled points with cross-event
+ */
+ unsigned int wgen_enirq[DEPT_IRQS_NR];
+
+ /*
+ * for keeping up-to-date IRQ-enabled points
+ */
+ unsigned long enirq_ip[DEPT_IRQS_NR];
+
+ /*
+ * current effective IRQ-enabled flag
+ */
+ unsigned long eff_enirqf;
+
+ /*
+ * for reserving a current stack instance at each operation
+ */
+ struct dept_stack *stack;
+
+ /*
+ * for preventing recursive call into DEPT engine
+ */
+ int recursive;
+
+ /*
+ * for staging data to commit a wait
+ */
+ struct dept_map stage_m;
+ bool stage_sched_map;
+ const char *stage_w_fn;
+ unsigned long stage_ip;
+
+ /*
+ * the number of missing ecxts
+ */
+ int missing_ecxt;
+
+ /*
+ * for tracking IRQ-enable state
+ */
+ bool hardirqs_enabled;
+ bool softirqs_enabled;
+
+ /*
+ * whether the current is on do_exit()
+ */
+ bool task_exit;
+
+ /*
+ * whether the current is running __schedule()
+ */
+ bool in_sched;
+};
+
+#define DEPT_TASK_INITIALIZER(t) \
+{ \
+ .wait_hist = { { .wait = NULL, } }, \
+ .ecxt_held_pos = 0, \
+ .wait_hist_pos = 0, \
+ .irq_id = { 0U }, \
+ .wgen_enirq = { 0U }, \
+ .enirq_ip = { 0UL }, \
+ .eff_enirqf = 0UL, \
+ .stack = NULL, \
+ .recursive = 0, \
+ .stage_m = DEPT_MAP_INITIALIZER((t)->stage_m, NULL), \
+ .stage_sched_map = false, \
+ .stage_w_fn = NULL, \
+ .stage_ip = 0UL, \
+ .missing_ecxt = 0, \
+ .hardirqs_enabled = false, \
+ .softirqs_enabled = false, \
+ .task_exit = false, \
+ .in_sched = false, \
+}
+
+extern void dept_on(void);
+extern void dept_off(void);
+extern void dept_init(void);
+extern void dept_task_init(struct task_struct *t);
+extern void dept_task_exit(struct task_struct *t);
+extern void dept_free_range(void *start, unsigned int sz);
+extern void dept_map_init(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
+extern void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
+extern void dept_map_copy(struct dept_map *to, struct dept_map *from);
+
+extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int sub_l);
+extern void dept_stage_wait(struct dept_map *m, struct dept_key *k, unsigned long ip, const char *w_fn, bool strong);
+extern void dept_request_event_wait_commit(void);
+extern void dept_clean_stage(void);
+extern void dept_stage_event(struct task_struct *t, unsigned long ip);
+extern void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *c_fn, const char *e_fn, int sub_l);
+extern bool dept_ecxt_holding(struct dept_map *m, unsigned long e_f);
+extern void dept_request_event(struct dept_map *m);
+extern void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *e_fn);
+extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
+extern void dept_sched_enter(void);
+extern void dept_sched_exit(void);
+
+static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
+{
+ dept_ecxt_enter(m, 0UL, 0UL, NULL, NULL, 0);
+}
+
+/*
+ * for users who want to manage external keys
+ */
+extern void dept_key_init(struct dept_key *k);
+extern void dept_key_destroy(struct dept_key *k);
+extern void dept_map_ecxt_modify(struct dept_map *m, unsigned long e_f, struct dept_key *new_k, unsigned long new_e_f, unsigned long new_ip, const char *new_c_fn, const char *new_e_fn, int new_sub_l);
+
+extern void dept_softirq_enter(void);
+extern void dept_hardirq_enter(void);
+extern void dept_softirqs_on(unsigned long ip);
+extern void dept_hardirqs_on(unsigned long ip);
+extern void dept_softirqs_off(unsigned long ip);
+extern void dept_hardirqs_off(unsigned long ip);
+#else /* !CONFIG_DEPT */
+struct dept_key { };
+struct dept_map { };
+struct dept_task { };
+
+#define DEPT_MAP_INITIALIZER(n, k) { }
+#define DEPT_TASK_INITIALIZER(t) { }
+
+#define dept_on() do { } while (0)
+#define dept_off() do { } while (0)
+#define dept_init() do { } while (0)
+#define dept_task_init(t) do { } while (0)
+#define dept_task_exit(t) do { } while (0)
+#define dept_free_range(s, sz) do { } while (0)
+#define dept_map_init(m, k, su, n) do { (void)(n); (void)(k); } while (0)
+#define dept_map_reinit(m, k, su, n) do { (void)(n); (void)(k); } while (0)
+#define dept_map_copy(t, f) do { } while (0)
+
+#define dept_wait(m, w_f, ip, w_fn, sl) do { (void)(w_fn); } while (0)
+#define dept_stage_wait(m, k, ip, w_fn, s) do { (void)(k); (void)(w_fn); } while (0)
+#define dept_request_event_wait_commit() do { } while (0)
+#define dept_clean_stage() do { } while (0)
+#define dept_stage_event(t, ip) do { } while (0)
+#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, sl) do { (void)(c_fn); (void)(e_fn); } while (0)
+#define dept_ecxt_holding(m, e_f) false
+#define dept_request_event(m) do { } while (0)
+#define dept_event(m, e_f, ip, e_fn) do { (void)(e_fn); } while (0)
+#define dept_ecxt_exit(m, e_f, ip) do { } while (0)
+#define dept_sched_enter() do { } while (0)
+#define dept_sched_exit() do { } while (0)
+#define dept_ecxt_enter_nokeep(m) do { } while (0)
+#define dept_key_init(k) do { (void)(k); } while (0)
+#define dept_key_destroy(k) do { (void)(k); } while (0)
+#define dept_map_ecxt_modify(m, e_f, n_k, n_e_f, n_ip, n_c_fn, n_e_fn, n_sl) do { (void)(n_k); (void)(n_c_fn); (void)(n_e_fn); } while (0)
+
+#define dept_softirq_enter() do { } while (0)
+#define dept_hardirq_enter() do { } while (0)
+#define dept_softirqs_on(ip) do { } while (0)
+#define dept_hardirqs_on(ip) do { } while (0)
+#define dept_softirqs_off(ip) do { } while (0)
+#define dept_hardirqs_off(ip) do { } while (0)
+#endif
+#endif /* __LINUX_DEPT_H */
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index d57cab4..bb279db 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -5,6 +5,7 @@
#include <linux/context_tracking_state.h>
#include <linux/preempt.h>
#include <linux/lockdep.h>
+#include <linux/dept.h>
#include <linux/ftrace_irq.h>
#include <linux/sched.h>
#include <linux/vtime.h>
@@ -106,6 +107,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
*/
#define __nmi_enter() \
do { \
+ dept_off(); \
lockdep_off(); \
arch_nmi_enter(); \
BUG_ON(in_nmi() == NMI_MASK); \
@@ -128,6 +130,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
__preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
arch_nmi_exit(); \
lockdep_on(); \
+ dept_on(); \
} while (0)

#define nmi_exit() \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 853d08f..fcb0099 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -37,6 +37,7 @@
#include <linux/kcsan.h>
#include <linux/rv.h>
#include <asm/kmap_size.h>
+#include <linux/dept.h>

/* task_struct member predeclarations (sorted alphabetically): */
struct audit_context;
@@ -1168,6 +1169,8 @@ struct task_struct {
struct held_lock held_locks[MAX_LOCK_DEPTH];
#endif

+ struct dept_task dept_task;
+
#if defined(CONFIG_UBSAN) && !defined(CONFIG_UBSAN_TRAP)
unsigned int in_ubsan;
#endif
diff --git a/init/init_task.c b/init/init_task.c
index ff6c4b9..eb36ad6 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -12,6 +12,7 @@
#include <linux/audit.h>
#include <linux/numa.h>
#include <linux/scs.h>
+#include <linux/dept.h>

#include <linux/uaccess.h>

@@ -194,6 +195,7 @@ struct task_struct init_task
.curr_chain_key = INITIAL_CHAIN_KEY,
.lockdep_recursion = 0,
#endif
+ .dept_task = DEPT_TASK_INITIALIZER(init_task),
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
.ret_stack = NULL,
.tracing_graph_pause = ATOMIC_INIT(0),
diff --git a/init/main.c b/init/main.c
index e1c3911..6e5b492 100644
--- a/init/main.c
+++ b/init/main.c
@@ -66,6 +66,7 @@
#include <linux/debug_locks.h>
#include <linux/debugobjects.h>
#include <linux/lockdep.h>
+#include <linux/dept.h>
#include <linux/kmemleak.h>
#include <linux/padata.h>
#include <linux/pid_namespace.h>
@@ -1080,6 +1081,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void)
panic_param);

lockdep_init();
+ dept_init();

/*
* Need to run this when irqs are enabled, because it wants
diff --git a/kernel/Makefile b/kernel/Makefile
index 10ef068..d1eb49e 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -51,6 +51,7 @@ obj-y += livepatch/
obj-y += dma/
obj-y += entry/
obj-$(CONFIG_MODULES) += module/
+obj-y += dependency/

obj-$(CONFIG_KCMP) += kcmp.o
obj-$(CONFIG_FREEZER) += freezer.o
diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
new file mode 100644
index 00000000..b5cfb8a
--- /dev/null
+++ b/kernel/dependency/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_DEPT) += dept.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
new file mode 100644
index 00000000..a54a770
--- /dev/null
+++ b/kernel/dependency/dept.c
@@ -0,0 +1,2980 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DEPT(DEPendency Tracker) - Runtime dependency tracker
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ *
+ * DEPT provides a general way to detect deadlock possibility in runtime
+ * and the interest is not limited to typical lock but to every
+ * syncronization primitives.
+ *
+ * The following ideas were borrowed from LOCKDEP:
+ *
+ * 1) Use a graph to track relationship between classes.
+ * 2) Prevent performance regression using hash.
+ *
+ * The following items were enhanced from LOCKDEP:
+ *
+ * 1) Cover more deadlock cases.
+ * 2) Allow muliple reports.
+ *
+ * TODO: Both LOCKDEP and DEPT should co-exist until DEPT is considered
+ * stable. Then the dependency check routine should be replaced with
+ * DEPT after. It should finally look like:
+ *
+ *
+ *
+ * As is:
+ *
+ * LOCKDEP
+ * +-----------------------------------------+
+ * | Lock usage correctness check | <-> locks
+ * | |
+ * | |
+ * | +-------------------------------------+ |
+ * | | Dependency check | |
+ * | | (by tracking lock acquisition order)| |
+ * | +-------------------------------------+ |
+ * | |
+ * +-----------------------------------------+
+ *
+ * DEPT
+ * +-----------------------------------------+
+ * | Dependency check | <-> waits/events
+ * | (by tracking wait and event context) |
+ * +-----------------------------------------+
+ *
+ *
+ *
+ * To be:
+ *
+ * LOCKDEP
+ * +-----------------------------------------+
+ * | Lock usage correctness check | <-> locks
+ * | |
+ * | |
+ * | (Request dependency check) |
+ * | T |
+ * +--------------------|--------------------+
+ * |
+ * DEPT V
+ * +-----------------------------------------+
+ * | Dependency check | <-> waits/events
+ * | (by tracking wait and event context) |
+ * +-----------------------------------------+
+ */
+
+#include <linux/sched.h>
+#include <linux/stacktrace.h>
+#include <linux/spinlock.h>
+#include <linux/kallsyms.h>
+#include <linux/hash.h>
+#include <linux/dept.h>
+#include <linux/utsname.h>
+
+static int dept_stop;
+static int dept_per_cpu_ready;
+
+#define DEPT_READY_WARN (!oops_in_progress)
+
+/*
+ * Make all operations using DEPT_WARN_ON() fail on oops_in_progress and
+ * prevent warning message.
+ */
+#define DEPT_WARN_ON_ONCE(c) \
+ ({ \
+ int __ret = 0; \
+ \
+ if (likely(DEPT_READY_WARN)) \
+ __ret = WARN_ONCE(c, "DEPT_WARN_ON_ONCE: " #c); \
+ __ret; \
+ })
+
+#define DEPT_WARN_ONCE(s...) \
+ ({ \
+ if (likely(DEPT_READY_WARN)) \
+ WARN_ONCE(1, "DEPT_WARN_ONCE: " s); \
+ })
+
+#define DEPT_WARN_ON(c) \
+ ({ \
+ int __ret = 0; \
+ \
+ if (likely(DEPT_READY_WARN)) \
+ __ret = WARN(c, "DEPT_WARN_ON: " #c); \
+ __ret; \
+ })
+
+#define DEPT_WARN(s...) \
+ ({ \
+ if (likely(DEPT_READY_WARN)) \
+ WARN(1, "DEPT_WARN: " s); \
+ })
+
+#define DEPT_STOP(s...) \
+ ({ \
+ WRITE_ONCE(dept_stop, 1); \
+ if (likely(DEPT_READY_WARN)) \
+ WARN(1, "DEPT_STOP: " s); \
+ })
+
+#define DEPT_INFO_ONCE(s...) pr_warn_once("DEPT_INFO_ONCE: " s)
+
+static arch_spinlock_t dept_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+static arch_spinlock_t stage_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+
+/*
+ * DEPT internal engine should be careful in using outside functions
+ * e.g. printk at reporting since that kind of usage might cause
+ * untrackable deadlock.
+ */
+static atomic_t dept_outworld = ATOMIC_INIT(0);
+
+static inline void dept_outworld_enter(void)
+{
+ atomic_inc(&dept_outworld);
+}
+
+static inline void dept_outworld_exit(void)
+{
+ atomic_dec(&dept_outworld);
+}
+
+static inline bool dept_outworld_entered(void)
+{
+ return atomic_read(&dept_outworld);
+}
+
+static inline bool dept_lock(void)
+{
+ while (!arch_spin_trylock(&dept_spin))
+ if (unlikely(dept_outworld_entered()))
+ return false;
+ return true;
+}
+
+static inline void dept_unlock(void)
+{
+ arch_spin_unlock(&dept_spin);
+}
+
+/*
+ * whether to stack-trace on every wait or every ecxt
+ */
+static bool rich_stack = true;
+
+enum bfs_ret {
+ BFS_CONTINUE,
+ BFS_CONTINUE_REV,
+ BFS_DONE,
+ BFS_SKIP,
+};
+
+static inline bool after(unsigned int a, unsigned int b)
+{
+ return (int)(b - a) < 0;
+}
+
+static inline bool before(unsigned int a, unsigned int b)
+{
+ return (int)(a - b) < 0;
+}
+
+static inline bool valid_stack(struct dept_stack *s)
+{
+ return s && s->nr > 0;
+}
+
+static inline bool valid_class(struct dept_class *c)
+{
+ return c->key;
+}
+
+static inline void invalidate_class(struct dept_class *c)
+{
+ c->key = 0UL;
+}
+
+static inline struct dept_ecxt *dep_e(struct dept_dep *d)
+{
+ return d->ecxt;
+}
+
+static inline struct dept_wait *dep_w(struct dept_dep *d)
+{
+ return d->wait;
+}
+
+static inline struct dept_class *dep_fc(struct dept_dep *d)
+{
+ return dep_e(d)->class;
+}
+
+static inline struct dept_class *dep_tc(struct dept_dep *d)
+{
+ return dep_w(d)->class;
+}
+
+static inline const char *irq_str(int irq)
+{
+ if (irq == DEPT_SIRQ)
+ return "softirq";
+ if (irq == DEPT_HIRQ)
+ return "hardirq";
+ return "(unknown)";
+}
+
+static inline struct dept_task *dept_task(void)
+{
+ return &current->dept_task;
+}
+
+/*
+ * Dept doesn't work either when it's stopped by DEPT_STOP() or in a nmi
+ * context.
+ */
+static inline bool dept_working(void)
+{
+ return !READ_ONCE(dept_stop) && !in_nmi();
+}
+
+/*
+ * Even k == NULL is considered as a valid key because it would use
+ * &->map_key as the key in that case.
+ */
+struct dept_key __dept_no_validate__;
+static inline bool valid_key(struct dept_key *k)
+{
+ return &__dept_no_validate__ != k;
+}
+
+/*
+ * Pool
+ * =====================================================================
+ * DEPT maintains pools to provide objects in a safe way.
+ *
+ * 1) Static pool is used at the beginning of booting time.
+ * 2) Local pool is tried first before the static pool. Objects that
+ * have been freed will be placed.
+ */
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+ #include "dept_object.h"
+#undef OBJECT
+ OBJECT_NR,
+};
+
+#define OBJECT(id, nr) \
+static struct dept_##id spool_##id[nr]; \
+static DEFINE_PER_CPU(struct llist_head, lpool_##id);
+ #include "dept_object.h"
+#undef OBJECT
+
+static struct dept_pool pool[OBJECT_NR] = {
+#define OBJECT(id, nr) { \
+ .name = #id, \
+ .obj_sz = sizeof(struct dept_##id), \
+ .obj_nr = ATOMIC_INIT(nr), \
+ .node_off = offsetof(struct dept_##id, pool_node), \
+ .spool = spool_##id, \
+ .lpool = &lpool_##id, },
+ #include "dept_object.h"
+#undef OBJECT
+};
+
+/*
+ * Can use llist no matter whether CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is
+ * enabled or not because NMI and other contexts in the same CPU never
+ * run inside of DEPT concurrently by preventing reentrance.
+ */
+static void *from_pool(enum object_t t)
+{
+ struct dept_pool *p;
+ struct llist_head *h;
+ struct llist_node *n;
+
+ /*
+ * llist_del_first() doesn't allow concurrent access e.g.
+ * between process and IRQ context.
+ */
+ if (DEPT_WARN_ON(!irqs_disabled()))
+ return NULL;
+
+ p = &pool[t];
+
+ /*
+ * Try local pool first.
+ */
+ if (likely(dept_per_cpu_ready))
+ h = this_cpu_ptr(p->lpool);
+ else
+ h = &p->boot_pool;
+
+ n = llist_del_first(h);
+ if (n)
+ return (void *)n - p->node_off;
+
+ /*
+ * Try static pool.
+ */
+ if (atomic_read(&p->obj_nr) > 0) {
+ int idx = atomic_dec_return(&p->obj_nr);
+
+ if (idx >= 0)
+ return p->spool + (idx * p->obj_sz);
+ }
+
+ DEPT_INFO_ONCE("---------------------------------------------\n"
+ " Some of Dept internal resources are run out.\n"
+ " Dept might still work if the resources get freed.\n"
+ " However, the chances are Dept will suffer from\n"
+ " the lack from now. Needs to extend the internal\n"
+ " resource pools. Ask [email protected]\n");
+ return NULL;
+}
+
+static void to_pool(void *o, enum object_t t)
+{
+ struct dept_pool *p = &pool[t];
+ struct llist_head *h;
+
+ preempt_disable();
+ if (likely(dept_per_cpu_ready))
+ h = this_cpu_ptr(p->lpool);
+ else
+ h = &p->boot_pool;
+
+ llist_add(o + p->node_off, h);
+ preempt_enable();
+}
+
+#define OBJECT(id, nr) \
+static void (*ctor_##id)(struct dept_##id *a); \
+static void (*dtor_##id)(struct dept_##id *a); \
+static inline struct dept_##id *new_##id(void) \
+{ \
+ struct dept_##id *a; \
+ \
+ a = (struct dept_##id *)from_pool(OBJECT_##id); \
+ if (unlikely(!a)) \
+ return NULL; \
+ \
+ atomic_set(&a->ref, 1); \
+ \
+ if (ctor_##id) \
+ ctor_##id(a); \
+ \
+ return a; \
+} \
+ \
+static inline struct dept_##id *get_##id(struct dept_##id *a) \
+{ \
+ atomic_inc(&a->ref); \
+ return a; \
+} \
+ \
+static inline void put_##id(struct dept_##id *a) \
+{ \
+ if (!atomic_dec_return(&a->ref)) { \
+ if (dtor_##id) \
+ dtor_##id(a); \
+ to_pool(a, OBJECT_##id); \
+ } \
+} \
+ \
+static inline void del_##id(struct dept_##id *a) \
+{ \
+ put_##id(a); \
+} \
+ \
+static inline bool id##_consumed(struct dept_##id *a) \
+{ \
+ return a && atomic_read(&a->ref) > 1; \
+}
+#include "dept_object.h"
+#undef OBJECT
+
+#define SET_CONSTRUCTOR(id, f) \
+static void (*ctor_##id)(struct dept_##id *a) = f
+
+static void initialize_dep(struct dept_dep *d)
+{
+ INIT_LIST_HEAD(&d->bfs_node);
+ INIT_LIST_HEAD(&d->dep_node);
+ INIT_LIST_HEAD(&d->dep_rev_node);
+}
+SET_CONSTRUCTOR(dep, initialize_dep);
+
+static void initialize_class(struct dept_class *c)
+{
+ int i;
+
+ for (i = 0; i < DEPT_IRQS_NR; i++) {
+ struct dept_iecxt *ie = &c->iecxt[i];
+ struct dept_iwait *iw = &c->iwait[i];
+
+ ie->ecxt = NULL;
+ ie->enirq = i;
+ ie->staled = false;
+
+ iw->wait = NULL;
+ iw->irq = i;
+ iw->staled = false;
+ iw->touched = false;
+ }
+ c->bfs_gen = 0U;
+
+ INIT_LIST_HEAD(&c->all_node);
+ INIT_LIST_HEAD(&c->dep_head);
+ INIT_LIST_HEAD(&c->dep_rev_head);
+}
+SET_CONSTRUCTOR(class, initialize_class);
+
+static void initialize_ecxt(struct dept_ecxt *e)
+{
+ int i;
+
+ for (i = 0; i < DEPT_IRQS_NR; i++) {
+ e->enirq_stack[i] = NULL;
+ e->enirq_ip[i] = 0UL;
+ }
+ e->ecxt_ip = 0UL;
+ e->ecxt_stack = NULL;
+ e->enirqf = 0UL;
+ e->event_ip = 0UL;
+ e->event_stack = NULL;
+}
+SET_CONSTRUCTOR(ecxt, initialize_ecxt);
+
+static void initialize_wait(struct dept_wait *w)
+{
+ int i;
+
+ for (i = 0; i < DEPT_IRQS_NR; i++) {
+ w->irq_stack[i] = NULL;
+ w->irq_ip[i] = 0UL;
+ }
+ w->wait_ip = 0UL;
+ w->wait_stack = NULL;
+ w->irqf = 0UL;
+}
+SET_CONSTRUCTOR(wait, initialize_wait);
+
+static void initialize_stack(struct dept_stack *s)
+{
+ s->nr = 0;
+}
+SET_CONSTRUCTOR(stack, initialize_stack);
+
+#define OBJECT(id, nr) \
+static void (*ctor_##id)(struct dept_##id *a);
+ #include "dept_object.h"
+#undef OBJECT
+
+#undef SET_CONSTRUCTOR
+
+#define SET_DESTRUCTOR(id, f) \
+static void (*dtor_##id)(struct dept_##id *a) = f
+
+static void destroy_dep(struct dept_dep *d)
+{
+ if (dep_e(d))
+ put_ecxt(dep_e(d));
+ if (dep_w(d))
+ put_wait(dep_w(d));
+}
+SET_DESTRUCTOR(dep, destroy_dep);
+
+static void destroy_ecxt(struct dept_ecxt *e)
+{
+ int i;
+
+ for (i = 0; i < DEPT_IRQS_NR; i++)
+ if (e->enirq_stack[i])
+ put_stack(e->enirq_stack[i]);
+ if (e->class)
+ put_class(e->class);
+ if (e->ecxt_stack)
+ put_stack(e->ecxt_stack);
+ if (e->event_stack)
+ put_stack(e->event_stack);
+}
+SET_DESTRUCTOR(ecxt, destroy_ecxt);
+
+static void destroy_wait(struct dept_wait *w)
+{
+ int i;
+
+ for (i = 0; i < DEPT_IRQS_NR; i++)
+ if (w->irq_stack[i])
+ put_stack(w->irq_stack[i]);
+ if (w->class)
+ put_class(w->class);
+ if (w->wait_stack)
+ put_stack(w->wait_stack);
+}
+SET_DESTRUCTOR(wait, destroy_wait);
+
+#define OBJECT(id, nr) \
+static void (*dtor_##id)(struct dept_##id *a);
+ #include "dept_object.h"
+#undef OBJECT
+
+#undef SET_DESTRUCTOR
+
+/*
+ * Caching and hashing
+ * =====================================================================
+ * DEPT makes use of caching and hashing to improve performance. Each
+ * object can be obtained in O(1) with its key.
+ *
+ * NOTE: Currently we assume all the objects in the hashs will never be
+ * removed. Implement it when needed.
+ */
+
+/*
+ * Some information might be lost but it's only for hashing key.
+ */
+static inline unsigned long mix(unsigned long a, unsigned long b)
+{
+ int halfbits = sizeof(unsigned long) * 8 / 2;
+ unsigned long halfmask = (1UL << halfbits) - 1UL;
+
+ return (a << halfbits) | (b & halfmask);
+}
+
+static bool cmp_dep(struct dept_dep *d1, struct dept_dep *d2)
+{
+ return dep_fc(d1)->key == dep_fc(d2)->key &&
+ dep_tc(d1)->key == dep_tc(d2)->key;
+}
+
+static unsigned long key_dep(struct dept_dep *d)
+{
+ return mix(dep_fc(d)->key, dep_tc(d)->key);
+}
+
+static bool cmp_class(struct dept_class *c1, struct dept_class *c2)
+{
+ return c1->key == c2->key;
+}
+
+static unsigned long key_class(struct dept_class *c)
+{
+ return c->key;
+}
+
+#define HASH(id, bits) \
+static struct hlist_head table_##id[1UL << bits]; \
+ \
+static inline struct hlist_head *head_##id(struct dept_##id *a) \
+{ \
+ return table_##id + hash_long(key_##id(a), bits); \
+} \
+ \
+static inline struct dept_##id *hash_lookup_##id(struct dept_##id *a) \
+{ \
+ struct dept_##id *b; \
+ \
+ hlist_for_each_entry_rcu(b, head_##id(a), hash_node) \
+ if (cmp_##id(a, b)) \
+ return b; \
+ return NULL; \
+} \
+ \
+static inline void hash_add_##id(struct dept_##id *a) \
+{ \
+ get_##id(a); \
+ hlist_add_head_rcu(&a->hash_node, head_##id(a)); \
+} \
+ \
+static inline void hash_del_##id(struct dept_##id *a) \
+{ \
+ hlist_del_rcu(&a->hash_node); \
+ put_##id(a); \
+}
+#include "dept_hash.h"
+#undef HASH
+
+static inline struct dept_dep *lookup_dep(struct dept_class *fc,
+ struct dept_class *tc)
+{
+ struct dept_ecxt onetime_e = { .class = fc };
+ struct dept_wait onetime_w = { .class = tc };
+ struct dept_dep onetime_d = { .ecxt = &onetime_e,
+ .wait = &onetime_w };
+ return hash_lookup_dep(&onetime_d);
+}
+
+static inline struct dept_class *lookup_class(unsigned long key)
+{
+ struct dept_class onetime_c = { .key = key };
+
+ return hash_lookup_class(&onetime_c);
+}
+
+/*
+ * Report
+ * =====================================================================
+ * DEPT prints useful information to help debuging on detection of
+ * problematic dependency.
+ */
+
+static inline void print_ip_stack(unsigned long ip, struct dept_stack *s)
+{
+ if (ip)
+ print_ip_sym(KERN_WARNING, ip);
+
+ if (valid_stack(s)) {
+ pr_warn("stacktrace:\n");
+ stack_trace_print(s->raw, s->nr, 5);
+ }
+
+ if (!ip && !valid_stack(s))
+ pr_warn("(N/A)\n");
+}
+
+#define print_spc(spc, fmt, ...) \
+ pr_warn("%*c" fmt, (spc) * 4, ' ', ##__VA_ARGS__)
+
+static void print_diagram(struct dept_dep *d)
+{
+ struct dept_ecxt *e = dep_e(d);
+ struct dept_wait *w = dep_w(d);
+ struct dept_class *fc = dep_fc(d);
+ struct dept_class *tc = dep_tc(d);
+ unsigned long irqf;
+ int irq;
+ bool firstline = true;
+ int spc = 1;
+ const char *w_fn = w->wait_fn ?: "(unknown)";
+ const char *e_fn = e->event_fn ?: "(unknown)";
+ const char *c_fn = e->ecxt_fn ?: "(unknown)";
+ const char *fc_n = fc->sched_map ? "<sched>" : (fc->name ?: "(unknown)");
+ const char *tc_n = tc->sched_map ? "<sched>" : (tc->name ?: "(unknown)");
+
+ irqf = e->enirqf & w->irqf;
+ for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ if (!firstline)
+ pr_warn("\nor\n\n");
+ firstline = false;
+
+ print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc_n, fc->sub_id);
+ print_spc(spc, " <%s interrupt>\n", irq_str(irq));
+ print_spc(spc + 1, "[W] %s(%s:%d)\n", w_fn, tc_n, tc->sub_id);
+ print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc_n, fc->sub_id);
+ }
+
+ if (!irqf) {
+ print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc_n, fc->sub_id);
+ print_spc(spc, "[W] %s(%s:%d)\n", w_fn, tc_n, tc->sub_id);
+ print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc_n, fc->sub_id);
+ }
+}
+
+static void print_dep(struct dept_dep *d)
+{
+ struct dept_ecxt *e = dep_e(d);
+ struct dept_wait *w = dep_w(d);
+ struct dept_class *fc = dep_fc(d);
+ struct dept_class *tc = dep_tc(d);
+ unsigned long irqf;
+ int irq;
+ const char *w_fn = w->wait_fn ?: "(unknown)";
+ const char *e_fn = e->event_fn ?: "(unknown)";
+ const char *c_fn = e->ecxt_fn ?: "(unknown)";
+ const char *fc_n = fc->sched_map ? "<sched>" : (fc->name ?: "(unknown)");
+ const char *tc_n = tc->sched_map ? "<sched>" : (tc->name ?: "(unknown)");
+
+ irqf = e->enirqf & w->irqf;
+ for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ pr_warn("%s has been enabled:\n", irq_str(irq));
+ print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
+ pr_warn("\n");
+
+ pr_warn("[S] %s(%s:%d):\n", c_fn, fc_n, fc->sub_id);
+ print_ip_stack(e->ecxt_ip, e->ecxt_stack);
+ pr_warn("\n");
+
+ pr_warn("[W] %s(%s:%d) in %s context:\n",
+ w_fn, tc_n, tc->sub_id, irq_str(irq));
+ print_ip_stack(w->irq_ip[irq], w->irq_stack[irq]);
+ pr_warn("\n");
+
+ pr_warn("[E] %s(%s:%d):\n", e_fn, fc_n, fc->sub_id);
+ print_ip_stack(e->event_ip, e->event_stack);
+ }
+
+ if (!irqf) {
+ pr_warn("[S] %s(%s:%d):\n", c_fn, fc_n, fc->sub_id);
+ print_ip_stack(e->ecxt_ip, e->ecxt_stack);
+ pr_warn("\n");
+
+ pr_warn("[W] %s(%s:%d):\n", w_fn, tc_n, tc->sub_id);
+ print_ip_stack(w->wait_ip, w->wait_stack);
+ pr_warn("\n");
+
+ pr_warn("[E] %s(%s:%d):\n", e_fn, fc_n, fc->sub_id);
+ print_ip_stack(e->event_ip, e->event_stack);
+ }
+}
+
+static void save_current_stack(int skip);
+
+/*
+ * Print all classes in a circle.
+ */
+static void print_circle(struct dept_class *c)
+{
+ struct dept_class *fc = c->bfs_parent;
+ struct dept_class *tc = c;
+ int i;
+
+ dept_outworld_enter();
+ save_current_stack(6);
+
+ pr_warn("===================================================\n");
+ pr_warn("DEPT: Circular dependency has been detected.\n");
+ pr_warn("%s %.*s %s\n", init_utsname()->release,
+ (int)strcspn(init_utsname()->version, " "),
+ init_utsname()->version,
+ print_tainted());
+ pr_warn("---------------------------------------------------\n");
+ pr_warn("summary\n");
+ pr_warn("---------------------------------------------------\n");
+
+ if (fc == tc)
+ pr_warn("*** AA DEADLOCK ***\n\n");
+ else
+ pr_warn("*** DEADLOCK ***\n\n");
+
+ i = 0;
+ do {
+ struct dept_dep *d = lookup_dep(fc, tc);
+
+ pr_warn("context %c\n", 'A' + (i++));
+ print_diagram(d);
+ if (fc != c)
+ pr_warn("\n");
+
+ tc = fc;
+ fc = fc->bfs_parent;
+ } while (tc != c);
+
+ pr_warn("\n");
+ pr_warn("[S]: start of the event context\n");
+ pr_warn("[W]: the wait blocked\n");
+ pr_warn("[E]: the event not reachable\n");
+
+ i = 0;
+ do {
+ struct dept_dep *d = lookup_dep(fc, tc);
+
+ pr_warn("---------------------------------------------------\n");
+ pr_warn("context %c's detail\n", 'A' + i);
+ pr_warn("---------------------------------------------------\n");
+ pr_warn("context %c\n", 'A' + (i++));
+ print_diagram(d);
+ pr_warn("\n");
+ print_dep(d);
+
+ tc = fc;
+ fc = fc->bfs_parent;
+ } while (tc != c);
+
+ pr_warn("---------------------------------------------------\n");
+ pr_warn("information that might be helpful\n");
+ pr_warn("---------------------------------------------------\n");
+ dump_stack();
+
+ dept_outworld_exit();
+}
+
+/*
+ * BFS(Breadth First Search)
+ * =====================================================================
+ * Whenever a new dependency is added into the graph, search the graph
+ * for a new circular dependency.
+ */
+
+static inline void enqueue(struct list_head *h, struct dept_dep *d)
+{
+ list_add_tail(&d->bfs_node, h);
+}
+
+static inline struct dept_dep *dequeue(struct list_head *h)
+{
+ struct dept_dep *d;
+
+ d = list_first_entry(h, struct dept_dep, bfs_node);
+ list_del(&d->bfs_node);
+ return d;
+}
+
+static inline bool empty(struct list_head *h)
+{
+ return list_empty(h);
+}
+
+static void extend_queue(struct list_head *h, struct dept_class *cur)
+{
+ struct dept_dep *d;
+
+ list_for_each_entry(d, &cur->dep_head, dep_node) {
+ struct dept_class *next = dep_tc(d);
+
+ if (cur->bfs_gen == next->bfs_gen)
+ continue;
+ next->bfs_gen = cur->bfs_gen;
+ next->bfs_dist = cur->bfs_dist + 1;
+ next->bfs_parent = cur;
+ enqueue(h, d);
+ }
+}
+
+static void extend_queue_rev(struct list_head *h, struct dept_class *cur)
+{
+ struct dept_dep *d;
+
+ list_for_each_entry(d, &cur->dep_rev_head, dep_rev_node) {
+ struct dept_class *next = dep_fc(d);
+
+ if (cur->bfs_gen == next->bfs_gen)
+ continue;
+ next->bfs_gen = cur->bfs_gen;
+ next->bfs_dist = cur->bfs_dist + 1;
+ next->bfs_parent = cur;
+ enqueue(h, d);
+ }
+}
+
+typedef enum bfs_ret bfs_f(struct dept_dep *d, void *in, void **out);
+static unsigned int bfs_gen;
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
+{
+ LIST_HEAD(q);
+ enum bfs_ret ret;
+
+ if (DEPT_WARN_ON(!cb))
+ return;
+
+ /*
+ * Avoid zero bfs_gen.
+ */
+ bfs_gen = bfs_gen + 1 ?: 1;
+
+ c->bfs_gen = bfs_gen;
+ c->bfs_dist = 0;
+ c->bfs_parent = c;
+
+ ret = cb(NULL, in, out);
+ if (ret == BFS_DONE)
+ return;
+ if (ret == BFS_SKIP)
+ return;
+ if (ret == BFS_CONTINUE)
+ extend_queue(&q, c);
+ if (ret == BFS_CONTINUE_REV)
+ extend_queue_rev(&q, c);
+
+ while (!empty(&q)) {
+ struct dept_dep *d = dequeue(&q);
+
+ ret = cb(d, in, out);
+ if (ret == BFS_DONE)
+ break;
+ if (ret == BFS_SKIP)
+ continue;
+ if (ret == BFS_CONTINUE)
+ extend_queue(&q, dep_tc(d));
+ if (ret == BFS_CONTINUE_REV)
+ extend_queue_rev(&q, dep_fc(d));
+ }
+
+ while (!empty(&q))
+ dequeue(&q);
+}
+
+/*
+ * Main operations
+ * =====================================================================
+ * Add dependencies - Each new dependency is added into the graph and
+ * checked if it forms a circular dependency.
+ *
+ * Track waits - Waits are queued into the ring buffer for later use to
+ * generate appropriate dependencies with cross-event.
+ *
+ * Track event contexts(ecxt) - Event contexts are pushed into local
+ * stack for later use to generate appropriate dependencies with waits.
+ */
+
+static inline unsigned long cur_enirqf(void);
+static inline int cur_irq(void);
+static inline unsigned int cur_ctxt_id(void);
+
+static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
+{
+ return &c->iecxt[irq];
+}
+
+static inline struct dept_iwait *iwait(struct dept_class *c, int irq)
+{
+ return &c->iwait[irq];
+}
+
+static inline void stale_iecxt(struct dept_iecxt *ie)
+{
+ if (ie->ecxt)
+ put_ecxt(ie->ecxt);
+
+ WRITE_ONCE(ie->ecxt, NULL);
+ WRITE_ONCE(ie->staled, true);
+}
+
+static inline void set_iecxt(struct dept_iecxt *ie, struct dept_ecxt *e)
+{
+ /*
+ * ->ecxt will never be updated once getting set until the class
+ * gets removed.
+ */
+ if (ie->ecxt)
+ DEPT_WARN_ON(1);
+ else
+ WRITE_ONCE(ie->ecxt, get_ecxt(e));
+}
+
+static inline void stale_iwait(struct dept_iwait *iw)
+{
+ if (iw->wait)
+ put_wait(iw->wait);
+
+ WRITE_ONCE(iw->wait, NULL);
+ WRITE_ONCE(iw->staled, true);
+}
+
+static inline void set_iwait(struct dept_iwait *iw, struct dept_wait *w)
+{
+ /*
+ * ->wait will never be updated once getting set until the class
+ * gets removed.
+ */
+ if (iw->wait)
+ DEPT_WARN_ON(1);
+ else
+ WRITE_ONCE(iw->wait, get_wait(w));
+
+ iw->touched = true;
+}
+
+static inline void touch_iwait(struct dept_iwait *iw)
+{
+ iw->touched = true;
+}
+
+static inline void untouch_iwait(struct dept_iwait *iw)
+{
+ iw->touched = false;
+}
+
+static inline struct dept_stack *get_current_stack(void)
+{
+ struct dept_stack *s = dept_task()->stack;
+
+ return s ? get_stack(s) : NULL;
+}
+
+static inline void prepare_current_stack(void)
+{
+ struct dept_stack *s = dept_task()->stack;
+
+ /*
+ * The dept_stack is already ready.
+ */
+ if (s && !stack_consumed(s)) {
+ s->nr = 0;
+ return;
+ }
+
+ if (s)
+ put_stack(s);
+
+ s = dept_task()->stack = new_stack();
+ if (!s)
+ return;
+
+ get_stack(s);
+ del_stack(s);
+}
+
+static void save_current_stack(int skip)
+{
+ struct dept_stack *s = dept_task()->stack;
+
+ if (!s)
+ return;
+ if (valid_stack(s))
+ return;
+
+ s->nr = stack_trace_save(s->raw, DEPT_MAX_STACK_ENTRY, skip);
+}
+
+static void finish_current_stack(void)
+{
+ struct dept_stack *s = dept_task()->stack;
+
+ if (stack_consumed(s))
+ save_current_stack(2);
+}
+
+/*
+ * FIXME: For now, disable LOCKDEP while DEPT is working.
+ *
+ * Both LOCKDEP and DEPT report it on a deadlock detection using
+ * printk taking the risk of another deadlock that might be caused by
+ * locks of console or printk between inside and outside of them.
+ *
+ * For DEPT, it's no problem since multiple reports are allowed. But it
+ * would be a bad idea for LOCKDEP since it will stop even on a singe
+ * report. So we need to prevent LOCKDEP from its reporting the risk
+ * DEPT would take when reporting something.
+ */
+#include <linux/lockdep.h>
+
+void dept_off(void)
+{
+ dept_task()->recursive++;
+ lockdep_off();
+}
+
+void dept_on(void)
+{
+ dept_task()->recursive--;
+ lockdep_on();
+}
+
+static inline unsigned long dept_enter(void)
+{
+ unsigned long flags;
+
+ flags = arch_local_irq_save();
+ dept_off();
+ prepare_current_stack();
+ return flags;
+}
+
+static inline void dept_exit(unsigned long flags)
+{
+ finish_current_stack();
+ dept_on();
+ arch_local_irq_restore(flags);
+}
+
+static inline unsigned long dept_enter_recursive(void)
+{
+ unsigned long flags;
+
+ flags = arch_local_irq_save();
+ return flags;
+}
+
+static inline void dept_exit_recursive(unsigned long flags)
+{
+ arch_local_irq_restore(flags);
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static struct dept_dep *__add_dep(struct dept_ecxt *e,
+ struct dept_wait *w)
+{
+ struct dept_dep *d;
+
+ if (DEPT_WARN_ON(!valid_class(e->class)))
+ return NULL;
+
+ if (DEPT_WARN_ON(!valid_class(w->class)))
+ return NULL;
+
+ if (lookup_dep(e->class, w->class))
+ return NULL;
+
+ d = new_dep();
+ if (unlikely(!d))
+ return NULL;
+
+ d->ecxt = get_ecxt(e);
+ d->wait = get_wait(w);
+
+ /*
+ * Add the dependency into hash and graph.
+ */
+ hash_add_dep(d);
+ list_add(&d->dep_node, &dep_fc(d)->dep_head);
+ list_add(&d->dep_rev_node, &dep_tc(d)->dep_rev_head);
+ return d;
+}
+
+static enum bfs_ret cb_check_dl(struct dept_dep *d,
+ void *in, void **out)
+{
+ struct dept_dep *new = (struct dept_dep *)in;
+
+ /*
+ * initial condition for this BFS search
+ */
+ if (!d) {
+ dep_tc(new)->bfs_parent = dep_fc(new);
+
+ if (dep_tc(new) != dep_fc(new))
+ return BFS_CONTINUE;
+
+ /*
+ * AA circle does not make additional deadlock. We don't
+ * have to continue this BFS search.
+ */
+ print_circle(dep_tc(new));
+ return BFS_DONE;
+ }
+
+ /*
+ * Allow multiple reports.
+ */
+ if (dep_tc(d) == dep_fc(new))
+ print_circle(dep_tc(new));
+
+ return BFS_CONTINUE;
+}
+
+/*
+ * This function is actually in charge of reporting.
+ */
+static inline void check_dl_bfs(struct dept_dep *d)
+{
+ bfs(dep_tc(d), cb_check_dl, (void *)d, NULL);
+}
+
+static enum bfs_ret cb_find_iw(struct dept_dep *d, void *in, void **out)
+{
+ int irq = *(int *)in;
+ struct dept_class *fc;
+ struct dept_iwait *iw;
+
+ if (DEPT_WARN_ON(!out))
+ return BFS_DONE;
+
+ /*
+ * initial condition for this BFS search
+ */
+ if (!d)
+ return BFS_CONTINUE_REV;
+
+ fc = dep_fc(d);
+ iw = iwait(fc, irq);
+
+ /*
+ * If any parent's ->wait was set, then the children would've
+ * been touched.
+ */
+ if (!iw->touched)
+ return BFS_SKIP;
+
+ if (!iw->wait)
+ return BFS_CONTINUE_REV;
+
+ *out = iw;
+ return BFS_DONE;
+}
+
+static struct dept_iwait *find_iw_bfs(struct dept_class *c, int irq)
+{
+ struct dept_iwait *iw = iwait(c, irq);
+ struct dept_iwait *found = NULL;
+
+ if (iw->wait)
+ return iw;
+
+ /*
+ * '->touched == false' guarantees there's no parent that has
+ * been set ->wait.
+ */
+ if (!iw->touched)
+ return NULL;
+
+ bfs(c, cb_find_iw, (void *)&irq, (void **)&found);
+
+ if (found)
+ return found;
+
+ untouch_iwait(iw);
+ return NULL;
+}
+
+static enum bfs_ret cb_touch_iw_find_ie(struct dept_dep *d, void *in,
+ void **out)
+{
+ int irq = *(int *)in;
+ struct dept_class *tc;
+ struct dept_iecxt *ie;
+ struct dept_iwait *iw;
+
+ if (DEPT_WARN_ON(!out))
+ return BFS_DONE;
+
+ /*
+ * initial condition for this BFS search
+ */
+ if (!d)
+ return BFS_CONTINUE;
+
+ tc = dep_tc(d);
+ ie = iecxt(tc, irq);
+ iw = iwait(tc, irq);
+
+ touch_iwait(iw);
+
+ if (!ie->ecxt)
+ return BFS_CONTINUE;
+
+ if (!*out)
+ *out = ie;
+
+ return BFS_CONTINUE;
+}
+
+static struct dept_iecxt *touch_iw_find_ie_bfs(struct dept_class *c,
+ int irq)
+{
+ struct dept_iecxt *ie = iecxt(c, irq);
+ struct dept_iwait *iw = iwait(c, irq);
+ struct dept_iecxt *found = ie->ecxt ? ie : NULL;
+
+ touch_iwait(iw);
+ bfs(c, cb_touch_iw_find_ie, (void *)&irq, (void **)&found);
+ return found;
+}
+
+/*
+ * Should be called with dept_lock held.
+ */
+static void __add_idep(struct dept_iecxt *ie, struct dept_iwait *iw)
+{
+ struct dept_dep *new;
+
+ /*
+ * There's nothing to do.
+ */
+ if (!ie || !iw || !ie->ecxt || !iw->wait)
+ return;
+
+ new = __add_dep(ie->ecxt, iw->wait);
+
+ /*
+ * Deadlock detected. Let check_dl_bfs() report it.
+ */
+ if (new) {
+ check_dl_bfs(new);
+ stale_iecxt(ie);
+ stale_iwait(iw);
+ }
+
+ /*
+ * If !new, it would be the case of lack of object resource.
+ * Just let it go and get checked by other chances. Retrying is
+ * meaningless in that case.
+ */
+}
+
+static void set_check_iecxt(struct dept_class *c, int irq,
+ struct dept_ecxt *e)
+{
+ struct dept_iecxt *ie = iecxt(c, irq);
+
+ set_iecxt(ie, e);
+ __add_idep(ie, find_iw_bfs(c, irq));
+}
+
+static void set_check_iwait(struct dept_class *c, int irq,
+ struct dept_wait *w)
+{
+ struct dept_iwait *iw = iwait(c, irq);
+
+ set_iwait(iw, w);
+ __add_idep(touch_iw_find_ie_bfs(c, irq), iw);
+}
+
+static void add_iecxt(struct dept_class *c, int irq, struct dept_ecxt *e,
+ bool stack)
+{
+ /*
+ * This access is safe since we ensure e->class has set locally.
+ */
+ struct dept_task *dt = dept_task();
+ struct dept_iecxt *ie = iecxt(c, irq);
+
+ if (DEPT_WARN_ON(!valid_class(c)))
+ return;
+
+ if (unlikely(READ_ONCE(ie->staled)))
+ return;
+
+ /*
+ * Skip add_iecxt() if ie->ecxt has ever been set at least once.
+ * Which means it has a valid ->ecxt or been staled.
+ */
+ if (READ_ONCE(ie->ecxt))
+ return;
+
+ if (unlikely(!dept_lock()))
+ return;
+
+ if (unlikely(ie->staled))
+ goto unlock;
+ if (ie->ecxt)
+ goto unlock;
+
+ e->enirqf |= (1UL << irq);
+
+ /*
+ * Should be NULL since it's the first time that these
+ * enirq_{ip,stack}[irq] have ever set.
+ */
+ DEPT_WARN_ON(e->enirq_ip[irq]);
+ DEPT_WARN_ON(e->enirq_stack[irq]);
+
+ e->enirq_ip[irq] = dt->enirq_ip[irq];
+ e->enirq_stack[irq] = stack ? get_current_stack() : NULL;
+
+ set_check_iecxt(c, irq, e);
+unlock:
+ dept_unlock();
+}
+
+static void add_iwait(struct dept_class *c, int irq, struct dept_wait *w)
+{
+ struct dept_iwait *iw = iwait(c, irq);
+
+ if (DEPT_WARN_ON(!valid_class(c)))
+ return;
+
+ if (unlikely(READ_ONCE(iw->staled)))
+ return;
+
+ /*
+ * Skip add_iwait() if iw->wait has ever been set at least once.
+ * Which means it has a valid ->wait or been staled.
+ */
+ if (READ_ONCE(iw->wait))
+ return;
+
+ if (unlikely(!dept_lock()))
+ return;
+
+ if (unlikely(iw->staled))
+ goto unlock;
+ if (iw->wait)
+ goto unlock;
+
+ w->irqf |= (1UL << irq);
+
+ /*
+ * Should be NULL since it's the first time that these
+ * irq_{ip,stack}[irq] have ever set.
+ */
+ DEPT_WARN_ON(w->irq_ip[irq]);
+ DEPT_WARN_ON(w->irq_stack[irq]);
+
+ w->irq_ip[irq] = w->wait_ip;
+ w->irq_stack[irq] = get_current_stack();
+
+ set_check_iwait(c, irq, w);
+unlock:
+ dept_unlock();
+}
+
+static inline struct dept_wait_hist *hist(int pos)
+{
+ struct dept_task *dt = dept_task();
+
+ return dt->wait_hist + (pos % DEPT_MAX_WAIT_HIST);
+}
+
+static inline int hist_pos_next(void)
+{
+ struct dept_task *dt = dept_task();
+
+ return dt->wait_hist_pos % DEPT_MAX_WAIT_HIST;
+}
+
+static inline void hist_advance(void)
+{
+ struct dept_task *dt = dept_task();
+
+ dt->wait_hist_pos++;
+ dt->wait_hist_pos %= DEPT_MAX_WAIT_HIST;
+}
+
+static inline struct dept_wait_hist *new_hist(void)
+{
+ struct dept_wait_hist *wh = hist(hist_pos_next());
+
+ hist_advance();
+ return wh;
+}
+
+static void add_hist(struct dept_wait *w, unsigned int wg, unsigned int ctxt_id)
+{
+ struct dept_wait_hist *wh = new_hist();
+
+ if (likely(wh->wait))
+ put_wait(wh->wait);
+
+ wh->wait = get_wait(w);
+ wh->wgen = wg;
+ wh->ctxt_id = ctxt_id;
+}
+
+/*
+ * Should be called after setting up e's iecxt and w's iwait.
+ */
+static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
+{
+ struct dept_class *fc = e->class;
+ struct dept_class *tc = w->class;
+ struct dept_dep *d;
+ int i;
+
+ if (lookup_dep(fc, tc))
+ return;
+
+ if (unlikely(!dept_lock()))
+ return;
+
+ /*
+ * __add_dep() will lookup_dep() again with lock held.
+ */
+ d = __add_dep(e, w);
+ if (d) {
+ check_dl_bfs(d);
+
+ for (i = 0; i < DEPT_IRQS_NR; i++) {
+ struct dept_iwait *fiw = iwait(fc, i);
+ struct dept_iecxt *found_ie;
+ struct dept_iwait *found_iw;
+
+ /*
+ * '->touched == false' guarantees there's no
+ * parent that has been set ->wait.
+ */
+ if (!fiw->touched)
+ continue;
+
+ /*
+ * find_iw_bfs() will untouch the iwait if
+ * not found.
+ */
+ found_iw = find_iw_bfs(fc, i);
+
+ if (!found_iw)
+ continue;
+
+ found_ie = touch_iw_find_ie_bfs(tc, i);
+ __add_idep(found_ie, found_iw);
+ }
+ }
+ dept_unlock();
+}
+
+static atomic_t wgen = ATOMIC_INIT(1);
+
+static void add_wait(struct dept_class *c, unsigned long ip,
+ const char *w_fn, int sub_l, bool sched_sleep)
+{
+ struct dept_task *dt = dept_task();
+ struct dept_wait *w;
+ unsigned int wg = 0U;
+ int irq;
+ int i;
+
+ if (DEPT_WARN_ON(!valid_class(c)))
+ return;
+
+ w = new_wait();
+ if (unlikely(!w))
+ return;
+
+ WRITE_ONCE(w->class, get_class(c));
+ w->wait_ip = ip;
+ w->wait_fn = w_fn;
+ w->wait_stack = get_current_stack();
+ w->sched_sleep = sched_sleep;
+
+ irq = cur_irq();
+ if (irq < DEPT_IRQS_NR)
+ add_iwait(c, irq, w);
+
+ /*
+ * Avoid adding dependency between user aware nested ecxt and
+ * wait.
+ */
+ for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+ struct dept_ecxt_held *eh;
+
+ eh = dt->ecxt_held + i;
+
+ /*
+ * the case of invalid key'ed one
+ */
+ if (!eh->ecxt)
+ continue;
+
+ if (eh->ecxt->class != c || eh->sub_l == sub_l)
+ add_dep(eh->ecxt, w);
+ }
+
+ if (!wait_consumed(w) && !rich_stack) {
+ if (w->wait_stack)
+ put_stack(w->wait_stack);
+ w->wait_stack = NULL;
+ }
+
+ /*
+ * Avoid zero wgen.
+ */
+ wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+ add_hist(w, wg, cur_ctxt_id());
+
+ del_wait(w);
+}
+
+static bool add_ecxt(struct dept_map *m, struct dept_class *c,
+ unsigned long ip, const char *c_fn,
+ const char *e_fn, int sub_l)
+{
+ struct dept_task *dt = dept_task();
+ struct dept_ecxt_held *eh;
+ struct dept_ecxt *e;
+ unsigned long irqf;
+ int irq;
+
+ if (DEPT_WARN_ON(!valid_class(c)))
+ return false;
+
+ if (DEPT_WARN_ON_ONCE(dt->ecxt_held_pos >= DEPT_MAX_ECXT_HELD))
+ return false;
+
+ if (m->nocheck) {
+ eh = dt->ecxt_held + (dt->ecxt_held_pos++);
+ eh->ecxt = NULL;
+ eh->map = m;
+ eh->class = get_class(c);
+ eh->wgen = atomic_read(&wgen);
+ eh->sub_l = sub_l;
+
+ return true;
+ }
+
+ e = new_ecxt();
+ if (unlikely(!e))
+ return false;
+
+ e->class = get_class(c);
+ e->ecxt_ip = ip;
+ e->ecxt_stack = ip && rich_stack ? get_current_stack() : NULL;
+ e->event_fn = e_fn;
+ e->ecxt_fn = c_fn;
+
+ eh = dt->ecxt_held + (dt->ecxt_held_pos++);
+ eh->ecxt = get_ecxt(e);
+ eh->map = m;
+ eh->class = get_class(c);
+ eh->wgen = atomic_read(&wgen);
+ eh->sub_l = sub_l;
+
+ irqf = cur_enirqf();
+ for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+ add_iecxt(c, irq, e, false);
+
+ del_ecxt(e);
+ return true;
+}
+
+static int find_ecxt_pos(struct dept_map *m, struct dept_class *c,
+ bool newfirst)
+{
+ struct dept_task *dt = dept_task();
+ int i;
+
+ if (newfirst) {
+ for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+ struct dept_ecxt_held *eh;
+
+ eh = dt->ecxt_held + i;
+ if (eh->map == m && eh->class == c)
+ return i;
+ }
+ } else {
+ for (i = 0; i < dt->ecxt_held_pos; i++) {
+ struct dept_ecxt_held *eh;
+
+ eh = dt->ecxt_held + i;
+ if (eh->map == m && eh->class == c)
+ return i;
+ }
+ }
+ return -1;
+}
+
+static bool pop_ecxt(struct dept_map *m, struct dept_class *c)
+{
+ struct dept_task *dt = dept_task();
+ int pos;
+ int i;
+
+ pos = find_ecxt_pos(m, c, true);
+ if (pos == -1)
+ return false;
+
+ if (dt->ecxt_held[pos].class)
+ put_class(dt->ecxt_held[pos].class);
+
+ if (dt->ecxt_held[pos].ecxt)
+ put_ecxt(dt->ecxt_held[pos].ecxt);
+
+ dt->ecxt_held_pos--;
+
+ for (i = pos; i < dt->ecxt_held_pos; i++)
+ dt->ecxt_held[i] = dt->ecxt_held[i + 1];
+ return true;
+}
+
+static inline bool good_hist(struct dept_wait_hist *wh, unsigned int wg)
+{
+ return wh->wait != NULL && before(wg, wh->wgen);
+}
+
+/*
+ * Binary-search the ring buffer for the earliest valid wait.
+ */
+static int find_hist_pos(unsigned int wg)
+{
+ int oldest;
+ int l;
+ int r;
+ int pos;
+
+ oldest = hist_pos_next();
+ if (unlikely(good_hist(hist(oldest), wg))) {
+ DEPT_INFO_ONCE("Need to expand the ring buffer.\n");
+ return oldest;
+ }
+
+ l = oldest + 1;
+ r = oldest + DEPT_MAX_WAIT_HIST - 1;
+ for (pos = (l + r) / 2; l <= r; pos = (l + r) / 2) {
+ struct dept_wait_hist *p = hist(pos - 1);
+ struct dept_wait_hist *wh = hist(pos);
+
+ if (!good_hist(p, wg) && good_hist(wh, wg))
+ return pos % DEPT_MAX_WAIT_HIST;
+ if (good_hist(wh, wg))
+ r = pos - 1;
+ else
+ l = pos + 1;
+ }
+ return -1;
+}
+
+static void do_event(struct dept_map *m, struct dept_class *c,
+ unsigned int wg, unsigned long ip)
+{
+ struct dept_task *dt = dept_task();
+ struct dept_wait_hist *wh;
+ struct dept_ecxt_held *eh;
+ unsigned int ctxt_id;
+ int end;
+ int pos;
+ int i;
+
+ if (DEPT_WARN_ON(!valid_class(c)))
+ return;
+
+ if (m->nocheck)
+ return;
+
+ /*
+ * The event was triggered before wait.
+ */
+ if (!wg)
+ return;
+
+ pos = find_ecxt_pos(m, c, false);
+ if (pos == -1)
+ return;
+
+ eh = dt->ecxt_held + pos;
+
+ if (DEPT_WARN_ON(!eh->ecxt))
+ return;
+
+ eh->ecxt->event_ip = ip;
+ eh->ecxt->event_stack = get_current_stack();
+
+ /*
+ * The ecxt already has done what it needs.
+ */
+ if (!before(wg, eh->wgen))
+ return;
+
+ pos = find_hist_pos(wg);
+ if (pos == -1)
+ return;
+
+ ctxt_id = cur_ctxt_id();
+ end = hist_pos_next();
+ end = end > pos ? end : end + DEPT_MAX_WAIT_HIST;
+ for (wh = hist(pos); pos < end; wh = hist(++pos)) {
+ if (after(wh->wgen, eh->wgen))
+ break;
+
+ if (dt->in_sched && wh->wait->sched_sleep)
+ continue;
+
+ if (wh->ctxt_id == ctxt_id)
+ add_dep(eh->ecxt, wh->wait);
+ }
+
+ for (i = 0; i < DEPT_IRQS_NR; i++) {
+ struct dept_ecxt *e;
+
+ if (before(dt->wgen_enirq[i], wg))
+ continue;
+
+ e = eh->ecxt;
+ add_iecxt(e->class, i, e, false);
+ }
+}
+
+static void del_dep_rcu(struct rcu_head *rh)
+{
+ struct dept_dep *d = container_of(rh, struct dept_dep, rh);
+
+ preempt_disable();
+ del_dep(d);
+ preempt_enable();
+}
+
+/*
+ * NOTE: Must be called with dept_lock held.
+ */
+static void disconnect_class(struct dept_class *c)
+{
+ struct dept_dep *d, *n;
+ int i;
+
+ list_for_each_entry_safe(d, n, &c->dep_head, dep_node) {
+ list_del_rcu(&d->dep_node);
+ list_del_rcu(&d->dep_rev_node);
+ hash_del_dep(d);
+ call_rcu(&d->rh, del_dep_rcu);
+ }
+
+ list_for_each_entry_safe(d, n, &c->dep_rev_head, dep_rev_node) {
+ list_del_rcu(&d->dep_node);
+ list_del_rcu(&d->dep_rev_node);
+ hash_del_dep(d);
+ call_rcu(&d->rh, del_dep_rcu);
+ }
+
+ for (i = 0; i < DEPT_IRQS_NR; i++) {
+ stale_iecxt(iecxt(c, i));
+ stale_iwait(iwait(c, i));
+ }
+}
+
+/*
+ * Context control
+ * =====================================================================
+ * Whether a wait is in {hard,soft}-IRQ context or whether
+ * {hard,soft}-IRQ has been enabled on the way to an event is very
+ * important to check dependency. All those things should be tracked.
+ */
+
+static inline unsigned long cur_enirqf(void)
+{
+ struct dept_task *dt = dept_task();
+ int he = dt->hardirqs_enabled;
+ int se = dt->softirqs_enabled;
+
+ if (he)
+ return DEPT_HIRQF | (se ? DEPT_SIRQF : 0UL);
+ return 0UL;
+}
+
+static inline int cur_irq(void)
+{
+ if (lockdep_softirq_context(current))
+ return DEPT_SIRQ;
+ if (lockdep_hardirq_context())
+ return DEPT_HIRQ;
+ return DEPT_IRQS_NR;
+}
+
+static inline unsigned int cur_ctxt_id(void)
+{
+ struct dept_task *dt = dept_task();
+ int irq = cur_irq();
+
+ /*
+ * Normal process context
+ */
+ if (irq == DEPT_IRQS_NR)
+ return 0U;
+
+ return dt->irq_id[irq] | (1UL << irq);
+}
+
+static void enirq_transition(int irq)
+{
+ struct dept_task *dt = dept_task();
+ int i;
+
+ /*
+ * READ wgen >= wgen of an event with IRQ enabled has been
+ * observed on the way to the event means, the IRQ can cut in
+ * within the ecxt. Used for cross-event detection.
+ *
+ * wait context event context(ecxt)
+ * ------------ -------------------
+ * wait event
+ * WRITE wgen
+ * observe IRQ enabled
+ * READ wgen
+ * keep the wgen locally
+ *
+ * on the event
+ * check the local wgen
+ */
+ dt->wgen_enirq[irq] = atomic_read(&wgen);
+
+ for (i = dt->ecxt_held_pos - 1; i >= 0; i--) {
+ struct dept_ecxt_held *eh;
+ struct dept_ecxt *e;
+
+ eh = dt->ecxt_held + i;
+ e = eh->ecxt;
+ if (e)
+ add_iecxt(e->class, irq, e, true);
+ }
+}
+
+static void enirq_update(unsigned long ip)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long irqf;
+ unsigned long prev;
+ int irq;
+
+ prev = dt->eff_enirqf;
+ irqf = cur_enirqf();
+ dt->eff_enirqf = irqf;
+
+ /*
+ * Do enirq_transition() only on an OFF -> ON transition.
+ */
+ for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ if (prev & (1UL << irq))
+ continue;
+
+ dt->enirq_ip[irq] = ip;
+ enirq_transition(irq);
+ }
+}
+
+/*
+ * Ensure it has been called on ON/OFF transition.
+ */
+static void dept_enirq_transition(unsigned long ip)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ /*
+ * IRQ ON/OFF transition might happen while Dept is working.
+ * We cannot handle recursive entrance. Just ingnore it.
+ * Only transitions outside of Dept will be considered.
+ */
+ if (dt->recursive)
+ return;
+
+ flags = dept_enter();
+
+ enirq_update(ip);
+
+ dept_exit(flags);
+}
+
+void dept_softirqs_on(unsigned long ip)
+{
+ /*
+ * Assumes that it's called with IRQ disabled so that accessing
+ * current's fields is not racy.
+ */
+ dept_task()->softirqs_enabled = true;
+ dept_enirq_transition(ip);
+}
+
+void dept_hardirqs_on(unsigned long ip)
+{
+ /*
+ * Assumes that it's called with IRQ disabled so that accessing
+ * current's fields is not racy.
+ */
+ dept_task()->hardirqs_enabled = true;
+ dept_enirq_transition(ip);
+}
+EXPORT_SYMBOL_GPL(dept_hardirqs_on);
+
+void dept_softirqs_off(unsigned long ip)
+{
+ /*
+ * Assumes that it's called with IRQ disabled so that accessing
+ * current's fields is not racy.
+ */
+ dept_task()->softirqs_enabled = false;
+ dept_enirq_transition(ip);
+}
+
+void dept_hardirqs_off(unsigned long ip)
+{
+ /*
+ * Assumes that it's called with IRQ disabled so that accessing
+ * current's fields is not racy.
+ */
+ dept_task()->hardirqs_enabled = false;
+ dept_enirq_transition(ip);
+}
+EXPORT_SYMBOL_GPL(dept_hardirqs_off);
+
+/*
+ * Ensure it's the outmost softirq context.
+ */
+void dept_softirq_enter(void)
+{
+ struct dept_task *dt = dept_task();
+
+ dt->irq_id[DEPT_SIRQ] += 1UL << DEPT_IRQS_NR;
+}
+
+/*
+ * Ensure it's the outmost hardirq context.
+ */
+void dept_hardirq_enter(void)
+{
+ struct dept_task *dt = dept_task();
+
+ dt->irq_id[DEPT_HIRQ] += 1UL << DEPT_IRQS_NR;
+}
+
+void dept_sched_enter(void)
+{
+ dept_task()->in_sched = true;
+}
+
+void dept_sched_exit(void)
+{
+ dept_task()->in_sched = false;
+}
+
+/*
+ * Exposed APIs
+ * =====================================================================
+ */
+
+static inline void clean_classes_cache(struct dept_key *k)
+{
+ int i;
+
+ for (i = 0; i < DEPT_MAX_SUBCLASSES_CACHE; i++) {
+ if (!READ_ONCE(k->classes[i]))
+ continue;
+
+ WRITE_ONCE(k->classes[i], NULL);
+ }
+}
+
+void dept_map_init(struct dept_map *m, struct dept_key *k, int sub_u,
+ const char *n)
+{
+ unsigned long flags;
+
+ if (unlikely(!dept_working())) {
+ m->nocheck = true;
+ return;
+ }
+
+ if (DEPT_WARN_ON(sub_u < 0)) {
+ m->nocheck = true;
+ return;
+ }
+
+ if (DEPT_WARN_ON(sub_u >= DEPT_MAX_SUBCLASSES_USR)) {
+ m->nocheck = true;
+ return;
+ }
+
+ /*
+ * Allow recursive entrance.
+ */
+ flags = dept_enter_recursive();
+
+ clean_classes_cache(&m->map_key);
+
+ m->keys = k;
+ m->sub_u = sub_u;
+ m->name = n;
+ m->wgen = 0U;
+ m->nocheck = !valid_key(k);
+
+ dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_init);
+
+void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u,
+ const char *n)
+{
+ unsigned long flags;
+
+ if (unlikely(!dept_working())) {
+ m->nocheck = true;
+ return;
+ }
+
+ /*
+ * Allow recursive entrance.
+ */
+ flags = dept_enter_recursive();
+
+ if (k) {
+ clean_classes_cache(&m->map_key);
+ m->keys = k;
+ m->nocheck = !valid_key(k);
+ }
+
+ if (sub_u >= 0 && sub_u < DEPT_MAX_SUBCLASSES_USR)
+ m->sub_u = sub_u;
+
+ if (n)
+ m->name = n;
+
+ m->wgen = 0U;
+
+ dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_reinit);
+
+void dept_map_copy(struct dept_map *to, struct dept_map *from)
+{
+ if (unlikely(!dept_working())) {
+ to->nocheck = true;
+ return;
+ }
+
+ *to = *from;
+
+ /*
+ * XXX: 'to' might be in a stack or something. Using the address
+ * in a stack segment as a key is meaningless. Just ignore the
+ * case for now.
+ */
+ if (!to->keys) {
+ to->nocheck = true;
+ return;
+ }
+
+ /*
+ * Since the class cache can be modified concurrently we could
+ * observe half pointers (64bit arch using 32bit copy insns).
+ * Therefore clear the caches and take the performance hit.
+ *
+ * XXX: Doesn't work well with lockdep_set_class_and_subclass()
+ * since that relies on cache abuse.
+ */
+ clean_classes_cache(&to->map_key);
+}
+
+static LIST_HEAD(classes);
+
+static inline bool within(const void *addr, void *start, unsigned long size)
+{
+ return addr >= start && addr < start + size;
+}
+
+void dept_free_range(void *start, unsigned int sz)
+{
+ struct dept_task *dt = dept_task();
+ struct dept_class *c, *n;
+ unsigned long flags;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (dt->recursive) {
+ DEPT_STOP("Failed to successfully free Dept objects.\n");
+ return;
+ }
+
+ flags = dept_enter();
+
+ /*
+ * dept_free_range() should not fail.
+ *
+ * FIXME: Should be fixed if dept_free_range() causes deadlock
+ * with dept_lock().
+ */
+ while (unlikely(!dept_lock()))
+ cpu_relax();
+
+ list_for_each_entry_safe(c, n, &classes, all_node) {
+ if (!within((void *)c->key, start, sz) &&
+ !within(c->name, start, sz))
+ continue;
+
+ hash_del_class(c);
+ disconnect_class(c);
+ list_del(&c->all_node);
+ invalidate_class(c);
+
+ /*
+ * Actual deletion will happen on the rcu callback
+ * that has been added in disconnect_class().
+ */
+ del_class(c);
+ }
+ dept_unlock();
+ dept_exit(flags);
+
+ /*
+ * Wait until even lockless hash_lookup_class() for the class
+ * returns NULL.
+ */
+ might_sleep();
+ synchronize_rcu();
+}
+
+static inline int sub_id(struct dept_map *m, int e)
+{
+ return (m ? m->sub_u : 0) + e * DEPT_MAX_SUBCLASSES_USR;
+}
+
+static struct dept_class *check_new_class(struct dept_key *local,
+ struct dept_key *k, int sub_id,
+ const char *n, bool sched_map)
+{
+ struct dept_class *c = NULL;
+
+ if (DEPT_WARN_ON(sub_id >= DEPT_MAX_SUBCLASSES))
+ return NULL;
+
+ if (DEPT_WARN_ON(!k))
+ return NULL;
+
+ /*
+ * XXX: Assume that users prevent the map from using if any of
+ * the cached keys has been invalidated. If not, the cache,
+ * local->classes should not be used because it would be racy
+ * with class deletion.
+ */
+ if (local && sub_id < DEPT_MAX_SUBCLASSES_CACHE)
+ c = READ_ONCE(local->classes[sub_id]);
+
+ if (c)
+ return c;
+
+ c = lookup_class((unsigned long)k->base + sub_id);
+ if (c)
+ goto caching;
+
+ if (unlikely(!dept_lock()))
+ return NULL;
+
+ c = lookup_class((unsigned long)k->base + sub_id);
+ if (unlikely(c))
+ goto unlock;
+
+ c = new_class();
+ if (unlikely(!c))
+ goto unlock;
+
+ c->name = n;
+ c->sched_map = sched_map;
+ c->sub_id = sub_id;
+ c->key = (unsigned long)(k->base + sub_id);
+ hash_add_class(c);
+ list_add(&c->all_node, &classes);
+unlock:
+ dept_unlock();
+caching:
+ if (local && sub_id < DEPT_MAX_SUBCLASSES_CACHE)
+ WRITE_ONCE(local->classes[sub_id], c);
+
+ return c;
+}
+
+/*
+ * Called between dept_enter() and dept_exit().
+ */
+static void __dept_wait(struct dept_map *m, unsigned long w_f,
+ unsigned long ip, const char *w_fn, int sub_l,
+ bool sched_sleep, bool sched_map)
+{
+ int e;
+
+ /*
+ * Be as conservative as possible. In case of mulitple waits for
+ * a single dept_map, we are going to keep only the last wait's
+ * wgen for simplicity - keeping all wgens seems overengineering.
+ *
+ * Of course, it might cause missing some dependencies that
+ * would rarely, probabily never, happen but it helps avoid
+ * false positive report.
+ */
+ for_each_set_bit(e, &w_f, DEPT_MAX_SUBCLASSES_EVT) {
+ struct dept_class *c;
+ struct dept_key *k;
+
+ k = m->keys ?: &m->map_key;
+ c = check_new_class(&m->map_key, k,
+ sub_id(m, e), m->name, sched_map);
+ if (!c)
+ continue;
+
+ add_wait(c, ip, w_fn, sub_l, sched_sleep);
+ }
+}
+
+/*
+ * Called between dept_enter() and dept_exit().
+ */
+static void __dept_event(struct dept_map *m, unsigned long e_f,
+ unsigned long ip, const char *e_fn,
+ bool sched_map)
+{
+ struct dept_class *c;
+ struct dept_key *k;
+ int e;
+
+ e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+ if (DEPT_WARN_ON(e >= DEPT_MAX_SUBCLASSES_EVT))
+ goto exit;
+
+ /*
+ * An event is an event. If the caller passed more than single
+ * event, then warn it and handle the event corresponding to
+ * the first bit anyway.
+ */
+ DEPT_WARN_ON(1UL << e != e_f);
+
+ k = m->keys ?: &m->map_key;
+ c = check_new_class(&m->map_key, k, sub_id(m, e), m->name, sched_map);
+
+ if (c && add_ecxt(m, c, 0UL, NULL, e_fn, 0)) {
+ do_event(m, c, READ_ONCE(m->wgen), ip);
+ pop_ecxt(m, c);
+ }
+exit:
+ /*
+ * Keep the map diabled until the next sleep.
+ */
+ WRITE_ONCE(m->wgen, 0U);
+}
+
+void dept_wait(struct dept_map *m, unsigned long w_f,
+ unsigned long ip, const char *w_fn, int sub_l)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (dt->recursive)
+ return;
+
+ if (m->nocheck)
+ return;
+
+ flags = dept_enter();
+
+ __dept_wait(m, w_f, ip, w_fn, sub_l, false, false);
+
+ dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_wait);
+
+void dept_stage_wait(struct dept_map *m, struct dept_key *k,
+ unsigned long ip, const char *w_fn, bool strong)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (m && m->nocheck)
+ return;
+
+ /*
+ * Either m or k should be passed. Which means Dept relies on
+ * either its own map or the caller's position in the code when
+ * determining its class.
+ */
+ if (DEPT_WARN_ON(!m && !k))
+ return;
+
+ /*
+ * Allow recursive entrance.
+ */
+ flags = dept_enter_recursive();
+
+ arch_spin_lock(&stage_spin);
+
+ if (!strong && dt->stage_m.keys)
+ goto unlock;
+
+ if (m) {
+ dt->stage_m = *m;
+
+ /*
+ * Ensure dt->stage_m.keys != NULL and it works with the
+ * map's map_key, not stage_m's one when ->keys == NULL.
+ */
+ if (!m->keys)
+ dt->stage_m.keys = &m->map_key;
+ } else {
+ dt->stage_m.name = w_fn;
+ dt->stage_sched_map = true;
+ }
+
+ /*
+ * dept_map_reinit() includes WRITE_ONCE(->wgen, 0U) that
+ * effectively disables the map just in case real sleep won't
+ * happen. dept_request_event_wait_commit() will enable it.
+ */
+ dept_map_reinit(&dt->stage_m, k, -1, NULL);
+
+ dt->stage_w_fn = w_fn;
+ dt->stage_ip = ip;
+unlock:
+ arch_spin_unlock(&stage_spin);
+
+ dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_stage_wait);
+
+void dept_clean_stage(void)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ /*
+ * Allow recursive entrance.
+ */
+ flags = dept_enter_recursive();
+
+ arch_spin_lock(&stage_spin);
+ memset(&dt->stage_m, 0x0, sizeof(struct dept_map));
+ dt->stage_sched_map = false;
+ dt->stage_w_fn = NULL;
+ dt->stage_ip = 0UL;
+ arch_spin_unlock(&stage_spin);
+
+ dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_clean_stage);
+
+/*
+ * Always called from __schedule().
+ */
+void dept_request_event_wait_commit(void)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+ unsigned int wg;
+ unsigned long ip;
+ const char *w_fn;
+ bool sched_map;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ /*
+ * It's impossible that __schedule() is called while Dept is
+ * working that already disabled IRQ at the entrance.
+ */
+ if (DEPT_WARN_ON(dt->recursive))
+ return;
+
+ flags = dept_enter();
+
+ /*
+ * Checks if current has staged a wait.
+ */
+ if (!dt->stage_m.keys)
+ goto exit;
+
+ w_fn = dt->stage_w_fn;
+ ip = dt->stage_ip;
+ sched_map = dt->stage_sched_map;
+
+ /*
+ * Avoid zero wgen.
+ */
+ wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+ WRITE_ONCE(dt->stage_m.wgen, wg);
+
+ __dept_wait(&dt->stage_m, 1UL, ip, w_fn, 0, true, sched_map);
+exit:
+ dept_exit(flags);
+}
+
+/*
+ * Always called from try_to_wake_up().
+ */
+void dept_stage_event(struct task_struct *t, unsigned long ip)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+ struct dept_map m;
+ bool sched_map;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (dt->recursive)
+ return;
+
+ flags = dept_enter();
+
+ arch_spin_lock(&stage_spin);
+ m = dt->stage_m;
+ sched_map = dt->stage_sched_map;
+ arch_spin_unlock(&stage_spin);
+
+ /*
+ * ->stage_m.keys should not be NULL if it's in use. Should
+ * make sure that it's not NULL when staging a valid map.
+ */
+ if (!m.keys)
+ goto exit;
+
+ __dept_event(&m, 1UL, ip, "try_to_wake_up", sched_map);
+exit:
+ dept_exit(flags);
+}
+
+/*
+ * Modifies the latest ecxt corresponding to m and e_f.
+ */
+void dept_map_ecxt_modify(struct dept_map *m, unsigned long e_f,
+ struct dept_key *new_k, unsigned long new_e_f,
+ unsigned long new_ip, const char *new_c_fn,
+ const char *new_e_fn, int new_sub_l)
+{
+ struct dept_task *dt = dept_task();
+ struct dept_ecxt_held *eh;
+ struct dept_class *c;
+ struct dept_key *k;
+ unsigned long flags;
+ int pos = -1;
+ int new_e;
+ int e;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ /*
+ * XXX: Couldn't handle re-enterance cases. Ingore it for now.
+ */
+ if (dt->recursive)
+ return;
+
+ /*
+ * Should go ahead no matter whether ->nocheck == true or not
+ * because ->nocheck value can be changed within the ecxt area
+ * delimitated by dept_ecxt_enter() and dept_ecxt_exit().
+ */
+
+ flags = dept_enter();
+
+ for_each_set_bit(e, &e_f, DEPT_MAX_SUBCLASSES_EVT) {
+ k = m->keys ?: &m->map_key;
+ c = check_new_class(&m->map_key, k,
+ sub_id(m, e), m->name, false);
+ if (!c)
+ continue;
+
+ /*
+ * When it found an ecxt for any event in e_f, done.
+ */
+ pos = find_ecxt_pos(m, c, true);
+ if (pos != -1)
+ break;
+ }
+
+ if (unlikely(pos == -1))
+ goto exit;
+
+ eh = dt->ecxt_held + pos;
+ new_sub_l = new_sub_l >= 0 ? new_sub_l : eh->sub_l;
+
+ new_e = find_first_bit(&new_e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+ if (new_e < DEPT_MAX_SUBCLASSES_EVT)
+ /*
+ * Let it work with the first bit anyway.
+ */
+ DEPT_WARN_ON(1UL << new_e != new_e_f);
+ else
+ new_e = e;
+
+ pop_ecxt(m, c);
+
+ /*
+ * Apply the key to the map.
+ */
+ if (new_k)
+ dept_map_reinit(m, new_k, -1, NULL);
+
+ k = m->keys ?: &m->map_key;
+ c = check_new_class(&m->map_key, k, sub_id(m, new_e), m->name, false);
+
+ if (c && add_ecxt(m, c, new_ip, new_c_fn, new_e_fn, new_sub_l))
+ goto exit;
+
+ /*
+ * Successfully pop_ecxt()ed but failed to add_ecxt().
+ */
+ dt->missing_ecxt++;
+exit:
+ dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_map_ecxt_modify);
+
+void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip,
+ const char *c_fn, const char *e_fn, int sub_l)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+ struct dept_class *c;
+ struct dept_key *k;
+ int e;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (dt->recursive) {
+ dt->missing_ecxt++;
+ return;
+ }
+
+ /*
+ * Should go ahead no matter whether ->nocheck == true or not
+ * because ->nocheck value can be changed within the ecxt area
+ * delimitated by dept_ecxt_enter() and dept_ecxt_exit().
+ */
+
+ flags = dept_enter();
+
+ e = find_first_bit(&e_f, DEPT_MAX_SUBCLASSES_EVT);
+
+ if (e >= DEPT_MAX_SUBCLASSES_EVT)
+ goto missing_ecxt;
+
+ /*
+ * An event is an event. If the caller passed more than single
+ * event, then warn it and handle the event corresponding to
+ * the first bit anyway.
+ */
+ DEPT_WARN_ON(1UL << e != e_f);
+
+ k = m->keys ?: &m->map_key;
+ c = check_new_class(&m->map_key, k, sub_id(m, e), m->name, false);
+
+ if (c && add_ecxt(m, c, ip, c_fn, e_fn, sub_l))
+ goto exit;
+missing_ecxt:
+ dt->missing_ecxt++;
+exit:
+ dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_enter);
+
+bool dept_ecxt_holding(struct dept_map *m, unsigned long e_f)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+ bool ret = false;
+ int e;
+
+ if (unlikely(!dept_working()))
+ return false;
+
+ if (dt->recursive)
+ return false;
+
+ flags = dept_enter();
+
+ for_each_set_bit(e, &e_f, DEPT_MAX_SUBCLASSES_EVT) {
+ struct dept_class *c;
+ struct dept_key *k;
+
+ k = m->keys ?: &m->map_key;
+ c = check_new_class(&m->map_key, k,
+ sub_id(m, e), m->name, false);
+ if (!c)
+ continue;
+
+ if (find_ecxt_pos(m, c, true) != -1) {
+ ret = true;
+ break;
+ }
+ }
+
+ dept_exit(flags);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_holding);
+
+void dept_request_event(struct dept_map *m)
+{
+ unsigned long flags;
+ unsigned int wg;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (m->nocheck)
+ return;
+
+ /*
+ * Allow recursive entrance.
+ */
+ flags = dept_enter_recursive();
+
+ /*
+ * Avoid zero wgen.
+ */
+ wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
+ WRITE_ONCE(m->wgen, wg);
+
+ dept_exit_recursive(flags);
+}
+EXPORT_SYMBOL_GPL(dept_request_event);
+
+void dept_event(struct dept_map *m, unsigned long e_f,
+ unsigned long ip, const char *e_fn)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (dt->recursive) {
+ /*
+ * Dept won't work with this even though an event
+ * context has been asked. Don't make it confused at
+ * handling the event. Disable it until the next.
+ */
+ WRITE_ONCE(m->wgen, 0U);
+ return;
+ }
+
+ if (m->nocheck)
+ return;
+
+ flags = dept_enter();
+
+ __dept_event(m, e_f, ip, e_fn, false);
+
+ dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_event);
+
+void dept_ecxt_exit(struct dept_map *m, unsigned long e_f,
+ unsigned long ip)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+ int e;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (dt->recursive) {
+ dt->missing_ecxt--;
+ return;
+ }
+
+ /*
+ * Should go ahead no matter whether ->nocheck == true or not
+ * because ->nocheck value can be changed within the ecxt area
+ * delimitated by dept_ecxt_enter() and dept_ecxt_exit().
+ */
+
+ flags = dept_enter();
+
+ for_each_set_bit(e, &e_f, DEPT_MAX_SUBCLASSES_EVT) {
+ struct dept_class *c;
+ struct dept_key *k;
+
+ k = m->keys ?: &m->map_key;
+ c = check_new_class(&m->map_key, k,
+ sub_id(m, e), m->name, false);
+ if (!c)
+ continue;
+
+ /*
+ * When it found an ecxt for any event in e_f, done.
+ */
+ if (pop_ecxt(m, c))
+ goto exit;
+ }
+
+ dt->missing_ecxt--;
+exit:
+ dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_ecxt_exit);
+
+void dept_task_exit(struct task_struct *t)
+{
+ struct dept_task *dt = &t->dept_task;
+ int i;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ raw_local_irq_disable();
+
+ if (dt->stack)
+ put_stack(dt->stack);
+
+ for (i = 0; i < dt->ecxt_held_pos; i++) {
+ if (dt->ecxt_held[i].class)
+ put_class(dt->ecxt_held[i].class);
+ if (dt->ecxt_held[i].ecxt)
+ put_ecxt(dt->ecxt_held[i].ecxt);
+ }
+
+ for (i = 0; i < DEPT_MAX_WAIT_HIST; i++)
+ if (dt->wait_hist[i].wait)
+ put_wait(dt->wait_hist[i].wait);
+
+ dt->task_exit = true;
+ dept_off();
+
+ raw_local_irq_enable();
+}
+
+void dept_task_init(struct task_struct *t)
+{
+ memset(&t->dept_task, 0x0, sizeof(struct dept_task));
+}
+
+void dept_key_init(struct dept_key *k)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+ int sub_id;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (dt->recursive) {
+ DEPT_STOP("Key initialization fails.\n");
+ return;
+ }
+
+ flags = dept_enter();
+
+ clean_classes_cache(k);
+
+ /*
+ * dept_key_init() should not fail.
+ *
+ * FIXME: Should be fixed if dept_key_init() causes deadlock
+ * with dept_lock().
+ */
+ while (unlikely(!dept_lock()))
+ cpu_relax();
+
+ for (sub_id = 0; sub_id < DEPT_MAX_SUBCLASSES; sub_id++) {
+ struct dept_class *c;
+
+ c = lookup_class((unsigned long)k->base + sub_id);
+ if (!c)
+ continue;
+
+ DEPT_STOP("The class(%s/%d) has not been removed.\n",
+ c->name, sub_id);
+ break;
+ }
+
+ dept_unlock();
+ dept_exit(flags);
+}
+EXPORT_SYMBOL_GPL(dept_key_init);
+
+void dept_key_destroy(struct dept_key *k)
+{
+ struct dept_task *dt = dept_task();
+ unsigned long flags;
+ int sub_id;
+
+ if (unlikely(!dept_working()))
+ return;
+
+ if (dt->recursive == 1 && dt->task_exit) {
+ /*
+ * Need to allow to go ahead in this case where
+ * ->recursive has been set to 1 by dept_off() in
+ * dept_task_exit() and ->task_exit has been set to
+ * true in dept_task_exit().
+ */
+ } else if (dt->recursive) {
+ DEPT_STOP("Key destroying fails.\n");
+ return;
+ }
+
+ flags = dept_enter();
+
+ /*
+ * dept_key_destroy() should not fail.
+ *
+ * FIXME: Should be fixed if dept_key_destroy() causes deadlock
+ * with dept_lock().
+ */
+ while (unlikely(!dept_lock()))
+ cpu_relax();
+
+ for (sub_id = 0; sub_id < DEPT_MAX_SUBCLASSES; sub_id++) {
+ struct dept_class *c;
+
+ c = lookup_class((unsigned long)k->base + sub_id);
+ if (!c)
+ continue;
+
+ hash_del_class(c);
+ disconnect_class(c);
+ list_del(&c->all_node);
+ invalidate_class(c);
+
+ /*
+ * Actual deletion will happen on the rcu callback
+ * that has been added in disconnect_class().
+ */
+ del_class(c);
+ }
+
+ dept_unlock();
+ dept_exit(flags);
+
+ /*
+ * Wait until even lockless hash_lookup_class() for the class
+ * returns NULL.
+ */
+ might_sleep();
+ synchronize_rcu();
+}
+EXPORT_SYMBOL_GPL(dept_key_destroy);
+
+static void move_llist(struct llist_head *to, struct llist_head *from)
+{
+ struct llist_node *first = llist_del_all(from);
+ struct llist_node *last;
+
+ if (!first)
+ return;
+
+ for (last = first; last->next; last = last->next);
+ llist_add_batch(first, last, to);
+}
+
+static void migrate_per_cpu_pool(void)
+{
+ const int boot_cpu = 0;
+ int i;
+
+ /*
+ * The boot CPU has been using the temperal local pool so far.
+ * From now on that per_cpu areas have been ready, use the
+ * per_cpu local pool instead.
+ */
+ DEPT_WARN_ON(smp_processor_id() != boot_cpu);
+ for (i = 0; i < OBJECT_NR; i++) {
+ struct llist_head *from;
+ struct llist_head *to;
+
+ from = &pool[i].boot_pool;
+ to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+ move_llist(to, from);
+ }
+}
+
+#define B2KB(B) ((B) / 1024)
+
+/*
+ * Should be called after setup_per_cpu_areas() and before no non-boot
+ * CPUs have been on.
+ */
+void __init dept_init(void)
+{
+ size_t mem_total = 0;
+
+ local_irq_disable();
+ dept_per_cpu_ready = 1;
+ migrate_per_cpu_pool();
+ local_irq_enable();
+
+#define OBJECT(id, nr) mem_total += sizeof(struct dept_##id) * nr;
+ #include "dept_object.h"
+#undef OBJECT
+#define HASH(id, bits) mem_total += sizeof(struct hlist_head) * (1UL << bits);
+ #include "dept_hash.h"
+#undef HASH
+
+ pr_info("DEPendency Tracker: Copyright (c) 2020 LG Electronics, Inc., Byungchul Park\n");
+ pr_info("... DEPT_MAX_STACK_ENTRY: %d\n", DEPT_MAX_STACK_ENTRY);
+ pr_info("... DEPT_MAX_WAIT_HIST : %d\n", DEPT_MAX_WAIT_HIST);
+ pr_info("... DEPT_MAX_ECXT_HELD : %d\n", DEPT_MAX_ECXT_HELD);
+ pr_info("... DEPT_MAX_SUBCLASSES : %d\n", DEPT_MAX_SUBCLASSES);
+#define OBJECT(id, nr) \
+ pr_info("... memory used by %s: %zu KB\n", \
+ #id, B2KB(sizeof(struct dept_##id) * nr));
+ #include "dept_object.h"
+#undef OBJECT
+#define HASH(id, bits) \
+ pr_info("... hash list head used by %s: %zu KB\n", \
+ #id, B2KB(sizeof(struct hlist_head) * (1UL << bits)));
+ #include "dept_hash.h"
+#undef HASH
+ pr_info("... total memory used by objects and hashs: %zu KB\n", B2KB(mem_total));
+ pr_info("... per task memory footprint: %zu bytes\n", sizeof(struct dept_task));
+}
diff --git a/kernel/dependency/dept_hash.h b/kernel/dependency/dept_hash.h
new file mode 100644
index 00000000..fd85aab
--- /dev/null
+++ b/kernel/dependency/dept_hash.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * HASH(id, bits)
+ *
+ * id : Id for the object of struct dept_##id.
+ * bits: 1UL << bits is the hash table size.
+ */
+
+HASH(dep, 12)
+HASH(class, 12)
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
new file mode 100644
index 00000000..0b7eb16
--- /dev/null
+++ b/kernel/dependency/dept_object.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * OBJECT(id, nr)
+ *
+ * id: Id for the object of struct dept_##id.
+ * nr: # of the object that should be kept in the pool.
+ */
+
+OBJECT(dep, 1024 * 8)
+OBJECT(class, 1024 * 8)
+OBJECT(stack, 1024 * 32)
+OBJECT(ecxt, 1024 * 16)
+OBJECT(wait, 1024 * 32)
diff --git a/kernel/exit.c b/kernel/exit.c
index 15dc2ec..0f48752 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -916,6 +916,7 @@ void __noreturn do_exit(long code)
exit_tasks_rcu_finish();

lockdep_free_task(tsk);
+ dept_task_exit(tsk);
do_task_dead();
}

diff --git a/kernel/fork.c b/kernel/fork.c
index 9f7fe35..1d33fc3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -97,6 +97,7 @@
#include <linux/io_uring.h>
#include <linux/bpf.h>
#include <linux/stackprotector.h>
+#include <linux/dept.h>

#include <asm/pgalloc.h>
#include <linux/uaccess.h>
@@ -2219,6 +2220,7 @@ static __latent_entropy struct task_struct *copy_process(
#ifdef CONFIG_LOCKDEP
lockdep_init_task(p);
#endif
+ dept_task_init(p);

#ifdef CONFIG_DEBUG_MUTEXES
p->blocked_on = NULL; /* not blocked yet */
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 48568a0..2882ea2 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -1194,6 +1194,7 @@ static void free_module(struct module *mod)

/* Free lock-classes; relies on the preceding sync_rcu(). */
lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);
+ dept_free_range(mod->data_layout.base, mod->data_layout.size);

/* Finally, free the core (containing the module structure) */
module_memfree(mod->core_layout.base);
@@ -2893,6 +2894,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
free_module:
/* Free lock-classes; relies on the preceding sync_rcu() */
lockdep_free_key_range(mod->data_layout.base, mod->data_layout.size);
+ dept_free_range(mod->data_layout.base, mod->data_layout.size);

module_deallocate(mod, info);
free_copy:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 25b582b..0dc066c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -64,6 +64,7 @@
#include <linux/vtime.h>
#include <linux/wait_api.h>
#include <linux/workqueue_api.h>
+#include <linux/dept.h>

#ifdef CONFIG_PREEMPT_DYNAMIC
# ifdef CONFIG_GENERIC_ENTRY
@@ -4070,6 +4071,7 @@ bool ttwu_state_match(struct task_struct *p, unsigned int state, int *success)
int cpu, success = 0;

preempt_disable();
+ dept_stage_event(p, _RET_IP_);
if (p == current) {
/*
* We're waking current, this means 'p->on_rq' and 'task_cpu(p)
@@ -6446,6 +6448,12 @@ static void __sched notrace __schedule(unsigned int sched_mode)
rq = cpu_rq(cpu);
prev = rq->curr;

+ prev_state = READ_ONCE(prev->__state);
+ if (sched_mode != SM_PREEMPT && prev_state & TASK_NORMAL)
+ dept_request_event_wait_commit();
+
+ dept_sched_enter();
+
schedule_debug(prev, !!sched_mode);

if (sched_feat(HRTICK) || sched_feat(HRTICK_DL))
@@ -6560,6 +6568,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
__balance_callbacks(rq);
raw_spin_rq_unlock_irq(rq);
}
+ dept_sched_exit();
}

void __noreturn do_task_dead(void)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 881c3f8..611fd01 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1255,6 +1255,33 @@ config DEBUG_PREEMPT

menu "Lock Debugging (spinlocks, mutexes, etc...)"

+config DEPT
+ bool "Dependency tracking (EXPERIMENTAL)"
+ depends on DEBUG_KERNEL && LOCK_DEBUGGING_SUPPORT
+ select DEBUG_SPINLOCK
+ select DEBUG_MUTEXES
+ select DEBUG_RT_MUTEXES if RT_MUTEXES
+ select DEBUG_RWSEMS
+ select DEBUG_WW_MUTEX_SLOWPATH
+ select DEBUG_LOCK_ALLOC
+ select TRACE_IRQFLAGS
+ select STACKTRACE
+ select FRAME_POINTER if !MIPS && !PPC && !ARM && !S390 && !MICROBLAZE && !ARC && !X86
+ select KALLSYMS
+ select KALLSYMS_ALL
+ select PROVE_LOCKING
+ default n
+ help
+ Check dependencies between wait and event and report it if
+ deadlock possibility has been detected. Multiple reports are
+ allowed if there are more than a single problem.
+
+ This feature is considered EXPERIMENTAL that might produce
+ false positive reports because new dependencies start to be
+ tracked, that have never been tracked before. It's worth
+ noting, to mitigate the impact by the false positives, multi
+ reporting has been supported.
+
config LOCK_DEBUGGING_SUPPORT
bool
depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 8d24279..cd89138 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1398,6 +1398,8 @@ static void reset_locks(void)
local_irq_disable();
lockdep_free_key_range(&ww_lockdep.acquire_key, 1);
lockdep_free_key_range(&ww_lockdep.mutex_key, 1);
+ dept_free_range(&ww_lockdep.acquire_key, 1);
+ dept_free_range(&ww_lockdep.mutex_key, 1);

I1(A); I1(B); I1(C); I1(D);
I1(X1); I1(X2); I1(Y1); I1(Y2); I1(Z1); I1(Z2);
--
1.9.1

2023-01-09 04:07:30

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 13/23] dept: Distinguish each work from another

Workqueue already provides concurrency control. By that, any wait in a
work doesn't prevents events in other works with the control enabled.
Thus, each work would better be considered a different context.

So let Dept assign a different context id to each work.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 2 ++
kernel/dependency/dept.c | 10 ++++++++++
kernel/workqueue.c | 3 +++
3 files changed, 15 insertions(+)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 777c837..625c645 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -515,6 +515,7 @@ struct dept_task {
extern void dept_sched_enter(void);
extern void dept_sched_exit(void);
extern void dept_kernel_enter(void);
+extern void dept_work_enter(void);

static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
{
@@ -565,6 +566,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
#define dept_sched_enter() do { } while (0)
#define dept_sched_exit() do { } while (0)
#define dept_kernel_enter() do { } while (0)
+#define dept_work_enter() do { } while (0)
#define dept_ecxt_enter_nokeep(m) do { } while (0)
#define dept_key_init(k) do { (void)(k); } while (0)
#define dept_key_destroy(k) do { (void)(k); } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index e98617b..2f215c2 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -1954,6 +1954,16 @@ void dept_hardirqs_off(unsigned long ip)
}
EXPORT_SYMBOL_GPL(dept_hardirqs_off);

+/*
+ * Assign a different context id to each work.
+ */
+void dept_work_enter(void)
+{
+ struct dept_task *dt = dept_task();
+
+ dt->cxt_id[DEPT_CXT_PROCESS] += 1UL << DEPT_CXTS_NR;
+}
+
void dept_kernel_enter(void)
{
struct dept_task *dt = dept_task();
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 07895de..69c4f46 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -51,6 +51,7 @@
#include <linux/sched/isolation.h>
#include <linux/nmi.h>
#include <linux/kvm_para.h>
+#include <linux/dept.h>

#include "workqueue_internal.h"

@@ -2199,6 +2200,8 @@ static void process_one_work(struct worker *worker, struct work_struct *work)

lockdep_copy_map(&lockdep_map, &work->lockdep_map);
#endif
+ dept_work_enter();
+
/* ensure we're on the correct CPU */
WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
raw_smp_processor_id() != pool->cpu);
--
1.9.1

2023-01-09 04:07:40

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 08/23] dept: Apply sdt_might_sleep_strong() to PG_{locked,writeback} wait

Makes Dept able to track dependencies by PG_{locked,writeback} waits.

Signed-off-by: Byungchul Park <[email protected]>
---
mm/filemap.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index c4d4ace..b013a5b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -42,6 +42,7 @@
#include <linux/ramfs.h>
#include <linux/page_idle.h>
#include <linux/migrate.h>
+#include <linux/dept_sdt.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include "internal.h"
@@ -1215,6 +1216,9 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
/* How many times do we accept lock stealing from under a waiter? */
int sysctl_page_lock_unfairness = 5;

+static struct dept_map __maybe_unused PG_locked_map = DEPT_MAP_INITIALIZER(PG_locked_map, NULL);
+static struct dept_map __maybe_unused PG_writeback_map = DEPT_MAP_INITIALIZER(PG_writeback_map, NULL);
+
static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
int state, enum behavior behavior)
{
@@ -1226,6 +1230,11 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
unsigned long pflags;
bool in_thrashing;

+ if (bit_nr == PG_locked)
+ sdt_might_sleep_strong(&PG_locked_map);
+ else if (bit_nr == PG_writeback)
+ sdt_might_sleep_strong(&PG_writeback_map);
+
if (bit_nr == PG_locked &&
!folio_test_uptodate(folio) && folio_test_workingset(folio)) {
delayacct_thrashing_start(&in_thrashing);
@@ -1327,6 +1336,8 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
*/
finish_wait(q, wait);

+ sdt_might_sleep_finish();
+
if (thrashing) {
delayacct_thrashing_end(&in_thrashing);
psi_memstall_leave(&pflags);
--
1.9.1

2023-01-09 04:07:48

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 19/23] dept: Apply timeout consideration to swait

Now that CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT was introduced, apply the
consideration to swait, assuming an input 'ret' in ___swait_event()
macro is used as a timeout value.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/swait.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 1304209..339e5f2 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -162,7 +162,10 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq)
struct swait_queue __wait; \
long __ret = ret; \
\
- sdt_might_sleep_weak(NULL); \
+ if (!__ret || __ret == MAX_SCHEDULE_TIMEOUT) \
+ sdt_might_sleep_weak(NULL); \
+ else \
+ sdt_might_sleep_weak_timeout(NULL); \
INIT_LIST_HEAD(&__wait.task_list); \
for (;;) { \
long __int = prepare_to_swait_event(&wq, &__wait, state);\
--
1.9.1

2023-01-09 04:07:52

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 20/23] dept: Apply timeout consideration to waitqueue wait

Now that CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT was introduced, apply the
consideration to waitqueue wait, assuming an input 'ret' in
___wait_event() macro is used as a timeout value.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/wait.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index ede466c..87888ee 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -304,7 +304,10 @@ static inline void wake_up_pollfree(struct wait_queue_head *wq_head)
struct wait_queue_entry __wq_entry; \
long __ret = ret; /* explicit shadow */ \
\
- sdt_might_sleep_weak(NULL); \
+ if (!__ret || __ret == MAX_SCHEDULE_TIMEOUT) \
+ sdt_might_sleep_weak(NULL); \
+ else \
+ sdt_might_sleep_weak_timeout(NULL); \
init_wait_entry(&__wq_entry, exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);\
--
1.9.1

2023-01-09 04:07:52

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 16/23] dept: Apply sdt_might_sleep_strong() to dma fence wait

Makes Dept able to track dma fence waits.

Signed-off-by: Byungchul Park <[email protected]>
---
drivers/dma-buf/dma-fence.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 406b4e2..dd190cf 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -16,6 +16,7 @@
#include <linux/dma-fence.h>
#include <linux/sched/signal.h>
#include <linux/seq_file.h>
+#include <linux/dept_sdt.h>

#define CREATE_TRACE_POINTS
#include <trace/events/dma_fence.h>
@@ -782,6 +783,7 @@ struct default_wait_cb {
cb.task = current;
list_add(&cb.base.node, &fence->cb_list);

+ sdt_might_sleep_strong(NULL);
while (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) && ret > 0) {
if (intr)
__set_current_state(TASK_INTERRUPTIBLE);
@@ -795,6 +797,7 @@ struct default_wait_cb {
if (ret > 0 && intr && signal_pending(current))
ret = -ERESTARTSYS;
}
+ sdt_might_sleep_finish();

if (!list_empty(&cb.base.node))
list_del(&cb.base.node);
@@ -884,6 +887,7 @@ struct default_wait_cb {
}
}

+ sdt_might_sleep_strong(NULL);
while (ret > 0) {
if (intr)
set_current_state(TASK_INTERRUPTIBLE);
@@ -898,6 +902,7 @@ struct default_wait_cb {
if (ret > 0 && intr && signal_pending(current))
ret = -ERESTARTSYS;
}
+ sdt_might_sleep_finish();

__set_current_state(TASK_RUNNING);

--
1.9.1

2023-01-09 04:07:52

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 14/23] dept: Add a mechanism to refill the internal memory pools on running out

Dept engine works in a constrained environment. For example, Dept cannot
make use of dynamic allocation e.g. kmalloc(). So Dept has been using
static pools to keep memory chunks Dept uses.

However, Dept would barely work once any of the pools gets run out. So
implemented a mechanism for the refill on the lack by any chance, using
irq work and workqueue that fits on the contrained environment.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 19 ++++++--
kernel/dependency/dept.c | 104 ++++++++++++++++++++++++++++++++++------
kernel/dependency/dept_object.h | 10 ++--
kernel/dependency/dept_proc.c | 8 ++--
4 files changed, 112 insertions(+), 29 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 625c645..21ecefc 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -336,9 +336,19 @@ struct dept_pool {
size_t obj_sz;

/*
- * the number of the static array
+ * the remaining number of the object in spool
*/
- atomic_t obj_nr;
+ int obj_nr;
+
+ /*
+ * the number of the object in spool
+ */
+ int tot_nr;
+
+ /*
+ * accumulated amount of memory used by the object in byte
+ */
+ atomic_t acc_sz;

/*
* offset of ->pool_node
@@ -348,9 +358,10 @@ struct dept_pool {
/*
* pointer to the pool
*/
- void *spool;
+ void *spool; /* static pool */
+ void *rpool; /* reserved pool */
struct llist_head boot_pool;
- struct llist_head __percpu *lpool;
+ struct llist_head __percpu *lpool; /* local pool */
};

struct dept_ecxt_held {
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 2f215c2..11d4f75 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -73,6 +73,9 @@
#include <linux/hash.h>
#include <linux/dept.h>
#include <linux/utsname.h>
+#include <linux/workqueue.h>
+#include <linux/irq_work.h>
+#include <linux/vmalloc.h>
#include "dept_internal.h"

static int dept_stop;
@@ -121,10 +124,12 @@
WARN(1, "DEPT_STOP: " s); \
})

-#define DEPT_INFO_ONCE(s...) pr_warn_once("DEPT_INFO_ONCE: " s)
+#define DEPT_INFO_ONCE(s...) pr_warn_once("DEPT_INFO_ONCE: " s)
+#define DEPT_INFO(s...) pr_warn("DEPT_INFO: " s)

static arch_spinlock_t dept_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
static arch_spinlock_t stage_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+static arch_spinlock_t dept_pool_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;

/*
* DEPT internal engine should be careful in using outside functions
@@ -263,6 +268,7 @@ static inline bool valid_key(struct dept_key *k)

#define OBJECT(id, nr) \
static struct dept_##id spool_##id[nr]; \
+static struct dept_##id rpool_##id[nr]; \
static DEFINE_PER_CPU(struct llist_head, lpool_##id);
#include "dept_object.h"
#undef OBJECT
@@ -271,14 +277,70 @@ struct dept_pool dept_pool[OBJECT_NR] = {
#define OBJECT(id, nr) { \
.name = #id, \
.obj_sz = sizeof(struct dept_##id), \
- .obj_nr = ATOMIC_INIT(nr), \
+ .obj_nr = nr, \
+ .tot_nr = nr, \
+ .acc_sz = ATOMIC_INIT(sizeof(spool_##id) + sizeof(rpool_##id)), \
.node_off = offsetof(struct dept_##id, pool_node), \
.spool = spool_##id, \
+ .rpool = rpool_##id, \
.lpool = &lpool_##id, },
#include "dept_object.h"
#undef OBJECT
};

+static void dept_wq_work_fn(struct work_struct *work)
+{
+ int i;
+
+ for (i = 0; i < OBJECT_NR; i++) {
+ struct dept_pool *p = dept_pool + i;
+ int sz = p->tot_nr * p->obj_sz;
+ void *rpool;
+ bool need;
+
+ arch_spin_lock(&dept_pool_spin);
+ need = !p->rpool;
+ arch_spin_unlock(&dept_pool_spin);
+
+ if (!need)
+ continue;
+
+ rpool = vmalloc(sz);
+
+ if (!rpool) {
+ DEPT_STOP("Failed to extend internal resources.\n");
+ break;
+ }
+
+ arch_spin_lock(&dept_pool_spin);
+ if (!p->rpool) {
+ p->rpool = rpool;
+ rpool = NULL;
+ atomic_add(sz, &p->acc_sz);
+ }
+ arch_spin_unlock(&dept_pool_spin);
+
+ if (rpool)
+ vfree(rpool);
+ else
+ DEPT_INFO("Dept object(%s) just got refilled successfully.\n", p->name);
+ }
+}
+
+static DECLARE_WORK(dept_wq_work, dept_wq_work_fn);
+
+static void dept_irq_work_fn(struct irq_work *w)
+{
+ schedule_work(&dept_wq_work);
+}
+
+static DEFINE_IRQ_WORK(dept_irq_work, dept_irq_work_fn);
+
+static void request_rpool_refill(void)
+{
+ irq_work_queue(&dept_irq_work);
+}
+
/*
* Can use llist no matter whether CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is
* enabled or not because NMI and other contexts in the same CPU never
@@ -314,19 +376,31 @@ static void *from_pool(enum object_t t)
/*
* Try static pool.
*/
- if (atomic_read(&p->obj_nr) > 0) {
- int idx = atomic_dec_return(&p->obj_nr);
+ arch_spin_lock(&dept_pool_spin);
+
+ if (!p->obj_nr) {
+ p->spool = p->rpool;
+ p->obj_nr = p->rpool ? p->tot_nr : 0;
+ p->rpool = NULL;
+ request_rpool_refill();
+ }
+
+ if (p->obj_nr) {
+ void *ret;
+
+ p->obj_nr--;
+ ret = p->spool + (p->obj_nr * p->obj_sz);
+ arch_spin_unlock(&dept_pool_spin);

- if (idx >= 0)
- return p->spool + (idx * p->obj_sz);
+ return ret;
}
+ arch_spin_unlock(&dept_pool_spin);

- DEPT_INFO_ONCE("---------------------------------------------\n"
- " Some of Dept internal resources are run out.\n"
- " Dept might still work if the resources get freed.\n"
- " However, the chances are Dept will suffer from\n"
- " the lack from now. Needs to extend the internal\n"
- " resource pools. Ask [email protected]\n");
+ DEPT_INFO("------------------------------------------\n"
+ " Dept object(%s) is run out.\n"
+ " Dept is trying to refill the object.\n"
+ " Nevertheless, if it fails, Dept will stop.\n",
+ p->name);
return NULL;
}

@@ -2971,8 +3045,8 @@ void __init dept_init(void)
pr_info("... DEPT_MAX_ECXT_HELD : %d\n", DEPT_MAX_ECXT_HELD);
pr_info("... DEPT_MAX_SUBCLASSES : %d\n", DEPT_MAX_SUBCLASSES);
#define OBJECT(id, nr) \
- pr_info("... memory used by %s: %zu KB\n", \
- #id, B2KB(sizeof(struct dept_##id) * nr));
+ pr_info("... memory initially used by %s: %zu KB\n", \
+ #id, B2KB(sizeof(spool_##id) + sizeof(rpool_##id)));
#include "dept_object.h"
#undef OBJECT
#define HASH(id, bits) \
@@ -2980,6 +3054,6 @@ void __init dept_init(void)
#id, B2KB(sizeof(struct hlist_head) * (1UL << bits)));
#include "dept_hash.h"
#undef HASH
- pr_info("... total memory used by objects and hashs: %zu KB\n", B2KB(mem_total));
+ pr_info("... total memory initially used by objects and hashs: %zu KB\n", B2KB(mem_total));
pr_info("... per task memory footprint: %zu bytes\n", sizeof(struct dept_task));
}
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
index 0b7eb16..4f936ad 100644
--- a/kernel/dependency/dept_object.h
+++ b/kernel/dependency/dept_object.h
@@ -6,8 +6,8 @@
* nr: # of the object that should be kept in the pool.
*/

-OBJECT(dep, 1024 * 8)
-OBJECT(class, 1024 * 8)
-OBJECT(stack, 1024 * 32)
-OBJECT(ecxt, 1024 * 16)
-OBJECT(wait, 1024 * 32)
+OBJECT(dep, 1024 * 4 * 2)
+OBJECT(class, 1024 * 4)
+OBJECT(stack, 1024 * 4 * 8)
+OBJECT(ecxt, 1024 * 4 * 2)
+OBJECT(wait, 1024 * 4 * 4)
diff --git a/kernel/dependency/dept_proc.c b/kernel/dependency/dept_proc.c
index 7d61dfb..f07a512 100644
--- a/kernel/dependency/dept_proc.c
+++ b/kernel/dependency/dept_proc.c
@@ -73,12 +73,10 @@ static int dept_stats_show(struct seq_file *m, void *v)
{
int r;

- seq_puts(m, "Availability in the static pools:\n\n");
+ seq_puts(m, "Accumulated amount of memory used by pools:\n\n");
#define OBJECT(id, nr) \
- r = atomic_read(&dept_pool[OBJECT_##id].obj_nr); \
- if (r < 0) \
- r = 0; \
- seq_printf(m, "%s\t%d/%d(%d%%)\n", #id, r, nr, (r * 100) / (nr));
+ r = atomic_read(&dept_pool[OBJECT_##id].acc_sz); \
+ seq_printf(m, "%s\t%d KB\n", #id, r / 1024);
#include "dept_object.h"
#undef OBJECT

--
1.9.1

2023-01-09 04:07:52

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 04/23] dept: Add lock dependency tracker APIs

Wrapped the base APIs for easier annotation on typical lock.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept_ldt.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
create mode 100644 include/linux/dept_ldt.h

diff --git a/include/linux/dept_ldt.h b/include/linux/dept_ldt.h
new file mode 100644
index 00000000..b8c00bc
--- /dev/null
+++ b/include/linux/dept_ldt.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Lock Dependency Tracker
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_LDT_H
+#define __LINUX_DEPT_LDT_H
+
+#include <linux/dept.h>
+
+#ifdef CONFIG_DEPT
+#define LDT_EVT_L 1UL
+#define LDT_EVT_R 2UL
+#define LDT_EVT_W 4UL
+#define LDT_EVT_RW (LDT_EVT_R | LDT_EVT_W)
+#define LDT_EVT_ALL (LDT_EVT_L | LDT_EVT_RW)
+
+#define ldt_init(m, k, su, n) dept_map_init(m, k, su, n)
+#define ldt_lock(m, sl, t, n, i) \
+ do { \
+ if (n) \
+ dept_ecxt_enter_nokeep(m); \
+ else if (t) \
+ dept_ecxt_enter(m, LDT_EVT_L, i, "trylock", "unlock", sl);\
+ else { \
+ dept_wait(m, LDT_EVT_L, i, "lock", sl); \
+ dept_ecxt_enter(m, LDT_EVT_L, i, "lock", "unlock", sl);\
+ } \
+ } while (0)
+
+#define ldt_rlock(m, sl, t, n, i) \
+ do { \
+ if (n) \
+ dept_ecxt_enter_nokeep(m); \
+ else if (t) \
+ dept_ecxt_enter(m, LDT_EVT_R, i, "read_trylock", "read_unlock", sl);\
+ else { \
+ dept_wait(m, LDT_EVT_W, i, "read_lock", sl); \
+ dept_ecxt_enter(m, LDT_EVT_R, i, "read_lock", "read_unlock", sl);\
+ } \
+ } while (0)
+
+#define ldt_wlock(m, sl, t, n, i) \
+ do { \
+ if (n) \
+ dept_ecxt_enter_nokeep(m); \
+ else if (t) \
+ dept_ecxt_enter(m, LDT_EVT_W, i, "write_trylock", "write_unlock", sl);\
+ else { \
+ dept_wait(m, LDT_EVT_RW, i, "write_lock", sl); \
+ dept_ecxt_enter(m, LDT_EVT_W, i, "write_lock", "write_unlock", sl);\
+ } \
+ } while (0)
+
+#define ldt_unlock(m, i) dept_ecxt_exit(m, LDT_EVT_ALL, i)
+
+#define ldt_downgrade(m, i) \
+ do { \
+ if (dept_ecxt_holding(m, LDT_EVT_W)) \
+ dept_map_ecxt_modify(m, LDT_EVT_W, NULL, LDT_EVT_R, i, "downgrade", "read_unlock", -1);\
+ } while (0)
+
+#define ldt_set_class(m, n, k, sl, i) dept_map_ecxt_modify(m, LDT_EVT_ALL, k, 0UL, i, "lock_set_class", "(any)unlock", sl)
+#else /* !CONFIG_DEPT */
+#define ldt_init(m, k, su, n) do { (void)(k); } while (0)
+#define ldt_lock(m, sl, t, n, i) do { } while (0)
+#define ldt_rlock(m, sl, t, n, i) do { } while (0)
+#define ldt_wlock(m, sl, t, n, i) do { } while (0)
+#define ldt_unlock(m, i) do { } while (0)
+#define ldt_downgrade(m, i) do { } while (0)
+#define ldt_set_class(m, n, k, sl, i) do { } while (0)
+#endif
+#endif /* __LINUX_DEPT_LDT_H */
--
1.9.1

2023-01-09 04:07:56

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 11/23] dept: Apply sdt_might_sleep_weak() to hashed-waitqueue wait

Makes Dept able to track dependencies by hashed-waitqueue waits, but
weakly.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/wait_bit.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
index 7725b75..bad30ba 100644
--- a/include/linux/wait_bit.h
+++ b/include/linux/wait_bit.h
@@ -6,6 +6,7 @@
* Linux wait-bit related types and methods:
*/
#include <linux/wait.h>
+#include <linux/dept_sdt.h>

struct wait_bit_key {
void *flags;
@@ -246,6 +247,7 @@ struct wait_bit_queue_entry {
struct wait_bit_queue_entry __wbq_entry; \
long __ret = ret; /* explicit shadow */ \
\
+ sdt_might_sleep_weak(NULL); \
init_wait_var_entry(&__wbq_entry, var, \
exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
@@ -263,6 +265,7 @@ struct wait_bit_queue_entry {
cmd; \
} \
finish_wait(__wq_head, &__wbq_entry.wq_entry); \
+ sdt_might_sleep_finish(); \
__out: __ret; \
})

--
1.9.1

2023-01-09 04:07:59

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 22/23] dept: Apply timeout consideration to dma fence wait

Now that CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT was introduced, apply the
consideration to dma fence wait.

Signed-off-by: Byungchul Park <[email protected]>
---
drivers/dma-buf/dma-fence.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index dd190cf..ee9b350 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -783,7 +783,10 @@ struct default_wait_cb {
cb.task = current;
list_add(&cb.base.node, &fence->cb_list);

- sdt_might_sleep_strong(NULL);
+ if (timeout == MAX_SCHEDULE_TIMEOUT)
+ sdt_might_sleep_strong(NULL);
+ else
+ sdt_might_sleep_strong_timeout(NULL);
while (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) && ret > 0) {
if (intr)
__set_current_state(TASK_INTERRUPTIBLE);
@@ -887,7 +890,10 @@ struct default_wait_cb {
}
}

- sdt_might_sleep_strong(NULL);
+ if (timeout == MAX_SCHEDULE_TIMEOUT)
+ sdt_might_sleep_strong(NULL);
+ else
+ sdt_might_sleep_strong_timeout(NULL);
while (ret > 0) {
if (intr)
set_current_state(TASK_INTERRUPTIBLE);
--
1.9.1

2023-01-09 04:08:02

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 09/23] dept: Apply sdt_might_sleep_weak() to swait

Makes Dept able to track dependencies by swaits, but weakly.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/swait.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/include/linux/swait.h b/include/linux/swait.h
index 6a8c22b..1304209 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -6,6 +6,7 @@
#include <linux/stddef.h>
#include <linux/spinlock.h>
#include <linux/wait.h>
+#include <linux/dept_sdt.h>
#include <asm/current.h>

/*
@@ -161,6 +162,7 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq)
struct swait_queue __wait; \
long __ret = ret; \
\
+ sdt_might_sleep_weak(NULL); \
INIT_LIST_HEAD(&__wait.task_list); \
for (;;) { \
long __int = prepare_to_swait_event(&wq, &__wait, state);\
@@ -176,6 +178,7 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq)
cmd; \
} \
finish_swait(&wq, &__wait); \
+ sdt_might_sleep_finish(); \
__out: __ret; \
})

--
1.9.1

2023-01-09 04:08:18

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 03/23] dept: Add single event dependency tracker APIs

Wrapped the base APIs for easier annotation on wait and event. Start
with supporting waiters on each single event. More general support for
multiple events is a future work. Do more when the need arises.

How to annotate (the simplest way):

1. Initaialize a map for the interesting wait.

/*
* Recommand to place along with the wait instance.
*/
struct dept_map my_wait;

/*
* Recommand to place in the initialization code.
*/
sdt_map_init(&my_wait);

2. Place the following at the wait code.

sdt_wait(&my_wait);

3. Place the following at the event code.

sdt_event(&my_wait);

That's it!

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept_sdt.h | 72 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 72 insertions(+)
create mode 100644 include/linux/dept_sdt.h

diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
new file mode 100644
index 00000000..2644e77
--- /dev/null
+++ b/include/linux/dept_sdt.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Single-event Dependency Tracker
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_SDT_H
+#define __LINUX_DEPT_SDT_H
+
+#include <linux/dept.h>
+
+#ifdef CONFIG_DEPT
+#define sdt_map_init(m) \
+ do { \
+ static struct dept_key __key; \
+ dept_map_init(m, &__key, 0, #m); \
+ } while (0)
+
+#define sdt_map_init_key(m, k) dept_map_init(m, k, 0, #m)
+
+#define sdt_wait(m) \
+ do { \
+ dept_request_event(m); \
+ dept_wait(m, 1UL, _THIS_IP_, __func__, 0); \
+ } while (0)
+
+/*
+ * sdt_might_sleep() and its family will be committed in __schedule()
+ * when it actually gets to __schedule(). Both dept_request_event() and
+ * dept_wait() will be performed on the commit.
+ */
+
+/*
+ * Use the code location as the class key if an explicit map is not used.
+ */
+#define sdt_might_sleep_strong(m) \
+ do { \
+ struct dept_map *__m = m; \
+ static struct dept_key __key; \
+ dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, true);\
+ } while (0)
+
+/*
+ * Use the code location as the class key if an explicit map is not used.
+ */
+#define sdt_might_sleep_weak(m) \
+ do { \
+ struct dept_map *__m = m; \
+ static struct dept_key __key; \
+ dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, false);\
+ } while (0)
+
+#define sdt_might_sleep_finish() dept_clean_stage()
+
+#define sdt_ecxt_enter(m) dept_ecxt_enter(m, 1UL, _THIS_IP_, "start", "event", 0)
+#define sdt_event(m) dept_event(m, 1UL, _THIS_IP_, __func__)
+#define sdt_ecxt_exit(m) dept_ecxt_exit(m, 1UL, _THIS_IP_)
+#else /* !CONFIG_DEPT */
+#define sdt_map_init(m) do { } while (0)
+#define sdt_map_init_key(m, k) do { (void)(k); } while (0)
+#define sdt_wait(m) do { } while (0)
+#define sdt_might_sleep_strong(m) do { } while (0)
+#define sdt_might_sleep_weak(m) do { } while (0)
+#define sdt_might_sleep_finish() do { } while (0)
+#define sdt_ecxt_enter(m) do { } while (0)
+#define sdt_event(m) do { } while (0)
+#define sdt_ecxt_exit(m) do { } while (0)
+#endif
+#endif /* __LINUX_DEPT_SDT_H */
--
1.9.1

2023-01-09 04:08:19

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 17/23] dept: Track timeout waits separately with a new Kconfig

Waits with valid timeouts don't actually cause deadlocks. However, Dept
has been reporting the cases as well because it's worth informing the
circular dependency for some cases where, for example, timeout is used
to avoid a deadlock but not meant to be expired.

However, yes, there are also a lot of, even more, cases where timeout
is used for its clear purpose and meant to be expired.

Let Dept report these as an information rather than shouting DEADLOCK.
Plus, introduced CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT Kconfig to make it
optional so that any reports involving waits with timeouts can be turned
on/off depending on the purpose.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 15 ++++++++----
include/linux/dept_ldt.h | 6 ++---
include/linux/dept_sdt.h | 35 +++++++++++++++++++++++++---
kernel/dependency/dept.c | 60 ++++++++++++++++++++++++++++++++++++++++--------
lib/Kconfig.debug | 10 ++++++++
5 files changed, 107 insertions(+), 19 deletions(-)

diff --git a/include/linux/dept.h b/include/linux/dept.h
index 21ecefc..4387d5a 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -270,6 +270,11 @@ struct dept_wait {
* whether this wait is for commit in scheduler
*/
bool sched_sleep;
+
+ /*
+ * whether a timeout is set
+ */
+ bool timeout;
};
};
};
@@ -458,6 +463,7 @@ struct dept_task {
bool stage_sched_map;
const char *stage_w_fn;
unsigned long stage_ip;
+ bool stage_timeout;

/*
* the number of missing ecxts
@@ -496,6 +502,7 @@ struct dept_task {
.stage_sched_map = false, \
.stage_w_fn = NULL, \
.stage_ip = 0UL, \
+ .stage_timeout = false, \
.missing_ecxt = 0, \
.hardirqs_enabled = false, \
.softirqs_enabled = false, \
@@ -513,8 +520,8 @@ struct dept_task {
extern void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
extern void dept_map_copy(struct dept_map *to, struct dept_map *from);

-extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int sub_l);
-extern void dept_stage_wait(struct dept_map *m, struct dept_key *k, unsigned long ip, const char *w_fn, bool strong);
+extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int sub_l, bool timeout);
+extern void dept_stage_wait(struct dept_map *m, struct dept_key *k, unsigned long ip, const char *w_fn, bool strong, bool timeout);
extern void dept_request_event_wait_commit(void);
extern void dept_clean_stage(void);
extern void dept_stage_event(struct task_struct *t, unsigned long ip);
@@ -564,8 +571,8 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
#define dept_map_reinit(m, k, su, n) do { (void)(n); (void)(k); } while (0)
#define dept_map_copy(t, f) do { } while (0)

-#define dept_wait(m, w_f, ip, w_fn, sl) do { (void)(w_fn); } while (0)
-#define dept_stage_wait(m, k, ip, w_fn, s) do { (void)(k); (void)(w_fn); } while (0)
+#define dept_wait(m, w_f, ip, w_fn, sl, t) do { (void)(w_fn); } while (0)
+#define dept_stage_wait(m, k, ip, w_fn, s, t) do { (void)(k); (void)(w_fn); } while (0)
#define dept_request_event_wait_commit() do { } while (0)
#define dept_clean_stage() do { } while (0)
#define dept_stage_event(t, ip) do { } while (0)
diff --git a/include/linux/dept_ldt.h b/include/linux/dept_ldt.h
index b8c00bc..9a88aa3 100644
--- a/include/linux/dept_ldt.h
+++ b/include/linux/dept_ldt.h
@@ -27,7 +27,7 @@
else if (t) \
dept_ecxt_enter(m, LDT_EVT_L, i, "trylock", "unlock", sl);\
else { \
- dept_wait(m, LDT_EVT_L, i, "lock", sl); \
+ dept_wait(m, LDT_EVT_L, i, "lock", sl, false); \
dept_ecxt_enter(m, LDT_EVT_L, i, "lock", "unlock", sl);\
} \
} while (0)
@@ -39,7 +39,7 @@
else if (t) \
dept_ecxt_enter(m, LDT_EVT_R, i, "read_trylock", "read_unlock", sl);\
else { \
- dept_wait(m, LDT_EVT_W, i, "read_lock", sl); \
+ dept_wait(m, LDT_EVT_W, i, "read_lock", sl, false);\
dept_ecxt_enter(m, LDT_EVT_R, i, "read_lock", "read_unlock", sl);\
} \
} while (0)
@@ -51,7 +51,7 @@
else if (t) \
dept_ecxt_enter(m, LDT_EVT_W, i, "write_trylock", "write_unlock", sl);\
else { \
- dept_wait(m, LDT_EVT_RW, i, "write_lock", sl); \
+ dept_wait(m, LDT_EVT_RW, i, "write_lock", sl, false);\
dept_ecxt_enter(m, LDT_EVT_W, i, "write_lock", "write_unlock", sl);\
} \
} while (0)
diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
index 2644e77..94bfef1 100644
--- a/include/linux/dept_sdt.h
+++ b/include/linux/dept_sdt.h
@@ -24,7 +24,13 @@
#define sdt_wait(m) \
do { \
dept_request_event(m); \
- dept_wait(m, 1UL, _THIS_IP_, __func__, 0); \
+ dept_wait(m, 1UL, _THIS_IP_, __func__, 0, false); \
+ } while (0)
+
+#define sdt_wait_timeout(m) \
+ do { \
+ dept_request_event(m); \
+ dept_wait(m, 1UL, _THIS_IP_, __func__, 0, true); \
} while (0)

/*
@@ -40,7 +46,17 @@
do { \
struct dept_map *__m = m; \
static struct dept_key __key; \
- dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, true);\
+ dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, true, false);\
+ } while (0)
+
+/*
+ * Use the code location as the class key if an explicit map is not used.
+ */
+#define sdt_might_sleep_strong_timeout(m) \
+ do { \
+ struct dept_map *__m = m; \
+ static struct dept_key __key; \
+ dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, true, true);\
} while (0)

/*
@@ -50,7 +66,17 @@
do { \
struct dept_map *__m = m; \
static struct dept_key __key; \
- dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, false);\
+ dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, false, false);\
+ } while (0)
+
+/*
+ * Use the code location as the class key if an explicit map is not used.
+ */
+#define sdt_might_sleep_weak_timeout(m) \
+ do { \
+ struct dept_map *__m = m; \
+ static struct dept_key __key; \
+ dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, false, true);\
} while (0)

#define sdt_might_sleep_finish() dept_clean_stage()
@@ -62,8 +88,11 @@
#define sdt_map_init(m) do { } while (0)
#define sdt_map_init_key(m, k) do { (void)(k); } while (0)
#define sdt_wait(m) do { } while (0)
+#define sdt_wait_timeout(m) do { } while (0)
#define sdt_might_sleep_strong(m) do { } while (0)
#define sdt_might_sleep_weak(m) do { } while (0)
+#define sdt_might_sleep_strong_timeout(m) do { } while (0)
+#define sdt_might_sleep_weak_timeout(m) do { } while (0)
#define sdt_might_sleep_finish() do { } while (0)
#define sdt_ecxt_enter(m) do { } while (0)
#define sdt_event(m) do { } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 11d4f75..cd25995 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -739,6 +739,8 @@ static void print_diagram(struct dept_dep *d)
if (!irqf) {
print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc_n, fc->sub_id);
print_spc(spc, "[W] %s(%s:%d)\n", w_fn, tc_n, tc->sub_id);
+ if (w->timeout)
+ print_spc(spc, "--------------- >8 timeout ---------------\n");
print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc_n, fc->sub_id);
}
}
@@ -792,6 +794,24 @@ static void print_dep(struct dept_dep *d)

static void save_current_stack(int skip);

+static bool is_timeout_wait_circle(struct dept_class *c)
+{
+ struct dept_class *fc = c->bfs_parent;
+ struct dept_class *tc = c;
+
+ do {
+ struct dept_dep *d = lookup_dep(fc, tc);
+
+ if (d->wait->timeout)
+ return true;
+
+ tc = fc;
+ fc = fc->bfs_parent;
+ } while (tc != c);
+
+ return false;
+}
+
/*
* Print all classes in a circle.
*/
@@ -814,10 +834,14 @@ static void print_circle(struct dept_class *c)
pr_warn("summary\n");
pr_warn("---------------------------------------------------\n");

- if (fc == tc)
+ if (is_timeout_wait_circle(c)) {
+ pr_warn("NOT A DEADLOCK BUT A CIRCULAR DEPENDENCY\n");
+ pr_warn("CHECK IF THE TIMEOUT IS INTENDED\n\n");
+ } else if (fc == tc) {
pr_warn("*** AA DEADLOCK ***\n\n");
- else
+ } else {
pr_warn("*** DEADLOCK ***\n\n");
+ }

i = 0;
do {
@@ -1563,7 +1587,8 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
static atomic_t wgen = ATOMIC_INIT(1);

static void add_wait(struct dept_class *c, unsigned long ip,
- const char *w_fn, int sub_l, bool sched_sleep)
+ const char *w_fn, int sub_l, bool sched_sleep,
+ bool timeout)
{
struct dept_task *dt = dept_task();
struct dept_wait *w;
@@ -1583,6 +1608,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
w->wait_fn = w_fn;
w->wait_stack = get_current_stack();
w->sched_sleep = sched_sleep;
+ w->timeout = timeout;

cxt = cur_cxt();
if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
@@ -2315,7 +2341,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
*/
static void __dept_wait(struct dept_map *m, unsigned long w_f,
unsigned long ip, const char *w_fn, int sub_l,
- bool sched_sleep, bool sched_map)
+ bool sched_sleep, bool sched_map, bool timeout)
{
int e;

@@ -2338,7 +2364,7 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
if (!c)
continue;

- add_wait(c, ip, w_fn, sub_l, sched_sleep);
+ add_wait(c, ip, w_fn, sub_l, sched_sleep, timeout);
}
}

@@ -2380,7 +2406,8 @@ static void __dept_event(struct dept_map *m, unsigned long e_f,
}

void dept_wait(struct dept_map *m, unsigned long w_f,
- unsigned long ip, const char *w_fn, int sub_l)
+ unsigned long ip, const char *w_fn, int sub_l,
+ bool timeout)
{
struct dept_task *dt = dept_task();
unsigned long flags;
@@ -2388,6 +2415,11 @@ void dept_wait(struct dept_map *m, unsigned long w_f,
if (unlikely(!dept_working()))
return;

+#if !defined(CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT)
+ if (timeout)
+ return;
+#endif
+
if (dt->recursive)
return;

@@ -2396,14 +2428,15 @@ void dept_wait(struct dept_map *m, unsigned long w_f,

flags = dept_enter();

- __dept_wait(m, w_f, ip, w_fn, sub_l, false, false);
+ __dept_wait(m, w_f, ip, w_fn, sub_l, false, false, timeout);

dept_exit(flags);
}
EXPORT_SYMBOL_GPL(dept_wait);

void dept_stage_wait(struct dept_map *m, struct dept_key *k,
- unsigned long ip, const char *w_fn, bool strong)
+ unsigned long ip, const char *w_fn, bool strong,
+ bool timeout)
{
struct dept_task *dt = dept_task();
unsigned long flags;
@@ -2411,6 +2444,11 @@ void dept_stage_wait(struct dept_map *m, struct dept_key *k,
if (unlikely(!dept_working()))
return;

+#if !defined(CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT)
+ if (timeout)
+ return;
+#endif
+
if (m && m->nocheck)
return;

@@ -2455,6 +2493,7 @@ void dept_stage_wait(struct dept_map *m, struct dept_key *k,

dt->stage_w_fn = w_fn;
dt->stage_ip = ip;
+ dt->stage_timeout = timeout;
unlock:
arch_spin_unlock(&stage_spin);

@@ -2480,6 +2519,7 @@ void dept_clean_stage(void)
dt->stage_sched_map = false;
dt->stage_w_fn = NULL;
dt->stage_ip = 0UL;
+ dt->stage_timeout = false;
arch_spin_unlock(&stage_spin);

dept_exit_recursive(flags);
@@ -2497,6 +2537,7 @@ void dept_request_event_wait_commit(void)
unsigned long ip;
const char *w_fn;
bool sched_map;
+ bool timeout;

if (unlikely(!dept_working()))
return;
@@ -2519,6 +2560,7 @@ void dept_request_event_wait_commit(void)
w_fn = dt->stage_w_fn;
ip = dt->stage_ip;
sched_map = dt->stage_sched_map;
+ timeout = dt->stage_timeout;

/*
* Avoid zero wgen.
@@ -2526,7 +2568,7 @@ void dept_request_event_wait_commit(void)
wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
WRITE_ONCE(dt->stage_m.wgen, wg);

- __dept_wait(&dt->stage_m, 1UL, ip, w_fn, 0, true, sched_map);
+ __dept_wait(&dt->stage_m, 1UL, ip, w_fn, 0, true, sched_map, timeout);
exit:
dept_exit(flags);
}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 611fd01..912309b 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1282,6 +1282,16 @@ config DEPT
noting, to mitigate the impact by the false positives, multi
reporting has been supported.

+config DEPT_AGGRESSIVE_TIMEOUT_WAIT
+ bool "Aggressively track even timeout waits"
+ depends on DEPT
+ default n
+ help
+ Timeout wait doesn't contribute to a deadlock. However,
+ informing a circular dependency might be helpful for cases
+ that timeout is used to avoid a deadlock. Say N if you'd like
+ to avoid verbose reports.
+
config LOCK_DEBUGGING_SUPPORT
bool
depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
--
1.9.1

2023-01-09 04:08:19

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 18/23] dept: Apply timeout consideration to wait_for_completion()/complete()

Now that CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT was introduced, apply the
consideration to wait_for_completion()/complete().

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/completion.h | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 0408f6d..57a715f 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -11,6 +11,7 @@

#include <linux/swait.h>
#include <linux/dept_sdt.h>
+#include <linux/sched.h>

/*
* struct completion - structure used to maintain state for a "completion"
@@ -153,7 +154,10 @@ extern long raw_wait_for_completion_killable_timeout(
#define wait_for_completion_timeout(x, t) \
({ \
unsigned long __ret; \
- sdt_might_sleep_strong(NULL); \
+ if ((t) == MAX_SCHEDULE_TIMEOUT) \
+ sdt_might_sleep_strong(NULL); \
+ else \
+ sdt_might_sleep_strong_timeout(NULL); \
__ret = raw_wait_for_completion_timeout(x, t); \
sdt_might_sleep_finish(); \
__ret; \
@@ -161,7 +165,10 @@ extern long raw_wait_for_completion_killable_timeout(
#define wait_for_completion_io_timeout(x, t) \
({ \
unsigned long __ret; \
- sdt_might_sleep_strong(NULL); \
+ if ((t) == MAX_SCHEDULE_TIMEOUT) \
+ sdt_might_sleep_strong(NULL); \
+ else \
+ sdt_might_sleep_strong_timeout(NULL); \
__ret = raw_wait_for_completion_io_timeout(x, t); \
sdt_might_sleep_finish(); \
__ret; \
@@ -169,7 +176,10 @@ extern long raw_wait_for_completion_killable_timeout(
#define wait_for_completion_interruptible_timeout(x, t) \
({ \
long __ret; \
- sdt_might_sleep_strong(NULL); \
+ if ((t) == MAX_SCHEDULE_TIMEOUT) \
+ sdt_might_sleep_strong(NULL); \
+ else \
+ sdt_might_sleep_strong_timeout(NULL); \
__ret = raw_wait_for_completion_interruptible_timeout(x, t);\
sdt_might_sleep_finish(); \
__ret; \
@@ -177,7 +187,10 @@ extern long raw_wait_for_completion_killable_timeout(
#define wait_for_completion_killable_timeout(x, t) \
({ \
long __ret; \
- sdt_might_sleep_strong(NULL); \
+ if ((t) == MAX_SCHEDULE_TIMEOUT) \
+ sdt_might_sleep_strong(NULL); \
+ else \
+ sdt_might_sleep_strong_timeout(NULL); \
__ret = raw_wait_for_completion_killable_timeout(x, t); \
sdt_might_sleep_finish(); \
__ret; \
--
1.9.1

2023-01-09 04:08:27

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 15/23] locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread

cb92173d1f0 ("locking/lockdep, cpu/hotplug: Annotate AP thread") was
introduced to make lockdep_assert_cpus_held() work in AP thread.

However, the annotation is too strong for that purpose. We don't have to
use more than try lock annotation for that.

Furthermore, now that Dept was introduced, false positive alarms was
reported by that. Replaced it with try lock annotation.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6c0a92c..6a9b9c3 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -356,7 +356,7 @@ int lockdep_is_cpus_held(void)

static void lockdep_acquire_cpus_lock(void)
{
- rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_);
+ rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 1, _THIS_IP_);
}

static void lockdep_release_cpus_lock(void)
--
1.9.1

2023-01-09 04:08:38

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 12/23] dept: Distinguish each syscall context from another

It enters kernel mode on each syscall and each syscall handling should
be considered independently from the point of view of Dept. Otherwise,
Dept may wrongly track dependencies across different syscalls.

That might be a real dependency from user mode. However, now that Dept
just started to work, conservatively let Dept not track dependencies
across different syscalls.

Signed-off-by: Byungchul Park <[email protected]>
---
arch/arm64/kernel/syscall.c | 2 ++
arch/x86/entry/common.c | 4 +++
include/linux/dept.h | 39 +++++++++++++++-----------
kernel/dependency/dept.c | 67 +++++++++++++++++++++++----------------------
4 files changed, 63 insertions(+), 49 deletions(-)

diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index a5de47e..e26d0ca 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -7,6 +7,7 @@
#include <linux/ptrace.h>
#include <linux/randomize_kstack.h>
#include <linux/syscalls.h>
+#include <linux/dept.h>

#include <asm/daifflags.h>
#include <asm/debug-monitors.h>
@@ -105,6 +106,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
*/

local_daif_restore(DAIF_PROCCTX);
+ dept_kernel_enter();

if (flags & _TIF_MTE_ASYNC_FAULT) {
/*
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c28264..7cdd27a 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -19,6 +19,7 @@
#include <linux/nospec.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h>
+#include <linux/dept.h>

#ifdef CONFIG_XEN_PV
#include <xen/xen-ops.h>
@@ -72,6 +73,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)

__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
{
+ dept_kernel_enter();
add_random_kstack_offset();
nr = syscall_enter_from_user_mode(regs, nr);

@@ -120,6 +122,7 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
{
int nr = syscall_32_enter(regs);

+ dept_kernel_enter();
add_random_kstack_offset();
/*
* Subtlety here: if ptrace pokes something larger than 2^31-1 into
@@ -140,6 +143,7 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
int nr = syscall_32_enter(regs);
int res;

+ dept_kernel_enter();
add_random_kstack_offset();
/*
* This cannot use syscall_enter_from_user_mode() as it has to
diff --git a/include/linux/dept.h b/include/linux/dept.h
index f2a3057..777c837 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -25,11 +25,16 @@
#define DEPT_MAX_SUBCLASSES_USR (DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
#define DEPT_MAX_SUBCLASSES_CACHE 2

-#define DEPT_SIRQ 0
-#define DEPT_HIRQ 1
-#define DEPT_IRQS_NR 2
-#define DEPT_SIRQF (1UL << DEPT_SIRQ)
-#define DEPT_HIRQF (1UL << DEPT_HIRQ)
+enum {
+ DEPT_CXT_SIRQ = 0,
+ DEPT_CXT_HIRQ,
+ DEPT_CXT_IRQS_NR,
+ DEPT_CXT_PROCESS = DEPT_CXT_IRQS_NR,
+ DEPT_CXTS_NR
+};
+
+#define DEPT_SIRQF (1UL << DEPT_CXT_SIRQ)
+#define DEPT_HIRQF (1UL << DEPT_CXT_HIRQ)

struct dept_ecxt;
struct dept_iecxt {
@@ -94,8 +99,8 @@ struct dept_class {
/*
* for tracking IRQ dependencies
*/
- struct dept_iecxt iecxt[DEPT_IRQS_NR];
- struct dept_iwait iwait[DEPT_IRQS_NR];
+ struct dept_iecxt iecxt[DEPT_CXT_IRQS_NR];
+ struct dept_iwait iwait[DEPT_CXT_IRQS_NR];

/*
* classified by a map embedded in task_struct,
@@ -207,8 +212,8 @@ struct dept_ecxt {
/*
* where the IRQ-enabled happened
*/
- unsigned long enirq_ip[DEPT_IRQS_NR];
- struct dept_stack *enirq_stack[DEPT_IRQS_NR];
+ unsigned long enirq_ip[DEPT_CXT_IRQS_NR];
+ struct dept_stack *enirq_stack[DEPT_CXT_IRQS_NR];

/*
* where the event context started
@@ -252,8 +257,8 @@ struct dept_wait {
/*
* where the IRQ wait happened
*/
- unsigned long irq_ip[DEPT_IRQS_NR];
- struct dept_stack *irq_stack[DEPT_IRQS_NR];
+ unsigned long irq_ip[DEPT_CXT_IRQS_NR];
+ struct dept_stack *irq_stack[DEPT_CXT_IRQS_NR];

/*
* where the wait happened
@@ -406,19 +411,19 @@ struct dept_task {
int wait_hist_pos;

/*
- * sequential id to identify each IRQ context
+ * sequential id to identify each context
*/
- unsigned int irq_id[DEPT_IRQS_NR];
+ unsigned int cxt_id[DEPT_CXTS_NR];

/*
* for tracking IRQ-enabled points with cross-event
*/
- unsigned int wgen_enirq[DEPT_IRQS_NR];
+ unsigned int wgen_enirq[DEPT_CXT_IRQS_NR];

/*
* for keeping up-to-date IRQ-enabled points
*/
- unsigned long enirq_ip[DEPT_IRQS_NR];
+ unsigned long enirq_ip[DEPT_CXT_IRQS_NR];

/*
* current effective IRQ-enabled flag
@@ -470,7 +475,7 @@ struct dept_task {
.wait_hist = { { .wait = NULL, } }, \
.ecxt_held_pos = 0, \
.wait_hist_pos = 0, \
- .irq_id = { 0U }, \
+ .cxt_id = { 0U }, \
.wgen_enirq = { 0U }, \
.enirq_ip = { 0UL }, \
.eff_enirqf = 0UL, \
@@ -509,6 +514,7 @@ struct dept_task {
extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
extern void dept_sched_enter(void);
extern void dept_sched_exit(void);
+extern void dept_kernel_enter(void);

static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
{
@@ -558,6 +564,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
#define dept_ecxt_exit(m, e_f, ip) do { } while (0)
#define dept_sched_enter() do { } while (0)
#define dept_sched_exit() do { } while (0)
+#define dept_kernel_enter() do { } while (0)
#define dept_ecxt_enter_nokeep(m) do { } while (0)
#define dept_key_init(k) do { (void)(k); } while (0)
#define dept_key_destroy(k) do { (void)(k); } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index d164482..e98617b 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -220,9 +220,9 @@ static inline struct dept_class *dep_tc(struct dept_dep *d)

static inline const char *irq_str(int irq)
{
- if (irq == DEPT_SIRQ)
+ if (irq == DEPT_CXT_SIRQ)
return "softirq";
- if (irq == DEPT_HIRQ)
+ if (irq == DEPT_CXT_HIRQ)
return "hardirq";
return "(unknown)";
}
@@ -406,7 +406,7 @@ static void initialize_class(struct dept_class *c)
{
int i;

- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
struct dept_iecxt *ie = &c->iecxt[i];
struct dept_iwait *iw = &c->iwait[i];

@@ -431,7 +431,7 @@ static void initialize_ecxt(struct dept_ecxt *e)
{
int i;

- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
e->enirq_stack[i] = NULL;
e->enirq_ip[i] = 0UL;
}
@@ -447,7 +447,7 @@ static void initialize_wait(struct dept_wait *w)
{
int i;

- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
w->irq_stack[i] = NULL;
w->irq_ip[i] = 0UL;
}
@@ -486,7 +486,7 @@ static void destroy_ecxt(struct dept_ecxt *e)
{
int i;

- for (i = 0; i < DEPT_IRQS_NR; i++)
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
if (e->enirq_stack[i])
put_stack(e->enirq_stack[i]);
if (e->class)
@@ -502,7 +502,7 @@ static void destroy_wait(struct dept_wait *w)
{
int i;

- for (i = 0; i < DEPT_IRQS_NR; i++)
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
if (w->irq_stack[i])
put_stack(w->irq_stack[i]);
if (w->class)
@@ -651,7 +651,7 @@ static void print_diagram(struct dept_dep *d)
const char *tc_n = tc->sched_map ? "<sched>" : (tc->name ?: "(unknown)");

irqf = e->enirqf & w->irqf;
- for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
if (!firstline)
pr_warn("\nor\n\n");
firstline = false;
@@ -684,7 +684,7 @@ static void print_dep(struct dept_dep *d)
const char *tc_n = tc->sched_map ? "<sched>" : (tc->name ?: "(unknown)");

irqf = e->enirqf & w->irqf;
- for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
pr_warn("%s has been enabled:\n", irq_str(irq));
print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
pr_warn("\n");
@@ -910,7 +910,7 @@ static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
*/

static inline unsigned long cur_enirqf(void);
-static inline int cur_irq(void);
+static inline int cur_cxt(void);
static inline unsigned int cur_ctxt_id(void);

static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
@@ -1458,7 +1458,7 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
if (d) {
check_dl_bfs(d);

- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
struct dept_iwait *fiw = iwait(fc, i);
struct dept_iecxt *found_ie;
struct dept_iwait *found_iw;
@@ -1494,7 +1494,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
struct dept_task *dt = dept_task();
struct dept_wait *w;
unsigned int wg = 0U;
- int irq;
+ int cxt;
int i;

if (DEPT_WARN_ON(!valid_class(c)))
@@ -1510,9 +1510,9 @@ static void add_wait(struct dept_class *c, unsigned long ip,
w->wait_stack = get_current_stack();
w->sched_sleep = sched_sleep;

- irq = cur_irq();
- if (irq < DEPT_IRQS_NR)
- add_iwait(c, irq, w);
+ cxt = cur_cxt();
+ if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
+ add_iwait(c, cxt, w);

/*
* Avoid adding dependency between user aware nested ecxt and
@@ -1593,7 +1593,7 @@ static bool add_ecxt(struct dept_map *m, struct dept_class *c,
eh->sub_l = sub_l;

irqf = cur_enirqf();
- for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+ for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR)
add_iecxt(c, irq, e, false);

del_ecxt(e);
@@ -1745,7 +1745,7 @@ static void do_event(struct dept_map *m, struct dept_class *c,
add_dep(eh->ecxt, wh->wait);
}

- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
struct dept_ecxt *e;

if (before(dt->wgen_enirq[i], wg))
@@ -1787,7 +1787,7 @@ static void disconnect_class(struct dept_class *c)
call_rcu(&d->rh, del_dep_rcu);
}

- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
stale_iecxt(iecxt(c, i));
stale_iwait(iwait(c, i));
}
@@ -1812,27 +1812,21 @@ static inline unsigned long cur_enirqf(void)
return 0UL;
}

-static inline int cur_irq(void)
+static inline int cur_cxt(void)
{
if (lockdep_softirq_context(current))
- return DEPT_SIRQ;
+ return DEPT_CXT_SIRQ;
if (lockdep_hardirq_context())
- return DEPT_HIRQ;
- return DEPT_IRQS_NR;
+ return DEPT_CXT_HIRQ;
+ return DEPT_CXT_PROCESS;
}

static inline unsigned int cur_ctxt_id(void)
{
struct dept_task *dt = dept_task();
- int irq = cur_irq();
+ int cxt = cur_cxt();

- /*
- * Normal process context
- */
- if (irq == DEPT_IRQS_NR)
- return 0U;
-
- return dt->irq_id[irq] | (1UL << irq);
+ return dt->cxt_id[cxt] | (1UL << cxt);
}

static void enirq_transition(int irq)
@@ -1883,7 +1877,7 @@ static void enirq_update(unsigned long ip)
/*
* Do enirq_transition() only on an OFF -> ON transition.
*/
- for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
if (prev & (1UL << irq))
continue;

@@ -1960,6 +1954,13 @@ void dept_hardirqs_off(unsigned long ip)
}
EXPORT_SYMBOL_GPL(dept_hardirqs_off);

+void dept_kernel_enter(void)
+{
+ struct dept_task *dt = dept_task();
+
+ dt->cxt_id[DEPT_CXT_PROCESS] += 1UL << DEPT_CXTS_NR;
+}
+
/*
* Ensure it's the outmost softirq context.
*/
@@ -1967,7 +1968,7 @@ void dept_softirq_enter(void)
{
struct dept_task *dt = dept_task();

- dt->irq_id[DEPT_SIRQ] += 1UL << DEPT_IRQS_NR;
+ dt->cxt_id[DEPT_CXT_SIRQ] += 1UL << DEPT_CXTS_NR;
}

/*
@@ -1977,7 +1978,7 @@ void dept_hardirq_enter(void)
{
struct dept_task *dt = dept_task();

- dt->irq_id[DEPT_HIRQ] += 1UL << DEPT_IRQS_NR;
+ dt->cxt_id[DEPT_CXT_HIRQ] += 1UL << DEPT_CXTS_NR;
}

void dept_sched_enter(void)
--
1.9.1

2023-01-09 04:08:44

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 05/23] dept: Tie to Lockdep and IRQ tracing

Yes. How to place Dept in here looks so ugly. But it's inevitable as
long as relying on Lockdep. The way should be enhanced gradually.

1. Basically relies on Lockdep to track typical locks and IRQ things.

2. Dept fails to recognize IRQ situation so it generates false alarms
when raw_local_irq_*() APIs are used. So made it track those too.

3. Lockdep doesn't track the outmost {hard,soft}irq entracnes but
Dept makes use of it. So made it track those too.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/irqflags.h | 22 +++++++-
include/linux/local_lock_internal.h | 1 +
include/linux/lockdep.h | 108 +++++++++++++++++++++++++++++-------
include/linux/lockdep_types.h | 3 +
include/linux/mutex.h | 1 +
include/linux/percpu-rwsem.h | 2 +-
include/linux/rtmutex.h | 1 +
include/linux/rwlock_types.h | 1 +
include/linux/rwsem.h | 1 +
include/linux/seqlock.h | 2 +-
include/linux/spinlock_types_raw.h | 3 +
include/linux/srcu.h | 2 +-
kernel/dependency/dept.c | 4 +-
kernel/locking/lockdep.c | 23 ++++++++
14 files changed, 145 insertions(+), 29 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5ec0fa7..3cca328 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -13,6 +13,7 @@
#define _LINUX_TRACE_IRQFLAGS_H

#include <linux/typecheck.h>
+#include <linux/dept.h>
#include <asm/irqflags.h>
#include <asm/percpu.h>

@@ -60,8 +61,10 @@ struct irqtrace_events {
# define lockdep_softirqs_enabled(p) ((p)->softirqs_enabled)
# define lockdep_hardirq_enter() \
do { \
- if (__this_cpu_inc_return(hardirq_context) == 1)\
+ if (__this_cpu_inc_return(hardirq_context) == 1) { \
current->hardirq_threaded = 0; \
+ dept_hardirq_enter(); \
+ } \
} while (0)
# define lockdep_hardirq_threaded() \
do { \
@@ -136,6 +139,8 @@ struct irqtrace_events {
# define lockdep_softirq_enter() \
do { \
current->softirq_context++; \
+ if (current->softirq_context == 1) \
+ dept_softirq_enter(); \
} while (0)
# define lockdep_softirq_exit() \
do { \
@@ -170,17 +175,28 @@ struct irqtrace_events {
/*
* Wrap the arch provided IRQ routines to provide appropriate checks.
*/
-#define raw_local_irq_disable() arch_local_irq_disable()
-#define raw_local_irq_enable() arch_local_irq_enable()
+#define raw_local_irq_disable() \
+ do { \
+ arch_local_irq_disable(); \
+ dept_hardirqs_off(_THIS_IP_); \
+ } while (0)
+#define raw_local_irq_enable() \
+ do { \
+ dept_hardirqs_on(_THIS_IP_); \
+ arch_local_irq_enable(); \
+ } while (0)
#define raw_local_irq_save(flags) \
do { \
typecheck(unsigned long, flags); \
flags = arch_local_irq_save(); \
+ dept_hardirqs_off(_THIS_IP_); \
} while (0)
#define raw_local_irq_restore(flags) \
do { \
typecheck(unsigned long, flags); \
raw_check_bogus_irq_restore(); \
+ if (!arch_irqs_disabled_flags(flags)) \
+ dept_hardirqs_on(_THIS_IP_); \
arch_local_irq_restore(flags); \
} while (0)
#define raw_local_save_flags(flags) \
diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h
index 975e33b..39f6778 100644
--- a/include/linux/local_lock_internal.h
+++ b/include/linux/local_lock_internal.h
@@ -21,6 +21,7 @@
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
.lock_type = LD_LOCK_PERCPU, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}, \
.owner = NULL,

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 1f1099d..3c0b10e 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -12,6 +12,7 @@

#include <linux/lockdep_types.h>
#include <linux/smp.h>
+#include <linux/dept_ldt.h>
#include <asm/percpu.h>

struct task_struct;
@@ -39,6 +40,8 @@ static inline void lockdep_copy_map(struct lockdep_map *to,
*/
for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
to->class_cache[i] = NULL;
+
+ dept_map_copy(&to->dmap, &from->dmap);
}

/*
@@ -441,7 +444,8 @@ enum xhlock_context_t {
* Note that _name must not be NULL.
*/
#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
- { .name = (_name), .key = (void *)(_key), }
+ { .name = (_name), .key = (void *)(_key), \
+ .dmap = DEPT_MAP_INITIALIZER(_name, _key) }

static inline void lockdep_invariant_state(bool force) {}
static inline void lockdep_free_task(struct task_struct *task) {}
@@ -523,33 +527,95 @@ static inline void print_irqtrace_events(struct task_struct *curr)
#define lock_acquire_shared(l, s, t, n, i) lock_acquire(l, s, t, 1, 1, n, i)
#define lock_acquire_shared_recursive(l, s, t, n, i) lock_acquire(l, s, t, 2, 1, n, i)

-#define spin_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define spin_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
-#define spin_release(l, i) lock_release(l, i)
-
-#define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
+#define spin_acquire(l, s, t, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
+#define spin_acquire_nest(l, s, t, n, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, n, i); \
+ lock_acquire_exclusive(l, s, t, n, i); \
+} while (0)
+#define spin_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
+#define rwlock_acquire(l, s, t, i) \
+do { \
+ if (read_lock_is_recursive()) \
+ ldt_wlock(&(l)->dmap, s, t, NULL, i); \
+ else \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
#define rwlock_acquire_read(l, s, t, i) \
do { \
if (read_lock_is_recursive()) \
+ ldt_rlock(&(l)->dmap, s, t, NULL, i); \
+ else \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ if (read_lock_is_recursive()) \
lock_acquire_shared_recursive(l, s, t, NULL, i); \
else \
lock_acquire_shared(l, s, t, NULL, i); \
} while (0)
-
-#define rwlock_release(l, i) lock_release(l, i)
-
-#define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define seqcount_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i)
-#define seqcount_release(l, i) lock_release(l, i)
-
-#define mutex_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define mutex_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
-#define mutex_release(l, i) lock_release(l, i)
-
-#define rwsem_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwsem_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
-#define rwsem_acquire_read(l, s, t, i) lock_acquire_shared(l, s, t, NULL, i)
-#define rwsem_release(l, i) lock_release(l, i)
+#define rwlock_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
+#define seqcount_acquire(l, s, t, i) \
+do { \
+ ldt_wlock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
+#define seqcount_acquire_read(l, s, t, i) \
+do { \
+ ldt_rlock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_shared_recursive(l, s, t, NULL, i); \
+} while (0)
+#define seqcount_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
+#define mutex_acquire(l, s, t, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
+#define mutex_acquire_nest(l, s, t, n, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, n, i); \
+ lock_acquire_exclusive(l, s, t, n, i); \
+} while (0)
+#define mutex_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
+#define rwsem_acquire(l, s, t, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
+#define rwsem_acquire_nest(l, s, t, n, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, n, i); \
+ lock_acquire_exclusive(l, s, t, n, i); \
+} while (0)
+#define rwsem_acquire_read(l, s, t, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_shared(l, s, t, NULL, i); \
+} while (0)
+#define rwsem_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)

#define lock_map_acquire(l) lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
#define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index d224308..50c8879 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -11,6 +11,7 @@
#define __LINUX_LOCKDEP_TYPES_H

#include <linux/types.h>
+#include <linux/dept.h>

#define MAX_LOCKDEP_SUBCLASSES 8UL

@@ -76,6 +77,7 @@ struct lock_class_key {
struct hlist_node hash_entry;
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
};
+ struct dept_key dkey;
};

extern struct lock_class_key __lockdep_no_validate__;
@@ -185,6 +187,7 @@ struct lockdep_map {
int cpu;
unsigned long ip;
#endif
+ struct dept_map dmap;
};

struct pin_cookie { unsigned int val; };
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 8f226d4..58bf314 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -25,6 +25,7 @@
, .dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}
#else
# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 36b942b..e871aca 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -21,7 +21,7 @@ struct percpu_rw_semaphore {
};

#ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname },
+#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname, .dmap = DEPT_MAP_INITIALIZER(lockname, NULL) },
#else
#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)
#endif
diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 7d04988..35889ac 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -81,6 +81,7 @@ static inline void rt_mutex_debug_task_free(struct task_struct *tsk) { }
.dep_map = { \
.name = #mutexname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ .dmap = DEPT_MAP_INITIALIZER(mutexname, NULL),\
}
#else
#define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)
diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index 1948442..6e58dfc 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -10,6 +10,7 @@
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL), \
}
#else
# define RW_DEP_MAP_INIT(lockname)
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index efa5c32..4f856e7 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -21,6 +21,7 @@
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
},
#else
# define __RWSEM_DEP_MAP_INIT(lockname)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 3926e90..6ba00bc 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -81,7 +81,7 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
#ifdef CONFIG_DEBUG_LOCK_ALLOC

# define SEQCOUNT_DEP_MAP_INIT(lockname) \
- .dep_map = { .name = #lockname }
+ .dep_map = { .name = #lockname, .dmap = DEPT_MAP_INITIALIZER(lockname, NULL) }

/**
* seqcount_init() - runtime initializer for seqcount_t
diff --git a/include/linux/spinlock_types_raw.h b/include/linux/spinlock_types_raw.h
index 91cb36b..3dcc551 100644
--- a/include/linux/spinlock_types_raw.h
+++ b/include/linux/spinlock_types_raw.h
@@ -31,11 +31,13 @@
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_SPIN, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}
# define SPIN_DEP_MAP_INIT(lockname) \
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}

# define LOCAL_SPIN_DEP_MAP_INIT(lockname) \
@@ -43,6 +45,7 @@
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
.lock_type = LD_LOCK_PERCPU, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}
#else
# define RAW_SPIN_DEP_MAP_INIT(lockname)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 9b9d0bb..c934158 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -35,7 +35,7 @@ int __init_srcu_struct(struct srcu_struct *ssp, const char *name,
__init_srcu_struct((ssp), #ssp, &__srcu_key); \
})

-#define __SRCU_DEP_MAP_INIT(srcu_name) .dep_map = { .name = #srcu_name },
+#define __SRCU_DEP_MAP_INIT(srcu_name) .dep_map = { .name = #srcu_name, .dmap = DEPT_MAP_INITIALIZER(srcu_name, NULL) },
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */

int init_srcu_struct(struct srcu_struct *ssp);
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index a54a770..e950954 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -244,10 +244,10 @@ static inline bool dept_working(void)
* Even k == NULL is considered as a valid key because it would use
* &->map_key as the key in that case.
*/
-struct dept_key __dept_no_validate__;
+extern struct lock_class_key __lockdep_no_validate__;
static inline bool valid_key(struct dept_key *k)
{
- return &__dept_no_validate__ != k;
+ return &__lockdep_no_validate__.dkey != k;
}

/*
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e3375bc..abe9298 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1220,6 +1220,8 @@ void lockdep_register_key(struct lock_class_key *key)
struct lock_class_key *k;
unsigned long flags;

+ dept_key_init(&key->dkey);
+
if (WARN_ON_ONCE(static_obj(key)))
return;
hash_head = keyhashentry(key);
@@ -4327,6 +4329,8 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
{
struct irqtrace_events *trace = &current->irqtrace;

+ dept_hardirqs_on(ip);
+
if (unlikely(!debug_locks))
return;

@@ -4392,6 +4396,8 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
*/
void noinstr lockdep_hardirqs_off(unsigned long ip)
{
+ dept_hardirqs_off(ip);
+
if (unlikely(!debug_locks))
return;

@@ -4436,6 +4442,8 @@ void lockdep_softirqs_on(unsigned long ip)
{
struct irqtrace_events *trace = &current->irqtrace;

+ dept_softirqs_on(ip);
+
if (unlikely(!lockdep_enabled()))
return;

@@ -4474,6 +4482,9 @@ void lockdep_softirqs_on(unsigned long ip)
*/
void lockdep_softirqs_off(unsigned long ip)
{
+
+ dept_softirqs_off(ip);
+
if (unlikely(!lockdep_enabled()))
return;

@@ -4806,6 +4817,8 @@ void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
{
int i;

+ ldt_init(&lock->dmap, &key->dkey, subclass, name);
+
for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
lock->class_cache[i] = NULL;

@@ -5544,6 +5557,12 @@ void lock_set_class(struct lockdep_map *lock, const char *name,
{
unsigned long flags;

+ /*
+ * dept_map_(re)init() might be called twice redundantly. But
+ * there's no choice as long as Dept relies on Lockdep.
+ */
+ ldt_set_class(&lock->dmap, name, &key->dkey, subclass, ip);
+
if (unlikely(!lockdep_enabled()))
return;

@@ -5561,6 +5580,8 @@ void lock_downgrade(struct lockdep_map *lock, unsigned long ip)
{
unsigned long flags;

+ ldt_downgrade(&lock->dmap, ip);
+
if (unlikely(!lockdep_enabled()))
return;

@@ -6333,6 +6354,8 @@ void lockdep_unregister_key(struct lock_class_key *key)
unsigned long flags;
bool found = false;

+ dept_key_destroy(&key->dkey);
+
might_sleep();

if (WARN_ON_ONCE(static_obj(key)))
--
1.9.1

2023-01-09 04:08:52

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 23/23] dept: Record the latest one out of consecutive waits of the same class

The current code records all the waits for later use to track relation
between waits and events in each context. However, since the same class
is handled the same way, it'd be okay to record only one on behalf of
the others if they all have the same class.

Even though it's the ideal to search the whole history buffer for that,
since it'd cost too high, alternatively, let's keep the latest one at
least when the same class'ed waits consecutively appear.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/dependency/dept.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index cd25995..9cd37b4 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -1521,9 +1521,28 @@ static inline struct dept_wait_hist *new_hist(void)
return wh;
}

+static inline struct dept_wait_hist *last_hist(void)
+{
+ int pos_n = hist_pos_next();
+ struct dept_wait_hist *wh_n = hist(pos_n);
+
+ /*
+ * This is the first try.
+ */
+ if (!pos_n && !wh_n->wait)
+ return NULL;
+
+ return hist(pos_n + DEPT_MAX_WAIT_HIST - 1);
+}
+
static void add_hist(struct dept_wait *w, unsigned int wg, unsigned int ctxt_id)
{
- struct dept_wait_hist *wh = new_hist();
+ struct dept_wait_hist *wh;
+
+ wh = last_hist();
+
+ if (!wh || wh->wait->class != w->class || wh->ctxt_id != ctxt_id)
+ wh = new_hist();

if (likely(wh->wait))
put_wait(wh->wait);
--
1.9.1

2023-01-09 04:09:03

by Byungchul Park

[permalink] [raw]
Subject: [PATCH RFC v7 21/23] dept: Apply timeout consideration to hashed-waitqueue wait

Now that CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT was introduced, apply the
consideration to hashed-waitqueue wait, assuming an input 'ret' in
___wait_var_event() macro is used as a timeout value.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/wait_bit.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
index bad30ba..b504815 100644
--- a/include/linux/wait_bit.h
+++ b/include/linux/wait_bit.h
@@ -247,7 +247,10 @@ struct wait_bit_queue_entry {
struct wait_bit_queue_entry __wbq_entry; \
long __ret = ret; /* explicit shadow */ \
\
- sdt_might_sleep_weak(NULL); \
+ if (!__ret || __ret == MAX_SCHEDULE_TIMEOUT) \
+ sdt_might_sleep_weak(NULL); \
+ else \
+ sdt_might_sleep_weak_timeout(NULL); \
init_wait_var_entry(&__wbq_entry, var, \
exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
--
1.9.1

2023-01-09 09:15:01

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [PATCH RFC v7 08/23] dept: Apply sdt_might_sleep_strong() to PG_{locked,writeback} wait

On 1/9/23 6:33 AM, Byungchul Park wrote:

> Makes Dept able to track dependencies by PG_{locked,writeback} waits.
>
> Signed-off-by: Byungchul Park <[email protected]>
> ---
> mm/filemap.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c4d4ace..b013a5b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
[...]
> @@ -1226,6 +1230,11 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
> unsigned long pflags;
> bool in_thrashing;
>
> + if (bit_nr == PG_locked)
> + sdt_might_sleep_strong(&PG_locked_map);
> + else if (bit_nr == PG_writeback)
> + sdt_might_sleep_strong(&PG_writeback_map);

Hm, this is asking to be a *switch* statement instead...

[...]

MBR, Sergey

2023-01-17 19:15:13

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

[Cc Waiman]

On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
> [ Back from travel, so trying to make sense of this series.. ]
>
> On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <[email protected]> wrote:
> >
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms. It's done on v6.2-rc2.
>
> Ugh. I hate how this adds random patterns like
>
> if (timeout == MAX_SCHEDULE_TIMEOUT)
> sdt_might_sleep_strong(NULL);
> else
> sdt_might_sleep_strong_timeout(NULL);
> ...
> sdt_might_sleep_finish();
>
> to various places, it seems so very odd and unmaintainable.
>
> I also recall this giving a fair amount of false positives, are they all fixed?
>

From the following part in the cover letter, I guess the answer is no?

...
6. Multiple reports are allowed.
7. Deduplication control on multiple reports.
8. Withstand false positives thanks to 6.
...

seems to me that the logic is since DEPT allows multiple reports so that
false positives are fitlerable by users?

> Anyway, I'd really like the lockdep people to comment and be involved.

I never get Cced, so I'm unware of this for a long time...

A few comments after a quick look:

* Looks like the DEPT dependency graph doesn't handle the
fair/unfair readers as lockdep current does. Which bring the
next question.

* Can DEPT pass all the selftests of lockdep in
lib/locking-selftests.c?

* Instead of introducing a brand new detector/dependency tracker,
could we first improve the lockdep's dependency tracker? I think
Byungchul also agrees that DEPT and lockdep should share the
same dependency tracker and the benefit of improving the
existing one is that we can always use the self test to catch
any regression. Thoughts?

Actually the above sugguest is just to revert revert cross-release
without exposing any annotation, which I think is more practical to
review and test.

I'd sugguest we 1) first improve the lockdep dependency tracker with
wait/event in mind and then 2) introduce wait related annotation so that
users can use, and then 3) look for practical ways to resolve false
positives/multi reports with the help of users, if all goes well,
4) make it all operation annotated.

Thoughts?

Regards,
Boqun

> We did have a fairly recent case of "lockdep doesn't track page lock
> dependencies because it fundamentally cannot" issue, so DEPT might fix
> those kinds of missing dependency analysis. See
>
> https://lore.kernel.org/lkml/[email protected]/
>
> for some context to that one, but at teh same time I would *really*
> want the lockdep people more involved and acking this work.
>
> Maybe I missed the email where you reported on things DEPT has found
> (and on the lack of false positives)?
>
> Linus
>

2023-01-17 19:53:53

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On 1/17/23 13:18, Boqun Feng wrote:
> [Cc Waiman]
>
> On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
>> [ Back from travel, so trying to make sense of this series.. ]
>>
>> On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <[email protected]> wrote:
>>> I've been developing a tool for detecting deadlock possibilities by
>>> tracking wait/event rather than lock(?) acquisition order to try to
>>> cover all synchonization machanisms. It's done on v6.2-rc2.
>> Ugh. I hate how this adds random patterns like
>>
>> if (timeout == MAX_SCHEDULE_TIMEOUT)
>> sdt_might_sleep_strong(NULL);
>> else
>> sdt_might_sleep_strong_timeout(NULL);
>> ...
>> sdt_might_sleep_finish();
>>
>> to various places, it seems so very odd and unmaintainable.
>>
>> I also recall this giving a fair amount of false positives, are they all fixed?
>>
> From the following part in the cover letter, I guess the answer is no?
>
> ...
> 6. Multiple reports are allowed.
> 7. Deduplication control on multiple reports.
> 8. Withstand false positives thanks to 6.
> ...
>
> seems to me that the logic is since DEPT allows multiple reports so that
> false positives are fitlerable by users?
>
>> Anyway, I'd really like the lockdep people to comment and be involved.
> I never get Cced, so I'm unware of this for a long time...
>
> A few comments after a quick look:
>
> * Looks like the DEPT dependency graph doesn't handle the
> fair/unfair readers as lockdep current does. Which bring the
> next question.
>
> * Can DEPT pass all the selftests of lockdep in
> lib/locking-selftests.c?
>
> * Instead of introducing a brand new detector/dependency tracker,
> could we first improve the lockdep's dependency tracker? I think
> Byungchul also agrees that DEPT and lockdep should share the
> same dependency tracker and the benefit of improving the
> existing one is that we can always use the self test to catch
> any regression. Thoughts?
>
> Actually the above sugguest is just to revert revert cross-release
> without exposing any annotation, which I think is more practical to
> review and test.
>
> I'd sugguest we 1) first improve the lockdep dependency tracker with
> wait/event in mind and then 2) introduce wait related annotation so that
> users can use, and then 3) look for practical ways to resolve false
> positives/multi reports with the help of users, if all goes well,
> 4) make it all operation annotated.

I agree with your suggestions. In fact, the lockdep code itself is one
of major overheads when running a debug kernel. If we have another set
of parallel dependency tracker, we may slow down a debug kernel even
more. So I would rather prefer improving the existing lockdep code
instead creating a completely new one.

I do agree that the lockdep code itself is now rather complex. A
separate dependency tracker, however, may undergo similar transformation
over time to become more and more complex due to the needs to meet
different requirement and constraints.

Cheers,
Longman

2023-01-18 13:37:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Tue, Jan 17 2023 at 10:18, Boqun Feng wrote:
> On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
>> I also recall this giving a fair amount of false positives, are they all fixed?
>
> From the following part in the cover letter, I guess the answer is no?
> ...
> 6. Multiple reports are allowed.
> 7. Deduplication control on multiple reports.
> 8. Withstand false positives thanks to 6.
> ...
>
> seems to me that the logic is since DEPT allows multiple reports so that
> false positives are fitlerable by users?

I really do not know what's so valuable about multiple reports. They
produce a flood of information which needs to be filtered, while a
single report ensures that the first detected issue is dumped, which
increases the probability that it can be recorded and acted upon.

Filtering out false positives is just the wrong approach. Decoding
dependency issues from any tracker is complex enough given the nature of
the problem, so adding the burden of filtering out issues from a stream
of dumps is not helpful at all. It's just a marketing gag.

> * Instead of introducing a brand new detector/dependency tracker,
> could we first improve the lockdep's dependency tracker? I think
> Byungchul also agrees that DEPT and lockdep should share the
> same dependency tracker and the benefit of improving the
> existing one is that we can always use the self test to catch
> any regression. Thoughts?

Ack. If the internal implementation of lockdep has shortcomings, then we
can expand and/or replace it instead of having yet another
infrastructure which is not even remotely as mature.

Thanks,

tglx

2023-01-18 13:37:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH RFC v7 07/23] dept: Apply sdt_might_sleep_strong() to wait_for_completion()/complete()

On Mon, Jan 09 2023 at 12:33, Byungchul Park wrote:
> Makes Dept able to track dependencies by
> wait_for_completion()/complete().
>
> In order to obtain the meaningful caller points, replace all the
> wait_for_completion*() declarations with macros in the header.

That's just wrong.

> -extern void wait_for_completion(struct completion *);
> +extern void raw_wait_for_completion(struct completion *);

> +#define wait_for_completion(x) \
> +({ \
> + sdt_might_sleep_strong(NULL); \
> + raw_wait_for_completion(x); \
> + sdt_might_sleep_finish(); \
> +})

The very same can be achieved with a proper annotation which does not
enforce THIS_IP but allows to use __builtin_return_address($N). That's
what everything else uses too.

Thanks,

tglx

2023-01-18 13:38:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH RFC v7 06/23] dept: Add proc knobs to show stats and dependency graph

On Mon, Jan 09 2023 at 12:33, Byungchul Park wrote:
> It'd be useful to show Dept internal stats and dependency graph on
> runtime via proc for better information. Introduced the knobs.

proc?

That's what debugfs is for.

2023-01-18 13:40:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH RFC v7 03/23] dept: Add single event dependency tracker APIs

On Mon, Jan 09 2023 at 12:33, Byungchul Park wrote:
> +/*
> + * sdt_might_sleep() and its family will be committed in __schedule()
> + * when it actually gets to __schedule(). Both dept_request_event() and
> + * dept_wait() will be performed on the commit.
> + */
> +
> +/*
> + * Use the code location as the class key if an explicit map is not used.
> + */
> +#define sdt_might_sleep_strong(m) \
> + do { \
> + struct dept_map *__m = m; \
> + static struct dept_key __key; \
> + dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, true);\
> + } while (0)
> +
> +/*
> + * Use the code location as the class key if an explicit map is not used.
> + */
> +#define sdt_might_sleep_weak(m) \
> + do { \
> + struct dept_map *__m = m; \
> + static struct dept_key __key; \
> + dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, false);\
> + } while (0)
> +
> +#define sdt_might_sleep_finish() dept_clean_stage()
> +
> +#define sdt_ecxt_enter(m) dept_ecxt_enter(m, 1UL, _THIS_IP_, "start", "event", 0)
> +#define sdt_event(m) dept_event(m, 1UL, _THIS_IP_, __func__)
> +#define sdt_ecxt_exit(m) dept_ecxt_exit(m, 1UL, _THIS_IP_)

None of the above comes with a proper documentation of the various
macros/functions. How should anyone aside of you understand what this is
about and how this should be used?

Thanks,

tglx

2023-01-19 01:16:11

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

Torvalds wrote:
> On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <[email protected]> wrote:
>>
>> I've been developing a tool for detecting deadlock possibilities by
>> tracking wait/event rather than lock(?) acquisition order to try to
>> cover all synchonization machanisms. It's done on v6.2-rc2.
>
> Ugh. I hate how this adds random patterns like

I undertand what you mean.. But all the synchronization primitives
should let DEPT know the beginning and the end of each. However, I will
remove the 'if' statement that looks ugly from the next spin, and place
the pattern to a better place if possible.

> if (timeout == MAX_SCHEDULE_TIMEOUT)
> sdt_might_sleep_strong(NULL);
> else
> sdt_might_sleep_strong_timeout(NULL);
> ...
> sdt_might_sleep_finish();
>
> to various places, it seems so very odd and unmaintainable.
>
> I also recall this giving a fair amount of false positives, are they all fixed?

Yes. Of course I removed all the false positives we found.

> Anyway, I'd really like the lockdep people to comment and be involved.
> We did have a fairly recent case of "lockdep doesn't track page lock
> dependencies because it fundamentally cannot" issue, so DEPT might fix
> those kinds of missing dependency analysis. See

Sure. That's exactly what DEPT works for e.g. PG_locked.

> https://lore.kernel.org/lkml/[email protected]/

I will reproduce it and share the result.

> for some context to that one, but at teh same time I would *really*
> want the lockdep people more involved and acking this work.
>
> Maybe I missed the email where you reported on things DEPT has found
> (and on the lack of false positives)?

Maybe you didn't miss. It's still too hard to make a decision between:

Aggressive detection with false alarms that need to be fixed by
manual classification as Lockdep did, focusing on potential
possibility more.

versus

Conservative detection with few false alarms, which requires us
to test much longer to get result we expect, focusing on actual
happening.

>
> Linus

Byungchul

2023-01-19 06:37:36

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

Boqun wrote:
> On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
> > [ Back from travel, so trying to make sense of this series.. ]
> >
> > On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <[email protected]> wrote:
> > >
> > > I've been developing a tool for detecting deadlock possibilities by
> > > tracking wait/event rather than lock(?) acquisition order to try to
> > > cover all synchonization machanisms. It's done on v6.2-rc2.
> >
> > Ugh. I hate how this adds random patterns like
> >
> > if (timeout == MAX_SCHEDULE_TIMEOUT)
> > sdt_might_sleep_strong(NULL);
> > else
> > sdt_might_sleep_strong_timeout(NULL);
> > ...
> > sdt_might_sleep_finish();
> >
> > to various places, it seems so very odd and unmaintainable.
> >
> > I also recall this giving a fair amount of false positives, are they all fixed?
> >
>
> From the following part in the cover letter, I guess the answer is no?

I fixed what we found anyway.

> ...
> 6. Multiple reports are allowed.
> 7. Deduplication control on multiple reports.
> 8. Withstand false positives thanks to 6.
> ...
>
> seems to me that the logic is since DEPT allows multiple reports so that
> false positives are fitlerable by users?

At lease, it's needed until DEPT is considered stable because stronger
detection inevitably has more chance of false alarms unless we do manual
fix on each, which is the same as Lockdep.

> > Anyway, I'd really like the lockdep people to comment and be involved.
>
> I never get Cced, so I'm unware of this for a long time...

Sorry I missed it. I will cc you from now on.

> A few comments after a quick look:
>
> * Looks like the DEPT dependency graph doesn't handle the
> fair/unfair readers as lockdep current does. Which bring the
> next question.

No. DEPT works better for unfair read. It works based on wait/event. So
read_lock() is considered a potential wait waiting on write_unlock()
while write_lock() is considered a potential wait waiting on either
write_unlock() or read_unlock(). DEPT is working perfect for it.

For fair read (maybe you meant queued read lock), I think the case
should be handled in the same way as normal lock. I might get it wrong.
Please let me know if I miss something.

> * Can DEPT pass all the selftests of lockdep in
> lib/locking-selftests.c?
>
> * Instead of introducing a brand new detector/dependency tracker,
> could we first improve the lockdep's dependency tracker? I think

At the beginning of this work, of course I was thinking to improve
Lockdep but I decided to implement a new tool because:

1. What we need to check for deadlock detection is no longer
lock dependency but more fundamental dependency by wait/event.
A better design would have a separate dependency engine for
that, not within Lockdep. Remind lock/unlock are also
wait/event after all.

2. I was thinking to revert the revert of cross-release. But it
will start to report false alarms as Lockdep was at the
beginning, and require us to keep fixing things until being
able to see what we are interested in, maybe for ever. How
can we control that situation? I wouldn't use this extention.

3. Okay. Let's think about modifying the current Lockdep to make
it work similar to DEPT. It'd require us to pay more effort
than developing a new simple tool from the scratch with the
basic requirement.

4. Big change at once right away? No way. The new tool need to
be matured and there are ones who want to make use of DEPT at
the same time. The best approach would be I think to go along
together for a while.

Please don't look at each detail but the big picture, the architecture.
Plus, please consider I introduce a tool only focucing on fundamental
dependency itself that Lockdep can make use of. I wish great developers
like you would join improve the common engine togather.

> Byungchul also agrees that DEPT and lockdep should share the
> same dependency tracker and the benefit of improving the

I agree that both should share a single tracker.

> existing one is that we can always use the self test to catch
> any regression. Thoughts?

I imagine the follownig look for the final form:

Lock correctness checker(LOCKDEP)
+-----------------------------------------+
| Lock usage correctness check |
| |
| |
| (Request dependency check) |
| T |
+---------------------------|-------------+
|
Dependency tracker(DEPT) V
+-----------------------------------------+
| Dependency check |
| (by tracking wait and event context) |
+-----------------------------------------+

> Actually the above sugguest is just to revert revert cross-release
> without exposing any annotation, which I think is more practical to
> review and test.

Reverting the revert of cross-release is not bad. But I'd suggest a
nicer design for the reasons I explained above.

Byungchul

> I'd sugguest we 1) first improve the lockdep dependency tracker with
> wait/event in mind and then 2) introduce wait related annotation so that
> users can use, and then 3) look for practical ways to resolve false
> positives/multi reports with the help of users, if all goes well,
> 4) make it all operation annotated.
>
> Thoughts?
>
> Regards,
> Boqun

2023-01-19 07:15:33

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

Byungchul wrote:
> Boqun wrote:
> > On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
> > > [ Back from travel, so trying to make sense of this series.. ]
> > >
> > > On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <[email protected]> wrote:
> > > >
> > > > I've been developing a tool for detecting deadlock possibilities by
> > > > tracking wait/event rather than lock(?) acquisition order to try to
> > > > cover all synchonization machanisms. It's done on v6.2-rc2.
> > >
> > > Ugh. I hate how this adds random patterns like
> > >
> > > if (timeout == MAX_SCHEDULE_TIMEOUT)
> > > sdt_might_sleep_strong(NULL);
> > > else
> > > sdt_might_sleep_strong_timeout(NULL);
> > > ...
> > > sdt_might_sleep_finish();
> > >
> > > to various places, it seems so very odd and unmaintainable.
> > >
> > > I also recall this giving a fair amount of false positives, are they all fixed?
> > >
> >
> > From the following part in the cover letter, I guess the answer is no?
>
> I fixed what we found anyway.
>
> > ...
> > 6. Multiple reports are allowed.
> > 7. Deduplication control on multiple reports.
> > 8. Withstand false positives thanks to 6.
> > ...
> >
> > seems to me that the logic is since DEPT allows multiple reports so that
> > false positives are fitlerable by users?
>
> At lease, it's needed until DEPT is considered stable because stronger
> detection inevitably has more chance of false alarms unless we do manual
> fix on each, which is the same as Lockdep.
>
> > > Anyway, I'd really like the lockdep people to comment and be involved.
> >
> > I never get Cced, so I'm unware of this for a long time...
>
> Sorry I missed it. I will cc you from now on.
>
> > A few comments after a quick look:
> >
> > * Looks like the DEPT dependency graph doesn't handle the
> > fair/unfair readers as lockdep current does. Which bring the
> > next question.
>
> No. DEPT works better for unfair read. It works based on wait/event. So
> read_lock() is considered a potential wait waiting on write_unlock()
> while write_lock() is considered a potential wait waiting on either
> write_unlock() or read_unlock(). DEPT is working perfect for it.
>
> For fair read (maybe you meant queued read lock), I think the case
> should be handled in the same way as normal lock. I might get it wrong.
> Please let me know if I miss something.
>
> > * Can DEPT pass all the selftests of lockdep in
> > lib/locking-selftests.c?
> >
> > * Instead of introducing a brand new detector/dependency tracker,
> > could we first improve the lockdep's dependency tracker? I think
>
> At the beginning of this work, of course I was thinking to improve
> Lockdep but I decided to implement a new tool because:
>
> 1. What we need to check for deadlock detection is no longer
> lock dependency but more fundamental dependency by wait/event.
> A better design would have a separate dependency engine for
> that, not within Lockdep. Remind lock/unlock are also
> wait/event after all.
>
> 2. I was thinking to revert the revert of cross-release. But it
> will start to report false alarms as Lockdep was at the
> beginning, and require us to keep fixing things until being
> able to see what we are interested in, maybe for ever. How
> can we control that situation? I wouldn't use this extention.
>
> 3. Okay. Let's think about modifying the current Lockdep to make
> it work similar to DEPT. It'd require us to pay more effort
> than developing a new simple tool from the scratch with the
> basic requirement.
>
> 4. Big change at once right away? No way. The new tool need to
> be matured and there are ones who want to make use of DEPT at
> the same time. The best approach would be I think to go along
> together for a while.

(Appologize for this. Let me re-write this part.)

4. Big change at once right away? No way. The new feature need
to be matured and there are ones who want to use the new
feature at the same time. The best approach would be I think
to go along together for a while.

Thanks,
Byungchul

> Please don't look at each detail but the big picture, the architecture.
> Plus, please consider I introduce a tool only focucing on fundamental
> dependency itself that Lockdep can make use of. I wish great developers
> like you would join improve the common engine togather.
>
> > Byungchul also agrees that DEPT and lockdep should share the
> > same dependency tracker and the benefit of improving the
>
> I agree that both should share a single tracker.
>
> > existing one is that we can always use the self test to catch
> > any regression. Thoughts?
>
> I imagine the follownig look for the final form:
>
> Lock correctness checker(LOCKDEP)
> +-----------------------------------------+
> | Lock usage correctness check |
> | |
> | |
> | (Request dependency check) |
> | T |
> +---------------------------|-------------+
> |
> Dependency tracker(DEPT) V
> +-----------------------------------------+
> | Dependency check |
> | (by tracking wait and event context) |
> +-----------------------------------------+
>
> > Actually the above sugguest is just to revert revert cross-release
> > without exposing any annotation, which I think is more practical to
> > review and test.
>
> Reverting the revert of cross-release is not bad. But I'd suggest a
> nicer design for the reasons I explained above.
>
> Byungchul
>
> > I'd sugguest we 1) first improve the lockdep dependency tracker with
> > wait/event in mind and then 2) introduce wait related annotation so that
> > users can use, and then 3) look for practical ways to resolve false
> > positives/multi reports with the help of users, if all goes well,
> > 4) make it all operation annotated.
> >
> > Thoughts?
> >
> > Regards,
> > Boqun

2023-01-19 09:24:24

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

Thomas wrote:
> On Tue, Jan 17 2023 at 10:18, Boqun Feng wrote:
> > On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
> > > I also recall this giving a fair amount of false positives, are they all fixed?
> >
> > From the following part in the cover letter, I guess the answer is no?
> > ...
> > 6. Multiple reports are allowed.
> > 7. Deduplication control on multiple reports.
> > 8. Withstand false positives thanks to 6.
> > ...
> >
> > seems to me that the logic is since DEPT allows multiple reports so that
> > false positives are fitlerable by users?
>
> I really do not know what's so valuable about multiple reports. They
> produce a flood of information which needs to be filtered, while a
> single report ensures that the first detected issue is dumped, which
> increases the probability that it can be recorded and acted upon.

Assuming the following 2 assumptions, you are right.

Assumption 1. There will be too many reports with the multi-report
support, like all the combination of dependencies between
e.g. in-irq and irq-enabled-context.

Assumption 2. The detection is matured enough so that it barely happens
to fix false onces to see true one which is not a big deal.

However, DEPT doesn't generate all the combination of irq things as
Lockdep does so we only see a few multi-reports even with the support,
and I admit DEPT hasn't matured enough yet because fine classification
is required anyway to suppress false alarms. That's why I introduced
multi-report support at least for now. IMHO, it'd be still useful even
if it's gonna report a few true ones at once w/o false ones some day.

> Filtering out false positives is just the wrong approach. Decoding
> dependency issues from any tracker is complex enough given the nature of
> the problem, so adding the burden of filtering out issues from a stream
> of dumps is not helpful at all. It's just a marketing gag.
>
> > * Instead of introducing a brand new detector/dependency tracker,
> > could we first improve the lockdep's dependency tracker? I think
> > Byungchul also agrees that DEPT and lockdep should share the
> > same dependency tracker and the benefit of improving the
> > existing one is that we can always use the self test to catch
> > any regression. Thoughts?
>
> Ack. If the internal implementation of lockdep has shortcomings, then we
> can expand and/or replace it instead of having yet another
> infrastructure which is not even remotely as mature.

Ultimately, yes. We should expand or replace it instead of having
another ultimately.

Byungchul
>
> Thanks,
>
> tglx

2023-01-19 13:35:46

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> Boqun wrote:
> > * Looks like the DEPT dependency graph doesn't handle the
> > fair/unfair readers as lockdep current does. Which bring the
> > next question.
>
> No. DEPT works better for unfair read. It works based on wait/event. So
> read_lock() is considered a potential wait waiting on write_unlock()
> while write_lock() is considered a potential wait waiting on either
> write_unlock() or read_unlock(). DEPT is working perfect for it.
>
> For fair read (maybe you meant queued read lock), I think the case
> should be handled in the same way as normal lock. I might get it wrong.
> Please let me know if I miss something.

From the lockdep/DEPT point of view, the question is whether:

read_lock(A)
read_lock(A)

can deadlock if a writer comes in between the two acquisitions and
sleeps waiting on A to be released. A fair lock will block new
readers when a writer is waiting, while an unfair lock will allow
new readers even while a writer is waiting.

2023-01-19 19:32:33

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > Boqun wrote:
> > > * Looks like the DEPT dependency graph doesn't handle the
> > > fair/unfair readers as lockdep current does. Which bring the
> > > next question.
> >
> > No. DEPT works better for unfair read. It works based on wait/event. So
> > read_lock() is considered a potential wait waiting on write_unlock()
> > while write_lock() is considered a potential wait waiting on either
> > write_unlock() or read_unlock(). DEPT is working perfect for it.
> >
> > For fair read (maybe you meant queued read lock), I think the case
> > should be handled in the same way as normal lock. I might get it wrong.
> > Please let me know if I miss something.
>
> From the lockdep/DEPT point of view, the question is whether:
>
> read_lock(A)
> read_lock(A)
>
> can deadlock if a writer comes in between the two acquisitions and
> sleeps waiting on A to be released. A fair lock will block new
> readers when a writer is waiting, while an unfair lock will allow
> new readers even while a writer is waiting.
>

To be more accurate, a fair reader will wait if there is a writer
waiting for other reader (fair or not) to unlock, and an unfair reader
won't.

In kernel there are read/write locks that can have both fair and unfair
readers (e.g. queued rwlock). Regarding deadlocks,

T0 T1 T2
-- -- --
fair_read_lock(A);
write_lock(B);
write_lock(A);
write_lock(B);
unfair_read_lock(A);

the above is not a deadlock, since T1's unfair reader can "steal" the
lock. However the following is a deadlock:

T0 T1 T2
-- -- --
unfair_read_lock(A);
write_lock(B);
write_lock(A);
write_lock(B);
fair_read_lock(A);

, since T'1 fair reader will wait.

FWIW, lockdep is able to catch this (figuring out which is deadlock and
which is not) since two years ago, plus other trivial deadlock detection
for read/write locks. Needless to say, if lib/lock-selftests.c was given
a try, one could find it out on one's own.

Regards,
Boqun

2023-01-20 01:56:43

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

Boqun wrote:
> On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> > On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > > Boqun wrote:
> > > > *Looks like the DEPT dependency graph doesn't handle the
> > > > fair/unfair readers as lockdep current does. Which bring the
> > > > next question.
> > >
> > > No. DEPT works better for unfair read. It works based on wait/event. So
> > > read_lock() is considered a potential wait waiting on write_unlock()
> > > while write_lock() is considered a potential wait waiting on either
> > > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > >
> > > For fair read (maybe you meant queued read lock), I think the case
> > > should be handled in the same way as normal lock. I might get it wrong.
> > > Please let me know if I miss something.
> >
> > From the lockdep/DEPT point of view, the question is whether:
> >
> > read_lock(A)
> > read_lock(A)
> >
> > can deadlock if a writer comes in between the two acquisitions and
> > sleeps waiting on A to be released. A fair lock will block new
> > readers when a writer is waiting, while an unfair lock will allow
> > new readers even while a writer is waiting.
> >
>
> To be more accurate, a fair reader will wait if there is a writer
> waiting for other reader (fair or not) to unlock, and an unfair reader
> won't.

What a kind guys, both of you! Thanks.

I asked to check if there are other subtle things than this. Fortunately,
I already understand what you guys shared.

> In kernel there are read/write locks that can have both fair and unfair
> readers (e.g. queued rwlock). Regarding deadlocks,
>
> T0 T1 T2
> -- -- --
> fair_read_lock(A);
> write_lock(B);
> write_lock(A);
> write_lock(B);
> unfair_read_lock(A);

With the DEPT's point of view (let me re-write the scenario):

T0 T1 T2
-- -- --
fair_read_lock(A);
write_lock(B);
write_lock(A);
write_lock(B);
unfair_read_lock(A);
write_unlock(B);
read_unlock(A);
read_unlock(A);
write_unlock(B);
write_unlock(A);

T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
not doing either write_unlock(B) or read_unlock(B). In other words:

1. read_unlock(A) happening depends on write_unlock(B) happening.
2. read_unlock(A) happening depends on read_unlock(B) happening.

T1: write_unlock(B) cannot happen if unfair_read_lock(A) is stuck by a A
owner not doing write_unlock(A). In other words:

3. write_unlock(B) happening depends on write_unlock(A) happening.

1, 2 and 3 give the following dependencies:

1. read_unlock(A) -> write_unlock(B)
2. read_unlock(A) -> read_unlock(B)
3. write_unlock(B) -> write_unlock(A)

There's no circular dependency so it's safe. DEPT doesn't report this.

> the above is not a deadlock, since T1's unfair reader can "steal" the
> lock. However the following is a deadlock:
>
> T0 T1 T2
> -- -- --
> unfair_read_lock(A);
> write_lock(B);
> write_lock(A);
> write_lock(B);
> fair_read_lock(A);
>
> , since T'1 fair reader will wait.

With the DEPT's point of view (let me re-write the scenario):

T0 T1 T2
-- -- --
unfair_read_lock(A);
write_lock(B);
write_lock(A);
write_lock(B);
fair_read_lock(A);
write_unlock(B);
read_unlock(A);
read_unlock(A);
write_unlock(B);
write_unlock(A);

T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
not doing either write_unlock(B) or read_unlock(B). In other words:

1. read_unlock(A) happening depends on write_unlock(B) happening.
2. read_unlock(A) happening depends on read_unlock(B) happening.

T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
owner not doing either write_unlock(A) or read_unlock(A). In other
words:

3. write_unlock(B) happening depends on write_unlock(A) happening.
4. write_unlock(B) happening depends on read_unlock(A) happening.

1, 2, 3 and 4 give the following dependencies:

1. read_unlock(A) -> write_unlock(B)
2. read_unlock(A) -> read_unlock(B)
3. write_unlock(B) -> write_unlock(A)
4. write_unlock(B) -> read_unlock(A)

With 1 and 4, there's a circular dependency so DEPT definitely report
this as a problem.

REMIND: DEPT focuses on waits and events.

> FWIW, lockdep is able to catch this (figuring out which is deadlock and
> which is not) since two years ago, plus other trivial deadlock detection
> for read/write locks. Needless to say, if lib/lock-selftests.c was given
> a try, one could find it out on one's own.
>
> Regards,
> Boqun
>

2023-01-20 02:33:43

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> Boqun wrote:
> > On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> > > On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > > > Boqun wrote:
> > > > > *Looks like the DEPT dependency graph doesn't handle the
> > > > > fair/unfair readers as lockdep current does. Which bring the
> > > > > next question.
> > > >
> > > > No. DEPT works better for unfair read. It works based on wait/event. So
> > > > read_lock() is considered a potential wait waiting on write_unlock()
> > > > while write_lock() is considered a potential wait waiting on either
> > > > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > > >
> > > > For fair read (maybe you meant queued read lock), I think the case
> > > > should be handled in the same way as normal lock. I might get it wrong.
> > > > Please let me know if I miss something.
> > >
> > > From the lockdep/DEPT point of view, the question is whether:
> > >
> > > read_lock(A)
> > > read_lock(A)
> > >
> > > can deadlock if a writer comes in between the two acquisitions and
> > > sleeps waiting on A to be released. A fair lock will block new
> > > readers when a writer is waiting, while an unfair lock will allow
> > > new readers even while a writer is waiting.
> > >
> >
> > To be more accurate, a fair reader will wait if there is a writer
> > waiting for other reader (fair or not) to unlock, and an unfair reader
> > won't.
>
> What a kind guys, both of you! Thanks.
>
> I asked to check if there are other subtle things than this. Fortunately,
> I already understand what you guys shared.
>
> > In kernel there are read/write locks that can have both fair and unfair
> > readers (e.g. queued rwlock). Regarding deadlocks,
> >
> > T0 T1 T2
> > -- -- --
> > fair_read_lock(A);
> > write_lock(B);
> > write_lock(A);
> > write_lock(B);
> > unfair_read_lock(A);
>
> With the DEPT's point of view (let me re-write the scenario):
>
> T0 T1 T2
> -- -- --
> fair_read_lock(A);
> write_lock(B);
> write_lock(A);
> write_lock(B);
> unfair_read_lock(A);
> write_unlock(B);
> read_unlock(A);
> read_unlock(A);
> write_unlock(B);
> write_unlock(A);
>
> T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> not doing either write_unlock(B) or read_unlock(B). In other words:
>
> 1. read_unlock(A) happening depends on write_unlock(B) happening.
> 2. read_unlock(A) happening depends on read_unlock(B) happening.
>
> T1: write_unlock(B) cannot happen if unfair_read_lock(A) is stuck by a A
> owner not doing write_unlock(A). In other words:
>
> 3. write_unlock(B) happening depends on write_unlock(A) happening.
>
> 1, 2 and 3 give the following dependencies:
>
> 1. read_unlock(A) -> write_unlock(B)
> 2. read_unlock(A) -> read_unlock(B)
> 3. write_unlock(B) -> write_unlock(A)
>
> There's no circular dependency so it's safe. DEPT doesn't report this.
>
> > the above is not a deadlock, since T1's unfair reader can "steal" the
> > lock. However the following is a deadlock:
> >
> > T0 T1 T2
> > -- -- --
> > unfair_read_lock(A);
> > write_lock(B);
> > write_lock(A);
> > write_lock(B);
> > fair_read_lock(A);
> >
> > , since T'1 fair reader will wait.
>
> With the DEPT's point of view (let me re-write the scenario):
>
> T0 T1 T2
> -- -- --
> unfair_read_lock(A);
> write_lock(B);
> write_lock(A);
> write_lock(B);
> fair_read_lock(A);
> write_unlock(B);
> read_unlock(A);
> read_unlock(A);
> write_unlock(B);
> write_unlock(A);
>
> T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> not doing either write_unlock(B) or read_unlock(B). In other words:
>
> 1. read_unlock(A) happening depends on write_unlock(B) happening.
> 2. read_unlock(A) happening depends on read_unlock(B) happening.
>
> T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> owner not doing either write_unlock(A) or read_unlock(A). In other
> words:
>
> 3. write_unlock(B) happening depends on write_unlock(A) happening.
> 4. write_unlock(B) happening depends on read_unlock(A) happening.
>
> 1, 2, 3 and 4 give the following dependencies:
>
> 1. read_unlock(A) -> write_unlock(B)
> 2. read_unlock(A) -> read_unlock(B)
> 3. write_unlock(B) -> write_unlock(A)
> 4. write_unlock(B) -> read_unlock(A)
>
> With 1 and 4, there's a circular dependency so DEPT definitely report
> this as a problem.
>
> REMIND: DEPT focuses on waits and events.

Do you have the test cases showing DEPT can detect this?

Regards,
Boqun

>
> > FWIW, lockdep is able to catch this (figuring out which is deadlock and
> > which is not) since two years ago, plus other trivial deadlock detection
> > for read/write locks. Needless to say, if lib/lock-selftests.c was given
> > a try, one could find it out on one's own.
> >
> > Regards,
> > Boqun
> >

2023-01-20 03:09:30

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> > Boqun wrote:
> > > On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> > > > On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > > > > Boqun wrote:
> > > > > > *Looks like the DEPT dependency graph doesn't handle the
> > > > > > fair/unfair readers as lockdep current does. Which bring the
> > > > > > next question.
> > > > >
> > > > > No. DEPT works better for unfair read. It works based on wait/event. So
> > > > > read_lock() is considered a potential wait waiting on write_unlock()
> > > > > while write_lock() is considered a potential wait waiting on either
> > > > > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > > > >
> > > > > For fair read (maybe you meant queued read lock), I think the case
> > > > > should be handled in the same way as normal lock. I might get it wrong.
> > > > > Please let me know if I miss something.
> > > >
> > > > From the lockdep/DEPT point of view, the question is whether:
> > > >
> > > > read_lock(A)
> > > > read_lock(A)
> > > >
> > > > can deadlock if a writer comes in between the two acquisitions and
> > > > sleeps waiting on A to be released. A fair lock will block new
> > > > readers when a writer is waiting, while an unfair lock will allow
> > > > new readers even while a writer is waiting.
> > > >
> > >
> > > To be more accurate, a fair reader will wait if there is a writer
> > > waiting for other reader (fair or not) to unlock, and an unfair reader
> > > won't.
> >
> > What a kind guys, both of you! Thanks.
> >
> > I asked to check if there are other subtle things than this. Fortunately,
> > I already understand what you guys shared.
> >
> > > In kernel there are read/write locks that can have both fair and unfair
> > > readers (e.g. queued rwlock). Regarding deadlocks,
> > >
> > > T0 T1 T2
> > > -- -- --
> > > fair_read_lock(A);
> > > write_lock(B);
> > > write_lock(A);
> > > write_lock(B);
> > > unfair_read_lock(A);
> >
> > With the DEPT's point of view (let me re-write the scenario):
> >
> > T0 T1 T2
> > -- -- --
> > fair_read_lock(A);
> > write_lock(B);
> > write_lock(A);
> > write_lock(B);
> > unfair_read_lock(A);
> > write_unlock(B);
> > read_unlock(A);
> > read_unlock(A);
> > write_unlock(B);
> > write_unlock(A);
> >
> > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > not doing either write_unlock(B) or read_unlock(B). In other words:
> >
> > 1. read_unlock(A) happening depends on write_unlock(B) happening.
> > 2. read_unlock(A) happening depends on read_unlock(B) happening.
> >
> > T1: write_unlock(B) cannot happen if unfair_read_lock(A) is stuck by a A
> > owner not doing write_unlock(A). In other words:
> >
> > 3. write_unlock(B) happening depends on write_unlock(A) happening.
> >
> > 1, 2 and 3 give the following dependencies:
> >
> > 1. read_unlock(A) -> write_unlock(B)
> > 2. read_unlock(A) -> read_unlock(B)
> > 3. write_unlock(B) -> write_unlock(A)
> >
> > There's no circular dependency so it's safe. DEPT doesn't report this.
> >
> > > the above is not a deadlock, since T1's unfair reader can "steal" the
> > > lock. However the following is a deadlock:
> > >
> > > T0 T1 T2
> > > -- -- --
> > > unfair_read_lock(A);
> > > write_lock(B);
> > > write_lock(A);
> > > write_lock(B);
> > > fair_read_lock(A);
> > >
> > > , since T'1 fair reader will wait.
> >
> > With the DEPT's point of view (let me re-write the scenario):
> >
> > T0 T1 T2
> > -- -- --
> > unfair_read_lock(A);
> > write_lock(B);
> > write_lock(A);
> > write_lock(B);
> > fair_read_lock(A);
> > write_unlock(B);
> > read_unlock(A);
> > read_unlock(A);
> > write_unlock(B);
> > write_unlock(A);
> >
> > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > not doing either write_unlock(B) or read_unlock(B). In other words:
> >
> > 1. read_unlock(A) happening depends on write_unlock(B) happening.
> > 2. read_unlock(A) happening depends on read_unlock(B) happening.
> >
> > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > owner not doing either write_unlock(A) or read_unlock(A). In other
> > words:
> >
> > 3. write_unlock(B) happening depends on write_unlock(A) happening.
> > 4. write_unlock(B) happening depends on read_unlock(A) happening.
> >
> > 1, 2, 3 and 4 give the following dependencies:
> >
> > 1. read_unlock(A) -> write_unlock(B)
> > 2. read_unlock(A) -> read_unlock(B)
> > 3. write_unlock(B) -> write_unlock(A)
> > 4. write_unlock(B) -> read_unlock(A)
> >
> > With 1 and 4, there's a circular dependency so DEPT definitely report
> > this as a problem.
> >
> > REMIND: DEPT focuses on waits and events.
>
> Do you have the test cases showing DEPT can detect this?
>

Just tried the following on your latest GitHub branch, I commented all
but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
Feel free to double check.

Regards,
Boqun

------------------------------------------->8
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index cd89138d62ba..f38e4109e013 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -2375,6 +2375,7 @@ static void ww_tests(void)
*/
static void queued_read_lock_hardirq_RE_Er(void)
{
+ // T0
HARDIRQ_ENTER();
read_lock(&rwlock_A);
LOCK(B);
@@ -2382,12 +2383,17 @@ static void queued_read_lock_hardirq_RE_Er(void)
read_unlock(&rwlock_A);
HARDIRQ_EXIT();

+ // T1
HARDIRQ_DISABLE();
LOCK(B);
read_lock(&rwlock_A);
read_unlock(&rwlock_A);
UNLOCK(B);
HARDIRQ_ENABLE();
+
+ // T2
+ write_lock_irq(&rwlock_A);
+ write_unlock_irq(&rwlock_A);
}

/*
@@ -2455,6 +2461,7 @@ static void queued_read_lock_tests(void)
dotest(queued_read_lock_hardirq_RE_Er, FAILURE, LOCKTYPE_RWLOCK);
pr_cont("\n");

+#if 0
print_testname("hardirq lock-read/read-lock");
dotest(queued_read_lock_hardirq_ER_rE, SUCCESS, LOCKTYPE_RWLOCK);
pr_cont("\n");
@@ -2462,6 +2469,7 @@ static void queued_read_lock_tests(void)
print_testname("hardirq inversion");
dotest(queued_read_lock_hardirq_inversion, FAILURE, LOCKTYPE_RWLOCK);
pr_cont("\n");
+#endif
}

static void fs_reclaim_correct_nesting(void)
@@ -2885,6 +2893,7 @@ void locking_selftest(void)
init_shared_classes();
lockdep_set_selftest_task(current);

+#if 0
DO_TESTCASE_6R("A-A deadlock", AA);
DO_TESTCASE_6R("A-B-B-A deadlock", ABBA);
DO_TESTCASE_6R("A-B-B-C-C-A deadlock", ABBCCA);
@@ -2967,6 +2976,7 @@ void locking_selftest(void)
DO_TESTCASE_6x2x2RW("irq read-recursion #3", irq_read_recursion3);

ww_tests();
+#endif

force_read_lock_recursive = 0;
/*
@@ -2975,6 +2985,7 @@ void locking_selftest(void)
if (IS_ENABLED(CONFIG_QUEUED_RWLOCKS))
queued_read_lock_tests();

+#if 0
fs_reclaim_tests();

/* Wait context test cases that are specific for RAW_LOCK_NESTING */
@@ -2987,6 +2998,7 @@ void locking_selftest(void)
dotest(hardirq_deadlock_softirq_not_deadlock, FAILURE, LOCKTYPE_SPECIAL);
pr_cont("\n");

+#endif
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;

2023-01-20 03:38:40

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote:
> On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> > > Boqun wrote:
> > > > On Thu, Jan 19, 2023 at 01:33:58PM +0000, Matthew Wilcox wrote:
> > > > > On Thu, Jan 19, 2023 at 03:23:08PM +0900, Byungchul Park wrote:
> > > > > > Boqun wrote:
> > > > > > > *Looks like the DEPT dependency graph doesn't handle the
> > > > > > > fair/unfair readers as lockdep current does. Which bring the
> > > > > > > next question.
> > > > > >
> > > > > > No. DEPT works better for unfair read. It works based on wait/event. So
> > > > > > read_lock() is considered a potential wait waiting on write_unlock()
> > > > > > while write_lock() is considered a potential wait waiting on either
> > > > > > write_unlock() or read_unlock(). DEPT is working perfect for it.
> > > > > >
> > > > > > For fair read (maybe you meant queued read lock), I think the case
> > > > > > should be handled in the same way as normal lock. I might get it wrong.
> > > > > > Please let me know if I miss something.
> > > > >
> > > > > From the lockdep/DEPT point of view, the question is whether:
> > > > >
> > > > > read_lock(A)
> > > > > read_lock(A)
> > > > >
> > > > > can deadlock if a writer comes in between the two acquisitions and
> > > > > sleeps waiting on A to be released. A fair lock will block new
> > > > > readers when a writer is waiting, while an unfair lock will allow
> > > > > new readers even while a writer is waiting.
> > > > >
> > > >
> > > > To be more accurate, a fair reader will wait if there is a writer
> > > > waiting for other reader (fair or not) to unlock, and an unfair reader
> > > > won't.
> > >
> > > What a kind guys, both of you! Thanks.
> > >
> > > I asked to check if there are other subtle things than this. Fortunately,
> > > I already understand what you guys shared.
> > >
> > > > In kernel there are read/write locks that can have both fair and unfair
> > > > readers (e.g. queued rwlock). Regarding deadlocks,
> > > >
> > > > T0 T1 T2
> > > > -- -- --
> > > > fair_read_lock(A);
> > > > write_lock(B);
> > > > write_lock(A);
> > > > write_lock(B);
> > > > unfair_read_lock(A);
> > >
> > > With the DEPT's point of view (let me re-write the scenario):
> > >
> > > T0 T1 T2
> > > -- -- --
> > > fair_read_lock(A);
> > > write_lock(B);
> > > write_lock(A);
> > > write_lock(B);
> > > unfair_read_lock(A);
> > > write_unlock(B);
> > > read_unlock(A);
> > > read_unlock(A);
> > > write_unlock(B);
> > > write_unlock(A);
> > >
> > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > > not doing either write_unlock(B) or read_unlock(B). In other words:
> > >
> > > 1. read_unlock(A) happening depends on write_unlock(B) happening.
> > > 2. read_unlock(A) happening depends on read_unlock(B) happening.
> > >
> > > T1: write_unlock(B) cannot happen if unfair_read_lock(A) is stuck by a A
> > > owner not doing write_unlock(A). In other words:
> > >
> > > 3. write_unlock(B) happening depends on write_unlock(A) happening.
> > >
> > > 1, 2 and 3 give the following dependencies:
> > >
> > > 1. read_unlock(A) -> write_unlock(B)
> > > 2. read_unlock(A) -> read_unlock(B)
> > > 3. write_unlock(B) -> write_unlock(A)
> > >
> > > There's no circular dependency so it's safe. DEPT doesn't report this.
> > >
> > > > the above is not a deadlock, since T1's unfair reader can "steal" the
> > > > lock. However the following is a deadlock:
> > > >
> > > > T0 T1 T2
> > > > -- -- --
> > > > unfair_read_lock(A);
> > > > write_lock(B);
> > > > write_lock(A);
> > > > write_lock(B);
> > > > fair_read_lock(A);
> > > >
> > > > , since T'1 fair reader will wait.
> > >
> > > With the DEPT's point of view (let me re-write the scenario):
> > >
> > > T0 T1 T2
> > > -- -- --
> > > unfair_read_lock(A);
> > > write_lock(B);
> > > write_lock(A);
> > > write_lock(B);
> > > fair_read_lock(A);
> > > write_unlock(B);
> > > read_unlock(A);
> > > read_unlock(A);
> > > write_unlock(B);
> > > write_unlock(A);
> > >
> > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > > not doing either write_unlock(B) or read_unlock(B). In other words:
> > >
> > > 1. read_unlock(A) happening depends on write_unlock(B) happening.
> > > 2. read_unlock(A) happening depends on read_unlock(B) happening.
> > >
> > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > > owner not doing either write_unlock(A) or read_unlock(A). In other
> > > words:
> > >
> > > 3. write_unlock(B) happening depends on write_unlock(A) happening.
> > > 4. write_unlock(B) happening depends on read_unlock(A) happening.
> > >
> > > 1, 2, 3 and 4 give the following dependencies:
> > >
> > > 1. read_unlock(A) -> write_unlock(B)
> > > 2. read_unlock(A) -> read_unlock(B)
> > > 3. write_unlock(B) -> write_unlock(A)
> > > 4. write_unlock(B) -> read_unlock(A)
> > >
> > > With 1 and 4, there's a circular dependency so DEPT definitely report
> > > this as a problem.
> > >
> > > REMIND: DEPT focuses on waits and events.
> >
> > Do you have the test cases showing DEPT can detect this?
> >
>
> Just tried the following on your latest GitHub branch, I commented all
> but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
> Feel free to double check.
>

In case anyone else want to try, let me explain a little bit how to
verify the behavior of the detectors. With the change, the only test
that runs is

dotest(queued_read_lock_hardirq_RE_Er, FAILURE, LOCKTYPE_RWLOCK);

"FAILURE" indicates selftests think lockdep should report a deadlock,
therefore for lockdep if all goes well, you will see:

[...] hardirq read-lock/lock-read: ok |

If you expect lockdep to print a full splat in the test (lockdep is
silent by default), you can add "debug_locks_verbose=2" in the kernel
command line, "2" mean RWLOCK testsuite.

Regards,
Boqun

> Regards,
> Boqun

2023-01-21 03:08:55

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

Byungchul wrote:
> Torvalds wrote:
> > On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <[email protected]> wrote:
> > >
> > > I've been developing a tool for detecting deadlock possibilities by
> > > tracking wait/event rather than lock(?) acquisition order to try to
> > > cover all synchonization machanisms. It's done on v6.2-rc2.
> >
> > Ugh. I hate how this adds random patterns like
>
> I undertand what you mean.. But all the synchronization primitives
> should let DEPT know the beginning and the end of each. However, I will
> remove the 'if' statement that looks ugly from the next spin, and place
> the pattern to a better place if possible.
>
> > if (timeout == MAX_SCHEDULE_TIMEOUT)
> > sdt_might_sleep_strong(NULL);
> > else
> > sdt_might_sleep_strong_timeout(NULL);
> > ...
> > sdt_might_sleep_finish();
> >
> > to various places, it seems so very odd and unmaintainable.
> >
> > I also recall this giving a fair amount of false positives, are they all fixed?
>
> Yes. Of course I removed all the false positives we found.
>
> > Anyway, I'd really like the lockdep people to comment and be involved.
> > We did have a fairly recent case of "lockdep doesn't track page lock
> > dependencies because it fundamentally cannot" issue, so DEPT might fix
> > those kinds of missing dependency analysis. See
>
> Sure. That's exactly what DEPT works for e.g. PG_locked.
>
> > https://lore.kernel.org/lkml/[email protected]/
>
> I will reproduce it and share the result.

Hi Torvalds and folks,

I reproduced the issue with DEPT on (after making DEPT work a lil more
aggressively for PG_locked), and obtain a DEPT report. I wish this is
the true positive, explaining the issue correctly!

Let me remind you guys again, "DEPT is designed exactly for that kind of
deadlock issue by e.g. PG_locked, PG_writeback and any wait APIs".

I attach the report and add how to interpret it at the end.

---

[ 227.854322] ===================================================
[ 227.854880] DEPT: Circular dependency has been detected.
[ 227.855341] 6.2.0-rc1-00025-gb0c20ebf51ac-dirty #28 Not tainted
[ 227.855864] ---------------------------------------------------
[ 227.856367] summary
[ 227.856601] ---------------------------------------------------
[ 227.857107] *** DEADLOCK ***

[ 227.857551] context A
[ 227.857803] [S] lock(&ni->ni_lock:0)
[ 227.858175] [W] folio_wait_bit_common(PG_locked_map:0)
[ 227.858658] [E] unlock(&ni->ni_lock:0)

[ 227.859233] context B
[ 227.859484] [S] (unknown)(PG_locked_map:0)
[ 227.859906] [W] lock(&ni->ni_lock:0)
[ 227.860277] [E] folio_unlock(PG_locked_map:0)

[ 227.860883] [S]: start of the event context
[ 227.861263] [W]: the wait blocked
[ 227.861581] [E]: the event not reachable
[ 227.861941] ---------------------------------------------------
[ 227.862436] context A's detail
[ 227.862738] ---------------------------------------------------
[ 227.863242] context A
[ 227.863490] [S] lock(&ni->ni_lock:0)
[ 227.863865] [W] folio_wait_bit_common(PG_locked_map:0)
[ 227.864356] [E] unlock(&ni->ni_lock:0)

[ 227.864929] [S] lock(&ni->ni_lock:0):
[ 227.865279] [<ffffffff82b396fb>] ntfs3_setattr+0x54b/0xd40
[ 227.865803] stacktrace:
[ 227.866064] ntfs3_setattr+0x54b/0xd40
[ 227.866469] notify_change+0xcb3/0x1430
[ 227.866875] do_truncate+0x149/0x210
[ 227.867277] path_openat+0x21a3/0x2a90
[ 227.867692] do_filp_open+0x1ba/0x410
[ 227.868110] do_sys_openat2+0x16d/0x4e0
[ 227.868520] __x64_sys_creat+0xcd/0x120
[ 227.868925] do_syscall_64+0x41/0xc0
[ 227.869322] entry_SYSCALL_64_after_hwframe+0x63/0xcd

[ 227.870019] [W] folio_wait_bit_common(PG_locked_map:0):
[ 227.870491] [<ffffffff81b228b0>] truncate_inode_pages_range+0x9b0/0xf20
[ 227.871074] stacktrace:
[ 227.871335] folio_wait_bit_common+0x5e0/0xaf0
[ 227.871796] truncate_inode_pages_range+0x9b0/0xf20
[ 227.872287] truncate_pagecache+0x67/0x90
[ 227.872730] ntfs3_setattr+0x55a/0xd40
[ 227.873152] notify_change+0xcb3/0x1430
[ 227.873578] do_truncate+0x149/0x210
[ 227.873981] path_openat+0x21a3/0x2a90
[ 227.874395] do_filp_open+0x1ba/0x410
[ 227.874803] do_sys_openat2+0x16d/0x4e0
[ 227.875215] __x64_sys_creat+0xcd/0x120
[ 227.875623] do_syscall_64+0x41/0xc0
[ 227.876035] entry_SYSCALL_64_after_hwframe+0x63/0xcd

[ 227.876738] [E] unlock(&ni->ni_lock:0):
[ 227.877105] (N/A)
[ 227.877331] ---------------------------------------------------
[ 227.877850] context B's detail
[ 227.878169] ---------------------------------------------------
[ 227.878699] context B
[ 227.878956] [S] (unknown)(PG_locked_map:0)
[ 227.879381] [W] lock(&ni->ni_lock:0)
[ 227.879774] [E] folio_unlock(PG_locked_map:0)

[ 227.880429] [S] (unknown)(PG_locked_map:0):
[ 227.880825] (N/A)

[ 227.881249] [W] lock(&ni->ni_lock:0):
[ 227.881607] [<ffffffff82b009ec>] attr_data_get_block+0x32c/0x19f0
[ 227.882151] stacktrace:
[ 227.882421] attr_data_get_block+0x32c/0x19f0
[ 227.882877] ntfs_get_block_vbo+0x264/0x1330
[ 227.883316] __block_write_begin_int+0x3bd/0x14b0
[ 227.883809] block_write_begin+0xb9/0x4d0
[ 227.884231] ntfs_write_begin+0x27e/0x480
[ 227.884650] generic_perform_write+0x256/0x570
[ 227.885155] __generic_file_write_iter+0x2ae/0x500
[ 227.885658] ntfs_file_write_iter+0x66d/0x1d70
[ 227.886136] do_iter_readv_writev+0x20b/0x3c0
[ 227.886596] do_iter_write+0x188/0x710
[ 227.887015] vfs_iter_write+0x74/0xa0
[ 227.887425] iter_file_splice_write+0x745/0xc90
[ 227.887913] direct_splice_actor+0x114/0x180
[ 227.888364] splice_direct_to_actor+0x33b/0x8b0
[ 227.888831] do_splice_direct+0x1b7/0x280
[ 227.889256] do_sendfile+0xb49/0x1310

[ 227.889854] [E] folio_unlock(PG_locked_map:0):
[ 227.890265] [<ffffffff81f10222>] generic_write_end+0xf2/0x440
[ 227.890788] stacktrace:
[ 227.891056] generic_write_end+0xf2/0x440
[ 227.891484] ntfs_write_end+0x42e/0x980
[ 227.891920] generic_perform_write+0x316/0x570
[ 227.892393] __generic_file_write_iter+0x2ae/0x500
[ 227.892899] ntfs_file_write_iter+0x66d/0x1d70
[ 227.893378] do_iter_readv_writev+0x20b/0x3c0
[ 227.893838] do_iter_write+0x188/0x710
[ 227.894253] vfs_iter_write+0x74/0xa0
[ 227.894660] iter_file_splice_write+0x745/0xc90
[ 227.895133] direct_splice_actor+0x114/0x180
[ 227.895585] splice_direct_to_actor+0x33b/0x8b0
[ 227.896082] do_splice_direct+0x1b7/0x280
[ 227.896521] do_sendfile+0xb49/0x1310
[ 227.896926] __x64_sys_sendfile64+0x1d0/0x210
[ 227.897389] do_syscall_64+0x41/0xc0
[ 227.897804] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 227.898332] ---------------------------------------------------
[ 227.898858] information that might be helpful
[ 227.899278] ---------------------------------------------------
[ 227.899817] CPU: 1 PID: 8060 Comm: a.out Not tainted 6.2.0-rc1-00025-gb0c20ebf51ac-dirty #28
[ 227.900547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 227.901249] Call Trace:
[ 227.901527] <TASK>
[ 227.901778] dump_stack_lvl+0xf2/0x169
[ 227.902167] print_circle.cold+0xca4/0xd28
[ 227.902593] ? lookup_dep+0x240/0x240
[ 227.902989] ? extend_queue+0x223/0x300
[ 227.903392] cb_check_dl+0x1e7/0x260
[ 227.903783] bfs+0x27b/0x610
[ 227.904102] ? print_circle+0x240/0x240
[ 227.904493] ? llist_add_batch+0x180/0x180
[ 227.904901] ? extend_queue_rev+0x300/0x300
[ 227.905317] ? __add_dep+0x60f/0x810
[ 227.905689] add_dep+0x221/0x5b0
[ 227.906041] ? __add_idep+0x310/0x310
[ 227.906432] ? add_iecxt+0x1bc/0xa60
[ 227.906821] ? add_iecxt+0x1bc/0xa60
[ 227.907210] ? add_iecxt+0x1bc/0xa60
[ 227.907599] ? add_iecxt+0x1bc/0xa60
[ 227.907997] __dept_wait+0x600/0x1490
[ 227.908392] ? add_iecxt+0x1bc/0xa60
[ 227.908778] ? truncate_inode_pages_range+0x9b0/0xf20
[ 227.909274] ? check_new_class+0x790/0x790
[ 227.909700] ? dept_enirq_transition+0x519/0x9c0
[ 227.910162] dept_wait+0x159/0x3b0
[ 227.910535] ? truncate_inode_pages_range+0x9b0/0xf20
[ 227.911032] folio_wait_bit_common+0x5e0/0xaf0
[ 227.911482] ? filemap_get_folios_contig+0xa30/0xa30
[ 227.911975] ? dept_enirq_transition+0x519/0x9c0
[ 227.912440] ? lock_is_held_type+0x10e/0x160
[ 227.912868] ? lock_is_held_type+0x11e/0x160
[ 227.913300] truncate_inode_pages_range+0x9b0/0xf20
[ 227.913782] ? truncate_inode_partial_folio+0xba0/0xba0
[ 227.914304] ? setattr_prepare+0x142/0xc40
[ 227.914718] truncate_pagecache+0x67/0x90
[ 227.915135] ntfs3_setattr+0x55a/0xd40
[ 227.915535] ? ktime_get_coarse_real_ts64+0x1e5/0x2f0
[ 227.916031] ? ntfs_extend+0x5c0/0x5c0
[ 227.916431] ? mode_strip_sgid+0x210/0x210
[ 227.916861] ? ntfs_extend+0x5c0/0x5c0
[ 227.917262] notify_change+0xcb3/0x1430
[ 227.917661] ? do_truncate+0x149/0x210
[ 227.918061] do_truncate+0x149/0x210
[ 227.918449] ? file_open_root+0x430/0x430
[ 227.918871] ? process_measurement+0x18c0/0x18c0
[ 227.919337] ? ntfs_file_release+0x230/0x230
[ 227.919784] path_openat+0x21a3/0x2a90
[ 227.920185] ? path_lookupat+0x840/0x840
[ 227.920595] ? dept_enirq_transition+0x519/0x9c0
[ 227.921047] ? lock_is_held_type+0x10e/0x160
[ 227.921460] do_filp_open+0x1ba/0x410
[ 227.921839] ? may_open_dev+0xf0/0xf0
[ 227.922214] ? find_held_lock+0x2d/0x110
[ 227.922612] ? lock_release+0x43c/0x830
[ 227.922992] ? dept_ecxt_exit+0x31a/0x590
[ 227.923395] ? _raw_spin_unlock+0x3b/0x50
[ 227.923793] ? alloc_fd+0x2de/0x6e0
[ 227.924148] do_sys_openat2+0x16d/0x4e0
[ 227.924529] ? __ia32_sys_get_robust_list+0x3b0/0x3b0
[ 227.925013] ? build_open_flags+0x6f0/0x6f0
[ 227.925414] ? dept_enirq_transition+0x519/0x9c0
[ 227.925870] ? dept_enirq_transition+0x519/0x9c0
[ 227.926331] ? lock_is_held_type+0x4e/0x160
[ 227.926751] ? lock_is_held_type+0x4e/0x160
[ 227.927168] __x64_sys_creat+0xcd/0x120
[ 227.927561] ? __x64_compat_sys_openat+0x1f0/0x1f0
[ 227.928031] do_syscall_64+0x41/0xc0
[ 227.928416] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 227.928912] RIP: 0033:0x7f8b9e4e4469
[ 227.929285] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
[ 227.930793] RSP: 002b:00007f8b9eea4ef8 EFLAGS: 00000202 ORIG_RAX: 0000000000000055
[ 227.931456] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8b9e4e4469
[ 227.932062] RDX: 0000000000737562 RSI: 0000000000000000 RDI: 0000000020000000
[ 227.932661] RBP: 00007f8b9eea4f20 R08: 0000000000000000 R09: 0000000000000000
[ 227.933252] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fffa75511ee
[ 227.933845] R13: 00007fffa75511ef R14: 00007f8b9ee85000 R15: 0000000000000003
[ 227.934443] </TASK>

---

This part is the most important.

[ 227.857551] context A
[ 227.857803] [S] lock(&ni->ni_lock:0)
[ 227.858175] [W] folio_wait_bit_common(PG_locked_map:0)
[ 227.858658] [E] unlock(&ni->ni_lock:0)

[ 227.859233] context B
[ 227.859484] [S] (unknown)(PG_locked_map:0)
[ 227.859906] [W] lock(&ni->ni_lock:0)
[ 227.860277] [E] folio_unlock(PG_locked_map:0)

[ 227.860883] [S]: start of the event context
[ 227.861263] [W]: the wait blocked
[ 227.861581] [E]: the event not reachable

Dependency 1. A's unlock(&ni_lock:0) cannot happen if A's
folio_wait_bit_common(PG_locked_map:0) is stuck waiting on
folio_ulock(PG_locked_map:0) that will wake up A.

Dependency 2. B's folio_unlock(PG_locked_map:0) cannot happend if B's
lock(&ni->ni_lock:0) is stuck waiting on
unlock(&ni->ni_lock:0) that will release &ni->ni_lock.

So if these two contexts run at the same time, a deadlock is gonna
happen. DEPT reports it based on the two dependencies above. You can
check the stacktrace of each [W] and [E] in context's detail section.

It'd be appreciated if you share your opinion. I will work on it and
post the next spin, after getting back to work in 4 days.

Byungchul

2023-01-21 03:29:33

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote:
> On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:

[...]

> > > T0 T1 T2
> > > -- -- --
> > > unfair_read_lock(A);
> > > write_lock(B);
> > > write_lock(A);
> > > write_lock(B);
> > > fair_read_lock(A);
> > > write_unlock(B);
> > > read_unlock(A);
> > > read_unlock(A);
> > > write_unlock(B);
> > > write_unlock(A);
> > >
> > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > > not doing either write_unlock(B) or read_unlock(B). In other words:
> > >
> > > 1. read_unlock(A) happening depends on write_unlock(B) happening.
> > > 2. read_unlock(A) happening depends on read_unlock(B) happening.
> > >
> > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > > owner not doing either write_unlock(A) or read_unlock(A). In other
> > > words:
> > >
> > > 3. write_unlock(B) happening depends on write_unlock(A) happening.
> > > 4. write_unlock(B) happening depends on read_unlock(A) happening.
> > >
> > > 1, 2, 3 and 4 give the following dependencies:
> > >
> > > 1. read_unlock(A) -> write_unlock(B)
> > > 2. read_unlock(A) -> read_unlock(B)
> > > 3. write_unlock(B) -> write_unlock(A)
> > > 4. write_unlock(B) -> read_unlock(A)
> > >
> > > With 1 and 4, there's a circular dependency so DEPT definitely report
> > > this as a problem.
> > >
> > > REMIND: DEPT focuses on waits and events.
> >
> > Do you have the test cases showing DEPT can detect this?
> >
>
> Just tried the following on your latest GitHub branch, I commented all
> but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
> Feel free to double check.

I tried the 'queued read lock' test cases with DEPT on. I can see DEPT
detect and report it. But yeah.. it's too verbose now. It's because DEPT
is not aware of the test environment so it's just working hard to report
every case.

To make DEPT work with the selftest better, some works are needed. I
will work on it later or you please work on it.

The corresponding report is the following.

---

[ 4.583997] ===================================================
[ 4.585094] DEPT: Circular dependency has been detected.
[ 4.585620] 6.0.0-00023-g331e0412f735 #2 Tainted: G W
[ 4.586347] ---------------------------------------------------
[ 4.586942] summary
[ 4.587161] ---------------------------------------------------
[ 4.587757] *** DEADLOCK ***
[ 4.587757]
[ 4.588198] context A
[ 4.588434] [S] lock(&rwlock_A:0)
[ 4.588804] [W] lock(&rwlock_B:0)
[ 4.589175] [E] unlock(&rwlock_A:0)
[ 4.589565]
[ 4.589727] context B
[ 4.589963] [S] lock(&rwlock_B:0)
[ 4.590375] [W] lock(&rwlock_A:0)
[ 4.590749] [E] unlock(&rwlock_B:0)
[ 4.591136]
[ 4.591295] [S]: start of the event context
[ 4.591716] [W]: the wait blocked
[ 4.592049] [E]: the event not reachable
[ 4.592443] ---------------------------------------------------
[ 4.593037] context A's detail
[ 4.593351] ---------------------------------------------------
[ 4.593944] context A
[ 4.594182] [S] lock(&rwlock_A:0)
[ 4.594577] [W] lock(&rwlock_B:0)
[ 4.594952] [E] unlock(&rwlock_A:0)
[ 4.595341]
[ 4.595501] [S] lock(&rwlock_A:0):
[ 4.595848] [<ffffffff814eb244>] queued_read_lock_hardirq_ER_rE+0xf4/0x170
[ 4.596547] stacktrace:
[ 4.596797] _raw_read_lock+0xcf/0x110
[ 4.597215] queued_read_lock_hardirq_ER_rE+0xf4/0x170
[ 4.597766] dotest+0x30/0x7bc
[ 4.598118] locking_selftest+0x2c6f/0x2ead
[ 4.598602] start_kernel+0x5aa/0x6d5
[ 4.599017] secondary_startup_64_no_verify+0xe0/0xeb
[ 4.599562]
[ 4.599721] [W] lock(&rwlock_B:0):
[ 4.600064] [<ffffffff814eb250>] queued_read_lock_hardirq_ER_rE+0x100/0x170
[ 4.600823] stacktrace:
[ 4.601075] dept_wait+0x12c/0x1d0
[ 4.601465] _raw_write_lock+0xa0/0xd0
[ 4.601892] queued_read_lock_hardirq_ER_rE+0x100/0x170
[ 4.602496] dotest+0x30/0x7bc
[ 4.602854] locking_selftest+0x2c6f/0x2ead
[ 4.603333] start_kernel+0x5aa/0x6d5
[ 4.603745] secondary_startup_64_no_verify+0xe0/0xeb
[ 4.604298]
[ 4.604458] [E] unlock(&rwlock_A:0):
[ 4.604820] (N/A)
[ 4.605023] ---------------------------------------------------
[ 4.605617] context B's detail
[ 4.605930] ---------------------------------------------------
[ 4.606551] context B
[ 4.606790] [S] lock(&rwlock_B:0)
[ 4.607163] [W] lock(&rwlock_A:0)
[ 4.607534] [E] unlock(&rwlock_B:0)
[ 4.607920]
[ 4.608080] [S] lock(&rwlock_B:0):
[ 4.608427] [<ffffffff814eb3b4>] queued_read_lock_hardirq_RE_Er+0xf4/0x170
[ 4.609113] stacktrace:
[ 4.609366] _raw_write_lock+0xc3/0xd0
[ 4.609788] queued_read_lock_hardirq_RE_Er+0xf4/0x170
[ 4.610371] dotest+0x30/0x7bc
[ 4.610730] locking_selftest+0x2c41/0x2ead
[ 4.611195] start_kernel+0x5aa/0x6d5
[ 4.611615] secondary_startup_64_no_verify+0xe0/0xeb
[ 4.612164]
[ 4.612325] [W] lock(&rwlock_A:0):
[ 4.612671] [<ffffffff814eb3c0>] queued_read_lock_hardirq_RE_Er+0x100/0x170
[ 4.613369] stacktrace:
[ 4.613622] _raw_read_lock+0xac/0x110
[ 4.614047] queued_read_lock_hardirq_RE_Er+0x100/0x170
[ 4.614652] dotest+0x30/0x7bc
[ 4.615007] locking_selftest+0x2c41/0x2ead
[ 4.615468] start_kernel+0x5aa/0x6d5
[ 4.615879] secondary_startup_64_no_verify+0xe0/0xeb
[ 4.616607]
[ 4.616769] [E] unlock(&rwlock_B:0):
[ 4.617132] (N/A)
[ 4.617336] ---------------------------------------------------
[ 4.617927] information that might be helpful
[ 4.618390] ---------------------------------------------------
[ 4.618981] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.0.0-00023-g331e0412f735 #2
[ 4.619886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 4.620699] Call Trace:
[ 4.620958] <TASK>
[ 4.621182] dump_stack_lvl+0x5d/0x81
[ 4.621561] print_circle.cold+0x52b/0x545
[ 4.621983] ? print_circle+0xd0/0xd0
[ 4.622385] cb_check_dl+0x58/0x60
[ 4.622737] bfs+0xba/0x170
[ 4.623029] add_dep+0x85/0x170
[ 4.623355] ? from_pool+0x4c/0x160
[ 4.623714] __dept_wait+0x1fd/0x600
[ 4.624081] ? queued_read_lock_hardirq_ER_rE+0x100/0x170
[ 4.624628] ? rcu_read_lock_held_common+0x9/0x50
[ 4.625108] ? queued_read_lock_hardirq_ER_rE+0x100/0x170
[ 4.625652] dept_wait+0x12c/0x1d0
[ 4.626000] _raw_write_lock+0xa0/0xd0
[ 4.626417] queued_read_lock_hardirq_ER_rE+0x100/0x170
[ 4.626951] dotest+0x30/0x7bc
[ 4.627270] locking_selftest+0x2c6f/0x2ead
[ 4.627702] start_kernel+0x5aa/0x6d5
[ 4.628081] secondary_startup_64_no_verify+0xe0/0xeb
[ 4.628597] </TASK>
---

The most important part is the following.

[ 4.588198] context A
[ 4.588434] [S] lock(&rwlock_A:0)
[ 4.588804] [W] lock(&rwlock_B:0)
[ 4.589175] [E] unlock(&rwlock_A:0)
[ 4.589565]
[ 4.589727] context B
[ 4.589963] [S] lock(&rwlock_B:0)
[ 4.590375] [W] lock(&rwlock_A:0)
[ 4.590749] [E] unlock(&rwlock_B:0)

As I told you, DEPT treats a queued lock as a normal type lock, no
matter whether it's a read lock. That's why it prints just
'lock(&rwlock_A:0)' instead of 'read_lock(&rwlock_A:0)'. If needed, I'm
gonna change the format.

I checked the selftest code and found, LOCK(B) is transformed like:

LOCK(B) -> WL(B) -> write_lock(&rwlock_B)

That's why '&rwlock_B' is printed instead of just 'B', JFYI.

Plus, for your information, you should turn on CONFIG_DEPT to use it.

Byungchul

2023-01-21 04:03:54

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Sat, Jan 21, 2023 at 12:28:14PM +0900, Byungchul Park wrote:
> On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote:
> > On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> > > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
>
> [...]
>
> > > > T0 T1 T2
> > > > -- -- --
> > > > unfair_read_lock(A);
> > > > write_lock(B);
> > > > write_lock(A);
> > > > write_lock(B);
> > > > fair_read_lock(A);
> > > > write_unlock(B);
> > > > read_unlock(A);
> > > > read_unlock(A);
> > > > write_unlock(B);
> > > > write_unlock(A);
> > > >
> > > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > > > not doing either write_unlock(B) or read_unlock(B). In other words:
> > > >
> > > > 1. read_unlock(A) happening depends on write_unlock(B) happening.
> > > > 2. read_unlock(A) happening depends on read_unlock(B) happening.
> > > >
> > > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > > > owner not doing either write_unlock(A) or read_unlock(A). In other
> > > > words:
> > > >
> > > > 3. write_unlock(B) happening depends on write_unlock(A) happening.
> > > > 4. write_unlock(B) happening depends on read_unlock(A) happening.
> > > >
> > > > 1, 2, 3 and 4 give the following dependencies:
> > > >
> > > > 1. read_unlock(A) -> write_unlock(B)
> > > > 2. read_unlock(A) -> read_unlock(B)
> > > > 3. write_unlock(B) -> write_unlock(A)
> > > > 4. write_unlock(B) -> read_unlock(A)
> > > >
> > > > With 1 and 4, there's a circular dependency so DEPT definitely report
> > > > this as a problem.
> > > >
> > > > REMIND: DEPT focuses on waits and events.
> > >
> > > Do you have the test cases showing DEPT can detect this?
> > >
> >
> > Just tried the following on your latest GitHub branch, I commented all
> > but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
> > Feel free to double check.
>
> I tried the 'queued read lock' test cases with DEPT on. I can see DEPT
> detect and report it. But yeah.. it's too verbose now. It's because DEPT
> is not aware of the test environment so it's just working hard to report
> every case.
>
> To make DEPT work with the selftest better, some works are needed. I
> will work on it later or you please work on it.
>
> The corresponding report is the following.
>
[...]
> [ 4.593037] context A's detail
> [ 4.593351] ---------------------------------------------------
> [ 4.593944] context A
> [ 4.594182] [S] lock(&rwlock_A:0)
> [ 4.594577] [W] lock(&rwlock_B:0)
> [ 4.594952] [E] unlock(&rwlock_A:0)
> [ 4.595341]
> [ 4.595501] [S] lock(&rwlock_A:0):
> [ 4.595848] [<ffffffff814eb244>] queued_read_lock_hardirq_ER_rE+0xf4/0x170
> [ 4.596547] stacktrace:
> [ 4.596797] _raw_read_lock+0xcf/0x110
> [ 4.597215] queued_read_lock_hardirq_ER_rE+0xf4/0x170
> [ 4.597766] dotest+0x30/0x7bc
> [ 4.598118] locking_selftest+0x2c6f/0x2ead
> [ 4.598602] start_kernel+0x5aa/0x6d5
> [ 4.599017] secondary_startup_64_no_verify+0xe0/0xeb
> [ 4.599562]
[...]
> [ 4.608427] [<ffffffff814eb3b4>] queued_read_lock_hardirq_RE_Er+0xf4/0x170
> [ 4.609113] stacktrace:
> [ 4.609366] _raw_write_lock+0xc3/0xd0
> [ 4.609788] queued_read_lock_hardirq_RE_Er+0xf4/0x170
> [ 4.610371] dotest+0x30/0x7bc
> [ 4.610730] locking_selftest+0x2c41/0x2ead
> [ 4.611195] start_kernel+0x5aa/0x6d5
> [ 4.611615] secondary_startup_64_no_verify+0xe0/0xeb
> [ 4.612164]
> [ 4.612325] [W] lock(&rwlock_A:0):
> [ 4.612671] [<ffffffff814eb3c0>] queued_read_lock_hardirq_RE_Er+0x100/0x170
> [ 4.613369] stacktrace:
> [ 4.613622] _raw_read_lock+0xac/0x110
> [ 4.614047] queued_read_lock_hardirq_RE_Er+0x100/0x170
> [ 4.614652] dotest+0x30/0x7bc
> [ 4.615007] locking_selftest+0x2c41/0x2ead
> [ 4.615468] start_kernel+0x5aa/0x6d5
> [ 4.615879] secondary_startup_64_no_verify+0xe0/0xeb
> [ 4.616607]
[...]

> As I told you, DEPT treats a queued lock as a normal type lock, no
> matter whether it's a read lock. That's why it prints just
> 'lock(&rwlock_A:0)' instead of 'read_lock(&rwlock_A:0)'. If needed, I'm
> gonna change the format.
>
> I checked the selftest code and found, LOCK(B) is transformed like:
>
> LOCK(B) -> WL(B) -> write_lock(&rwlock_B)
>
> That's why '&rwlock_B' is printed instead of just 'B', JFYI.
>

Nah, you output shows that you've run at least both function

queued_read_lock_hardirq_RE_Er()
queued_read_lock_hardirq_ER_rE()

but if you apply my diff

https://lore.kernel.org/lkml/[email protected]/

you should only run

queued_read_lock_hardirq_RE_Er()

one test.

One of the reason that DEPT "detect" this is that DEPT doesn't reset
between tests, so old dependencies from previous run get carried over.


> Plus, for your information, you should turn on CONFIG_DEPT to use it.
>

Yes I turn that config on.

Regards,
Boqun

> Byungchul

2023-01-21 04:10:24

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

On Fri, Jan 20, 2023 at 07:44:01PM -0800, Boqun Feng wrote:
> On Sat, Jan 21, 2023 at 12:28:14PM +0900, Byungchul Park wrote:
> > On Thu, Jan 19, 2023 at 07:07:59PM -0800, Boqun Feng wrote:
> > > On Thu, Jan 19, 2023 at 06:23:49PM -0800, Boqun Feng wrote:
> > > > On Fri, Jan 20, 2023 at 10:51:45AM +0900, Byungchul Park wrote:
> >
> > [...]
> >
> > > > > T0 T1 T2
> > > > > -- -- --
> > > > > unfair_read_lock(A);
> > > > > write_lock(B);
> > > > > write_lock(A);
> > > > > write_lock(B);
> > > > > fair_read_lock(A);
> > > > > write_unlock(B);
> > > > > read_unlock(A);
> > > > > read_unlock(A);
> > > > > write_unlock(B);
> > > > > write_unlock(A);
> > > > >
> > > > > T0: read_unlock(A) cannot happen if write_lock(B) is stuck by a B owner
> > > > > not doing either write_unlock(B) or read_unlock(B). In other words:
> > > > >
> > > > > 1. read_unlock(A) happening depends on write_unlock(B) happening.
> > > > > 2. read_unlock(A) happening depends on read_unlock(B) happening.
> > > > >
> > > > > T1: write_unlock(B) cannot happen if fair_read_lock(A) is stuck by a A
> > > > > owner not doing either write_unlock(A) or read_unlock(A). In other
> > > > > words:
> > > > >
> > > > > 3. write_unlock(B) happening depends on write_unlock(A) happening.
> > > > > 4. write_unlock(B) happening depends on read_unlock(A) happening.
> > > > >
> > > > > 1, 2, 3 and 4 give the following dependencies:
> > > > >
> > > > > 1. read_unlock(A) -> write_unlock(B)
> > > > > 2. read_unlock(A) -> read_unlock(B)
> > > > > 3. write_unlock(B) -> write_unlock(A)
> > > > > 4. write_unlock(B) -> read_unlock(A)
> > > > >
> > > > > With 1 and 4, there's a circular dependency so DEPT definitely report
> > > > > this as a problem.
> > > > >
> > > > > REMIND: DEPT focuses on waits and events.
> > > >
> > > > Do you have the test cases showing DEPT can detect this?
> > > >
> > >
> > > Just tried the following on your latest GitHub branch, I commented all
> > > but one deadlock case. Lockdep CAN detect it but DEPT CANNOT detect it.
> > > Feel free to double check.
> >
> > I tried the 'queued read lock' test cases with DEPT on. I can see DEPT
> > detect and report it. But yeah.. it's too verbose now. It's because DEPT
> > is not aware of the test environment so it's just working hard to report
> > every case.
> >
> > To make DEPT work with the selftest better, some works are needed. I
> > will work on it later or you please work on it.
> >
> > The corresponding report is the following.
> >
> [...]
> > [ 4.593037] context A's detail
> > [ 4.593351] ---------------------------------------------------
> > [ 4.593944] context A
> > [ 4.594182] [S] lock(&rwlock_A:0)
> > [ 4.594577] [W] lock(&rwlock_B:0)
> > [ 4.594952] [E] unlock(&rwlock_A:0)
> > [ 4.595341]
> > [ 4.595501] [S] lock(&rwlock_A:0):
> > [ 4.595848] [<ffffffff814eb244>] queued_read_lock_hardirq_ER_rE+0xf4/0x170
> > [ 4.596547] stacktrace:
> > [ 4.596797] _raw_read_lock+0xcf/0x110
> > [ 4.597215] queued_read_lock_hardirq_ER_rE+0xf4/0x170
> > [ 4.597766] dotest+0x30/0x7bc
> > [ 4.598118] locking_selftest+0x2c6f/0x2ead
> > [ 4.598602] start_kernel+0x5aa/0x6d5
> > [ 4.599017] secondary_startup_64_no_verify+0xe0/0xeb
> > [ 4.599562]
> [...]
> > [ 4.608427] [<ffffffff814eb3b4>] queued_read_lock_hardirq_RE_Er+0xf4/0x170
> > [ 4.609113] stacktrace:
> > [ 4.609366] _raw_write_lock+0xc3/0xd0
> > [ 4.609788] queued_read_lock_hardirq_RE_Er+0xf4/0x170
> > [ 4.610371] dotest+0x30/0x7bc
> > [ 4.610730] locking_selftest+0x2c41/0x2ead
> > [ 4.611195] start_kernel+0x5aa/0x6d5
> > [ 4.611615] secondary_startup_64_no_verify+0xe0/0xeb
> > [ 4.612164]
> > [ 4.612325] [W] lock(&rwlock_A:0):
> > [ 4.612671] [<ffffffff814eb3c0>] queued_read_lock_hardirq_RE_Er+0x100/0x170
> > [ 4.613369] stacktrace:
> > [ 4.613622] _raw_read_lock+0xac/0x110
> > [ 4.614047] queued_read_lock_hardirq_RE_Er+0x100/0x170
> > [ 4.614652] dotest+0x30/0x7bc
> > [ 4.615007] locking_selftest+0x2c41/0x2ead
> > [ 4.615468] start_kernel+0x5aa/0x6d5
> > [ 4.615879] secondary_startup_64_no_verify+0xe0/0xeb
> > [ 4.616607]
> [...]
>
> > As I told you, DEPT treats a queued lock as a normal type lock, no
> > matter whether it's a read lock. That's why it prints just
> > 'lock(&rwlock_A:0)' instead of 'read_lock(&rwlock_A:0)'. If needed, I'm
> > gonna change the format.
> >
> > I checked the selftest code and found, LOCK(B) is transformed like:
> >
> > LOCK(B) -> WL(B) -> write_lock(&rwlock_B)
> >
> > That's why '&rwlock_B' is printed instead of just 'B', JFYI.
> >
>
> Nah, you output shows that you've run at least both function
>
> queued_read_lock_hardirq_RE_Er()
> queued_read_lock_hardirq_ER_rE()
>
> but if you apply my diff
>
> https://lore.kernel.org/lkml/[email protected]/
>
> you should only run
>
> queued_read_lock_hardirq_RE_Er()
>
> one test.
>
> One of the reason that DEPT "detect" this is that DEPT doesn't reset
> between tests, so old dependencies from previous run get carried over.
>
>
> > Plus, for your information, you should turn on CONFIG_DEPT to use it.
> >
>
> Yes I turn that config on.

FWIW, the branch is at:

https://github.com/fbq/linux-rust/tree/dept-test

.config attached, and be sure to run with kernel command line
"debug_locks_verbose=2" to see lockdep warning:

[ 0.000000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.000000] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.000000] ... MAX_LOCK_DEPTH: 48
[ 0.000000] ... MAX_LOCKDEP_KEYS: 8192
[ 0.000000] ... CLASSHASH_SIZE: 4096
[ 0.000000] ... MAX_LOCKDEP_ENTRIES: 32768
[ 0.000000] ... MAX_LOCKDEP_CHAINS: 65536
[ 0.000000] ... CHAINHASH_SIZE: 32768
[ 0.000000] memory used by lock dependency info: 6365 kB
[ 0.000000] memory used for stack traces: 4224 kB
[ 0.000000] per task-struct memory footprint: 1920 bytes
[ 0.000000] DEPendency Tracker: Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
[ 0.000000] ... DEPT_MAX_STACK_ENTRY: 16
[ 0.000000] ... DEPT_MAX_WAIT_HIST : 64
[ 0.000000] ... DEPT_MAX_ECXT_HELD : 48
[ 0.000000] ... DEPT_MAX_SUBCLASSES : 16
[ 0.000000] ... memory initially used by dep: 1664 KB
[ 0.000000] ... memory initially used by class: 1472 KB
[ 0.000000] ... memory initially used by stack: 9216 KB
[ 0.000000] ... memory initially used by ecxt: 1664 KB
[ 0.000000] ... memory initially used by wait: 2816 KB
[ 0.000000] ... hash list head used by dep: 32 KB
[ 0.000000] ... hash list head used by class: 32 KB
[ 0.000000] ... total memory initially used by objects and hashs: 8480 KB
[ 0.000000] ... per task memory footprint: 2720 bytes
[ 0.000000] ------------------------
[ 0.000000] | Locking API testsuite:
[ 0.000000] ----------------------------------------------------------------------------
[ 0.000000] | spin |wlock |rlock |mutex | wsem | rsem |rtmutex
[ 0.000000] --------------------------------------------------------------------------
[ 0.000000] --------------------------------------------------------------------------
[ 0.000000] | queued read lock tests |
[ 0.000000] ---------------------------
[ 0.000000] hardirq read-lock/lock-read:
[ 0.000000]
[ 0.000000] ======================================================
[ 0.000000] WARNING: possible circular locking dependency detected
[ 0.000000] 6.2.0-rc2+ #15 Not tainted
[ 0.000000] ------------------------------------------------------
[ 0.000000] swapper/0/0 is trying to acquire lock:
[ 0.000000] ffffffffb6600278 (rwlock_A){.-..}-{2:2}, at: locking_selftest+0x20f/0xa64
[ 0.000000]
[ 0.000000] but task is already holding lock:
[ 0.000000] ffffffffb66001f8 (rwlock_B){-...}-{2:2}, at: locking_selftest+0x203/0xa64
[ 0.000000]
[ 0.000000] which lock already depends on the new lock.
[ 0.000000]
[ 0.000000]
[ 0.000000] the existing dependency chain (in reverse order) is:
[ 0.000000]
[ 0.000000] -> #1 (rwlock_B){-...}-{2:2}:
[ 0.000000] _raw_write_lock+0x7e/0xd0
[ 0.000000] locking_selftest+0x191/0xa64
[ 0.000000] start_kernel+0x5b0/0x6e9
[ 0.000000] secondary_startup_64_no_verify+0xe0/0xeb
[ 0.000000]
[ 0.000000] -> #0 (rwlock_A){.-..}-{2:2}:
[ 0.000000] __lock_acquire+0x149f/0x2620
[ 0.000000] lock_acquire+0xdb/0x300
[ 0.000000] _raw_read_lock+0xf9/0x110
[ 0.000000] locking_selftest+0x20f/0xa64
[ 0.000000] start_kernel+0x5b0/0x6e9
[ 0.000000] secondary_startup_64_no_verify+0xe0/0xeb
[ 0.000000]
[ 0.000000] other info that might help us debug this:
[ 0.000000]
[ 0.000000] Possible unsafe locking scenario:
[ 0.000000]
[ 0.000000] CPU0 CPU1
[ 0.000000] ---- ----
[ 0.000000] lock(rwlock_B);
[ 0.000000] lock(rwlock_A);
[ 0.000000] lock(rwlock_B);
[ 0.000000] lock(rwlock_A);
[ 0.000000]
[ 0.000000] *** DEADLOCK ***
[ 0.000000]
[ 0.000000] 1 lock held by swapper/0/0:
[ 0.000000] #0: ffffffffb66001f8 (rwlock_B){-...}-{2:2}, at: locking_selftest+0x203/0xa64
[ 0.000000]
[ 0.000000] stack backtrace:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-rc2+ #15
[ 0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.1-1-1 04/01/2014
[ 0.000000] Call Trace:
[ 0.000000] <TASK>
[ 0.000000] dump_stack_lvl+0x66/0x89
[ 0.000000] check_noncircular+0x102/0x120
[ 0.000000] __lock_acquire+0x149f/0x2620
[ 0.000000] ? dept_enter+0x6b/0xe0
[ 0.000000] lock_acquire+0xdb/0x300
[ 0.000000] ? locking_selftest+0x20f/0xa64
[ 0.000000] ? dept_ecxt_enter+0x139/0x1a0
[ 0.000000] _raw_read_lock+0xf9/0x110
[ 0.000000] ? locking_selftest+0x20f/0xa64
[ 0.000000] locking_selftest+0x20f/0xa64
[ 0.000000] start_kernel+0x5b0/0x6e9
[ 0.000000] secondary_startup_64_no_verify+0xe0/0xeb
[ 0.000000] </TASK>
[ 0.000000] ok | lockclass mask: 2, debug_locks: 0, expected: 0
[ 0.000000]
[ 0.000000] -------------------------------------------------------
[ 0.000000] Good, all 1 testcases passed! |
[ 0.000000] ---------------------------------
[ 0.000000] ACPI: Core revision 20221020
[ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[ 0.001000] APIC: Switch to symmetric I/O mode setup
[ 0.002000] Switched APIC routing to physical flat.
[ 0.003000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.011000] tsc: Unable to calibrate against PIT
[ 0.012000] tsc: using HPET reference calibration
[ 0.013000] tsc: Detected 2394.216 MHz processor

Regards,
Boqun


Attachments:
(No filename) (11.20 kB)
.config (144.38 kB)
Download all attachments