I'm happy to see that DEPT reports a real problem in practice. See:
https://lore.kernel.org/lkml/[email protected]/#t
https://lore.kernel.org/lkml/[email protected]/
I added a document describing DEPT, that would help you understand what
DEPT is and how DEPT works. You can use DEPT just with CONFIG_DEPT on
and by checking dmesg in runtime.
---
Hi Linus and folks,
I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock acquisition order to try to cover
all synchonization machanisms.
Benifit:
0. Works with all lock primitives.
1. Works with wait_for_completion()/complete().
2. Works with PG_locked.
3. Works with swait/wakeup.
4. Works with waitqueue.
5. Works with wait_bit.
6. Multiple reports are allowed.
7. Deduplication control on multiple reports.
8. Withstand false positives thanks to 7.
9. Easy to tag any wait/event.
Future work:
0. To make it more stable.
1. To separates Dept from Lockdep.
2. To improves performance in terms of time and space.
3. To use Dept as a dependency engine for Lockdep.
4. To add any missing tags of wait/event in the kernel.
5. To deduplicate stack trace.
How to interpret reports:
[S] the start of the event context or the requestor having asked
the event context to go
[W] the wait disturbing the event from triggering
[E] the event that cannot be reachable
Thanks,
Byungchul
---
Changes from v11:
1. Add 'Dept' documentation describing the concept of Dept.
2. Rewrite the commit messages of the following commits for
using weaker lockdep annotation, for better description.
fs/jbd2: Use a weaker annotation in journal handling
cpu/hotplug: Use a weaker annotation in AP thread
(feedbacked by Thomas Gleixner)
Changes from v10:
1. Fix noinstr warning when building kernel source.
2. Dept has been reporting some false positives due to the folio
lock's unfairness. Reflect it and make Dept work based on
dept annotaions instead of just wait and wake up primitives.
3. Remove the support for PG_writeback while working on 2. I
will add the support later if needed.
4. Dept didn't print stacktrace for [S] if the participant of a
deadlock is not lock mechanism but general wait and event.
However, it made hard to interpret the report in that case.
So add support to print stacktrace of the requestor who asked
the event context to run - usually a waiter of the event does
it just before going to wait state.
5. Give up tracking raw_local_irq_{disable,enable}() since it
totally messed up dept's irq tracking. So make it work in the
same way as Lockdep does. I will consider it once any false
positives by those are observed again.
6. Change the manual rwsem_acquire_read(->j_trans_commit_map)
annotation in fs/jbd2/transaction.c to the try version so
that it works as much as it exactly needs.
7. Remove unnecessary 'inline' keyword in dept.c and add
'__maybe_unused' to a needed place.
Changes from v9:
1. Fix a bug. SDT tracking didn't work well because of my big
mistake that I should've used waiter's map to indentify its
class but it had been working with waker's one. FYI,
PG_locked and PG_writeback weren't affected. They still
worked well. (reported by YoungJun)
Changes from v8:
1. Fix build error by adding EXPORT_SYMBOL(PG_locked_map) and
EXPORT_SYMBOL(PG_writeback_map) for kernel module build -
appologize for that. (reported by kernel test robot)
2. Fix build error by removing header file's circular dependency
that was caused by "atomic.h", "kernel.h" and "irqflags.h",
which I introduced - appolgize for that. (reported by kernel
test robot)
Changes from v7:
1. Fix a bug that cannot track rwlock dependency properly,
introduced in v7. (reported by Boqun and lockdep selftest)
2. Track wait/event of PG_{locked,writeback} more aggressively
assuming that when a bit of PG_{locked,writeback} is cleared
there might be waits on the bit. (reported by Linus, Hillf
and syzbot)
3. Fix and clean bad style code e.i. unnecessarily introduced
a randome pattern and so on. (pointed out by Linux)
4. Clean code for applying DEPT to wait_for_completion().
Changes from v6:
1. Tie to task scheduler code to track sleep and try_to_wake_up()
assuming sleeps cause waits, try_to_wake_up()s would be the
events that those are waiting for, of course with proper DEPT
annotations, sdt_might_sleep_weak(), sdt_might_sleep_strong()
and so on. For these cases, class is classified at sleep
entrance rather than the synchronization initialization code.
Which would extremely reduce false alarms.
2. Remove the DEPT associated instance in each page struct for
tracking dependencies by PG_locked and PG_writeback thanks to
the 1. work above.
3. Introduce CONFIG_DEPT_AGGRESIVE_TIMEOUT_WAIT to suppress
reports that waits with timeout set are involved, for those
who don't like verbose reporting.
4. Add a mechanism to refill the internal memory pools on
running out so that DEPT could keep working as long as free
memory is available in the system.
5. Re-enable tracking hashed-waitqueue wait. That's going to no
longer generate false positives because class is classified
at sleep entrance rather than the waitqueue initailization.
6. Refactor to make it easier to port onto each new version of
the kernel.
7. Apply DEPT to dma fence.
8. Do trivial optimizaitions.
Changes from v5:
1. Use just pr_warn_once() rather than WARN_ONCE() on the lack
of internal resources because WARN_*() printing stacktrace is
too much for informing the lack. (feedback from Ted, Hyeonggon)
2. Fix trivial bugs like missing initializing a struct before
using it.
3. Assign a different class per task when handling onstack
variables for waitqueue or the like. Which makes Dept
distinguish between onstack variables of different tasks so
as to prevent false positives. (reported by Hyeonggon)
4. Make Dept aware of even raw_local_irq_*() to prevent false
positives. (reported by Hyeonggon)
5. Don't consider dependencies between the events that might be
triggered within __schedule() and the waits that requires
__schedule(), real ones. (reported by Hyeonggon)
6. Unstage the staged wait that has prepare_to_wait_event()'ed
*and* yet to get to __schedule(), if we encounter __schedule()
in-between for another sleep, which is possible if e.g. a
mutex_lock() exists in 'condition' of ___wait_event().
7. Turn on CONFIG_PROVE_LOCKING when CONFIG_DEPT is on, to rely
on the hardirq and softirq entrance tracing to make Dept more
portable for now.
Changes from v4:
1. Fix some bugs that produce false alarms.
2. Distinguish each syscall context from another *for arm64*.
3. Make it not warn it but just print it in case Dept ring
buffer gets exhausted. (feedback from Hyeonggon)
4. Explicitely describe "EXPERIMENTAL" and "Dept might produce
false positive reports" in Kconfig. (feedback from Ted)
Changes from v3:
1. Dept shouldn't create dependencies between different depths
of a class that were indicated by *_lock_nested(). Dept
normally doesn't but it does once another lock class comes
in. So fixed it. (feedback from Hyeonggon)
2. Dept considered a wait as a real wait once getting to
__schedule() even if it has been set to TASK_RUNNING by wake
up sources in advance. Fixed it so that Dept doesn't consider
the case as a real wait. (feedback from Jan Kara)
3. Stop tracking dependencies with a map once the event
associated with the map has been handled. Dept will start to
work with the map again, on the next sleep.
Changes from v2:
1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
reporting a lot of false positives, which is my fault.
Wait/event for bit_wait_table[] should've been tagged in a
higher layer for better work, which is a future work.
(feedback from Jan Kara)
2. Disable Dept on crypto_larval's completion to prevent a false
positive.
Changes from v1:
1. Fix coding style and typo. (feedback from Steven)
2. Distinguish each work context from another in workqueue.
3. Skip checking lock acquisition with nest_lock, which is about
correct lock usage that should be checked by Lockdep.
Changes from RFC(v0):
1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
(feedback from Linus and Matthew)
2. Use try version at lockdep_acquire_cpus_lock() annotation.
3. Distinguish each syscall context from another.
Byungchul Park (27):
llist: Move llist_{head,node} definition to types.h
dept: Implement Dept(Dependency Tracker)
dept: Add single event dependency tracker APIs
dept: Add lock dependency tracker APIs
dept: Tie to Lockdep and IRQ tracing
dept: Add proc knobs to show stats and dependency graph
dept: Apply sdt_might_sleep_{start,end}() to
wait_for_completion()/complete()
dept: Apply sdt_might_sleep_{start,end}() to swait
dept: Apply sdt_might_sleep_{start,end}() to waitqueue wait
dept: Apply sdt_might_sleep_{start,end}() to hashed-waitqueue wait
dept: Distinguish each syscall context from another
dept: Distinguish each work from another
dept: Add a mechanism to refill the internal memory pools on running
out
cpu/hotplug: Use a weaker annotation in AP thread
dept: Apply sdt_might_sleep_{start,end}() to dma fence wait
dept: Track timeout waits separately with a new Kconfig
dept: Apply timeout consideration to wait_for_completion()/complete()
dept: Apply timeout consideration to swait
dept: Apply timeout consideration to waitqueue wait
dept: Apply timeout consideration to hashed-waitqueue wait
dept: Apply timeout consideration to dma fence wait
dept: Record the latest one out of consecutive waits of the same class
dept: Make Dept able to work with an external wgen
dept: Track PG_locked with dept
dept: Print event context requestor's stacktrace on report
fs/jbd2: Use a weaker annotation in journal handling
dept: Add 'Dept' documentation
Documentation/dependency/dept.txt | 283 +++
arch/arm64/kernel/syscall.c | 3 +
arch/x86/entry/common.c | 4 +
drivers/dma-buf/dma-fence.c | 5 +
fs/jbd2/transaction.c | 2 +-
include/linux/completion.h | 30 +-
include/linux/dept.h | 617 ++++++
include/linux/dept_ldt.h | 77 +
include/linux/dept_sdt.h | 66 +
include/linux/hardirq.h | 3 +
include/linux/irqflags.h | 7 +-
include/linux/llist.h | 8 -
include/linux/local_lock_internal.h | 1 +
include/linux/lockdep.h | 102 +-
include/linux/lockdep_types.h | 3 +
include/linux/mm_types.h | 2 +
include/linux/mutex.h | 1 +
include/linux/page-flags.h | 105 +-
include/linux/pagemap.h | 7 +-
include/linux/percpu-rwsem.h | 2 +-
include/linux/rtmutex.h | 1 +
include/linux/rwlock_types.h | 1 +
include/linux/rwsem.h | 1 +
include/linux/sched.h | 3 +
include/linux/seqlock.h | 2 +-
include/linux/spinlock_types_raw.h | 3 +
include/linux/srcu.h | 2 +-
include/linux/swait.h | 3 +
include/linux/types.h | 8 +
include/linux/wait.h | 3 +
include/linux/wait_bit.h | 3 +
init/init_task.c | 2 +
init/main.c | 2 +
kernel/Makefile | 1 +
kernel/cpu.c | 2 +-
kernel/dependency/Makefile | 4 +
kernel/dependency/dept.c | 3175 +++++++++++++++++++++++++++
kernel/dependency/dept_hash.h | 10 +
kernel/dependency/dept_internal.h | 26 +
kernel/dependency/dept_object.h | 13 +
kernel/dependency/dept_proc.c | 93 +
kernel/exit.c | 1 +
kernel/fork.c | 2 +
kernel/locking/lockdep.c | 22 +
kernel/module/main.c | 4 +
kernel/sched/completion.c | 2 +-
kernel/sched/core.c | 10 +
kernel/workqueue.c | 3 +
lib/Kconfig.debug | 37 +
lib/locking-selftest.c | 2 +
mm/filemap.c | 26 +
mm/mm_init.c | 2 +
52 files changed, 4743 insertions(+), 54 deletions(-)
create mode 100644 Documentation/dependency/dept.txt
create mode 100644 include/linux/dept.h
create mode 100644 include/linux/dept_ldt.h
create mode 100644 include/linux/dept_sdt.h
create mode 100644 kernel/dependency/Makefile
create mode 100644 kernel/dependency/dept.c
create mode 100644 kernel/dependency/dept_hash.h
create mode 100644 kernel/dependency/dept_internal.h
create mode 100644 kernel/dependency/dept_object.h
create mode 100644 kernel/dependency/dept_proc.c
base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
--
2.17.1
How to place Dept this way looks so ugly. But it's inevitable for now.
The way should be enhanced gradually.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/irqflags.h | 7 +-
include/linux/local_lock_internal.h | 1 +
include/linux/lockdep.h | 102 ++++++++++++++++++++++------
include/linux/lockdep_types.h | 3 +
include/linux/mutex.h | 1 +
include/linux/percpu-rwsem.h | 2 +-
include/linux/rtmutex.h | 1 +
include/linux/rwlock_types.h | 1 +
include/linux/rwsem.h | 1 +
include/linux/seqlock.h | 2 +-
include/linux/spinlock_types_raw.h | 3 +
include/linux/srcu.h | 2 +-
kernel/dependency/dept.c | 8 +--
kernel/locking/lockdep.c | 22 ++++++
14 files changed, 127 insertions(+), 29 deletions(-)
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 2b665c32f5fe..672dac1c3059 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -14,6 +14,7 @@
#include <linux/typecheck.h>
#include <linux/cleanup.h>
+#include <linux/dept.h>
#include <asm/irqflags.h>
#include <asm/percpu.h>
@@ -61,8 +62,10 @@ extern void trace_hardirqs_off(void);
# define lockdep_softirqs_enabled(p) ((p)->softirqs_enabled)
# define lockdep_hardirq_enter() \
do { \
- if (__this_cpu_inc_return(hardirq_context) == 1)\
+ if (__this_cpu_inc_return(hardirq_context) == 1) { \
current->hardirq_threaded = 0; \
+ dept_hardirq_enter(); \
+ } \
} while (0)
# define lockdep_hardirq_threaded() \
do { \
@@ -137,6 +140,8 @@ do { \
# define lockdep_softirq_enter() \
do { \
current->softirq_context++; \
+ if (current->softirq_context == 1) \
+ dept_softirq_enter(); \
} while (0)
# define lockdep_softirq_exit() \
do { \
diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h
index 975e33b793a7..39f67788fd95 100644
--- a/include/linux/local_lock_internal.h
+++ b/include/linux/local_lock_internal.h
@@ -21,6 +21,7 @@ typedef struct {
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
.lock_type = LD_LOCK_PERCPU, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}, \
.owner = NULL,
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index dc2844b071c2..8825f535d36d 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -12,6 +12,7 @@
#include <linux/lockdep_types.h>
#include <linux/smp.h>
+#include <linux/dept_ldt.h>
#include <asm/percpu.h>
struct task_struct;
@@ -39,6 +40,8 @@ static inline void lockdep_copy_map(struct lockdep_map *to,
*/
for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
to->class_cache[i] = NULL;
+
+ dept_map_copy(&to->dmap, &from->dmap);
}
/*
@@ -466,7 +469,8 @@ enum xhlock_context_t {
* Note that _name must not be NULL.
*/
#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
- { .name = (_name), .key = (void *)(_key), }
+ { .name = (_name), .key = (void *)(_key), \
+ .dmap = DEPT_MAP_INITIALIZER(_name, _key) }
static inline void lockdep_invariant_state(bool force) {}
static inline void lockdep_free_task(struct task_struct *task) {}
@@ -548,33 +552,89 @@ extern bool read_lock_is_recursive(void);
#define lock_acquire_shared(l, s, t, n, i) lock_acquire(l, s, t, 1, 1, n, i)
#define lock_acquire_shared_recursive(l, s, t, n, i) lock_acquire(l, s, t, 2, 1, n, i)
-#define spin_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define spin_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
-#define spin_release(l, i) lock_release(l, i)
-
-#define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
+#define spin_acquire(l, s, t, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
+#define spin_acquire_nest(l, s, t, n, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, n, i); \
+ lock_acquire_exclusive(l, s, t, n, i); \
+} while (0)
+#define spin_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
+#define rwlock_acquire(l, s, t, i) \
+do { \
+ ldt_wlock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
#define rwlock_acquire_read(l, s, t, i) \
do { \
+ ldt_rlock(&(l)->dmap, s, t, NULL, i, !read_lock_is_recursive());\
if (read_lock_is_recursive()) \
lock_acquire_shared_recursive(l, s, t, NULL, i); \
else \
lock_acquire_shared(l, s, t, NULL, i); \
} while (0)
-
-#define rwlock_release(l, i) lock_release(l, i)
-
-#define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define seqcount_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i)
-#define seqcount_release(l, i) lock_release(l, i)
-
-#define mutex_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define mutex_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
-#define mutex_release(l, i) lock_release(l, i)
-
-#define rwsem_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwsem_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
-#define rwsem_acquire_read(l, s, t, i) lock_acquire_shared(l, s, t, NULL, i)
-#define rwsem_release(l, i) lock_release(l, i)
+#define rwlock_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
+#define seqcount_acquire(l, s, t, i) \
+do { \
+ ldt_wlock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
+#define seqcount_acquire_read(l, s, t, i) \
+do { \
+ ldt_rlock(&(l)->dmap, s, t, NULL, i, false); \
+ lock_acquire_shared_recursive(l, s, t, NULL, i); \
+} while (0)
+#define seqcount_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
+#define mutex_acquire(l, s, t, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
+#define mutex_acquire_nest(l, s, t, n, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, n, i); \
+ lock_acquire_exclusive(l, s, t, n, i); \
+} while (0)
+#define mutex_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
+#define rwsem_acquire(l, s, t, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+} while (0)
+#define rwsem_acquire_nest(l, s, t, n, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, n, i); \
+ lock_acquire_exclusive(l, s, t, n, i); \
+} while (0)
+#define rwsem_acquire_read(l, s, t, i) \
+do { \
+ ldt_lock(&(l)->dmap, s, t, NULL, i); \
+ lock_acquire_shared(l, s, t, NULL, i); \
+} while (0)
+#define rwsem_release(l, i) \
+do { \
+ ldt_unlock(&(l)->dmap, i); \
+ lock_release(l, i); \
+} while (0)
#define lock_map_acquire(l) lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_)
#define lock_map_acquire_try(l) lock_acquire_exclusive(l, 0, 1, NULL, _THIS_IP_)
diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index 2ebc323d345a..aecd65836b2c 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -11,6 +11,7 @@
#define __LINUX_LOCKDEP_TYPES_H
#include <linux/types.h>
+#include <linux/dept.h>
#define MAX_LOCKDEP_SUBCLASSES 8UL
@@ -77,6 +78,7 @@ struct lock_class_key {
struct hlist_node hash_entry;
struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES];
};
+ struct dept_key dkey;
};
extern struct lock_class_key __lockdep_no_validate__;
@@ -194,6 +196,7 @@ struct lockdep_map {
int cpu;
unsigned long ip;
#endif
+ struct dept_map dmap;
};
struct pin_cookie { unsigned int val; };
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index a33aa9eb9fc3..04c41faace85 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -26,6 +26,7 @@
, .dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}
#else
# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 36b942b67b7d..e871aca04645 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -21,7 +21,7 @@ struct percpu_rw_semaphore {
};
#ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname },
+#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname) .dep_map = { .name = #lockname, .dmap = DEPT_MAP_INITIALIZER(lockname, NULL) },
#else
#define __PERCPU_RWSEM_DEP_MAP_INIT(lockname)
#endif
diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 7d049883a08a..35889ac5eeae 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -81,6 +81,7 @@ do { \
.dep_map = { \
.name = #mutexname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ .dmap = DEPT_MAP_INITIALIZER(mutexname, NULL),\
}
#else
#define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)
diff --git a/include/linux/rwlock_types.h b/include/linux/rwlock_types.h
index 1948442e7750..6e58dfc84997 100644
--- a/include/linux/rwlock_types.h
+++ b/include/linux/rwlock_types.h
@@ -10,6 +10,7 @@
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL), \
}
#else
# define RW_DEP_MAP_INIT(lockname)
diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h
index 1dd530ce8b45..1fa391e7770a 100644
--- a/include/linux/rwsem.h
+++ b/include/linux/rwsem.h
@@ -22,6 +22,7 @@
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
},
#else
# define __RWSEM_DEP_MAP_INIT(lockname)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e92f9d5577ba..dee83ab183e4 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -81,7 +81,7 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define SEQCOUNT_DEP_MAP_INIT(lockname) \
- .dep_map = { .name = #lockname }
+ .dep_map = { .name = #lockname, .dmap = DEPT_MAP_INITIALIZER(lockname, NULL) }
/**
* seqcount_init() - runtime initializer for seqcount_t
diff --git a/include/linux/spinlock_types_raw.h b/include/linux/spinlock_types_raw.h
index 91cb36b65a17..3dcc551ded25 100644
--- a/include/linux/spinlock_types_raw.h
+++ b/include/linux/spinlock_types_raw.h
@@ -31,11 +31,13 @@ typedef struct raw_spinlock {
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_SPIN, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}
# define SPIN_DEP_MAP_INIT(lockname) \
.dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}
# define LOCAL_SPIN_DEP_MAP_INIT(lockname) \
@@ -43,6 +45,7 @@ typedef struct raw_spinlock {
.name = #lockname, \
.wait_type_inner = LD_WAIT_CONFIG, \
.lock_type = LD_LOCK_PERCPU, \
+ .dmap = DEPT_MAP_INITIALIZER(lockname, NULL),\
}
#else
# define RAW_SPIN_DEP_MAP_INIT(lockname)
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 127ef3b2e607..f6b8266a4bfd 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -35,7 +35,7 @@ int __init_srcu_struct(struct srcu_struct *ssp, const char *name,
__init_srcu_struct((ssp), #ssp, &__srcu_key); \
})
-#define __SRCU_DEP_MAP_INIT(srcu_name) .dep_map = { .name = #srcu_name },
+#define __SRCU_DEP_MAP_INIT(srcu_name) .dep_map = { .name = #srcu_name, .dmap = DEPT_MAP_INITIALIZER(srcu_name, NULL) },
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
int init_srcu_struct(struct srcu_struct *ssp);
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index a3e774479f94..7e12e46dc4b7 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -244,10 +244,10 @@ static bool dept_working(void)
* Even k == NULL is considered as a valid key because it would use
* &->map_key as the key in that case.
*/
-struct dept_key __dept_no_validate__;
+extern struct lock_class_key __lockdep_no_validate__;
static bool valid_key(struct dept_key *k)
{
- return &__dept_no_validate__ != k;
+ return &__lockdep_no_validate__.dkey != k;
}
/*
@@ -1936,7 +1936,7 @@ void dept_softirqs_off(void)
dept_task()->softirqs_enabled = false;
}
-void dept_hardirqs_off(void)
+void noinstr dept_hardirqs_off(void)
{
/*
* Assumes that it's called with IRQ disabled so that accessing
@@ -1958,7 +1958,7 @@ void dept_softirq_enter(void)
/*
* Ensure it's the outmost hardirq context.
*/
-void dept_hardirq_enter(void)
+void noinstr dept_hardirq_enter(void)
{
struct dept_task *dt = dept_task();
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 151bd3de5936..e27cf9d17163 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1215,6 +1215,8 @@ void lockdep_register_key(struct lock_class_key *key)
struct lock_class_key *k;
unsigned long flags;
+ dept_key_init(&key->dkey);
+
if (WARN_ON_ONCE(static_obj(key)))
return;
hash_head = keyhashentry(key);
@@ -4310,6 +4312,8 @@ static void __trace_hardirqs_on_caller(void)
*/
void lockdep_hardirqs_on_prepare(void)
{
+ dept_hardirqs_on();
+
if (unlikely(!debug_locks))
return;
@@ -4430,6 +4434,8 @@ EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);
*/
void noinstr lockdep_hardirqs_off(unsigned long ip)
{
+ dept_hardirqs_off();
+
if (unlikely(!debug_locks))
return;
@@ -4474,6 +4480,8 @@ void lockdep_softirqs_on(unsigned long ip)
{
struct irqtrace_events *trace = ¤t->irqtrace;
+ dept_softirqs_on_ip(ip);
+
if (unlikely(!lockdep_enabled()))
return;
@@ -4512,6 +4520,8 @@ void lockdep_softirqs_on(unsigned long ip)
*/
void lockdep_softirqs_off(unsigned long ip)
{
+ dept_softirqs_off();
+
if (unlikely(!lockdep_enabled()))
return;
@@ -4859,6 +4869,8 @@ void lockdep_init_map_type(struct lockdep_map *lock, const char *name,
{
int i;
+ ldt_init(&lock->dmap, &key->dkey, subclass, name);
+
for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
lock->class_cache[i] = NULL;
@@ -5630,6 +5642,12 @@ void lock_set_class(struct lockdep_map *lock, const char *name,
{
unsigned long flags;
+ /*
+ * dept_map_(re)init() might be called twice redundantly. But
+ * there's no choice as long as Dept relies on Lockdep.
+ */
+ ldt_set_class(&lock->dmap, name, &key->dkey, subclass, ip);
+
if (unlikely(!lockdep_enabled()))
return;
@@ -5647,6 +5665,8 @@ void lock_downgrade(struct lockdep_map *lock, unsigned long ip)
{
unsigned long flags;
+ ldt_downgrade(&lock->dmap, ip);
+
if (unlikely(!lockdep_enabled()))
return;
@@ -6447,6 +6467,8 @@ void lockdep_unregister_key(struct lock_class_key *key)
unsigned long flags;
bool found = false;
+ dept_key_destroy(&key->dkey);
+
might_sleep();
if (WARN_ON_ONCE(static_obj(key)))
--
2.17.1
Makes Dept able to track dependencies by swaits.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/swait.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/swait.h b/include/linux/swait.h
index d324419482a0..277ac74f61c3 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -6,6 +6,7 @@
#include <linux/stddef.h>
#include <linux/spinlock.h>
#include <linux/wait.h>
+#include <linux/dept_sdt.h>
#include <asm/current.h>
/*
@@ -161,6 +162,7 @@ extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
struct swait_queue __wait; \
long __ret = ret; \
\
+ sdt_might_sleep_start(NULL); \
INIT_LIST_HEAD(&__wait.task_list); \
for (;;) { \
long __int = prepare_to_swait_event(&wq, &__wait, state);\
@@ -176,6 +178,7 @@ extern void finish_swait(struct swait_queue_head *q, struct swait_queue *wait);
cmd; \
} \
finish_swait(&wq, &__wait); \
+ sdt_might_sleep_end(); \
__out: __ret; \
})
--
2.17.1
Makes Dept able to track dependencies by
wait_for_completion()/complete().
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/completion.h | 30 +++++++++++++++++++++++++-----
1 file changed, 25 insertions(+), 5 deletions(-)
diff --git a/include/linux/completion.h b/include/linux/completion.h
index fb2915676574..bd2c207481d6 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -10,6 +10,7 @@
*/
#include <linux/swait.h>
+#include <linux/dept_sdt.h>
/*
* struct completion - structure used to maintain state for a "completion"
@@ -26,14 +27,33 @@
struct completion {
unsigned int done;
struct swait_queue_head wait;
+ struct dept_map dmap;
};
+#define init_completion(x) \
+do { \
+ sdt_map_init(&(x)->dmap); \
+ __init_completion(x); \
+} while (0)
+
+/*
+ * XXX: No use cases for now. Fill the body when needed.
+ */
#define init_completion_map(x, m) init_completion(x)
-static inline void complete_acquire(struct completion *x) {}
-static inline void complete_release(struct completion *x) {}
+
+static inline void complete_acquire(struct completion *x)
+{
+ sdt_might_sleep_start(&x->dmap);
+}
+
+static inline void complete_release(struct completion *x)
+{
+ sdt_might_sleep_end();
+}
#define COMPLETION_INITIALIZER(work) \
- { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+ { 0, __SWAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+ .dmap = DEPT_MAP_INITIALIZER(work, NULL), }
#define COMPLETION_INITIALIZER_ONSTACK_MAP(work, map) \
(*({ init_completion_map(&(work), &(map)); &(work); }))
@@ -75,13 +95,13 @@ static inline void complete_release(struct completion *x) {}
#endif
/**
- * init_completion - Initialize a dynamically allocated completion
+ * __init_completion - Initialize a dynamically allocated completion
* @x: pointer to completion structure that is to be initialized
*
* This inline function will initialize a dynamically created completion
* structure.
*/
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x)
{
x->done = 0;
init_swait_queue_head(&x->wait);
--
2.17.1
It'd be useful to show Dept internal stats and dependency graph on
runtime via proc for better information. Introduced the knobs.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/dependency/Makefile | 1 +
kernel/dependency/dept.c | 24 +++-----
kernel/dependency/dept_internal.h | 26 +++++++++
kernel/dependency/dept_proc.c | 95 +++++++++++++++++++++++++++++++
4 files changed, 131 insertions(+), 15 deletions(-)
create mode 100644 kernel/dependency/dept_internal.h
create mode 100644 kernel/dependency/dept_proc.c
diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
index b5cfb8a03c0c..92f165400187 100644
--- a/kernel/dependency/Makefile
+++ b/kernel/dependency/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_DEPT) += dept.o
+obj-$(CONFIG_DEPT) += dept_proc.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 7e12e46dc4b7..19406093103e 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -74,6 +74,7 @@
#include <linux/dept.h>
#include <linux/utsname.h>
#include <linux/kernel.h>
+#include "dept_internal.h"
static int dept_stop;
static int dept_per_cpu_ready;
@@ -260,20 +261,13 @@ static bool valid_key(struct dept_key *k)
* have been freed will be placed.
*/
-enum object_t {
-#define OBJECT(id, nr) OBJECT_##id,
- #include "dept_object.h"
-#undef OBJECT
- OBJECT_NR,
-};
-
#define OBJECT(id, nr) \
static struct dept_##id spool_##id[nr]; \
static DEFINE_PER_CPU(struct llist_head, lpool_##id);
#include "dept_object.h"
#undef OBJECT
-static struct dept_pool pool[OBJECT_NR] = {
+struct dept_pool dept_pool[OBJECT_NR] = {
#define OBJECT(id, nr) { \
.name = #id, \
.obj_sz = sizeof(struct dept_##id), \
@@ -303,7 +297,7 @@ static void *from_pool(enum object_t t)
if (DEPT_WARN_ON(!irqs_disabled()))
return NULL;
- p = &pool[t];
+ p = &dept_pool[t];
/*
* Try local pool first.
@@ -338,7 +332,7 @@ static void *from_pool(enum object_t t)
static void to_pool(void *o, enum object_t t)
{
- struct dept_pool *p = &pool[t];
+ struct dept_pool *p = &dept_pool[t];
struct llist_head *h;
preempt_disable();
@@ -2092,7 +2086,7 @@ void dept_map_copy(struct dept_map *to, struct dept_map *from)
clean_classes_cache(&to->map_key);
}
-static LIST_HEAD(classes);
+LIST_HEAD(dept_classes);
static bool within(const void *addr, void *start, unsigned long size)
{
@@ -2124,7 +2118,7 @@ void dept_free_range(void *start, unsigned int sz)
while (unlikely(!dept_lock()))
cpu_relax();
- list_for_each_entry_safe(c, n, &classes, all_node) {
+ list_for_each_entry_safe(c, n, &dept_classes, all_node) {
if (!within((void *)c->key, start, sz) &&
!within(c->name, start, sz))
continue;
@@ -2200,7 +2194,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
c->sub_id = sub_id;
c->key = (unsigned long)(k->base + sub_id);
hash_add_class(c);
- list_add(&c->all_node, &classes);
+ list_add(&c->all_node, &dept_classes);
unlock:
dept_unlock();
caching:
@@ -2915,8 +2909,8 @@ static void migrate_per_cpu_pool(void)
struct llist_head *from;
struct llist_head *to;
- from = &pool[i].boot_pool;
- to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+ from = &dept_pool[i].boot_pool;
+ to = per_cpu_ptr(dept_pool[i].lpool, boot_cpu);
move_llist(to, from);
}
}
diff --git a/kernel/dependency/dept_internal.h b/kernel/dependency/dept_internal.h
new file mode 100644
index 000000000000..007c1eec6bab
--- /dev/null
+++ b/kernel/dependency/dept_internal.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept(DEPendency Tracker) - runtime dependency tracker internal header
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __DEPT_INTERNAL_H
+#define __DEPT_INTERNAL_H
+
+#ifdef CONFIG_DEPT
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+ #include "dept_object.h"
+#undef OBJECT
+ OBJECT_NR,
+};
+
+extern struct list_head dept_classes;
+extern struct dept_pool dept_pool[];
+
+#endif
+#endif /* __DEPT_INTERNAL_H */
diff --git a/kernel/dependency/dept_proc.c b/kernel/dependency/dept_proc.c
new file mode 100644
index 000000000000..7d61dfbc5865
--- /dev/null
+++ b/kernel/dependency/dept_proc.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Procfs knobs for Dept(DEPendency Tracker)
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (C) 2021 LG Electronics, Inc. , Byungchul Park
+ */
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/dept.h>
+#include "dept_internal.h"
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ return seq_list_next(v, &dept_classes, pos);
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ return seq_list_start_head(&dept_classes, *pos);
+}
+
+static void l_stop(struct seq_file *m, void *v)
+{
+}
+
+static int l_show(struct seq_file *m, void *v)
+{
+ struct dept_class *fc = list_entry(v, struct dept_class, all_node);
+ struct dept_dep *d;
+ const char *prefix;
+
+ if (v == &dept_classes) {
+ seq_puts(m, "All classes:\n\n");
+ return 0;
+ }
+
+ prefix = fc->sched_map ? "<sched> " : "";
+ seq_printf(m, "[%p] %s%s\n", (void *)fc->key, prefix, fc->name);
+
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ list_for_each_entry(d, &fc->dep_head, dep_node) {
+ struct dept_class *tc = d->wait->class;
+
+ prefix = tc->sched_map ? "<sched> " : "";
+ seq_printf(m, " -> [%p] %s%s\n", (void *)tc->key, prefix, tc->name);
+ }
+ seq_puts(m, "\n");
+
+ return 0;
+}
+
+static const struct seq_operations dept_deps_ops = {
+ .start = l_start,
+ .next = l_next,
+ .stop = l_stop,
+ .show = l_show,
+};
+
+static int dept_stats_show(struct seq_file *m, void *v)
+{
+ int r;
+
+ seq_puts(m, "Availability in the static pools:\n\n");
+#define OBJECT(id, nr) \
+ r = atomic_read(&dept_pool[OBJECT_##id].obj_nr); \
+ if (r < 0) \
+ r = 0; \
+ seq_printf(m, "%s\t%d/%d(%d%%)\n", #id, r, nr, (r * 100) / (nr));
+ #include "dept_object.h"
+#undef OBJECT
+
+ return 0;
+}
+
+static int __init dept_proc_init(void)
+{
+ proc_create_seq("dept_deps", S_IRUSR, NULL, &dept_deps_ops);
+ proc_create_single("dept_stats", S_IRUSR, NULL, dept_stats_show);
+ return 0;
+}
+
+__initcall(dept_proc_init);
--
2.17.1
Makes Dept able to track dependencies by hashed-waitqueue waits.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/wait_bit.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
index 7725b7579b78..fe89282c3e96 100644
--- a/include/linux/wait_bit.h
+++ b/include/linux/wait_bit.h
@@ -6,6 +6,7 @@
* Linux wait-bit related types and methods:
*/
#include <linux/wait.h>
+#include <linux/dept_sdt.h>
struct wait_bit_key {
void *flags;
@@ -246,6 +247,7 @@ extern wait_queue_head_t *__var_waitqueue(void *p);
struct wait_bit_queue_entry __wbq_entry; \
long __ret = ret; /* explicit shadow */ \
\
+ sdt_might_sleep_start(NULL); \
init_wait_var_entry(&__wbq_entry, var, \
exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
@@ -263,6 +265,7 @@ extern wait_queue_head_t *__var_waitqueue(void *p);
cmd; \
} \
finish_wait(__wq_head, &__wbq_entry.wq_entry); \
+ sdt_might_sleep_end(); \
__out: __ret; \
})
--
2.17.1
Dept engine works in a constrained environment. For example, Dept cannot
make use of dynamic allocation e.g. kmalloc(). So Dept has been using
static pools to keep memory chunks Dept uses.
However, Dept would barely work once any of the pools gets run out. So
implemented a mechanism for the refill on the lack by any chance, using
irq work and workqueue that fits on the contrained environment.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 19 ++++--
kernel/dependency/dept.c | 104 +++++++++++++++++++++++++++-----
kernel/dependency/dept_object.h | 10 +--
kernel/dependency/dept_proc.c | 8 +--
4 files changed, 112 insertions(+), 29 deletions(-)
diff --git a/include/linux/dept.h b/include/linux/dept.h
index 319a5b43df89..ca1a34be4127 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -336,9 +336,19 @@ struct dept_pool {
size_t obj_sz;
/*
- * the number of the static array
+ * the remaining number of the object in spool
*/
- atomic_t obj_nr;
+ int obj_nr;
+
+ /*
+ * the number of the object in spool
+ */
+ int tot_nr;
+
+ /*
+ * accumulated amount of memory used by the object in byte
+ */
+ atomic_t acc_sz;
/*
* offset of ->pool_node
@@ -348,9 +358,10 @@ struct dept_pool {
/*
* pointer to the pool
*/
- void *spool;
+ void *spool; /* static pool */
+ void *rpool; /* reserved pool */
struct llist_head boot_pool;
- struct llist_head __percpu *lpool;
+ struct llist_head __percpu *lpool; /* local pool */
};
struct dept_ecxt_held {
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index a8e693fd590f..8ca46ad98e10 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -74,6 +74,9 @@
#include <linux/dept.h>
#include <linux/utsname.h>
#include <linux/kernel.h>
+#include <linux/workqueue.h>
+#include <linux/irq_work.h>
+#include <linux/vmalloc.h>
#include "dept_internal.h"
static int dept_stop;
@@ -122,9 +125,11 @@ static int dept_per_cpu_ready;
WARN(1, "DEPT_STOP: " s); \
})
-#define DEPT_INFO_ONCE(s...) pr_warn_once("DEPT_INFO_ONCE: " s)
+#define DEPT_INFO_ONCE(s...) pr_warn_once("DEPT_INFO_ONCE: " s)
+#define DEPT_INFO(s...) pr_warn("DEPT_INFO: " s)
static arch_spinlock_t dept_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+static arch_spinlock_t dept_pool_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
/*
* DEPT internal engine should be careful in using outside functions
@@ -263,6 +268,7 @@ static bool valid_key(struct dept_key *k)
#define OBJECT(id, nr) \
static struct dept_##id spool_##id[nr]; \
+static struct dept_##id rpool_##id[nr]; \
static DEFINE_PER_CPU(struct llist_head, lpool_##id);
#include "dept_object.h"
#undef OBJECT
@@ -271,14 +277,70 @@ struct dept_pool dept_pool[OBJECT_NR] = {
#define OBJECT(id, nr) { \
.name = #id, \
.obj_sz = sizeof(struct dept_##id), \
- .obj_nr = ATOMIC_INIT(nr), \
+ .obj_nr = nr, \
+ .tot_nr = nr, \
+ .acc_sz = ATOMIC_INIT(sizeof(spool_##id) + sizeof(rpool_##id)), \
.node_off = offsetof(struct dept_##id, pool_node), \
.spool = spool_##id, \
+ .rpool = rpool_##id, \
.lpool = &lpool_##id, },
#include "dept_object.h"
#undef OBJECT
};
+static void dept_wq_work_fn(struct work_struct *work)
+{
+ int i;
+
+ for (i = 0; i < OBJECT_NR; i++) {
+ struct dept_pool *p = dept_pool + i;
+ int sz = p->tot_nr * p->obj_sz;
+ void *rpool;
+ bool need;
+
+ arch_spin_lock(&dept_pool_spin);
+ need = !p->rpool;
+ arch_spin_unlock(&dept_pool_spin);
+
+ if (!need)
+ continue;
+
+ rpool = vmalloc(sz);
+
+ if (!rpool) {
+ DEPT_STOP("Failed to extend internal resources.\n");
+ break;
+ }
+
+ arch_spin_lock(&dept_pool_spin);
+ if (!p->rpool) {
+ p->rpool = rpool;
+ rpool = NULL;
+ atomic_add(sz, &p->acc_sz);
+ }
+ arch_spin_unlock(&dept_pool_spin);
+
+ if (rpool)
+ vfree(rpool);
+ else
+ DEPT_INFO("Dept object(%s) just got refilled successfully.\n", p->name);
+ }
+}
+
+static DECLARE_WORK(dept_wq_work, dept_wq_work_fn);
+
+static void dept_irq_work_fn(struct irq_work *w)
+{
+ schedule_work(&dept_wq_work);
+}
+
+static DEFINE_IRQ_WORK(dept_irq_work, dept_irq_work_fn);
+
+static void request_rpool_refill(void)
+{
+ irq_work_queue(&dept_irq_work);
+}
+
/*
* Can use llist no matter whether CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG is
* enabled or not because NMI and other contexts in the same CPU never
@@ -314,19 +376,31 @@ static void *from_pool(enum object_t t)
/*
* Try static pool.
*/
- if (atomic_read(&p->obj_nr) > 0) {
- int idx = atomic_dec_return(&p->obj_nr);
+ arch_spin_lock(&dept_pool_spin);
+
+ if (!p->obj_nr) {
+ p->spool = p->rpool;
+ p->obj_nr = p->rpool ? p->tot_nr : 0;
+ p->rpool = NULL;
+ request_rpool_refill();
+ }
+
+ if (p->obj_nr) {
+ void *ret;
+
+ p->obj_nr--;
+ ret = p->spool + (p->obj_nr * p->obj_sz);
+ arch_spin_unlock(&dept_pool_spin);
- if (idx >= 0)
- return p->spool + (idx * p->obj_sz);
+ return ret;
}
+ arch_spin_unlock(&dept_pool_spin);
- DEPT_INFO_ONCE("---------------------------------------------\n"
- " Some of Dept internal resources are run out.\n"
- " Dept might still work if the resources get freed.\n"
- " However, the chances are Dept will suffer from\n"
- " the lack from now. Needs to extend the internal\n"
- " resource pools. Ask [email protected]\n");
+ DEPT_INFO("------------------------------------------\n"
+ " Dept object(%s) is run out.\n"
+ " Dept is trying to refill the object.\n"
+ " Nevertheless, if it fails, Dept will stop.\n",
+ p->name);
return NULL;
}
@@ -2957,8 +3031,8 @@ void __init dept_init(void)
pr_info("... DEPT_MAX_ECXT_HELD : %d\n", DEPT_MAX_ECXT_HELD);
pr_info("... DEPT_MAX_SUBCLASSES : %d\n", DEPT_MAX_SUBCLASSES);
#define OBJECT(id, nr) \
- pr_info("... memory used by %s: %zu KB\n", \
- #id, B2KB(sizeof(struct dept_##id) * nr));
+ pr_info("... memory initially used by %s: %zu KB\n", \
+ #id, B2KB(sizeof(spool_##id) + sizeof(rpool_##id)));
#include "dept_object.h"
#undef OBJECT
#define HASH(id, bits) \
@@ -2966,6 +3040,6 @@ void __init dept_init(void)
#id, B2KB(sizeof(struct hlist_head) * (1 << (bits))));
#include "dept_hash.h"
#undef HASH
- pr_info("... total memory used by objects and hashs: %zu KB\n", B2KB(mem_total));
+ pr_info("... total memory initially used by objects and hashs: %zu KB\n", B2KB(mem_total));
pr_info("... per task memory footprint: %zu bytes\n", sizeof(struct dept_task));
}
diff --git a/kernel/dependency/dept_object.h b/kernel/dependency/dept_object.h
index 0b7eb16fe9fb..4f936adfa8ee 100644
--- a/kernel/dependency/dept_object.h
+++ b/kernel/dependency/dept_object.h
@@ -6,8 +6,8 @@
* nr: # of the object that should be kept in the pool.
*/
-OBJECT(dep, 1024 * 8)
-OBJECT(class, 1024 * 8)
-OBJECT(stack, 1024 * 32)
-OBJECT(ecxt, 1024 * 16)
-OBJECT(wait, 1024 * 32)
+OBJECT(dep, 1024 * 4 * 2)
+OBJECT(class, 1024 * 4)
+OBJECT(stack, 1024 * 4 * 8)
+OBJECT(ecxt, 1024 * 4 * 2)
+OBJECT(wait, 1024 * 4 * 4)
diff --git a/kernel/dependency/dept_proc.c b/kernel/dependency/dept_proc.c
index 7d61dfbc5865..f07a512b203f 100644
--- a/kernel/dependency/dept_proc.c
+++ b/kernel/dependency/dept_proc.c
@@ -73,12 +73,10 @@ static int dept_stats_show(struct seq_file *m, void *v)
{
int r;
- seq_puts(m, "Availability in the static pools:\n\n");
+ seq_puts(m, "Accumulated amount of memory used by pools:\n\n");
#define OBJECT(id, nr) \
- r = atomic_read(&dept_pool[OBJECT_##id].obj_nr); \
- if (r < 0) \
- r = 0; \
- seq_printf(m, "%s\t%d/%d(%d%%)\n", #id, r, nr, (r * 100) / (nr));
+ r = atomic_read(&dept_pool[OBJECT_##id].acc_sz); \
+ seq_printf(m, "%s\t%d KB\n", #id, r / 1024);
#include "dept_object.h"
#undef OBJECT
--
2.17.1
Waits with valid timeouts don't actually cause deadlocks. However, Dept
has been reporting the cases as well because it's worth informing the
circular dependency for some cases where, for example, timeout is used
to avoid a deadlock but not meant to be expired.
However, yes, there are also a lot of, even more, cases where timeout
is used for its clear purpose and meant to be expired.
Let Dept report these as an information rather than shouting DEADLOCK.
Plus, introduced CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT Kconfig to make it
optional so that any reports involving waits with timeouts can be turned
on/off depending on the purpose.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 15 ++++++---
include/linux/dept_ldt.h | 6 ++--
include/linux/dept_sdt.h | 12 +++++---
kernel/dependency/dept.c | 66 ++++++++++++++++++++++++++++++++++------
lib/Kconfig.debug | 10 ++++++
5 files changed, 89 insertions(+), 20 deletions(-)
diff --git a/include/linux/dept.h b/include/linux/dept.h
index ca1a34be4127..0280e45cc2af 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -270,6 +270,11 @@ struct dept_wait {
* whether this wait is for commit in scheduler
*/
bool sched_sleep;
+
+ /*
+ * whether a timeout is set
+ */
+ bool timeout;
};
};
};
@@ -453,6 +458,7 @@ struct dept_task {
bool stage_sched_map;
const char *stage_w_fn;
unsigned long stage_ip;
+ bool stage_timeout;
/*
* the number of missing ecxts
@@ -490,6 +496,7 @@ struct dept_task {
.stage_sched_map = false, \
.stage_w_fn = NULL, \
.stage_ip = 0UL, \
+ .stage_timeout = false, \
.missing_ecxt = 0, \
.hardirqs_enabled = false, \
.softirqs_enabled = false, \
@@ -507,8 +514,8 @@ extern void dept_map_init(struct dept_map *m, struct dept_key *k, int sub_u, con
extern void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
extern void dept_map_copy(struct dept_map *to, struct dept_map *from);
-extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int sub_l);
-extern void dept_stage_wait(struct dept_map *m, struct dept_key *k, unsigned long ip, const char *w_fn);
+extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int sub_l, long timeout);
+extern void dept_stage_wait(struct dept_map *m, struct dept_key *k, unsigned long ip, const char *w_fn, long timeout);
extern void dept_request_event_wait_commit(void);
extern void dept_clean_stage(void);
extern void dept_stage_event(struct task_struct *t, unsigned long ip);
@@ -558,8 +565,8 @@ struct dept_task { };
#define dept_map_reinit(m, k, su, n) do { (void)(n); (void)(k); } while (0)
#define dept_map_copy(t, f) do { } while (0)
-#define dept_wait(m, w_f, ip, w_fn, sl) do { (void)(w_fn); } while (0)
-#define dept_stage_wait(m, k, ip, w_fn) do { (void)(k); (void)(w_fn); } while (0)
+#define dept_wait(m, w_f, ip, w_fn, sl, t) do { (void)(w_fn); } while (0)
+#define dept_stage_wait(m, k, ip, w_fn, t) do { (void)(k); (void)(w_fn); } while (0)
#define dept_request_event_wait_commit() do { } while (0)
#define dept_clean_stage() do { } while (0)
#define dept_stage_event(t, ip) do { } while (0)
diff --git a/include/linux/dept_ldt.h b/include/linux/dept_ldt.h
index 062613e89fc3..8adf298dfcb8 100644
--- a/include/linux/dept_ldt.h
+++ b/include/linux/dept_ldt.h
@@ -27,7 +27,7 @@
else if (t) \
dept_ecxt_enter(m, LDT_EVT_L, i, "trylock", "unlock", sl);\
else { \
- dept_wait(m, LDT_EVT_L, i, "lock", sl); \
+ dept_wait(m, LDT_EVT_L, i, "lock", sl, false); \
dept_ecxt_enter(m, LDT_EVT_L, i, "lock", "unlock", sl);\
} \
} while (0)
@@ -39,7 +39,7 @@
else if (t) \
dept_ecxt_enter(m, LDT_EVT_R, i, "read_trylock", "read_unlock", sl);\
else { \
- dept_wait(m, q ? LDT_EVT_RW : LDT_EVT_W, i, "read_lock", sl);\
+ dept_wait(m, q ? LDT_EVT_RW : LDT_EVT_W, i, "read_lock", sl, false);\
dept_ecxt_enter(m, LDT_EVT_R, i, "read_lock", "read_unlock", sl);\
} \
} while (0)
@@ -51,7 +51,7 @@
else if (t) \
dept_ecxt_enter(m, LDT_EVT_W, i, "write_trylock", "write_unlock", sl);\
else { \
- dept_wait(m, LDT_EVT_RW, i, "write_lock", sl); \
+ dept_wait(m, LDT_EVT_RW, i, "write_lock", sl, false);\
dept_ecxt_enter(m, LDT_EVT_W, i, "write_lock", "write_unlock", sl);\
} \
} while (0)
diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
index 12a793b90c7e..21fce525f031 100644
--- a/include/linux/dept_sdt.h
+++ b/include/linux/dept_sdt.h
@@ -22,11 +22,12 @@
#define sdt_map_init_key(m, k) dept_map_init(m, k, 0, #m)
-#define sdt_wait(m) \
+#define sdt_wait_timeout(m, t) \
do { \
dept_request_event(m); \
- dept_wait(m, 1UL, _THIS_IP_, __func__, 0); \
+ dept_wait(m, 1UL, _THIS_IP_, __func__, 0, t); \
} while (0)
+#define sdt_wait(m) sdt_wait_timeout(m, -1L)
/*
* sdt_might_sleep() and its family will be committed in __schedule()
@@ -37,12 +38,13 @@
/*
* Use the code location as the class key if an explicit map is not used.
*/
-#define sdt_might_sleep_start(m) \
+#define sdt_might_sleep_start_timeout(m, t) \
do { \
struct dept_map *__m = m; \
static struct dept_key __key; \
- dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__);\
+ dept_stage_wait(__m, __m ? NULL : &__key, _THIS_IP_, __func__, t);\
} while (0)
+#define sdt_might_sleep_start(m) sdt_might_sleep_start_timeout(m, -1L)
#define sdt_might_sleep_end() dept_clean_stage()
@@ -52,7 +54,9 @@
#else /* !CONFIG_DEPT */
#define sdt_map_init(m) do { } while (0)
#define sdt_map_init_key(m, k) do { (void)(k); } while (0)
+#define sdt_wait_timeout(m, t) do { } while (0)
#define sdt_wait(m) do { } while (0)
+#define sdt_might_sleep_start_timeout(m, t) do { } while (0)
#define sdt_might_sleep_start(m) do { } while (0)
#define sdt_might_sleep_end() do { } while (0)
#define sdt_ecxt_enter(m) do { } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 8ca46ad98e10..1b8fa9f69d73 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -739,6 +739,8 @@ static void print_diagram(struct dept_dep *d)
if (!irqf) {
print_spc(spc, "[S] %s(%s:%d)\n", c_fn, fc_n, fc->sub_id);
print_spc(spc, "[W] %s(%s:%d)\n", w_fn, tc_n, tc->sub_id);
+ if (w->timeout)
+ print_spc(spc, "--------------- >8 timeout ---------------\n");
print_spc(spc, "[E] %s(%s:%d)\n", e_fn, fc_n, fc->sub_id);
}
}
@@ -792,6 +794,24 @@ static void print_dep(struct dept_dep *d)
static void save_current_stack(int skip);
+static bool is_timeout_wait_circle(struct dept_class *c)
+{
+ struct dept_class *fc = c->bfs_parent;
+ struct dept_class *tc = c;
+
+ do {
+ struct dept_dep *d = lookup_dep(fc, tc);
+
+ if (d->wait->timeout)
+ return true;
+
+ tc = fc;
+ fc = fc->bfs_parent;
+ } while (tc != c);
+
+ return false;
+}
+
/*
* Print all classes in a circle.
*/
@@ -814,10 +834,14 @@ static void print_circle(struct dept_class *c)
pr_warn("summary\n");
pr_warn("---------------------------------------------------\n");
- if (fc == tc)
+ if (is_timeout_wait_circle(c)) {
+ pr_warn("NOT A DEADLOCK BUT A CIRCULAR DEPENDENCY\n");
+ pr_warn("CHECK IF THE TIMEOUT IS INTENDED\n\n");
+ } else if (fc == tc) {
pr_warn("*** AA DEADLOCK ***\n\n");
- else
+ } else {
pr_warn("*** DEADLOCK ***\n\n");
+ }
i = 0;
do {
@@ -1563,7 +1587,8 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
static atomic_t wgen = ATOMIC_INIT(1);
static void add_wait(struct dept_class *c, unsigned long ip,
- const char *w_fn, int sub_l, bool sched_sleep)
+ const char *w_fn, int sub_l, bool sched_sleep,
+ bool timeout)
{
struct dept_task *dt = dept_task();
struct dept_wait *w;
@@ -1583,6 +1608,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
w->wait_fn = w_fn;
w->wait_stack = get_current_stack();
w->sched_sleep = sched_sleep;
+ w->timeout = timeout;
cxt = cur_cxt();
if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
@@ -2294,7 +2320,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
*/
static void __dept_wait(struct dept_map *m, unsigned long w_f,
unsigned long ip, const char *w_fn, int sub_l,
- bool sched_sleep, bool sched_map)
+ bool sched_sleep, bool sched_map, bool timeout)
{
int e;
@@ -2317,7 +2343,7 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
if (!c)
continue;
- add_wait(c, ip, w_fn, sub_l, sched_sleep);
+ add_wait(c, ip, w_fn, sub_l, sched_sleep, timeout);
}
}
@@ -2354,14 +2380,23 @@ static void __dept_event(struct dept_map *m, unsigned long e_f,
}
void dept_wait(struct dept_map *m, unsigned long w_f,
- unsigned long ip, const char *w_fn, int sub_l)
+ unsigned long ip, const char *w_fn, int sub_l,
+ long timeoutval)
{
struct dept_task *dt = dept_task();
unsigned long flags;
+ bool timeout;
if (unlikely(!dept_working()))
return;
+ timeout = timeoutval > 0 && timeoutval < MAX_SCHEDULE_TIMEOUT;
+
+#if !defined(CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT)
+ if (timeout)
+ return;
+#endif
+
if (dt->recursive)
return;
@@ -2370,21 +2405,30 @@ void dept_wait(struct dept_map *m, unsigned long w_f,
flags = dept_enter();
- __dept_wait(m, w_f, ip, w_fn, sub_l, false, false);
+ __dept_wait(m, w_f, ip, w_fn, sub_l, false, false, timeout);
dept_exit(flags);
}
EXPORT_SYMBOL_GPL(dept_wait);
void dept_stage_wait(struct dept_map *m, struct dept_key *k,
- unsigned long ip, const char *w_fn)
+ unsigned long ip, const char *w_fn,
+ long timeoutval)
{
struct dept_task *dt = dept_task();
unsigned long flags;
+ bool timeout;
if (unlikely(!dept_working()))
return;
+ timeout = timeoutval > 0 && timeoutval < MAX_SCHEDULE_TIMEOUT;
+
+#if !defined(CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT)
+ if (timeout)
+ return;
+#endif
+
if (m && m->nocheck)
return;
@@ -2430,6 +2474,7 @@ void dept_stage_wait(struct dept_map *m, struct dept_key *k,
dt->stage_w_fn = w_fn;
dt->stage_ip = ip;
+ dt->stage_timeout = timeout;
exit:
dept_exit_recursive(flags);
}
@@ -2441,6 +2486,7 @@ static void __dept_clean_stage(struct dept_task *dt)
dt->stage_sched_map = false;
dt->stage_w_fn = NULL;
dt->stage_ip = 0UL;
+ dt->stage_timeout = false;
}
void dept_clean_stage(void)
@@ -2471,6 +2517,7 @@ void dept_request_event_wait_commit(void)
unsigned long ip;
const char *w_fn;
bool sched_map;
+ bool timeout;
if (unlikely(!dept_working()))
return;
@@ -2493,6 +2540,7 @@ void dept_request_event_wait_commit(void)
w_fn = dt->stage_w_fn;
ip = dt->stage_ip;
sched_map = dt->stage_sched_map;
+ timeout = dt->stage_timeout;
/*
* Avoid zero wgen.
@@ -2500,7 +2548,7 @@ void dept_request_event_wait_commit(void)
wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
WRITE_ONCE(dt->stage_m.wgen, wg);
- __dept_wait(&dt->stage_m, 1UL, ip, w_fn, 0, true, sched_map);
+ __dept_wait(&dt->stage_m, 1UL, ip, w_fn, 0, true, sched_map, timeout);
exit:
dept_exit(flags);
}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 9602f41ad8e8..0ec3addef504 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1312,6 +1312,16 @@ config DEPT
noting, to mitigate the impact by the false positives, multi
reporting has been supported.
+config DEPT_AGGRESSIVE_TIMEOUT_WAIT
+ bool "Aggressively track even timeout waits"
+ depends on DEPT
+ default n
+ help
+ Timeout wait doesn't contribute to a deadlock. However,
+ informing a circular dependency might be helpful for cases
+ that timeout is used to avoid a deadlock. Say N if you'd like
+ to avoid verbose reports.
+
config LOCK_DEBUGGING_SUPPORT
bool
depends on TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
--
2.17.1
Now that CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT was introduced, apply the
consideration to dma fence wait.
Signed-off-by: Byungchul Park <[email protected]>
---
drivers/dma-buf/dma-fence.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 76dba11f0dab..95121cbcc6b5 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -784,7 +784,7 @@ dma_fence_default_wait(struct dma_fence *fence, bool intr, signed long timeout)
cb.task = current;
list_add(&cb.base.node, &fence->cb_list);
- sdt_might_sleep_start(NULL);
+ sdt_might_sleep_start_timeout(NULL, timeout);
while (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags) && ret > 0) {
if (intr)
__set_current_state(TASK_INTERRUPTIBLE);
@@ -888,7 +888,7 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,
}
}
- sdt_might_sleep_start(NULL);
+ sdt_might_sleep_start_timeout(NULL, timeout);
while (ret > 0) {
if (intr)
set_current_state(TASK_INTERRUPTIBLE);
--
2.17.1
Now that CONFIG_DEPT_AGGRESSIVE_TIMEOUT_WAIT was introduced, apply the
consideration to hashed-waitqueue wait, assuming an input 'ret' in
___wait_var_event() macro is used as a timeout value.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/wait_bit.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h
index fe89282c3e96..3ef450d9a7c5 100644
--- a/include/linux/wait_bit.h
+++ b/include/linux/wait_bit.h
@@ -247,7 +247,7 @@ extern wait_queue_head_t *__var_waitqueue(void *p);
struct wait_bit_queue_entry __wbq_entry; \
long __ret = ret; /* explicit shadow */ \
\
- sdt_might_sleep_start(NULL); \
+ sdt_might_sleep_start_timeout(NULL, __ret); \
init_wait_var_entry(&__wbq_entry, var, \
exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
--
2.17.1
There is a case where total maps for its wait/event is so large in size.
For instance, struct page for PG_locked and PG_writeback is the case.
The additional memory size for the maps would be 'the # of pages *
sizeof(struct dept_map)' if each struct page keeps its map all the way,
which might be too big to accept.
It'd be better to keep the minimum data in the case, which is timestamp
called 'wgen' that Dept makes use of. So made Dept able to work with an
external wgen when needed.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 18 ++++++++++++++----
include/linux/dept_sdt.h | 4 ++--
kernel/dependency/dept.c | 30 +++++++++++++++++++++---------
3 files changed, 37 insertions(+), 15 deletions(-)
diff --git a/include/linux/dept.h b/include/linux/dept.h
index 0280e45cc2af..dea53ad5b356 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -482,6 +482,13 @@ struct dept_task {
bool in_sched;
};
+/*
+ * for subsystems that requires compact use of memory e.g. struct page
+ */
+struct dept_ext_wgen{
+ unsigned int wgen;
+};
+
#define DEPT_TASK_INITIALIZER(t) \
{ \
.wait_hist = { { .wait = NULL, } }, \
@@ -512,6 +519,7 @@ extern void dept_task_exit(struct task_struct *t);
extern void dept_free_range(void *start, unsigned int sz);
extern void dept_map_init(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
extern void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u, const char *n);
+extern void dept_ext_wgen_init(struct dept_ext_wgen *ewg);
extern void dept_map_copy(struct dept_map *to, struct dept_map *from);
extern void dept_wait(struct dept_map *m, unsigned long w_f, unsigned long ip, const char *w_fn, int sub_l, long timeout);
@@ -521,8 +529,8 @@ extern void dept_clean_stage(void);
extern void dept_stage_event(struct task_struct *t, unsigned long ip);
extern void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *c_fn, const char *e_fn, int sub_l);
extern bool dept_ecxt_holding(struct dept_map *m, unsigned long e_f);
-extern void dept_request_event(struct dept_map *m);
-extern void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *e_fn);
+extern void dept_request_event(struct dept_map *m, struct dept_ext_wgen *ewg);
+extern void dept_event(struct dept_map *m, unsigned long e_f, unsigned long ip, const char *e_fn, struct dept_ext_wgen *ewg);
extern void dept_ecxt_exit(struct dept_map *m, unsigned long e_f, unsigned long ip);
extern void dept_sched_enter(void);
extern void dept_sched_exit(void);
@@ -551,6 +559,7 @@ extern void dept_hardirqs_off(void);
struct dept_key { };
struct dept_map { };
struct dept_task { };
+struct dept_ext_wgen { };
#define DEPT_MAP_INITIALIZER(n, k) { }
#define DEPT_TASK_INITIALIZER(t) { }
@@ -563,6 +572,7 @@ struct dept_task { };
#define dept_free_range(s, sz) do { } while (0)
#define dept_map_init(m, k, su, n) do { (void)(n); (void)(k); } while (0)
#define dept_map_reinit(m, k, su, n) do { (void)(n); (void)(k); } while (0)
+#define dept_ext_wgen_init(wg) do { } while (0)
#define dept_map_copy(t, f) do { } while (0)
#define dept_wait(m, w_f, ip, w_fn, sl, t) do { (void)(w_fn); } while (0)
@@ -572,8 +582,8 @@ struct dept_task { };
#define dept_stage_event(t, ip) do { } while (0)
#define dept_ecxt_enter(m, e_f, ip, c_fn, e_fn, sl) do { (void)(c_fn); (void)(e_fn); } while (0)
#define dept_ecxt_holding(m, e_f) false
-#define dept_request_event(m) do { } while (0)
-#define dept_event(m, e_f, ip, e_fn) do { (void)(e_fn); } while (0)
+#define dept_request_event(m, wg) do { } while (0)
+#define dept_event(m, e_f, ip, e_fn, wg) do { (void)(e_fn); } while (0)
#define dept_ecxt_exit(m, e_f, ip) do { } while (0)
#define dept_sched_enter() do { } while (0)
#define dept_sched_exit() do { } while (0)
diff --git a/include/linux/dept_sdt.h b/include/linux/dept_sdt.h
index 21fce525f031..8cdac7982036 100644
--- a/include/linux/dept_sdt.h
+++ b/include/linux/dept_sdt.h
@@ -24,7 +24,7 @@
#define sdt_wait_timeout(m, t) \
do { \
- dept_request_event(m); \
+ dept_request_event(m, NULL); \
dept_wait(m, 1UL, _THIS_IP_, __func__, 0, t); \
} while (0)
#define sdt_wait(m) sdt_wait_timeout(m, -1L)
@@ -49,7 +49,7 @@
#define sdt_might_sleep_end() dept_clean_stage()
#define sdt_ecxt_enter(m) dept_ecxt_enter(m, 1UL, _THIS_IP_, "start", "event", 0)
-#define sdt_event(m) dept_event(m, 1UL, _THIS_IP_, __func__)
+#define sdt_event(m) dept_event(m, 1UL, _THIS_IP_, __func__, NULL)
#define sdt_ecxt_exit(m) dept_ecxt_exit(m, 1UL, _THIS_IP_)
#else /* !CONFIG_DEPT */
#define sdt_map_init(m) do { } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 5c996f11abd5..fb33c3758c25 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -2186,6 +2186,11 @@ void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u,
}
EXPORT_SYMBOL_GPL(dept_map_reinit);
+void dept_ext_wgen_init(struct dept_ext_wgen *ewg)
+{
+ ewg->wgen = 0U;
+}
+
void dept_map_copy(struct dept_map *to, struct dept_map *from)
{
if (unlikely(!dept_working())) {
@@ -2371,7 +2376,7 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
*/
static void __dept_event(struct dept_map *m, unsigned long e_f,
unsigned long ip, const char *e_fn,
- bool sched_map)
+ bool sched_map, unsigned int wg)
{
struct dept_class *c;
struct dept_key *k;
@@ -2393,7 +2398,7 @@ static void __dept_event(struct dept_map *m, unsigned long e_f,
c = check_new_class(&m->map_key, k, sub_id(m, e), m->name, sched_map);
if (c && add_ecxt(m, c, 0UL, NULL, e_fn, 0)) {
- do_event(m, c, READ_ONCE(m->wgen), ip);
+ do_event(m, c, wg, ip);
pop_ecxt(m, c);
}
}
@@ -2606,7 +2611,7 @@ void dept_stage_event(struct task_struct *requestor, unsigned long ip)
if (!m.keys)
goto exit;
- __dept_event(&m, 1UL, ip, "try_to_wake_up", sched_map);
+ __dept_event(&m, 1UL, ip, "try_to_wake_up", sched_map, m.wgen);
exit:
dept_exit(flags);
}
@@ -2785,10 +2790,11 @@ bool dept_ecxt_holding(struct dept_map *m, unsigned long e_f)
}
EXPORT_SYMBOL_GPL(dept_ecxt_holding);
-void dept_request_event(struct dept_map *m)
+void dept_request_event(struct dept_map *m, struct dept_ext_wgen *ewg)
{
unsigned long flags;
unsigned int wg;
+ unsigned int *wg_p;
if (unlikely(!dept_working()))
return;
@@ -2801,21 +2807,25 @@ void dept_request_event(struct dept_map *m)
*/
flags = dept_enter_recursive();
+ wg_p = ewg ? &ewg->wgen : &m->wgen;
+
/*
* Avoid zero wgen.
*/
wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
- WRITE_ONCE(m->wgen, wg);
+ WRITE_ONCE(*wg_p, wg);
dept_exit_recursive(flags);
}
EXPORT_SYMBOL_GPL(dept_request_event);
void dept_event(struct dept_map *m, unsigned long e_f,
- unsigned long ip, const char *e_fn)
+ unsigned long ip, const char *e_fn,
+ struct dept_ext_wgen *ewg)
{
struct dept_task *dt = dept_task();
unsigned long flags;
+ unsigned int *wg_p;
if (unlikely(!dept_working()))
return;
@@ -2823,24 +2833,26 @@ void dept_event(struct dept_map *m, unsigned long e_f,
if (m->nocheck)
return;
+ wg_p = ewg ? &ewg->wgen : &m->wgen;
+
if (dt->recursive) {
/*
* Dept won't work with this even though an event
* context has been asked. Don't make it confused at
* handling the event. Disable it until the next.
*/
- WRITE_ONCE(m->wgen, 0U);
+ WRITE_ONCE(*wg_p, 0U);
return;
}
flags = dept_enter();
- __dept_event(m, e_f, ip, e_fn, false);
+ __dept_event(m, e_f, ip, e_fn, false, READ_ONCE(*wg_p));
/*
* Keep the map diabled until the next sleep.
*/
- WRITE_ONCE(m->wgen, 0U);
+ WRITE_ONCE(*wg_p, 0U);
dept_exit(flags);
}
--
2.17.1
Makes Dept able to track PG_locked waits and events. It's going to be
useful in practice. See the following link that shows dept worked with
PG_locked and can detect real issues:
https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/mm_types.h | 2 +
include/linux/page-flags.h | 105 ++++++++++++++++++++++++++++++++-----
include/linux/pagemap.h | 7 ++-
mm/filemap.c | 26 +++++++++
mm/mm_init.c | 2 +
5 files changed, 129 insertions(+), 13 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 957ce38768b2..5c1112bc7a46 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -19,6 +19,7 @@
#include <linux/workqueue.h>
#include <linux/seqlock.h>
#include <linux/percpu_counter.h>
+#include <linux/dept.h>
#include <asm/mmu.h>
@@ -203,6 +204,7 @@ struct page {
struct page *kmsan_shadow;
struct page *kmsan_origin;
#endif
+ struct dept_ext_wgen PG_locked_wgen;
} _struct_page_alignment;
/*
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index a88e64acebfe..0a498f2c4543 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -198,6 +198,43 @@ enum pageflags {
#ifndef __GENERATING_BOUNDS_H
+#ifdef CONFIG_DEPT
+#include <linux/kernel.h>
+#include <linux/dept.h>
+
+extern struct dept_map PG_locked_map;
+
+/*
+ * Place the following annotations in its suitable point in code:
+ *
+ * Annotate dept_page_set_bit() around firstly set_bit*()
+ * Annotate dept_page_clear_bit() around clear_bit*()
+ * Annotate dept_page_wait_on_bit() around wait_on_bit*()
+ */
+
+static inline void dept_page_set_bit(struct page *p, int bit_nr)
+{
+ if (bit_nr == PG_locked)
+ dept_request_event(&PG_locked_map, &p->PG_locked_wgen);
+}
+
+static inline void dept_page_clear_bit(struct page *p, int bit_nr)
+{
+ if (bit_nr == PG_locked)
+ dept_event(&PG_locked_map, 1UL, _RET_IP_, __func__, &p->PG_locked_wgen);
+}
+
+static inline void dept_page_wait_on_bit(struct page *p, int bit_nr)
+{
+ if (bit_nr == PG_locked)
+ dept_wait(&PG_locked_map, 1UL, _RET_IP_, __func__, 0, -1L);
+}
+#else
+#define dept_page_set_bit(p, bit_nr) do { } while (0)
+#define dept_page_clear_bit(p, bit_nr) do { } while (0)
+#define dept_page_wait_on_bit(p, bit_nr) do { } while (0)
+#endif
+
#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
@@ -379,44 +416,88 @@ static __always_inline int Page##uname(struct page *page) \
#define SETPAGEFLAG(uname, lname, policy) \
static __always_inline \
void folio_set_##lname(struct folio *folio) \
-{ set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
+{ \
+ set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); \
+ dept_page_set_bit(&folio->page, PG_##lname); \
+} \
static __always_inline void SetPage##uname(struct page *page) \
-{ set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ \
+ set_bit(PG_##lname, &policy(page, 1)->flags); \
+ dept_page_set_bit(page, PG_##lname); \
+}
#define CLEARPAGEFLAG(uname, lname, policy) \
static __always_inline \
void folio_clear_##lname(struct folio *folio) \
-{ clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
+{ \
+ clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); \
+ dept_page_clear_bit(&folio->page, PG_##lname); \
+} \
static __always_inline void ClearPage##uname(struct page *page) \
-{ clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ \
+ clear_bit(PG_##lname, &policy(page, 1)->flags); \
+ dept_page_clear_bit(page, PG_##lname); \
+}
#define __SETPAGEFLAG(uname, lname, policy) \
static __always_inline \
void __folio_set_##lname(struct folio *folio) \
-{ __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
+{ \
+ __set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); \
+ dept_page_set_bit(&folio->page, PG_##lname); \
+} \
static __always_inline void __SetPage##uname(struct page *page) \
-{ __set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ \
+ __set_bit(PG_##lname, &policy(page, 1)->flags); \
+ dept_page_set_bit(page, PG_##lname); \
+}
#define __CLEARPAGEFLAG(uname, lname, policy) \
static __always_inline \
void __folio_clear_##lname(struct folio *folio) \
-{ __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
+{ \
+ __clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); \
+ dept_page_clear_bit(&folio->page, PG_##lname); \
+} \
static __always_inline void __ClearPage##uname(struct page *page) \
-{ __clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ \
+ __clear_bit(PG_##lname, &policy(page, 1)->flags); \
+ dept_page_clear_bit(page, PG_##lname); \
+}
#define TESTSETFLAG(uname, lname, policy) \
static __always_inline \
bool folio_test_set_##lname(struct folio *folio) \
-{ return test_and_set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
+{ \
+ bool ret = test_and_set_bit(PG_##lname, folio_flags(folio, FOLIO_##policy));\
+ if (!ret) \
+ dept_page_set_bit(&folio->page, PG_##lname); \
+ return ret; \
+} \
static __always_inline int TestSetPage##uname(struct page *page) \
-{ return test_and_set_bit(PG_##lname, &policy(page, 1)->flags); }
+{ \
+ bool ret = test_and_set_bit(PG_##lname, &policy(page, 1)->flags);\
+ if (!ret) \
+ dept_page_set_bit(page, PG_##lname); \
+ return ret; \
+}
#define TESTCLEARFLAG(uname, lname, policy) \
static __always_inline \
bool folio_test_clear_##lname(struct folio *folio) \
-{ return test_and_clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy)); } \
+{ \
+ bool ret = test_and_clear_bit(PG_##lname, folio_flags(folio, FOLIO_##policy));\
+ if (ret) \
+ dept_page_clear_bit(&folio->page, PG_##lname); \
+ return ret; \
+} \
static __always_inline int TestClearPage##uname(struct page *page) \
-{ return test_and_clear_bit(PG_##lname, &policy(page, 1)->flags); }
+{ \
+ bool ret = test_and_clear_bit(PG_##lname, &policy(page, 1)->flags);\
+ if (ret) \
+ dept_page_clear_bit(page, PG_##lname); \
+ return ret; \
+}
#define PAGEFLAG(uname, lname, policy) \
TESTPAGEFLAG(uname, lname, policy) \
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 06142ff7f9ce..c6683b228b20 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -991,7 +991,12 @@ void folio_unlock(struct folio *folio);
*/
static inline bool folio_trylock(struct folio *folio)
{
- return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0)));
+ bool ret = !test_and_set_bit_lock(PG_locked, folio_flags(folio, 0));
+
+ if (ret)
+ dept_page_set_bit(&folio->page, PG_locked);
+
+ return likely(ret);
}
/*
diff --git a/mm/filemap.c b/mm/filemap.c
index ad5b4aa049a3..241a67a363b0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -45,6 +45,7 @@
#include <linux/migrate.h>
#include <linux/pipe_fs_i.h>
#include <linux/splice.h>
+#include <linux/dept.h>
#include <asm/pgalloc.h>
#include <asm/tlbflush.h>
#include "internal.h"
@@ -1098,6 +1099,7 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync,
if (flags & WQ_FLAG_CUSTOM) {
if (test_and_set_bit(key->bit_nr, &key->folio->flags))
return -1;
+ dept_page_set_bit(&key->folio->page, key->bit_nr);
flags |= WQ_FLAG_DONE;
}
}
@@ -1181,6 +1183,7 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
if (wait->flags & WQ_FLAG_EXCLUSIVE) {
if (test_and_set_bit(bit_nr, &folio->flags))
return false;
+ dept_page_set_bit(&folio->page, bit_nr);
} else if (test_bit(bit_nr, &folio->flags))
return false;
@@ -1191,6 +1194,9 @@ static inline bool folio_trylock_flag(struct folio *folio, int bit_nr,
/* How many times do we accept lock stealing from under a waiter? */
int sysctl_page_lock_unfairness = 5;
+struct dept_map __maybe_unused PG_locked_map = DEPT_MAP_INITIALIZER(PG_locked_map, NULL);
+EXPORT_SYMBOL(PG_locked_map);
+
static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
int state, enum behavior behavior)
{
@@ -1202,6 +1208,8 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
unsigned long pflags;
bool in_thrashing;
+ dept_page_wait_on_bit(&folio->page, bit_nr);
+
if (bit_nr == PG_locked &&
!folio_test_uptodate(folio) && folio_test_workingset(folio)) {
delayacct_thrashing_start(&in_thrashing);
@@ -1295,6 +1303,23 @@ static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
break;
}
+ /*
+ * dept_page_set_bit() might have been called already in
+ * folio_trylock_flag(), wake_page_function() or somewhere.
+ * However, call it again to reset the wgen of dept to ensure
+ * dept_page_wait_on_bit() is called prior to
+ * dept_page_set_bit().
+ *
+ * Remind dept considers all the waits between
+ * dept_page_set_bit() and dept_page_clear_bit() as potential
+ * event disturbers. Ensure the correct sequence so that dept
+ * can make correct decisions:
+ *
+ * wait -> acquire(set bit) -> release(clear bit)
+ */
+ if (wait->flags & WQ_FLAG_DONE)
+ dept_page_set_bit(&folio->page, bit_nr);
+
/*
* If a signal happened, this 'finish_wait()' may remove the last
* waiter from the wait-queues, but the folio waiters bit will remain
@@ -1471,6 +1496,7 @@ void folio_unlock(struct folio *folio)
BUILD_BUG_ON(PG_waiters != 7);
BUILD_BUG_ON(PG_locked > 7);
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
+ dept_page_clear_bit(&folio->page, PG_locked);
if (folio_xor_flags_has_waiters(folio, 1 << PG_locked))
folio_wake_bit(folio, PG_locked);
}
diff --git a/mm/mm_init.c b/mm/mm_init.c
index 077bfe393b5e..fc150d7a3686 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -26,6 +26,7 @@
#include <linux/pgtable.h>
#include <linux/swap.h>
#include <linux/cma.h>
+#include <linux/dept.h>
#include "internal.h"
#include "slab.h"
#include "shuffle.h"
@@ -564,6 +565,7 @@ void __meminit __init_single_page(struct page *page, unsigned long pfn,
page_mapcount_reset(page);
page_cpupid_reset_last(page);
page_kasan_tag_reset(page);
+ dept_ext_wgen_init(&page->PG_locked_wgen);
INIT_LIST_HEAD(&page->lru);
#ifdef WANT_PAGE_VIRTUAL
--
2.17.1
Currently, print nothing in place of [S] in report, which means
stacktrace of event context's start if the event is not an unlock thing
by typical lock but general event because it's not easy to specify the
point in a general way, where the event context has started from.
However, unfortunately it makes hard to interpret dept's report in that
case. So made it print the event requestor's stacktrace instead of the
event context's start, in place of [S] in report.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 13 +++++++
kernel/dependency/dept.c | 83 ++++++++++++++++++++++++++++++++--------
2 files changed, 80 insertions(+), 16 deletions(-)
diff --git a/include/linux/dept.h b/include/linux/dept.h
index dea53ad5b356..6db23d77905e 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -145,6 +145,11 @@ struct dept_map {
*/
unsigned int wgen;
+ /*
+ * requestor for the event context to run
+ */
+ struct dept_stack *req_stack;
+
/*
* whether this map should be going to be checked or not
*/
@@ -486,7 +491,15 @@ struct dept_task {
* for subsystems that requires compact use of memory e.g. struct page
*/
struct dept_ext_wgen{
+ /*
+ * wait timestamp associated to this map
+ */
unsigned int wgen;
+
+ /*
+ * requestor for the event context to run
+ */
+ struct dept_stack *req_stack;
};
#define DEPT_TASK_INITIALIZER(t) \
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index fb33c3758c25..abf1cdab0615 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -129,6 +129,7 @@ static int dept_per_cpu_ready;
#define DEPT_INFO(s...) pr_warn("DEPT_INFO: " s)
static arch_spinlock_t dept_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+static arch_spinlock_t dept_req_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
static arch_spinlock_t dept_pool_spin = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
/*
@@ -1669,7 +1670,8 @@ static void add_wait(struct dept_class *c, unsigned long ip,
static bool add_ecxt(struct dept_map *m, struct dept_class *c,
unsigned long ip, const char *c_fn,
- const char *e_fn, int sub_l)
+ const char *e_fn, int sub_l,
+ struct dept_stack *req_stack)
{
struct dept_task *dt = dept_task();
struct dept_ecxt_held *eh;
@@ -1700,10 +1702,16 @@ static bool add_ecxt(struct dept_map *m, struct dept_class *c,
e->class = get_class(c);
e->ecxt_ip = ip;
- e->ecxt_stack = ip && rich_stack ? get_current_stack() : NULL;
e->event_fn = e_fn;
e->ecxt_fn = c_fn;
+ if (req_stack)
+ e->ecxt_stack = get_stack(req_stack);
+ else if (ip && rich_stack)
+ e->ecxt_stack = get_current_stack();
+ else
+ e->ecxt_stack = NULL;
+
eh = dt->ecxt_held + (dt->ecxt_held_pos++);
eh->ecxt = get_ecxt(e);
eh->map = m;
@@ -2147,6 +2155,7 @@ void dept_map_init(struct dept_map *m, struct dept_key *k, int sub_u,
m->sub_u = sub_u;
m->name = n;
m->wgen = 0U;
+ m->req_stack = NULL;
m->nocheck = !valid_key(k);
dept_exit_recursive(flags);
@@ -2181,6 +2190,7 @@ void dept_map_reinit(struct dept_map *m, struct dept_key *k, int sub_u,
m->name = n;
m->wgen = 0U;
+ m->req_stack = NULL;
dept_exit_recursive(flags);
}
@@ -2189,6 +2199,7 @@ EXPORT_SYMBOL_GPL(dept_map_reinit);
void dept_ext_wgen_init(struct dept_ext_wgen *ewg)
{
ewg->wgen = 0U;
+ ewg->req_stack = NULL;
}
void dept_map_copy(struct dept_map *to, struct dept_map *from)
@@ -2376,7 +2387,8 @@ static void __dept_wait(struct dept_map *m, unsigned long w_f,
*/
static void __dept_event(struct dept_map *m, unsigned long e_f,
unsigned long ip, const char *e_fn,
- bool sched_map, unsigned int wg)
+ bool sched_map, unsigned int wg,
+ struct dept_stack *req_stack)
{
struct dept_class *c;
struct dept_key *k;
@@ -2397,7 +2409,7 @@ static void __dept_event(struct dept_map *m, unsigned long e_f,
k = m->keys ?: &m->map_key;
c = check_new_class(&m->map_key, k, sub_id(m, e), m->name, sched_map);
- if (c && add_ecxt(m, c, 0UL, NULL, e_fn, 0)) {
+ if (c && add_ecxt(m, c, 0UL, "(event requestor)", e_fn, 0, req_stack)) {
do_event(m, c, wg, ip);
pop_ecxt(m, c);
}
@@ -2506,6 +2518,8 @@ EXPORT_SYMBOL_GPL(dept_stage_wait);
static void __dept_clean_stage(struct dept_task *dt)
{
+ if (dt->stage_m.req_stack)
+ put_stack(dt->stage_m.req_stack);
memset(&dt->stage_m, 0x0, sizeof(struct dept_map));
dt->stage_sched_map = false;
dt->stage_w_fn = NULL;
@@ -2571,6 +2585,7 @@ void dept_request_event_wait_commit(void)
*/
wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
WRITE_ONCE(dt->stage_m.wgen, wg);
+ dt->stage_m.req_stack = get_current_stack();
__dept_wait(&dt->stage_m, 1UL, ip, w_fn, 0, true, sched_map, timeout);
exit:
@@ -2602,6 +2617,8 @@ void dept_stage_event(struct task_struct *requestor, unsigned long ip)
*/
m = dt_req->stage_m;
sched_map = dt_req->stage_sched_map;
+ if (m.req_stack)
+ get_stack(m.req_stack);
__dept_clean_stage(dt_req);
/*
@@ -2611,8 +2628,12 @@ void dept_stage_event(struct task_struct *requestor, unsigned long ip)
if (!m.keys)
goto exit;
- __dept_event(&m, 1UL, ip, "try_to_wake_up", sched_map, m.wgen);
+ __dept_event(&m, 1UL, ip, "try_to_wake_up", sched_map, m.wgen,
+ m.req_stack);
exit:
+ if (m.req_stack)
+ put_stack(m.req_stack);
+
dept_exit(flags);
}
@@ -2692,7 +2713,7 @@ void dept_map_ecxt_modify(struct dept_map *m, unsigned long e_f,
k = m->keys ?: &m->map_key;
c = check_new_class(&m->map_key, k, sub_id(m, new_e), m->name, false);
- if (c && add_ecxt(m, c, new_ip, new_c_fn, new_e_fn, new_sub_l))
+ if (c && add_ecxt(m, c, new_ip, new_c_fn, new_e_fn, new_sub_l, NULL))
goto exit;
/*
@@ -2744,7 +2765,7 @@ void dept_ecxt_enter(struct dept_map *m, unsigned long e_f, unsigned long ip,
k = m->keys ?: &m->map_key;
c = check_new_class(&m->map_key, k, sub_id(m, e), m->name, false);
- if (c && add_ecxt(m, c, ip, c_fn, e_fn, sub_l))
+ if (c && add_ecxt(m, c, ip, c_fn, e_fn, sub_l, NULL))
goto exit;
missing_ecxt:
dt->missing_ecxt++;
@@ -2792,9 +2813,11 @@ EXPORT_SYMBOL_GPL(dept_ecxt_holding);
void dept_request_event(struct dept_map *m, struct dept_ext_wgen *ewg)
{
+ struct dept_task *dt = dept_task();
unsigned long flags;
unsigned int wg;
unsigned int *wg_p;
+ struct dept_stack **req_stack_p;
if (unlikely(!dept_working()))
return;
@@ -2802,12 +2825,18 @@ void dept_request_event(struct dept_map *m, struct dept_ext_wgen *ewg)
if (m->nocheck)
return;
- /*
- * Allow recursive entrance.
- */
- flags = dept_enter_recursive();
+ if (dt->recursive)
+ return;
- wg_p = ewg ? &ewg->wgen : &m->wgen;
+ flags = dept_enter();
+
+ if (ewg) {
+ wg_p = &ewg->wgen;
+ req_stack_p = &ewg->req_stack;
+ } else {
+ wg_p = &m->wgen;
+ req_stack_p = &m->req_stack;
+ }
/*
* Avoid zero wgen.
@@ -2815,7 +2844,13 @@ void dept_request_event(struct dept_map *m, struct dept_ext_wgen *ewg)
wg = atomic_inc_return(&wgen) ?: atomic_inc_return(&wgen);
WRITE_ONCE(*wg_p, wg);
- dept_exit_recursive(flags);
+ arch_spin_lock(&dept_req_spin);
+ if (*req_stack_p)
+ put_stack(*req_stack_p);
+ *req_stack_p = get_current_stack();
+ arch_spin_unlock(&dept_req_spin);
+
+ dept_exit(flags);
}
EXPORT_SYMBOL_GPL(dept_request_event);
@@ -2826,6 +2861,8 @@ void dept_event(struct dept_map *m, unsigned long e_f,
struct dept_task *dt = dept_task();
unsigned long flags;
unsigned int *wg_p;
+ struct dept_stack **req_stack_p;
+ struct dept_stack *req_stack;
if (unlikely(!dept_working()))
return;
@@ -2833,7 +2870,18 @@ void dept_event(struct dept_map *m, unsigned long e_f,
if (m->nocheck)
return;
- wg_p = ewg ? &ewg->wgen : &m->wgen;
+ if (ewg) {
+ wg_p = &ewg->wgen;
+ req_stack_p = &ewg->req_stack;
+ } else {
+ wg_p = &m->wgen;
+ req_stack_p = &m->req_stack;
+ }
+
+ arch_spin_lock(&dept_req_spin);
+ req_stack = *req_stack_p;
+ *req_stack_p = NULL;
+ arch_spin_unlock(&dept_req_spin);
if (dt->recursive) {
/*
@@ -2842,17 +2890,20 @@ void dept_event(struct dept_map *m, unsigned long e_f,
* handling the event. Disable it until the next.
*/
WRITE_ONCE(*wg_p, 0U);
+ if (req_stack)
+ put_stack(req_stack);
return;
}
flags = dept_enter();
-
- __dept_event(m, e_f, ip, e_fn, false, READ_ONCE(*wg_p));
+ __dept_event(m, e_f, ip, e_fn, false, READ_ONCE(*wg_p), req_stack);
/*
* Keep the map diabled until the next sleep.
*/
WRITE_ONCE(*wg_p, 0U);
+ if (req_stack)
+ put_stack(req_stack);
dept_exit(flags);
}
--
2.17.1
jbd2 journal handling code doesn't want jbd2_might_wait_for_commit()
to be placed between start_this_handle() and stop_this_handle(). So it
marks the region with rwsem_acquire_read() and rwsem_release().
However, the annotation is too strong for that purpose. We don't have to
use more than try lock annotation for that.
rwsem_acquire_read() implies:
1. might be a waiter on contention of the lock.
2. enter to the critical section of the lock.
All we need in here is to act 2, not 1. So trylock version of annotation
is sufficient for that purpose. Now that dept partially relies on
lockdep annotaions, dept interpets rwsem_acquire_read() as a potential
wait and might report a deadlock by the wait. So replaced it with
trylock version of annotation.
Signed-off-by: Byungchul Park <[email protected]>
---
fs/jbd2/transaction.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 5f08b5fd105a..2c159a547e15 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -460,7 +460,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
read_unlock(&journal->j_state_lock);
current->journal_info = handle;
- rwsem_acquire_read(&journal->j_trans_commit_map, 0, 0, _THIS_IP_);
+ rwsem_acquire_read(&journal->j_trans_commit_map, 0, 1, _THIS_IP_);
jbd2_journal_free_transaction(new_transaction);
/*
* Ensure that no allocations done while the transaction is open are
--
2.17.1
This document describes the concept of Dept.
Signed-off-by: Byungchul Park <[email protected]>
---
Documentation/dependency/dept.txt | 283 ++++++++++++++++++++++++++++++
1 file changed, 283 insertions(+)
create mode 100644 Documentation/dependency/dept.txt
diff --git a/Documentation/dependency/dept.txt b/Documentation/dependency/dept.txt
new file mode 100644
index 000000000000..7efe3bc59b2d
--- /dev/null
+++ b/Documentation/dependency/dept.txt
@@ -0,0 +1,283 @@
+DEPT(DEPendency Tracker)
+========================
+
+Started by Byungchul Park <[email protected]>
+
+How lockdep works
+-----------------
+
+Lockdep tries to detect a deadlock by checking lock acquisition order.
+For example, consider a graph built by lockdep like:
+
+ A -> B -
+ \
+ -> E
+ /
+ C -> D -
+
+ where 'A -> B' means that acquisition A is prior to acquisition B
+ with A still held.
+
+Lockdep keeps adding each new acquisition order into the graph in
+runtime. For example, 'E -> C' will be added when it's recognized that
+the two locks have been acquired in that order like:
+
+ A -> B -
+ \
+ -> E -
+ / \
+ -> C -> D - \
+ / /
+ \ /
+ ------------------
+
+ where 'A -> B' means that acquisition A is prior to acquisition B
+ with A still held.
+
+This graph contains a subgraph that demonstrates a loop like:
+
+ -> E -
+ / \
+ -> C -> D - \
+ / /
+ \ /
+ ------------------
+
+ where 'A -> B' means that acquisition A is prior to acquisition B
+ with A still held.
+
+Lockdep reports it as a deadlock on detection of a loop.
+
+CONCLUSION
+
+Lockdep detects a deadlock by checking if a loop has been created after
+expanding the graph.
+
+
+Limitation of lockdep
+---------------------
+
+Lockdep works on typical lock e.g. spinlock and mutex, that are supposed
+to be released within the acquisition context. However, a deadlock by
+folio lock or other synchronization mechanisms cannot be detected by
+lockdep that basically tracks lock acquisition order.
+
+Can we detect the following deadlock?
+
+ CONTEXT X CONTEXT Y CONTEXT Z
+
+ mutex_lock A
+ folio_lock B
+ folio_lock B
+ mutex_lock A /* DEADLOCK */
+ folio_unlock B
+ folio_unlock B
+ mutex_unlock A
+ mutex_unlock A
+
+No, we can't. What about the following?
+
+ CONTEXT X CONTEXT Y
+
+ mutex_lock A
+ mutex_lock A
+ wait_for_complete B /* DEADLOCK */
+ complete B
+ mutex_unlock A
+ mutex_unlock A
+
+No, we can't.
+
+CONCLUSION
+
+Given the limitation, lockdep cannot detect a deadlock by folio lock or
+other synchronization mechanisms.
+
+
+What leads a deadlock
+---------------------
+
+A deadlock occurs when one or multi contexts are waiting for events that
+will never happen. For example:
+
+ CONTEXT X CONTEXT Y CONTEXT Z
+
+ | | |
+ v | |
+ (1) wait for A v |
+ . (2) wait for C v
+ event C . (3) wait for B
+ event B .
+ event A
+
+Event C cannot be triggered because context X is stuck at (1), event B
+cannot be triggered because context Y is stuck at (2), and event A
+cannot be triggered because context Z is stuck at (3). All the contexts
+are stuck. We call the situation a *deadlock*.
+
+If an event occurrence is a prerequisite to reaching another event, we
+call it *dependency*. In the example above:
+
+ Event A occurrence is a prerequisite to reaching event C.
+ Event C occurrence is a prerequisite to reaching event B.
+ Event B occurrence is a prerequisite to reaching event A.
+
+In terms of dependency:
+
+ Event C depends on event A.
+ Event B depends on event C.
+ Event A depends on event B.
+
+Dependencies in a graph look like:
+
+ -> C -> A -> B -
+ / \
+ \ /
+ ----------------
+
+ where 'A -> B' means that event A depends on event B.
+
+A circular dependency exists. Such a circular dependency leads a
+deadlock since no waiters can have desired events triggered.
+
+CONCLUSION
+
+A circular dependency leads a deadlock.
+
+
+Introduce DEPT
+--------------
+
+DEPT(DEPendency Tracker) tracks wait and event instead of lock
+acquisition order so as to recognize the following situation:
+
+ CONTEXT X CONTEXT Y CONTEXT Z
+
+ | | |
+ v | |
+ wait for A v |
+ . wait for C v
+ event C . wait for B
+ event B .
+ event A
+
+and builds up a dependency graph in runtime, similar to lockdep. The
+graph would look like:
+
+ -> C -> A -> B -
+ / \
+ \ /
+ ----------------
+
+ where 'A -> B' means that event A depends on event B.
+
+DEPT keeps adding each new dependency into the graph in runtime. For
+example, 'B -> D' will be added when it's recognized that event D
+occurrence is a prerequisite to reaching event B, in other words, event
+B depends on event D like:
+
+ |
+ v
+ wait for D
+ .
+ event B
+
+After adding 'B -> D' dependency into the graph, the graph would look
+like:
+
+ -> D
+ /
+ -> C -> A -> B -
+ / \
+ \ /
+ ----------------
+
+ where 'A -> B' means that event A depends on event B.
+
+DEPT is going to report a deadlock on detection of a new loop.
+
+CONCLUSION
+
+DEPT works on wait and event so as to theoretically detect all the
+potential deadlocks.
+
+
+How DEPT works
+--------------
+
+Let's take a look how DEPT works with an example that was mentioned in
+the section 'Limitation of lockdep'.
+
+ CONTEXT X CONTEXT Y CONTEXT Z
+
+ mutex_lock A
+ folio_lock B
+ folio_lock B
+ mutex_lock A /* DEADLOCK */
+ folio_unlock B
+ folio_unlock B
+ mutex_unlock A
+ mutex_unlock A
+
+Add comments to describe DEPT's view using terms of wait and event.
+
+ CONTEXT X CONTEXT Y CONTEXT Z
+
+ mutex_lock A
+ /* start to take into account event A context */
+ folio_lock B
+ /* start to take into account event B context */
+
+ folio_lock B
+ /* wait for B */
+ (1)
+ mutex_lock A /* DEADLOCK */
+ /* wait for A */
+ (2)
+
+ folio_unlock B
+ /* event B */
+ folio_unlock B
+ /* not interest until reaching (1) */
+
+ mutex_unlock A
+ /* event A */
+ mutex_unlock A
+ /* not interest until reaching (2) */
+
+Focusing on wait and event, the example can be simplified like:
+
+ CONTEXT X CONTEXT Y CONTEXT Z
+
+ | |
+ | |
+ v |
+ wait for B v
+ . wait for A
+ . .
+ . event B
+ event A
+
+Event A occurrence is a prerequisite to reaching event B, and event B
+occurrence is a prerequisite to reaching event A.
+
+In terms of dependency:
+
+ Event B depends on event A.
+ Event A depends on event B.
+
+Dependencies in the dependency graph look like:
+
+ -> A -> B -
+ / \
+ \ /
+ -----------
+
+ where 'A -> B' means that event A depends on event B.
+
+A loop has been created. So DEPT can report it as a deadlock.
+
+CONCLUSION
+
+DEPT works well with any synchronization mechanisms by focusing on wait
+and event.
--
2.17.1
Wrapped the base APIs for easier annotation on typical lock.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept_ldt.h | 77 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
create mode 100644 include/linux/dept_ldt.h
diff --git a/include/linux/dept_ldt.h b/include/linux/dept_ldt.h
new file mode 100644
index 000000000000..062613e89fc3
--- /dev/null
+++ b/include/linux/dept_ldt.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Lock Dependency Tracker
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __LINUX_DEPT_LDT_H
+#define __LINUX_DEPT_LDT_H
+
+#include <linux/dept.h>
+
+#ifdef CONFIG_DEPT
+#define LDT_EVT_L 1UL
+#define LDT_EVT_R 2UL
+#define LDT_EVT_W 1UL
+#define LDT_EVT_RW (LDT_EVT_R | LDT_EVT_W)
+#define LDT_EVT_ALL (LDT_EVT_L | LDT_EVT_RW)
+
+#define ldt_init(m, k, su, n) dept_map_init(m, k, su, n)
+#define ldt_lock(m, sl, t, n, i) \
+ do { \
+ if (n) \
+ dept_ecxt_enter_nokeep(m); \
+ else if (t) \
+ dept_ecxt_enter(m, LDT_EVT_L, i, "trylock", "unlock", sl);\
+ else { \
+ dept_wait(m, LDT_EVT_L, i, "lock", sl); \
+ dept_ecxt_enter(m, LDT_EVT_L, i, "lock", "unlock", sl);\
+ } \
+ } while (0)
+
+#define ldt_rlock(m, sl, t, n, i, q) \
+ do { \
+ if (n) \
+ dept_ecxt_enter_nokeep(m); \
+ else if (t) \
+ dept_ecxt_enter(m, LDT_EVT_R, i, "read_trylock", "read_unlock", sl);\
+ else { \
+ dept_wait(m, q ? LDT_EVT_RW : LDT_EVT_W, i, "read_lock", sl);\
+ dept_ecxt_enter(m, LDT_EVT_R, i, "read_lock", "read_unlock", sl);\
+ } \
+ } while (0)
+
+#define ldt_wlock(m, sl, t, n, i) \
+ do { \
+ if (n) \
+ dept_ecxt_enter_nokeep(m); \
+ else if (t) \
+ dept_ecxt_enter(m, LDT_EVT_W, i, "write_trylock", "write_unlock", sl);\
+ else { \
+ dept_wait(m, LDT_EVT_RW, i, "write_lock", sl); \
+ dept_ecxt_enter(m, LDT_EVT_W, i, "write_lock", "write_unlock", sl);\
+ } \
+ } while (0)
+
+#define ldt_unlock(m, i) dept_ecxt_exit(m, LDT_EVT_ALL, i)
+
+#define ldt_downgrade(m, i) \
+ do { \
+ if (dept_ecxt_holding(m, LDT_EVT_W)) \
+ dept_map_ecxt_modify(m, LDT_EVT_W, NULL, LDT_EVT_R, i, "downgrade", "read_unlock", -1);\
+ } while (0)
+
+#define ldt_set_class(m, n, k, sl, i) dept_map_ecxt_modify(m, LDT_EVT_ALL, k, 0UL, i, "lock_set_class", "(any)unlock", sl)
+#else /* !CONFIG_DEPT */
+#define ldt_init(m, k, su, n) do { (void)(k); } while (0)
+#define ldt_lock(m, sl, t, n, i) do { } while (0)
+#define ldt_rlock(m, sl, t, n, i, q) do { } while (0)
+#define ldt_wlock(m, sl, t, n, i) do { } while (0)
+#define ldt_unlock(m, i) do { } while (0)
+#define ldt_downgrade(m, i) do { } while (0)
+#define ldt_set_class(m, n, k, sl, i) do { } while (0)
+#endif
+#endif /* __LINUX_DEPT_LDT_H */
--
2.17.1
Makes Dept able to track dependencies by waitqueue waits.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/wait.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 3473b663176f..ebeb4678859f 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -7,6 +7,7 @@
#include <linux/list.h>
#include <linux/stddef.h>
#include <linux/spinlock.h>
+#include <linux/dept_sdt.h>
#include <asm/current.h>
#include <uapi/linux/wait.h>
@@ -303,6 +304,7 @@ extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags);
struct wait_queue_entry __wq_entry; \
long __ret = ret; /* explicit shadow */ \
\
+ sdt_might_sleep_start(NULL); \
init_wait_entry(&__wq_entry, exclusive ? WQ_FLAG_EXCLUSIVE : 0); \
for (;;) { \
long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);\
@@ -318,6 +320,7 @@ extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags);
cmd; \
} \
finish_wait(&wq_head, &__wq_entry); \
+ sdt_might_sleep_end(); \
__out: __ret; \
})
--
2.17.1