I'm gonna re-add RFC for a while at Ted's request. But hard testing is
needed to find false alarms for now that there's no false alarm with my
system. I'm gonna look for other systems that might produce false
alarms. And it'd be appreciated if you share it when you see any alarms
with yours.
---
Hi Linus and folks,
I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.17-rc7 tag.
https://github.com/lgebyungchulpark/linux-dept/commits/dept1.18_on_v5.17-rc7
Benifit:
0. Works with all lock primitives.
1. Works with wait_for_completion()/complete().
2. Works with 'wait' on PG_locked.
3. Works with 'wait' on PG_writeback.
4. Works with swait/wakeup.
5. Works with waitqueue.
6. Multiple reports are allowed.
7. Deduplication control on multiple reports.
8. Withstand false positives thanks to 6.
9. Easy to tag any wait/event.
Future work:
0. To make it more stable.
1. To separates Dept from Lockdep.
2. To improves performance in terms of time and space.
3. To use Dept as a dependency engine for Lockdep.
4. To add any missing tags of wait/event in the kernel.
5. To deduplicate stack trace.
How to interpret reports:
1. E(event) in each context cannot be triggered because of the
W(wait) that cannot be woken.
2. The stack trace helping find the problematic code is located
in each conext's detail.
Thanks,
Byungchul
---
Changes from v4:
1. Fix some bugs that produce false alarms.
2. Distinguish each syscall context from another *for arm64*.
3. Make it not warn it but just print it in case Dept ring
buffer gets exhausted. (feedback from Hyeonggon)
4. Explicitely describe "EXPERIMENTAL" and "Dept might produce
false positive reports" in Kconfig. (feedback from Ted)
Changes from v3:
1. Dept shouldn't create dependencies between different depths
of a class that were indicated by *_lock_nested(). Dept
normally doesn't but it does once another lock class comes
in. So fixed it. (feedback from Hyeonggon)
2. Dept considered a wait as a real wait once getting to
__schedule() even if it has been set to TASK_RUNNING by wake
up sources in advance. Fixed it so that Dept doesn't consider
the case as a real wait. (feedback from Jan Kara)
3. Stop tracking dependencies with a map once the event
associated with the map has been handled. Dept will start to
work with the map again, on the next sleep.
Changes from v2:
1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
reporting a lot of false positives, which is my fault.
Wait/event for bit_wait_table[] should've been tagged in a
higher layer for better work, which is a future work.
(feedback from Jan Kara)
2. Disable Dept on crypto_larval's completion to prevent a false
positive.
Changes from v1:
1. Fix coding style and typo. (feedback from Steven)
2. Distinguish each work context from another in workqueue.
3. Skip checking lock acquisition with nest_lock, which is about
correct lock usage that should be checked by Lockdep.
Changes from RFC:
1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
(feedback from Linus and Matthew)
2. Use try version at lockdep_acquire_cpus_lock() annotation.
3. Distinguish each syscall context from another.
Byungchul Park (21):
llist: Move llist_{head,node} definition to types.h
dept: Implement Dept(Dependency Tracker)
dept: Embed Dept data in Lockdep
dept: Apply Dept to spinlock
dept: Apply Dept to mutex families
dept: Apply Dept to rwlock
dept: Apply Dept to wait_for_completion()/complete()
dept: Apply Dept to seqlock
dept: Apply Dept to rwsem
dept: Add proc knobs to show stats and dependency graph
dept: Introduce split map concept and new APIs for them
dept: Apply Dept to wait/event of PG_{locked,writeback}
dept: Apply SDT to swait
dept: Apply SDT to wait(waitqueue)
locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
dept: Distinguish each syscall context from another
dept: Distinguish each work from another
dept: Disable Dept within the wait_bit layer by default
dept: Add nocheck version of init_completion()
dept: Disable Dept on struct crypto_larval's completion for now
dept: Don't create dependencies between different depths in any case
arch/arm64/kernel/syscall.c | 2 +
arch/x86/entry/common.c | 4 +
crypto/api.c | 7 +-
include/linux/completion.h | 50 +-
include/linux/dept.h | 544 +++++++
include/linux/dept_page.h | 78 +
include/linux/dept_sdt.h | 62 +
include/linux/hardirq.h | 3 +
include/linux/irqflags.h | 33 +-
include/linux/llist.h | 8 -
include/linux/lockdep.h | 157 ++-
include/linux/lockdep_types.h | 3 +
include/linux/mutex.h | 32 +
include/linux/page-flags.h | 45 +-
include/linux/pagemap.h | 7 +-
include/linux/percpu-rwsem.h | 10 +-
include/linux/rtmutex.h | 7 +
include/linux/rwlock.h | 50 +
include/linux/rwlock_api_smp.h | 8 +-
include/linux/rwlock_types.h | 7 +
include/linux/rwsem.h | 32 +
include/linux/sched.h | 7 +
include/linux/seqlock.h | 68 +-
include/linux/spinlock.h | 25 +
include/linux/spinlock_types_raw.h | 13 +
include/linux/swait.h | 4 +
include/linux/types.h | 8 +
include/linux/wait.h | 6 +-
init/init_task.c | 2 +
init/main.c | 4 +
kernel/Makefile | 1 +
kernel/cpu.c | 2 +-
kernel/dependency/Makefile | 4 +
kernel/dependency/dept.c | 2743 ++++++++++++++++++++++++++++++++++++
kernel/dependency/dept_hash.h | 10 +
kernel/dependency/dept_internal.h | 26 +
kernel/dependency/dept_object.h | 13 +
kernel/dependency/dept_proc.c | 92 ++
kernel/exit.c | 1 +
kernel/fork.c | 2 +
kernel/locking/lockdep.c | 12 +-
kernel/module.c | 2 +
kernel/sched/completion.c | 12 +-
kernel/sched/core.c | 8 +
kernel/sched/swait.c | 10 +
kernel/sched/wait.c | 16 +
kernel/sched/wait_bit.c | 5 +-
kernel/softirq.c | 6 +-
kernel/trace/trace_preemptirq.c | 19 +-
kernel/workqueue.c | 3 +
lib/Kconfig.debug | 27 +
mm/filemap.c | 68 +
mm/page_ext.c | 5 +
53 files changed, 4313 insertions(+), 60 deletions(-)
create mode 100644 include/linux/dept.h
create mode 100644 include/linux/dept_page.h
create mode 100644 include/linux/dept_sdt.h
create mode 100644 kernel/dependency/Makefile
create mode 100644 kernel/dependency/dept.c
create mode 100644 kernel/dependency/dept_hash.h
create mode 100644 kernel/dependency/dept_internal.h
create mode 100644 kernel/dependency/dept_object.h
create mode 100644 kernel/dependency/dept_proc.c
--
1.9.1
Workqueue already provides concurrency control. By that, any wait in a
work doesn't prevents events in other works with the control enabled.
Thus, each work would better be considered a different context.
So let Dept assign a different context id to each work.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/dept.h | 2 ++
kernel/dependency/dept.c | 10 ++++++++++
kernel/workqueue.c | 3 +++
3 files changed, 15 insertions(+)
diff --git a/include/linux/dept.h b/include/linux/dept.h
index 672a762..3d24a52 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -491,6 +491,7 @@ struct dept_task {
extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
extern void dept_kernel_enter(void);
+extern void dept_work_enter(void);
static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
{
@@ -535,6 +536,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
#define dept_event_split_map(me, mc, ip, e_fn) do { } while (0)
#define dept_ask_event_split_map(me, mc) do { } while (0)
#define dept_kernel_enter() do { } while (0)
+#define dept_work_enter() do { } while (0)
#define dept_ecxt_enter_nokeep(m) do { } while (0)
#define dept_key_init(k) do { (void)(k); } while (0)
#define dept_key_destroy(k) do { (void)(k); } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 8d8d8eb..10801783 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -1886,6 +1886,16 @@ void dept_disable_hardirq(unsigned long ip)
dept_exit(flags);
}
+/*
+ * Assign a different context id to each work.
+ */
+void dept_work_enter(void)
+{
+ struct dept_task *dt = dept_task();
+
+ dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);
+}
+
void dept_kernel_enter(void)
{
struct dept_task *dt = dept_task();
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106..f5d762c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -51,6 +51,7 @@
#include <linux/sched/isolation.h>
#include <linux/nmi.h>
#include <linux/kvm_para.h>
+#include <linux/dept.h>
#include "workqueue_internal.h"
@@ -2217,6 +2218,8 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
lockdep_copy_map(&lockdep_map, &work->lockdep_map);
#endif
+ dept_work_enter();
+
/* ensure we're on the correct CPU */
WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
raw_smp_processor_id() != pool->cpu);
--
1.9.1
Makes SDT able to track dependencies by swait.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/swait.h | 4 ++++
kernel/sched/swait.c | 10 ++++++++++
2 files changed, 14 insertions(+)
diff --git a/include/linux/swait.h b/include/linux/swait.h
index 6a8c22b..dbdf2ce 100644
--- a/include/linux/swait.h
+++ b/include/linux/swait.h
@@ -6,6 +6,7 @@
#include <linux/stddef.h>
#include <linux/spinlock.h>
#include <linux/wait.h>
+#include <linux/dept_sdt.h>
#include <asm/current.h>
/*
@@ -43,6 +44,7 @@
struct swait_queue_head {
raw_spinlock_t lock;
struct list_head task_list;
+ struct dept_map dmap;
};
struct swait_queue {
@@ -61,6 +63,7 @@ struct swait_queue {
#define __SWAIT_QUEUE_HEAD_INITIALIZER(name) { \
.lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \
.task_list = LIST_HEAD_INIT((name).task_list), \
+ .dmap = DEPT_SDT_MAP_INIT(name), \
}
#define DECLARE_SWAIT_QUEUE_HEAD(name) \
@@ -72,6 +75,7 @@ extern void __init_swait_queue_head(struct swait_queue_head *q, const char *name
#define init_swait_queue_head(q) \
do { \
static struct lock_class_key __key; \
+ sdt_map_init(&(q)->dmap); \
__init_swait_queue_head((q), #q, &__key); \
} while (0)
diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c
index e1c655f..4ca7d6e 100644
--- a/kernel/sched/swait.c
+++ b/kernel/sched/swait.c
@@ -27,6 +27,7 @@ void swake_up_locked(struct swait_queue_head *q)
return;
curr = list_first_entry(&q->task_list, typeof(*curr), task_list);
+ sdt_event(&q->dmap);
wake_up_process(curr->task);
list_del_init(&curr->task_list);
}
@@ -69,6 +70,7 @@ void swake_up_all(struct swait_queue_head *q)
while (!list_empty(&tmp)) {
curr = list_first_entry(&tmp, typeof(*curr), task_list);
+ sdt_event(&q->dmap);
wake_up_state(curr->task, TASK_NORMAL);
list_del_init(&curr->task_list);
@@ -97,6 +99,9 @@ void prepare_to_swait_exclusive(struct swait_queue_head *q, struct swait_queue *
__prepare_to_swait(q, wait);
set_current_state(state);
raw_spin_unlock_irqrestore(&q->lock, flags);
+
+ if (state & TASK_NORMAL)
+ sdt_wait_prepare(&q->dmap);
}
EXPORT_SYMBOL(prepare_to_swait_exclusive);
@@ -119,12 +124,16 @@ long prepare_to_swait_event(struct swait_queue_head *q, struct swait_queue *wait
}
raw_spin_unlock_irqrestore(&q->lock, flags);
+ if (!ret && state & TASK_NORMAL)
+ sdt_wait_prepare(&q->dmap);
+
return ret;
}
EXPORT_SYMBOL(prepare_to_swait_event);
void __finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
{
+ sdt_wait_finish();
__set_current_state(TASK_RUNNING);
if (!list_empty(&wait->task_list))
list_del_init(&wait->task_list);
@@ -134,6 +143,7 @@ void finish_swait(struct swait_queue_head *q, struct swait_queue *wait)
{
unsigned long flags;
+ sdt_wait_finish();
__set_current_state(TASK_RUNNING);
if (!list_empty_careful(&wait->task_list)) {
--
1.9.1
Dept already prevents creating dependencies between different depths of
the class indicated by *_lock_nested() when the lock acquisitions happen
consecutively. For example:
lock A0 with depth
lock_nested A1 with depth + 1
...
unlock A1
unlock A0
Dept does not create A0 -> A1 dependency in this case. However, once
another class cut in, the code becomes problematic. When Dept tries to
create real dependencies, it does not only create real ones but also
wrong ones between different depths of the class. For example:
lock A0 with depth
lock B
lock_nested A1 with depth + 1
...
unlock A1
unlock B
unlock A0
Even in this case, Dept should not create A0 -> A1 dependency but it
does. Let Dept not create wrong dependencies between different depths
of the class in any case.
Reported-by: [email protected]
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/dependency/dept.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 10801783..a2088685 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -1458,14 +1458,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
eh = dt->ecxt_held + i;
if (eh->ecxt->class != c || eh->nest == ne)
- break;
- }
-
- for (; i >= 0; i--) {
- struct dept_ecxt_held *eh;
-
- eh = dt->ecxt_held + i;
- add_dep(eh->ecxt, w);
+ add_dep(eh->ecxt, w);
}
if (!wait_consumed(w) && !rich_stack) {
--
1.9.1
Makes SDT able to track dependencies by wait(waitqueue).
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/wait.h | 6 +++++-
kernel/sched/wait.c | 16 ++++++++++++++++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/include/linux/wait.h b/include/linux/wait.h
index 851e07d..2133998 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -7,6 +7,7 @@
#include <linux/list.h>
#include <linux/stddef.h>
#include <linux/spinlock.h>
+#include <linux/dept_sdt.h>
#include <asm/current.h>
#include <uapi/linux/wait.h>
@@ -37,6 +38,7 @@ struct wait_queue_entry {
struct wait_queue_head {
spinlock_t lock;
struct list_head head;
+ struct dept_map dmap;
};
typedef struct wait_queue_head wait_queue_head_t;
@@ -56,7 +58,8 @@ struct wait_queue_head {
#define __WAIT_QUEUE_HEAD_INITIALIZER(name) { \
.lock = __SPIN_LOCK_UNLOCKED(name.lock), \
- .head = LIST_HEAD_INIT(name.head) }
+ .head = LIST_HEAD_INIT(name.head), \
+ .dmap = DEPT_SDT_MAP_INIT(name) }
#define DECLARE_WAIT_QUEUE_HEAD(name) \
struct wait_queue_head name = __WAIT_QUEUE_HEAD_INITIALIZER(name)
@@ -67,6 +70,7 @@ struct wait_queue_head {
do { \
static struct lock_class_key __key; \
\
+ sdt_map_init(&(wq_head)->dmap); \
__init_waitqueue_head((wq_head), #wq_head, &__key); \
} while (0)
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index eca3810..fc5a16a 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -105,6 +105,7 @@ static int __wake_up_common(struct wait_queue_head *wq_head, unsigned int mode,
if (flags & WQ_FLAG_BOOKMARK)
continue;
+ sdt_event(&wq_head->dmap);
ret = curr->func(curr, mode, wake_flags, key);
if (ret < 0)
break;
@@ -268,6 +269,9 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head)
__add_wait_queue(wq_head, wq_entry);
set_current_state(state);
spin_unlock_irqrestore(&wq_head->lock, flags);
+
+ if (state & TASK_NORMAL)
+ sdt_wait_prepare(&wq_head->dmap);
}
EXPORT_SYMBOL(prepare_to_wait);
@@ -286,6 +290,10 @@ void __wake_up_pollfree(struct wait_queue_head *wq_head)
}
set_current_state(state);
spin_unlock_irqrestore(&wq_head->lock, flags);
+
+ if (state & TASK_NORMAL)
+ sdt_wait_prepare(&wq_head->dmap);
+
return was_empty;
}
EXPORT_SYMBOL(prepare_to_wait_exclusive);
@@ -331,6 +339,9 @@ long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_en
}
spin_unlock_irqrestore(&wq_head->lock, flags);
+ if (!ret && state & TASK_NORMAL)
+ sdt_wait_prepare(&wq_head->dmap);
+
return ret;
}
EXPORT_SYMBOL(prepare_to_wait_event);
@@ -352,7 +363,9 @@ int do_wait_intr(wait_queue_head_t *wq, wait_queue_entry_t *wait)
return -ERESTARTSYS;
spin_unlock(&wq->lock);
+ sdt_wait_prepare(&wq->dmap);
schedule();
+ sdt_wait_finish();
spin_lock(&wq->lock);
return 0;
@@ -369,7 +382,9 @@ int do_wait_intr_irq(wait_queue_head_t *wq, wait_queue_entry_t *wait)
return -ERESTARTSYS;
spin_unlock_irq(&wq->lock);
+ sdt_wait_prepare(&wq->dmap);
schedule();
+ sdt_wait_finish();
spin_lock_irq(&wq->lock);
return 0;
@@ -389,6 +404,7 @@ void finish_wait(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_en
{
unsigned long flags;
+ sdt_wait_finish();
__set_current_state(TASK_RUNNING);
/*
* We can check for list emptiness outside the lock
--
1.9.1
Makes Dept able to track dependencies by seqlock with adding wait
annotation on read side of seqlock.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/seqlock.h | 68 +++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 66 insertions(+), 2 deletions(-)
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 37ded6b..585f45c 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -23,6 +23,31 @@
#include <asm/processor.h>
+#ifdef CONFIG_DEPT
+#define DEPT_EVT_ALL ((1UL << DEPT_MAX_SUBCLASSES_EVT) - 1)
+#define dept_seq_wait(m, ip) dept_wait(m, DEPT_EVT_ALL, ip, __func__, 0)
+#define dept_seq_writebegin(m, ip) \
+do { \
+ dept_ecxt_enter(m, 1UL, ip, __func__, "write_seqcount_end", 0);\
+ dept_ask_event(m); \
+} while (0)
+#define dept_seq_writeend(m, ip) \
+do { \
+ dept_event(m, 1UL, ip, __func__); \
+ dept_ecxt_exit(m, 1UL, ip); \
+} while (0)
+#else
+#define dept_seq_wait(m, ip) do { } while (0)
+#define dept_seq_writebegin(m, ip) do { } while (0)
+#define dept_seq_writeend(m, ip) do { } while (0)
+#endif
+
+#ifdef CONFIG_DEPT
+#define SEQ_DMAP_INIT(lockname) .dmap = { .name = #lockname }
+#else
+#define SEQ_DMAP_INIT(lockname)
+#endif
+
/*
* The seqlock seqcount_t interface does not prescribe a precise sequence of
* read begin/retry/end. For readers, typically there is a call to
@@ -82,7 +107,8 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define SEQCOUNT_DEP_MAP_INIT(lockname) \
- .dep_map = { .name = #lockname }
+ .dep_map = { .name = #lockname, \
+ SEQ_DMAP_INIT(lockname) }
/**
* seqcount_init() - runtime initializer for seqcount_t
@@ -148,7 +174,7 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
* This lock-unlock technique must be implemented for all of PREEMPT_RT
* sleeping locks. See Documentation/locking/locktypes.rst
*/
-#if defined(CONFIG_LOCKDEP) || defined(CONFIG_PREEMPT_RT)
+#if defined(CONFIG_LOCKDEP) || defined(CONFIG_DEPT) || defined(CONFIG_PREEMPT_RT)
#define __SEQ_LOCK(expr) expr
#else
#define __SEQ_LOCK(expr)
@@ -203,6 +229,22 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s)
__SEQ_LOCK(locktype *lock); \
} seqcount_##lockname##_t; \
\
+static __always_inline void \
+__seqprop_##lockname##_wait(const seqcount_##lockname##_t *s) \
+{ \
+ __SEQ_LOCK(dept_seq_wait(&(lockmember)->dep_map.dmap, _RET_IP_));\
+} \
+ \
+static __always_inline void \
+__seqprop_##lockname##_writebegin(const seqcount_##lockname##_t *s) \
+{ \
+} \
+ \
+static __always_inline void \
+__seqprop_##lockname##_writeend(const seqcount_##lockname##_t *s) \
+{ \
+} \
+ \
static __always_inline seqcount_t * \
__seqprop_##lockname##_ptr(seqcount_##lockname##_t *s) \
{ \
@@ -271,6 +313,21 @@ static inline void __seqprop_assert(const seqcount_t *s)
lockdep_assert_preemption_disabled();
}
+static inline void __seqprop_wait(seqcount_t *s)
+{
+ dept_seq_wait(&s->dep_map.dmap, _RET_IP_);
+}
+
+static inline void __seqprop_writebegin(seqcount_t *s)
+{
+ dept_seq_writebegin(&s->dep_map.dmap, _RET_IP_);
+}
+
+static inline void __seqprop_writeend(seqcount_t *s)
+{
+ dept_seq_writeend(&s->dep_map.dmap, _RET_IP_);
+}
+
#define __SEQ_RT IS_ENABLED(CONFIG_PREEMPT_RT)
SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t, false, s->lock, raw_spin, raw_spin_lock(s->lock))
@@ -311,6 +368,9 @@ static inline void __seqprop_assert(const seqcount_t *s)
#define seqprop_sequence(s) __seqprop(s, sequence)
#define seqprop_preemptible(s) __seqprop(s, preemptible)
#define seqprop_assert(s) __seqprop(s, assert)
+#define seqprop_dept_wait(s) __seqprop(s, wait)
+#define seqprop_dept_writebegin(s) __seqprop(s, writebegin)
+#define seqprop_dept_writeend(s) __seqprop(s, writeend)
/**
* __read_seqcount_begin() - begin a seqcount_t read section w/o barrier
@@ -360,6 +420,7 @@ static inline void __seqprop_assert(const seqcount_t *s)
#define read_seqcount_begin(s) \
({ \
seqcount_lockdep_reader_access(seqprop_ptr(s)); \
+ seqprop_dept_wait(s); \
raw_read_seqcount_begin(s); \
})
@@ -512,6 +573,7 @@ static inline void do_raw_write_seqcount_end(seqcount_t *s)
preempt_disable(); \
\
do_write_seqcount_begin_nested(seqprop_ptr(s), subclass); \
+ seqprop_dept_writebegin(s); \
} while (0)
static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
@@ -538,6 +600,7 @@ static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass)
preempt_disable(); \
\
do_write_seqcount_begin(seqprop_ptr(s)); \
+ seqprop_dept_writebegin(s); \
} while (0)
static inline void do_write_seqcount_begin(seqcount_t *s)
@@ -554,6 +617,7 @@ static inline void do_write_seqcount_begin(seqcount_t *s)
*/
#define write_seqcount_end(s) \
do { \
+ seqprop_dept_writeend(s); \
do_write_seqcount_end(seqprop_ptr(s)); \
\
if (seqprop_preemptible(s)) \
--
1.9.1
cb92173d1f0 ("locking/lockdep, cpu/hotplug: Annotate AP thread") was
introduced to make lockdep_assert_cpus_held() work in AP thread.
However, the annotation is too strong for that purpose. We don't have to
use more than try lock annotation for that.
Furthermore, now that Dept was introduced, false positive alarms was
reported by that. Replaced it with try lock annotation.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/cpu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 407a256..1f92a42 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -355,7 +355,7 @@ int lockdep_is_cpus_held(void)
static void lockdep_acquire_cpus_lock(void)
{
- rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 0, _THIS_IP_);
+ rwsem_acquire(&cpu_hotplug_lock.dep_map, 0, 1, _THIS_IP_);
}
static void lockdep_release_cpus_lock(void)
--
1.9.1
It'd be useful to show Dept internal stats and dependency graph on
runtime via proc for better information. Introduced the knobs.
Signed-off-by: Byungchul Park <[email protected]>
---
kernel/dependency/Makefile | 1 +
kernel/dependency/dept.c | 24 ++++------
kernel/dependency/dept_internal.h | 26 +++++++++++
kernel/dependency/dept_proc.c | 92 +++++++++++++++++++++++++++++++++++++++
4 files changed, 128 insertions(+), 15 deletions(-)
create mode 100644 kernel/dependency/dept_internal.h
create mode 100644 kernel/dependency/dept_proc.c
diff --git a/kernel/dependency/Makefile b/kernel/dependency/Makefile
index b5cfb8a..92f1654 100644
--- a/kernel/dependency/Makefile
+++ b/kernel/dependency/Makefile
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_DEPT) += dept.o
+obj-$(CONFIG_DEPT) += dept_proc.o
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index cfd8735..b810939 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -73,6 +73,7 @@
#include <linux/hash.h>
#include <linux/dept.h>
#include <linux/utsname.h>
+#include "dept_internal.h"
static int dept_stop;
static int dept_per_cpu_ready;
@@ -233,20 +234,13 @@ static inline struct dept_task *dept_task(void)
* have been freed will be placed.
*/
-enum object_t {
-#define OBJECT(id, nr) OBJECT_##id,
- #include "dept_object.h"
-#undef OBJECT
- OBJECT_NR,
-};
-
#define OBJECT(id, nr) \
static struct dept_##id spool_##id[nr]; \
static DEFINE_PER_CPU(struct llist_head, lpool_##id);
#include "dept_object.h"
#undef OBJECT
-static struct dept_pool pool[OBJECT_NR] = {
+struct dept_pool dept_pool[OBJECT_NR] = {
#define OBJECT(id, nr) { \
.name = #id, \
.obj_sz = sizeof(struct dept_##id), \
@@ -276,7 +270,7 @@ static void *from_pool(enum object_t t)
if (DEPT_WARN_ON(!irqs_disabled()))
return NULL;
- p = &pool[t];
+ p = &dept_pool[t];
/*
* Try local pool first.
@@ -306,7 +300,7 @@ static void *from_pool(enum object_t t)
static void to_pool(void *o, enum object_t t)
{
- struct dept_pool *p = &pool[t];
+ struct dept_pool *p = &dept_pool[t];
struct llist_head *h;
preempt_disable();
@@ -1997,7 +1991,7 @@ void dept_map_nocheck(struct dept_map *m)
}
EXPORT_SYMBOL_GPL(dept_map_nocheck);
-static LIST_HEAD(classes);
+LIST_HEAD(dept_classes);
static inline bool within(const void *addr, void *start, unsigned long size)
{
@@ -2024,7 +2018,7 @@ void dept_free_range(void *start, unsigned int sz)
while (unlikely(!dept_lock()))
cpu_relax();
- list_for_each_entry_safe(c, n, &classes, all_node) {
+ list_for_each_entry_safe(c, n, &dept_classes, all_node) {
if (!within((void *)c->key, start, sz) &&
!within(c->name, start, sz))
continue;
@@ -2093,7 +2087,7 @@ static struct dept_class *check_new_class(struct dept_key *local,
c->sub = sub;
c->key = (unsigned long)(k->subkeys + sub);
hash_add_class(c);
- list_add(&c->all_node, &classes);
+ list_add(&c->all_node, &dept_classes);
unlock:
dept_unlock();
caching:
@@ -2570,8 +2564,8 @@ static void migrate_per_cpu_pool(void)
struct llist_head *from;
struct llist_head *to;
- from = &pool[i].boot_pool;
- to = per_cpu_ptr(pool[i].lpool, boot_cpu);
+ from = &dept_pool[i].boot_pool;
+ to = per_cpu_ptr(dept_pool[i].lpool, boot_cpu);
move_llist(to, from);
}
}
diff --git a/kernel/dependency/dept_internal.h b/kernel/dependency/dept_internal.h
new file mode 100644
index 0000000..007c1ee
--- /dev/null
+++ b/kernel/dependency/dept_internal.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Dept(DEPendency Tracker) - runtime dependency tracker internal header
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (c) 2020 LG Electronics, Inc., Byungchul Park
+ */
+
+#ifndef __DEPT_INTERNAL_H
+#define __DEPT_INTERNAL_H
+
+#ifdef CONFIG_DEPT
+
+enum object_t {
+#define OBJECT(id, nr) OBJECT_##id,
+ #include "dept_object.h"
+#undef OBJECT
+ OBJECT_NR,
+};
+
+extern struct list_head dept_classes;
+extern struct dept_pool dept_pool[];
+
+#endif
+#endif /* __DEPT_INTERNAL_H */
diff --git a/kernel/dependency/dept_proc.c b/kernel/dependency/dept_proc.c
new file mode 100644
index 0000000..c069354
--- /dev/null
+++ b/kernel/dependency/dept_proc.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Procfs knobs for Dept(DEPendency Tracker)
+ *
+ * Started by Byungchul Park <[email protected]>:
+ *
+ * Copyright (C) 2021 LG Electronics, Inc. , Byungchul Park
+ */
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/dept.h>
+#include "dept_internal.h"
+
+static void *l_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ return seq_list_next(v, &dept_classes, pos);
+}
+
+static void *l_start(struct seq_file *m, loff_t *pos)
+{
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ return seq_list_start_head(&dept_classes, *pos);
+}
+
+static void l_stop(struct seq_file *m, void *v)
+{
+}
+
+static int l_show(struct seq_file *m, void *v)
+{
+ struct dept_class *fc = list_entry(v, struct dept_class, all_node);
+ struct dept_dep *d;
+
+ if (v == &dept_classes) {
+ seq_puts(m, "All classes:\n\n");
+ return 0;
+ }
+
+ seq_printf(m, "[%p] %s\n", (void *)fc->key, fc->name);
+
+ /*
+ * XXX: Serialize list traversal if needed. The following might
+ * give a wrong information on contention.
+ */
+ list_for_each_entry(d, &fc->dep_head, dep_node) {
+ struct dept_class *tc = d->wait->class;
+
+ seq_printf(m, " -> [%p] %s\n", (void *)tc->key, tc->name);
+ }
+ seq_puts(m, "\n");
+
+ return 0;
+}
+
+static const struct seq_operations dept_deps_ops = {
+ .start = l_start,
+ .next = l_next,
+ .stop = l_stop,
+ .show = l_show,
+};
+
+static int dept_stats_show(struct seq_file *m, void *v)
+{
+ int r;
+
+ seq_puts(m, "Availability in the static pools:\n\n");
+#define OBJECT(id, nr) \
+ r = atomic_read(&dept_pool[OBJECT_##id].obj_nr); \
+ if (r < 0) \
+ r = 0; \
+ seq_printf(m, "%s\t%d/%d(%d%%)\n", #id, r, nr, (r * 100) / (nr));
+ #include "dept_object.h"
+#undef OBJECT
+
+ return 0;
+}
+
+static int __init dept_proc_init(void)
+{
+ proc_create_seq("dept_deps", S_IRUSR, NULL, &dept_deps_ops);
+ proc_create_single("dept_stats", S_IRUSR, NULL, dept_stats_show);
+ return 0;
+}
+
+__initcall(dept_proc_init);
--
1.9.1
It enters kernel mode on each syscall and each syscall handling should
be considered independently from the point of view of Dept. Otherwise,
Dept may wrongly track dependencies across different syscalls.
That might be a real dependency from user mode. However, now that Dept
just started to work, conservatively let Dept not track dependencies
across different syscalls.
Signed-off-by: Byungchul Park <[email protected]>
---
arch/arm64/kernel/syscall.c | 2 ++
arch/x86/entry/common.c | 4 +++
include/linux/dept.h | 39 +++++++++++++++-----------
kernel/dependency/dept.c | 67 +++++++++++++++++++++++----------------------
4 files changed, 63 insertions(+), 49 deletions(-)
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index c938603..1f219f0 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -7,6 +7,7 @@
#include <linux/ptrace.h>
#include <linux/randomize_kstack.h>
#include <linux/syscalls.h>
+#include <linux/dept.h>
#include <asm/daifflags.h>
#include <asm/debug-monitors.h>
@@ -105,6 +106,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
*/
local_daif_restore(DAIF_PROCCTX);
+ dept_kernel_enter();
if (flags & _TIF_MTE_ASYNC_FAULT) {
/*
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c28264..7cdd27a 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -19,6 +19,7 @@
#include <linux/nospec.h>
#include <linux/syscalls.h>
#include <linux/uaccess.h>
+#include <linux/dept.h>
#ifdef CONFIG_XEN_PV
#include <xen/xen-ops.h>
@@ -72,6 +73,7 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
{
+ dept_kernel_enter();
add_random_kstack_offset();
nr = syscall_enter_from_user_mode(regs, nr);
@@ -120,6 +122,7 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
{
int nr = syscall_32_enter(regs);
+ dept_kernel_enter();
add_random_kstack_offset();
/*
* Subtlety here: if ptrace pokes something larger than 2^31-1 into
@@ -140,6 +143,7 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
int nr = syscall_32_enter(regs);
int res;
+ dept_kernel_enter();
add_random_kstack_offset();
/*
* This cannot use syscall_enter_from_user_mode() as it has to
diff --git a/include/linux/dept.h b/include/linux/dept.h
index 2ff2549..672a762 100644
--- a/include/linux/dept.h
+++ b/include/linux/dept.h
@@ -25,11 +25,16 @@
#define DEPT_MAX_SUBCLASSES_USR (DEPT_MAX_SUBCLASSES / DEPT_MAX_SUBCLASSES_EVT)
#define DEPT_MAX_SUBCLASSES_CACHE 2
-#define DEPT_SIRQ 0
-#define DEPT_HIRQ 1
-#define DEPT_IRQS_NR 2
-#define DEPT_SIRQF (1UL << DEPT_SIRQ)
-#define DEPT_HIRQF (1UL << DEPT_HIRQ)
+enum {
+ DEPT_CXT_SIRQ = 0,
+ DEPT_CXT_HIRQ,
+ DEPT_CXT_IRQS_NR,
+ DEPT_CXT_PROCESS = DEPT_CXT_IRQS_NR,
+ DEPT_CXTS_NR
+};
+
+#define DEPT_SIRQF (1UL << DEPT_CXT_SIRQ)
+#define DEPT_HIRQF (1UL << DEPT_CXT_HIRQ)
struct dept_ecxt;
struct dept_iecxt {
@@ -95,8 +100,8 @@ struct dept_class {
/*
* for tracking IRQ dependencies
*/
- struct dept_iecxt iecxt[DEPT_IRQS_NR];
- struct dept_iwait iwait[DEPT_IRQS_NR];
+ struct dept_iecxt iecxt[DEPT_CXT_IRQS_NR];
+ struct dept_iwait iwait[DEPT_CXT_IRQS_NR];
};
struct dept_stack {
@@ -150,8 +155,8 @@ struct dept_ecxt {
/*
* where the IRQ-enabled happened
*/
- unsigned long enirq_ip[DEPT_IRQS_NR];
- struct dept_stack *enirq_stack[DEPT_IRQS_NR];
+ unsigned long enirq_ip[DEPT_CXT_IRQS_NR];
+ struct dept_stack *enirq_stack[DEPT_CXT_IRQS_NR];
/*
* where the event context started
@@ -194,8 +199,8 @@ struct dept_wait {
/*
* where the IRQ wait happened
*/
- unsigned long irq_ip[DEPT_IRQS_NR];
- struct dept_stack *irq_stack[DEPT_IRQS_NR];
+ unsigned long irq_ip[DEPT_CXT_IRQS_NR];
+ struct dept_stack *irq_stack[DEPT_CXT_IRQS_NR];
/*
* where the wait happened
@@ -395,19 +400,19 @@ struct dept_task {
int wait_hist_pos;
/*
- * sequential id to identify each IRQ context
+ * sequential id to identify each context
*/
- unsigned int irq_id[DEPT_IRQS_NR];
+ unsigned int cxt_id[DEPT_CXTS_NR];
/*
* for tracking IRQ-enabled points with cross-event
*/
- unsigned int wgen_enirq[DEPT_IRQS_NR];
+ unsigned int wgen_enirq[DEPT_CXT_IRQS_NR];
/*
* for keeping up-to-date IRQ-enabled points
*/
- unsigned long enirq_ip[DEPT_IRQS_NR];
+ unsigned long enirq_ip[DEPT_CXT_IRQS_NR];
/*
* current effective IRQ-enabled flag
@@ -448,7 +453,7 @@ struct dept_task {
.dept_task.wait_hist = { { .wait = NULL, } }, \
.dept_task.ecxt_held_pos = 0, \
.dept_task.wait_hist_pos = 0, \
- .dept_task.irq_id = { 0U }, \
+ .dept_task.cxt_id = { 0U }, \
.dept_task.wgen_enirq = { 0U }, \
.dept_task.enirq_ip = { 0UL }, \
.dept_task.eff_enirqf = 0UL, \
@@ -485,6 +490,7 @@ struct dept_task {
extern void dept_wait_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *w_fn, int ne);
extern void dept_event_split_map(struct dept_map_each *me, struct dept_map_common *mc, unsigned long ip, const char *e_fn);
extern void dept_ask_event_split_map(struct dept_map_each *me, struct dept_map_common *mc);
+extern void dept_kernel_enter(void);
static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
{
@@ -528,6 +534,7 @@ static inline void dept_ecxt_enter_nokeep(struct dept_map *m)
#define dept_wait_split_map(me, mc, ip, w_fn, ne) do { } while (0)
#define dept_event_split_map(me, mc, ip, e_fn) do { } while (0)
#define dept_ask_event_split_map(me, mc) do { } while (0)
+#define dept_kernel_enter() do { } while (0)
#define dept_ecxt_enter_nokeep(m) do { } while (0)
#define dept_key_init(k) do { (void)(k); } while (0)
#define dept_key_destroy(k) do { (void)(k); } while (0)
diff --git a/kernel/dependency/dept.c b/kernel/dependency/dept.c
index 1ef72b8..8d8d8eb 100644
--- a/kernel/dependency/dept.c
+++ b/kernel/dependency/dept.c
@@ -212,9 +212,9 @@ static inline struct dept_class *dep_tc(struct dept_dep *d)
static inline const char *irq_str(int irq)
{
- if (irq == DEPT_SIRQ)
+ if (irq == DEPT_CXT_SIRQ)
return "softirq";
- if (irq == DEPT_HIRQ)
+ if (irq == DEPT_CXT_HIRQ)
return "hardirq";
return "(unknown)";
}
@@ -374,7 +374,7 @@ static void initialize_class(struct dept_class *c)
{
int i;
- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
struct dept_iecxt *ie = &c->iecxt[i];
struct dept_iwait *iw = &c->iwait[i];
@@ -399,7 +399,7 @@ static void initialize_ecxt(struct dept_ecxt *e)
{
int i;
- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
e->enirq_stack[i] = NULL;
e->enirq_ip[i] = 0UL;
}
@@ -414,7 +414,7 @@ static void initialize_wait(struct dept_wait *w)
{
int i;
- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
w->irq_stack[i] = NULL;
w->irq_ip[i] = 0UL;
}
@@ -453,7 +453,7 @@ static void destroy_ecxt(struct dept_ecxt *e)
{
int i;
- for (i = 0; i < DEPT_IRQS_NR; i++)
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
if (e->enirq_stack[i])
put_stack(e->enirq_stack[i]);
if (e->class)
@@ -469,7 +469,7 @@ static void destroy_wait(struct dept_wait *w)
{
int i;
- for (i = 0; i < DEPT_IRQS_NR; i++)
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++)
if (w->irq_stack[i])
put_stack(w->irq_stack[i]);
if (w->class)
@@ -614,7 +614,7 @@ static void print_diagram(struct dept_dep *d)
const char *c_fn = e->ecxt_fn ?: "(unknown)";
irqf = e->enirqf & w->irqf;
- for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
if (!firstline)
pr_warn("\nor\n\n");
firstline = false;
@@ -645,7 +645,7 @@ static void print_dep(struct dept_dep *d)
const char *c_fn = e->ecxt_fn ?: "(unknown)";
irqf = e->enirqf & w->irqf;
- for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
pr_warn("%s has been enabled:\n", irq_str(irq));
print_ip_stack(e->enirq_ip[irq], e->enirq_stack[irq]);
pr_warn("\n");
@@ -871,7 +871,7 @@ static void bfs(struct dept_class *c, bfs_f *cb, void *in, void **out)
*/
static inline unsigned long cur_enirqf(void);
-static inline int cur_irq(void);
+static inline int cur_cxt(void);
static inline unsigned int cur_ctxt_id(void);
static inline struct dept_iecxt *iecxt(struct dept_class *c, int irq)
@@ -1397,7 +1397,7 @@ static void add_dep(struct dept_ecxt *e, struct dept_wait *w)
if (d) {
check_dl_bfs(d);
- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
struct dept_iwait *fiw = iwait(fc, i);
struct dept_iecxt *found_ie;
struct dept_iwait *found_iw;
@@ -1433,7 +1433,7 @@ static void add_wait(struct dept_class *c, unsigned long ip,
struct dept_task *dt = dept_task();
struct dept_wait *w;
unsigned int wg = 0U;
- int irq;
+ int cxt;
int i;
w = new_wait();
@@ -1445,9 +1445,9 @@ static void add_wait(struct dept_class *c, unsigned long ip,
w->wait_fn = w_fn;
w->wait_stack = get_current_stack();
- irq = cur_irq();
- if (irq < DEPT_IRQS_NR)
- add_iwait(c, irq, w);
+ cxt = cur_cxt();
+ if (cxt == DEPT_CXT_HIRQ || cxt == DEPT_CXT_SIRQ)
+ add_iwait(c, cxt, w);
/*
* Avoid adding dependency between user aware nested ecxt and
@@ -1512,7 +1512,7 @@ static bool add_ecxt(void *obj, struct dept_class *c, unsigned long ip,
eh->nest = ne;
irqf = cur_enirqf();
- for_each_set_bit(irq, &irqf, DEPT_IRQS_NR)
+ for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR)
add_iecxt(c, irq, e, false);
del_ecxt(e);
@@ -1652,7 +1652,7 @@ static void do_event(void *obj, struct dept_class *c, unsigned int wg,
break;
}
- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
struct dept_ecxt *e;
if (before(dt->wgen_enirq[i], wg))
@@ -1694,7 +1694,7 @@ static void disconnect_class(struct dept_class *c)
call_rcu(&d->rh, del_dep_rcu);
}
- for (i = 0; i < DEPT_IRQS_NR; i++) {
+ for (i = 0; i < DEPT_CXT_IRQS_NR; i++) {
stale_iecxt(iecxt(c, i));
stale_iwait(iwait(c, i));
}
@@ -1719,27 +1719,21 @@ static inline unsigned long cur_enirqf(void)
return 0UL;
}
-static inline int cur_irq(void)
+static inline int cur_cxt(void)
{
if (lockdep_softirq_context(current))
- return DEPT_SIRQ;
+ return DEPT_CXT_SIRQ;
if (lockdep_hardirq_context())
- return DEPT_HIRQ;
- return DEPT_IRQS_NR;
+ return DEPT_CXT_HIRQ;
+ return DEPT_CXT_PROCESS;
}
static inline unsigned int cur_ctxt_id(void)
{
struct dept_task *dt = dept_task();
- int irq = cur_irq();
+ int cxt = cur_cxt();
- /*
- * Normal process context
- */
- if (irq == DEPT_IRQS_NR)
- return 0U;
-
- return dt->irq_id[irq] | (1UL << irq);
+ return dt->cxt_id[cxt] | (1UL << cxt);
}
static void enirq_transition(int irq)
@@ -1789,7 +1783,7 @@ static void enirq_update(unsigned long ip)
/*
* Do enirq_transition() only on an OFF -> ON transition.
*/
- for_each_set_bit(irq, &irqf, DEPT_IRQS_NR) {
+ for_each_set_bit(irq, &irqf, DEPT_CXT_IRQS_NR) {
if (prev & (1UL << irq))
continue;
@@ -1892,6 +1886,13 @@ void dept_disable_hardirq(unsigned long ip)
dept_exit(flags);
}
+void dept_kernel_enter(void)
+{
+ struct dept_task *dt = dept_task();
+
+ dt->cxt_id[DEPT_CXT_PROCESS] += (1UL << DEPT_CXTS_NR);
+}
+
/*
* Ensure it's the outmost softirq context.
*/
@@ -1899,7 +1900,7 @@ void dept_softirq_enter(void)
{
struct dept_task *dt = dept_task();
- dt->irq_id[DEPT_SIRQ] += (1UL << DEPT_IRQS_NR);
+ dt->cxt_id[DEPT_CXT_SIRQ] += (1UL << DEPT_CXTS_NR);
}
/*
@@ -1909,7 +1910,7 @@ void dept_hardirq_enter(void)
{
struct dept_task *dt = dept_task();
- dt->irq_id[DEPT_HIRQ] += (1UL << DEPT_IRQS_NR);
+ dt->cxt_id[DEPT_CXT_HIRQ] += (1UL << DEPT_CXTS_NR);
}
/*
--
1.9.1
Makes Dept able to track dependencies by mutex families.
Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/lockdep.h | 18 +++++++++++++++---
include/linux/mutex.h | 32 ++++++++++++++++++++++++++++++++
include/linux/rtmutex.h | 7 +++++++
3 files changed, 54 insertions(+), 3 deletions(-)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 01e7427..42237fc 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -614,9 +614,21 @@ static inline void print_irqtrace_events(struct task_struct *curr)
#define seqcount_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i)
#define seqcount_release(l, i) lock_release(l, i)
-#define mutex_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define mutex_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
-#define mutex_release(l, i) lock_release(l, i)
+#define mutex_acquire(l, s, t, i) \
+do { \
+ lock_acquire_exclusive(l, s, t, NULL, i); \
+ dept_mutex_lock(&(l)->dmap, s, t, NULL, "mutex_unlock", i); \
+} while (0)
+#define mutex_acquire_nest(l, s, t, n, i) \
+do { \
+ lock_acquire_exclusive(l, s, t, n, i); \
+ dept_mutex_lock(&(l)->dmap, s, t, (n) ? &(n)->dmap : NULL, "mutex_unlock", i);\
+} while (0)
+#define mutex_release(l, i) \
+do { \
+ lock_release(l, i); \
+ dept_mutex_unlock(&(l)->dmap, i); \
+} while (0)
#define rwsem_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
#define rwsem_acquire_nest(l, s, t, n, i) lock_acquire_exclusive(l, s, t, n, i)
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 8f226d4..c321911 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -20,11 +20,18 @@
#include <linux/osq_lock.h>
#include <linux/debug_locks.h>
+#ifdef CONFIG_DEPT
+# define DMAP_MUTEX_INIT(lockname) .dmap = { .name = #lockname },
+#else
+# define DMAP_MUTEX_INIT(lockname)
+#endif
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define __DEP_MAP_MUTEX_INITIALIZER(lockname) \
, .dep_map = { \
.name = #lockname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ DMAP_MUTEX_INIT(lockname) \
}
#else
# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
@@ -75,6 +82,31 @@ struct mutex {
#endif
};
+#ifdef CONFIG_DEPT
+#define dept_mutex_lock(m, ne, t, n, e_fn, ip) \
+do { \
+ if (t) { \
+ dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne); \
+ dept_ask_event(m); \
+ } else if (n) { \
+ dept_ecxt_enter_nokeep(m); \
+ dept_ask_event(m); \
+ } else { \
+ dept_wait(m, 1UL, ip, __func__, ne); \
+ dept_ecxt_enter(m, 1UL, ip, __func__, e_fn, ne); \
+ dept_ask_event(m); \
+ } \
+} while (0)
+#define dept_mutex_unlock(m, ip) \
+do { \
+ dept_event(m, 1UL, ip, __func__); \
+ dept_ecxt_exit(m, 1UL, ip); \
+} while (0)
+#else
+#define dept_mutex_lock(m, ne, t, n, e_fn, ip) do { } while (0)
+#define dept_mutex_unlock(m, ip) do { } while (0)
+#endif
+
#ifdef CONFIG_DEBUG_MUTEXES
#define __DEBUG_MUTEX_INITIALIZER(lockname) \
diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 7d04988..60cebb0 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -76,11 +76,18 @@ static inline void rt_mutex_debug_task_free(struct task_struct *tsk) { }
__rt_mutex_init(mutex, __func__, &__key); \
} while (0)
+#ifdef CONFIG_DEPT
+#define DMAP_RT_MUTEX_INIT(mutexname) .dmap = { .name = #mutexname },
+#else
+#define DMAP_RT_MUTEX_INIT(mutexname)
+#endif
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
#define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname) \
.dep_map = { \
.name = #mutexname, \
.wait_type_inner = LD_WAIT_SLEEP, \
+ DMAP_RT_MUTEX_INIT(mutexname) \
}
#else
#define __DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)
--
1.9.1
On Wed, Mar 16, 2022 at 11:39:19PM -0400, Theodore Ts'o wrote:
> On Wed, Mar 16, 2022 at 11:26:12AM +0900, Byungchul Park wrote:
> > I'm gonna re-add RFC for a while at Ted's request. But hard testing is
> > needed to find false alarms for now that there's no false alarm with my
> > system. I'm gonna look for other systems that might produce false
> > alarms. And it'd be appreciated if you share it when you see any alarms
> > with yours.
>
> Is dept1.18_on_v5.17-rc7 roughly equivalent to the v5 version sent to
Yes.
> the list. The commit date is March 16th, so I assume it was. I tried
> merging it with the ext4 dev branch, and tried enabling CONFIG_DEPT
> and running xfstests. The result was nearly test failing, because a
> DEPT warning.
>
> I assume that this is due to some misconfiguration of DEPT on my part?
I guess it was becasue of the commit b1fca27d384e8("kernel debug:
support resetting WARN*_ONCE"). Your script seems to reset WARN*_ONCE
repeatedly.
But, yeah. It's *too much* that Dept warns it on the lack of pools. I
will switch it to just pr_warn_once().
Plus, I will implement a new functionality to expand pools to prevent
facing the situation in advance.
> And I'm curious why DEPT_WARN_ONCE is apparently getting many, many
> times?
>
> [ 760.990409] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> [ 770.319656] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> [ 772.460360] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> [ 784.039676] DEPT_WARN_ONCE: Pool(ecxt) is empty.
>
> (and this goes on over and over...)
>
> Here's the full output of the DEPT warning from trying to run
> generic/001. There is a similar warning for generic/002, generic/003,
> etc., for a total of 468 failures out of 495 tests run.
Sorry for the noise. I will prevent this as described above.
> [ 760.945068] run fstests generic/001 at 2022-03-16 08:16:53
> [ 760.985440] ------------[ cut here ]------------
> [ 760.990409] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> [ 760.995166] WARNING: CPU: 1 PID: 73369 at kernel/dependency/dept.c:297 from_pool+0xc2/0x110
> [ 761.003915] CPU: 1 PID: 73369 Comm: bash Tainted: G W 5.17.0-rc7-xfstests-00649-g5456f2312272 #520
> [ 761.014389] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> [ 761.024363] RIP: 0010:from_pool+0xc2/0x110
> [ 761.028598] Code: 3d 32 62 96 01 00 75 c2 48 6b db 38 48 c7 c7 00 94 f1 ad 48 89 04 24 c6 05 1a 62 96 01 01 48 8b b3 20 9a 2f ae e8 2f dd bf 00 <0f> 0b 48 8b 04 24 eb 98 48 63 c2 48 0f af 86 28 9a 2f ae 48 03 86
> [ 761.048189] RSP: 0018:ffffa7ce4425fd48 EFLAGS: 00010086
> [ 761.053617] RAX: 0000000000000000 RBX: 00000000000000a8 RCX: 0000000000000000
> [ 761.060965] RDX: 0000000000000001 RSI: ffffffffadfb95e0 RDI: 00000000ffffffff
> [ 761.068322] RBP: 00000000001dc598 R08: 0000000000000000 R09: ffffa7ce4425fb90
> [ 761.075789] R10: fffffffffffe0aa0 R11: fffffffffffe0ae8 R12: ffff9768e07f0600
> [ 761.083063] R13: 0000000000000000 R14: 0000000000000246 R15: 0000000000000000
> [ 761.090312] FS: 00007fd4ecc4c740(0000) GS:ffff976999400000(0000) knlGS:0000000000000000
> [ 761.098623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 761.104580] CR2: 0000563c61657eb0 CR3: 00000001328fa001 CR4: 00000000003706e0
> [ 761.111921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 761.119171] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 761.126617] Call Trace:
> [ 761.129175] <TASK>
> [ 761.131385] add_ecxt+0x54/0x1c0
> [ 761.134736] ? simple_attr_write+0x87/0x100
> [ 761.139063] dept_event+0xaa/0x1d0
> [ 761.142687] ? simple_attr_write+0x87/0x100
> [ 761.147089] __mutex_unlock_slowpath+0x60/0x2d0
> [ 761.151866] simple_attr_write+0x87/0x100
> [ 761.155997] debugfs_attr_write+0x40/0x60
> [ 761.160124] vfs_write+0xec/0x390
> [ 761.163557] ksys_write+0x68/0xe0
> [ 761.167004] do_syscall_64+0x43/0x90
> [ 761.170782] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 761.176204] RIP: 0033:0x7fd4ecd3df33
> [ 761.180010] Code: 8b 15 61 ef 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> [ 761.199551] RSP: 002b:00007ffe772d4808 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 761.207240] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4ecd3df33
> [ 761.214583] RDX: 0000000000000002 RSI: 0000563c61657eb0 RDI: 0000000000000001
> [ 761.221835] RBP: 0000563c61657eb0 R08: 000000000000000a R09: 0000000000000001
> [ 761.229537] R10: 0000563c61902240 R11: 0000000000000246 R12: 0000000000000002
> [ 761.237239] R13: 00007fd4ece0e6a0 R14: 0000000000000002 R15: 00007fd4ece0e8a0
> [ 761.245283] </TASK>
> [ 761.247586] ---[ end trace 0000000000000000 ]---
> [ 761.761829] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
> [ 769.903489] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
>
> Let me know what I should do in order to fix this DEPT_WARN_ONCE?
I will let you know on all works done.
Thank you very much for all your feedback,
Byungchul
On Fri, Mar 18, 2022 at 04:49:45PM +0900, Byungchul Park wrote:
> On Wed, Mar 16, 2022 at 11:39:19PM -0400, Theodore Ts'o wrote:
> > On Wed, Mar 16, 2022 at 11:26:12AM +0900, Byungchul Park wrote:
> > > I'm gonna re-add RFC for a while at Ted's request. But hard testing is
> > > needed to find false alarms for now that there's no false alarm with my
> > > system. I'm gonna look for other systems that might produce false
> > > alarms. And it'd be appreciated if you share it when you see any alarms
> > > with yours.
> >
> > Is dept1.18_on_v5.17-rc7 roughly equivalent to the v5 version sent to
>
> Yes.
>
> > the list. The commit date is March 16th, so I assume it was. I tried
> > merging it with the ext4 dev branch, and tried enabling CONFIG_DEPT
> > and running xfstests. The result was nearly test failing, because a
> > DEPT warning.
> >
> > I assume that this is due to some misconfiguration of DEPT on my part?
>
> I guess it was becasue of the commit b1fca27d384e8("kernel debug:
> support resetting WARN*_ONCE"). Your script seems to reset WARN*_ONCE
> repeatedly.
>
> But, yeah. It's *too much* that Dept warns it on the lack of pools. I
> will switch it to just pr_warn_once().
>
> Plus, I will implement a new functionality to expand pools to prevent
> facing the situation in advance.
>
> > And I'm curious why DEPT_WARN_ONCE is apparently getting many, many
> > times?
> >
> > [ 760.990409] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> > [ 770.319656] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> > [ 772.460360] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> > [ 784.039676] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> >
> > (and this goes on over and over...)
> >
> > Here's the full output of the DEPT warning from trying to run
> > generic/001. There is a similar warning for generic/002, generic/003,
> > etc., for a total of 468 failures out of 495 tests run.
>
> Sorry for the noise. I will prevent this as described above.
>
> > [ 760.945068] run fstests generic/001 at 2022-03-16 08:16:53
> > [ 760.985440] ------------[ cut here ]------------
> > [ 760.990409] DEPT_WARN_ONCE: Pool(ecxt) is empty.
> > [ 760.995166] WARNING: CPU: 1 PID: 73369 at kernel/dependency/dept.c:297 from_pool+0xc2/0x110
> > [ 761.003915] CPU: 1 PID: 73369 Comm: bash Tainted: G W 5.17.0-rc7-xfstests-00649-g5456f2312272 #520
> > [ 761.014389] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > [ 761.024363] RIP: 0010:from_pool+0xc2/0x110
> > [ 761.028598] Code: 3d 32 62 96 01 00 75 c2 48 6b db 38 48 c7 c7 00 94 f1 ad 48 89 04 24 c6 05 1a 62 96 01 01 48 8b b3 20 9a 2f ae e8 2f dd bf 00 <0f> 0b 48 8b 04 24 eb 98 48 63 c2 48 0f af 86 28 9a 2f ae 48 03 86
> > [ 761.048189] RSP: 0018:ffffa7ce4425fd48 EFLAGS: 00010086
> > [ 761.053617] RAX: 0000000000000000 RBX: 00000000000000a8 RCX: 0000000000000000
> > [ 761.060965] RDX: 0000000000000001 RSI: ffffffffadfb95e0 RDI: 00000000ffffffff
> > [ 761.068322] RBP: 00000000001dc598 R08: 0000000000000000 R09: ffffa7ce4425fb90
> > [ 761.075789] R10: fffffffffffe0aa0 R11: fffffffffffe0ae8 R12: ffff9768e07f0600
> > [ 761.083063] R13: 0000000000000000 R14: 0000000000000246 R15: 0000000000000000
> > [ 761.090312] FS: 00007fd4ecc4c740(0000) GS:ffff976999400000(0000) knlGS:0000000000000000
> > [ 761.098623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 761.104580] CR2: 0000563c61657eb0 CR3: 00000001328fa001 CR4: 00000000003706e0
> > [ 761.111921] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 761.119171] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 761.126617] Call Trace:
> > [ 761.129175] <TASK>
> > [ 761.131385] add_ecxt+0x54/0x1c0
> > [ 761.134736] ? simple_attr_write+0x87/0x100
> > [ 761.139063] dept_event+0xaa/0x1d0
> > [ 761.142687] ? simple_attr_write+0x87/0x100
> > [ 761.147089] __mutex_unlock_slowpath+0x60/0x2d0
> > [ 761.151866] simple_attr_write+0x87/0x100
> > [ 761.155997] debugfs_attr_write+0x40/0x60
> > [ 761.160124] vfs_write+0xec/0x390
> > [ 761.163557] ksys_write+0x68/0xe0
> > [ 761.167004] do_syscall_64+0x43/0x90
> > [ 761.170782] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 761.176204] RIP: 0033:0x7fd4ecd3df33
> > [ 761.180010] Code: 8b 15 61 ef 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18
> > [ 761.199551] RSP: 002b:00007ffe772d4808 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > [ 761.207240] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4ecd3df33
> > [ 761.214583] RDX: 0000000000000002 RSI: 0000563c61657eb0 RDI: 0000000000000001
> > [ 761.221835] RBP: 0000563c61657eb0 R08: 000000000000000a R09: 0000000000000001
> > [ 761.229537] R10: 0000563c61902240 R11: 0000000000000246 R12: 0000000000000002
> > [ 761.237239] R13: 00007fd4ece0e6a0 R14: 0000000000000002 R15: 00007fd4ece0e8a0
> > [ 761.245283] </TASK>
> > [ 761.247586] ---[ end trace 0000000000000000 ]---
> > [ 761.761829] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
> > [ 769.903489] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
> >
> > Let me know what I should do in order to fix this DEPT_WARN_ONCE?
>
> I will let you know on all works done.
I have yet to decide the design for expanding pool on demand. I should
be careful in it because Dept is working in a very low layer. I will
have it done later.
However, I temporarily sized up the pools for heavy loaded system.
Besides that, all works have been done. I've just updated the same
branch.
https://github.com/lgebyungchulpark/linux-dept/commits/dept1.18_on_v5.17-rc7
This is just for your information.
Thanks,
Byungchul