2020-08-07 07:43:51

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 00/19] lockdep: Support deadlock detection for recursive read locks

Hi Peter and Waiman,

As promised, this is the updated version of my previous lockdep patchset
for recursive read lock support. It's based on v5.8. Previous versions
can be found at:

V1: https://marc.info/?l=linux-kernel&m=150393341825453
V2: https://marc.info/?l=linux-kernel&m=150468649417950
V3: https://marc.info/?l=linux-kernel&m=150637795424969
V4: https://marc.info/?l=linux-kernel&m=151550860121565
V5: https://marc.info/?l=linux-kernel&m=151928315529363
V6: https://lore.kernel.org/lkml/[email protected]/

Changes since last version:

* I change the detection algorithm which I present in 2018
plumbers [1], you can find the explanation of the detection
method in patch #2.

* Adjust the irq safe->unsafe changes from Frederic Weisbecker

* Add more tests.


As Peter pointed out:

https://marc.info/?l=linux-kernel&m=150349072023540

The lockdep current has a limit support for recursive read locks, the
deadlock case as follow could not be detected:

read_lock(A);
lock(B);
lock(B);
write_lock(A);

I got some inspiration from Gautham R Shenoy:

https://lwn.net/Articles/332801/

, and came up with this series.

The basic idea is:

* Add recursive read locks into the graph

* Classify dependencies into -(SR)->, -(ER)->, -(SN)->,
-(EN)->, where R stands for recursive read lock, N stands for
other locks(i.e. non-recursive read locks and write locks), S
stands for shared locks (read locks, no matter recursive or
not), and E stands for exclusive locks (i.e. write locks)

* Define strong dependency paths as the paths of dependencies
don't have two adjacent dependencies as -(*R)-> and -(S*)->.

* Extend __bfs() to only traverse on strong dependency paths.

* If __bfs() finds a strong dependency circle, then a deadlock is
reported.

The whole series consists of 19 patches:

1. Add documentation for recursive read lock deadlock detection
reasoning

2. Annotate read_lock() correctly (with queued_read_lock()
semantics into consideration)

3. Do a clean up on the return value of __bfs() and its friends.

4. Make __bfs() able to visit every dependency until a match is
found. The old version of __bfs() could only visit each lock
class once, and this is insufficient if we are going to add
recursive read locks into the dependency graph.

5. Reduce the size of lock_list::distance.

6-7 Extend __bfs() to be able to traverse the stong dependency
patchs after recursive read locks added into the graph.

8. Make __bfs(.math) return bool.

9-11 Adjust check_redundant(), check_noncircular() and
check_irq_usage() with recursive read locks into consideration.

12. Finally add recursive read locks into the dependency graph.

13-14 Adjust lock cache chain key generation with recursive read locks
into consideration, and provide a test case.

15-16 Add more test cases.

17. Revert commit d82fed752942 ("locking/lockdep/selftests: Fix
mixed read-write ABBA tests"),

18-19 Add more test cases (including tests that are specific for
queued_read_lock())

This series passed all the lockdep selftest cases (including those I
introduce).

Test and comments are welcome!

Regards,
Boqun


2020-08-07 07:44:10

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 01/19] locking: More accurate annotations for read_lock()

On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.

Note we used to treat read_lock() as pure recursive read locks in
lib/locking-seftest.c, and this is useful, especially for the lockdep
development selftest, so we keep this via a variable to force switching
lock annotation for read_lock().

Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/lockdep.h | 23 ++++++++++++++++++++++-
kernel/locking/lockdep.c | 14 ++++++++++++++
lib/locking-selftest.c | 11 +++++++++++
3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 8fce5c98a4b0..6b7cb390f19f 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -640,6 +640,20 @@ static inline void print_irqtrace_events(struct task_struct *curr)
}
#endif

+/* Variable used to make lockdep treat read_lock() as recursive in selftests */
+#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
+extern unsigned int force_read_lock_recursive;
+#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+#define force_read_lock_recursive 0
+#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+
+#ifdef CONFIG_LOCKDEP
+extern bool read_lock_is_recursive(void);
+#else /* CONFIG_LOCKDEP */
+/* If !LOCKDEP, the value is meaningless */
+#define read_lock_is_recursive() 0
+#endif
+
/*
* For trivial one-depth nesting of a lock-class, the following
* global define can be used. (Subsystems with multiple levels
@@ -661,7 +675,14 @@ static inline void print_irqtrace_events(struct task_struct *curr)
#define spin_release(l, i) lock_release(l, i)

#define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwlock_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i)
+#define rwlock_acquire_read(l, s, t, i) \
+do { \
+ if (read_lock_is_recursive()) \
+ lock_acquire_shared_recursive(l, s, t, NULL, i); \
+ else \
+ lock_acquire_shared(l, s, t, NULL, i); \
+} while (0)
+
#define rwlock_release(l, i) lock_release(l, i)

#define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 29a8de4c50b9..fbcbb6350ce7 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4921,6 +4921,20 @@ static bool lockdep_nmi(void)
return true;
}

+/*
+ * read_lock() is recursive if:
+ * 1. We force lockdep think this way in selftests or
+ * 2. The implementation is not queued read/write lock or
+ * 3. The locker is at an in_interrupt() context.
+ */
+bool read_lock_is_recursive(void)
+{
+ return force_read_lock_recursive ||
+ !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
+ in_interrupt();
+}
+EXPORT_SYMBOL_GPL(read_lock_is_recursive);
+
/*
* We are not always called with irqs disabled - do that here,
* and also avoid lockdep recursion:
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 14f44f59e733..caadc4dd3368 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -28,6 +28,7 @@
* Change this to 1 if you want to see the failure printouts:
*/
static unsigned int debug_locks_verbose;
+unsigned int force_read_lock_recursive;

static DEFINE_WD_CLASS(ww_lockdep);

@@ -1978,6 +1979,11 @@ void locking_selftest(void)
return;
}

+ /*
+ * treats read_lock() as recursive read locks for testing purpose
+ */
+ force_read_lock_recursive = 1;
+
/*
* Run the testsuite:
*/
@@ -2073,6 +2079,11 @@ void locking_selftest(void)

ww_tests();

+ force_read_lock_recursive = 0;
+ /*
+ * queued_read_lock() specific test cases can be put here
+ */
+
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;
--
2.28.0

2020-08-07 07:44:20

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 02/19] lockdep/Documention: Recursive read lock detection reasoning

This patch add the documentation piece for the reasoning of deadlock
detection related to recursive read lock. The following sections are
added:

* Explain what is a recursive read lock, and what deadlock cases
they could introduce.

* Introduce the notations for different types of dependencies, and
the definition of strong paths.

* Proof for a closed strong path is both sufficient and necessary
for deadlock detections with recursive read locks involved. The
proof could also explain why we call the path "strong"

Signed-off-by: Boqun Feng <[email protected]>
---
Documentation/locking/lockdep-design.rst | 258 +++++++++++++++++++++++
1 file changed, 258 insertions(+)

diff --git a/Documentation/locking/lockdep-design.rst b/Documentation/locking/lockdep-design.rst
index 23fcbc4d3fc0..cec03bd1294a 100644
--- a/Documentation/locking/lockdep-design.rst
+++ b/Documentation/locking/lockdep-design.rst
@@ -392,3 +392,261 @@ Run the command and save the output, then compare against the output from
a later run of this command to identify the leakers. This same output
can also help you find situations where runtime lock initialization has
been omitted.
+
+Recursive read locks:
+---------------------
+The whole of the rest document tries to prove a certain type of cycle is equivalent
+to deadlock possibility.
+
+There are three types of lockers: writers (i.e. exclusive lockers, like
+spin_lock() or write_lock()), non-recursive readers (i.e. shared lockers, like
+down_read()) and recursive readers (recursive shared lockers, like rcu_read_lock()).
+And we use the following notations of those lockers in the rest of the document:
+
+ W or E: stands for writers (exclusive lockers).
+ r: stands for non-recursive readers.
+ R: stands for recursive readers.
+ S: stands for all readers (non-recursive + recursive), as both are shared lockers.
+ N: stands for writers and non-recursive readers, as both are not recursive.
+
+Obviously, N is "r or W" and S is "r or R".
+
+Recursive readers, as their name indicates, are the lockers allowed to acquire
+even inside the critical section of another reader of the same lock instance,
+in other words, allowing nested read-side critical sections of one lock instance.
+
+While non-recursive readers will cause a self deadlock if trying to acquire inside
+the critical section of another reader of the same lock instance.
+
+The difference between recursive readers and non-recursive readers is because:
+recursive readers get blocked only by a write lock *holder*, while non-recursive
+readers could get blocked by a write lock *waiter*. Considering the follow example:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+ write_lock(X);
+ read_lock_2(X);
+
+Task A gets the reader (no matter whether recursive or non-recursive) on X via
+read_lock() first. And when task B tries to acquire writer on X, it will block
+and become a waiter for writer on X. Now if read_lock_2() is recursive readers,
+task A will make progress, because writer waiters don't block recursive readers,
+and there is no deadlock. However, if read_lock_2() is non-recursive readers,
+it will get blocked by writer waiter B, and cause a self deadlock.
+
+Block conditions on readers/writers of the same lock instance:
+--------------------------------------------------------------
+There are simply four block conditions:
+
+1. Writers block other writers.
+2. Readers block writers.
+3. Writers block both recursive readers and non-recursive readers.
+4. And readers (recursive or not) don't block other recursive readers but
+ may block non-recursive readers (because of the potential co-existing
+ writer waiters)
+
+Block condition matrix, Y means the row blocks the column, and N means otherwise.
+
+ | E | r | R |
+ +---+---+---+---+
+ E | Y | Y | Y |
+ +---+---+---+---+
+ r | Y | Y | N |
+ +---+---+---+---+
+ R | Y | Y | N |
+
+ (W: writers, r: non-recursive readers, R: recursive readers)
+
+
+acquired recursively. Unlike non-recursive read locks, recursive read locks
+only get blocked by current write lock *holders* other than write lock
+*waiters*, for example:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+
+ write_lock(X);
+
+ read_lock(X);
+
+is not a deadlock for recursive read locks, as while the task B is waiting for
+the lock X, the second read_lock() doesn't need to wait because it's a recursive
+read lock. However if the read_lock() is non-recursive read lock, then the above
+case is a deadlock, because even if the write_lock() in TASK B cannot get the
+lock, but it can block the second read_lock() in TASK A.
+
+Note that a lock can be a write lock (exclusive lock), a non-recursive read
+lock (non-recursive shared lock) or a recursive read lock (recursive shared
+lock), depending on the lock operations used to acquire it (more specifically,
+the value of the 'read' parameter for lock_acquire()). In other words, a single
+lock instance has three types of acquisition depending on the acquisition
+functions: exclusive, non-recursive read, and recursive read.
+
+To be concise, we call that write locks and non-recursive read locks as
+"non-recursive" locks and recursive read locks as "recursive" locks.
+
+Recursive locks don't block each other, while non-recursive locks do (this is
+even true for two non-recursive read locks). A non-recursive lock can block the
+corresponding recursive lock, and vice versa.
+
+A deadlock case with recursive locks involved is as follow:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+ read_lock(Y);
+ write_lock(Y);
+ write_lock(X);
+
+Task A is waiting for task B to read_unlock() Y and task B is waiting for task
+A to read_unlock() X.
+
+Dependency types and strong dependency paths:
+---------------------------------------------
+Lock dependencies record the orders of the acquisitions of a pair of locks, and
+because there are 3 types for lockers, there are, in theory, 9 types of lock
+dependencies, but we can show that 4 types of lock dependencies are enough for
+deadlock detection.
+
+For each lock dependency:
+
+ L1 -> L2
+
+, which means lockdep has seen L1 held before L2 held in the same context at runtime.
+And in deadlock detection, we care whether we could get blocked on L2 with L1 held,
+IOW, whether there is a locker L3 that L1 blocks L3 and L2 gets blocked by L3. So
+we only care about 1) what L1 blocks and 2) what blocks L2. As a result, we can combine
+recursive readers and non-recursive readers for L1 (as they block the same types) and
+we can combine writers and non-recursive readers for L2 (as they get blocked by the
+same types).
+
+With the above combination for simplification, there are 4 types of dependency edges
+in the lockdep graph:
+
+1) -(ER)->: exclusive writer to recursive reader dependency, "X -(ER)-> Y" means
+ X -> Y and X is a writer and Y is a recursive reader.
+
+2) -(EN)->: exclusive writer to non-recursive locker dependency, "X -(EN)-> Y" means
+ X -> Y and X is a writer and Y is either a writer or non-recursive reader.
+
+3) -(SR)->: shared reader to recursive reader dependency, "X -(SR)-> Y" means
+ X -> Y and X is a reader (recursive or not) and Y is a recursive reader.
+
+4) -(SN)->: shared reader to non-recursive locker dependency, "X -(SN)-> Y" means
+ X -> Y and X is a reader (recursive or not) and Y is either a writer or
+ non-recursive reader.
+
+Note that given two locks, they may have multiple dependencies between them, for example:
+
+ TASK A:
+
+ read_lock(X);
+ write_lock(Y);
+ ...
+
+ TASK B:
+
+ write_lock(X);
+ write_lock(Y);
+
+, we have both X -(SN)-> Y and X -(EN)-> Y in the dependency graph.
+
+We use -(xN)-> to represent edges that are either -(EN)-> or -(SN)->, the
+similar for -(Ex)->, -(xR)-> and -(Sx)->
+
+A "path" is a series of conjunct dependency edges in the graph. And we define a
+"strong" path, which indicates the strong dependency throughout each dependency
+in the path, as the path that doesn't have two conjunct edges (dependencies) as
+-(xR)-> and -(Sx)->. In other words, a "strong" path is a path from a lock
+walking to another through the lock dependencies, and if X -> Y -> Z is in the
+path (where X, Y, Z are locks), and the walk from X to Y is through a -(SR)-> or
+-(ER)-> dependency, the walk from Y to Z must not be through a -(SN)-> or
+-(SR)-> dependency.
+
+We will see why the path is called "strong" in next section.
+
+Recursive Read Deadlock Detection:
+----------------------------------
+
+We now prove two things:
+
+Lemma 1:
+
+If there is a closed strong path (i.e. a strong circle), then there is a
+combination of locking sequences that causes deadlock. I.e. a strong circle is
+sufficient for deadlock detection.
+
+Lemma 2:
+
+If there is no closed strong path (i.e. strong circle), then there is no
+combination of locking sequences that could cause deadlock. I.e. strong
+circles are necessary for deadlock detection.
+
+With these two Lemmas, we can easily say a closed strong path is both sufficient
+and necessary for deadlocks, therefore a closed strong path is equivalent to
+deadlock possibility. As a closed strong path stands for a dependency chain that
+could cause deadlocks, so we call it "strong", considering there are dependency
+circles that won't cause deadlocks.
+
+Proof for sufficiency (Lemma 1):
+
+Let's say we have a strong circle:
+
+ L1 -> L2 ... -> Ln -> L1
+
+, which means we have dependencies:
+
+ L1 -> L2
+ L2 -> L3
+ ...
+ Ln-1 -> Ln
+ Ln -> L1
+
+We now can construct a combination of locking sequences that cause deadlock:
+
+Firstly let's make one CPU/task get the L1 in L1 -> L2, and then another get
+the L2 in L2 -> L3, and so on. After this, all of the Lx in Lx -> Lx+1 are
+held by different CPU/tasks.
+
+And then because we have L1 -> L2, so the holder of L1 is going to acquire L2
+in L1 -> L2, however since L2 is already held by another CPU/task, plus L1 ->
+L2 and L2 -> L3 are not -(xR)-> and -(Sx)-> (the definition of strong), which
+means either L2 in L1 -> L2 is a non-recursive locker (blocked by anyone) or
+the L2 in L2 -> L3, is writer (blocking anyone), therefore the holder of L1
+cannot get L2, it has to wait L2's holder to release.
+
+Moreover, we can have a similar conclusion for L2's holder: it has to wait L3's
+holder to release, and so on. We now can prove that Lx's holder has to wait for
+Lx+1's holder to release, and note that Ln+1 is L1, so we have a circular
+waiting scenario and nobody can get progress, therefore a deadlock.
+
+Proof for necessary (Lemma 2):
+
+Lemma 2 is equivalent to: If there is a deadlock scenario, then there must be a
+strong circle in the dependency graph.
+
+According to Wikipedia[1], if there is a deadlock, then there must be a circular
+waiting scenario, means there are N CPU/tasks, where CPU/task P1 is waiting for
+a lock held by P2, and P2 is waiting for a lock held by P3, ... and Pn is waiting
+for a lock held by P1. Let's name the lock Px is waiting as Lx, so since P1 is waiting
+for L1 and holding Ln, so we will have Ln -> L1 in the dependency graph. Similarly,
+we have L1 -> L2, L2 -> L3, ..., Ln-1 -> Ln in the dependency graph, which means we
+have a circle:
+
+ Ln -> L1 -> L2 -> ... -> Ln
+
+, and now let's prove the circle is strong:
+
+For a lock Lx, Px contributes the dependency Lx-1 -> Lx and Px+1 contributes
+the dependency Lx -> Lx+1, and since Px is waiting for Px+1 to release Lx,
+so it's impossible that Lx on Px+1 is a reader and Lx on Px is a recursive
+reader, because readers (no matter recursive or not) don't block recursive
+readers, therefore Lx-1 -> Lx and Lx -> Lx+1 cannot be a -(xR)-> -(Sx)-> pair,
+and this is true for any lock in the circle, therefore, the circle is strong.
+
+References:
+-----------
+[1]: https://en.wikipedia.org/wiki/Deadlock
+[2]: Shibu, K. (2009). Intro To Embedded Systems (1st ed.). Tata McGraw-Hill
--
2.28.0

2020-08-07 07:44:51

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 03/19] lockdep: Demagic the return value of BFS

__bfs() could return four magic numbers:

1: search succeeds, but none match.
0: search succeeds, find one match.
-1: search fails because of the cq is full.
-2: search fails because a invalid node is found.

This patch cleans things up by using a enum type for the return value
of __bfs() and its friends, this improves the code readability of the
code, and further, could help if we want to extend the BFS.

Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 155 ++++++++++++++++++++++-----------------
1 file changed, 89 insertions(+), 66 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index fbcbb6350ce7..8fba156db5ba 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1471,28 +1471,58 @@ static inline struct list_head *get_dep_list(struct lock_list *lock, int offset)

return lock_class + offset;
}
+/*
+ * Return values of a bfs search:
+ *
+ * BFS_E* indicates an error
+ * BFS_R* indicates a result (match or not)
+ *
+ * BFS_EINVALIDNODE: Find a invalid node in the graph.
+ *
+ * BFS_EQUEUEFULL: The queue is full while doing the bfs.
+ *
+ * BFS_RMATCH: Find the matched node in the graph, and put that node into
+ * *@target_entry.
+ *
+ * BFS_RNOMATCH: Haven't found the matched node and keep *@target_entry
+ * _unchanged_.
+ */
+enum bfs_result {
+ BFS_EINVALIDNODE = -2,
+ BFS_EQUEUEFULL = -1,
+ BFS_RMATCH = 0,
+ BFS_RNOMATCH = 1,
+};
+
+/*
+ * bfs_result < 0 means error
+ */
+static inline bool bfs_error(enum bfs_result res)
+{
+ return res < 0;
+}

/*
* Forward- or backward-dependency search, used for both circular dependency
* checking and hardirq-unsafe/softirq-unsafe checking.
*/
-static int __bfs(struct lock_list *source_entry,
- void *data,
- int (*match)(struct lock_list *entry, void *data),
- struct lock_list **target_entry,
- int offset)
+static enum bfs_result __bfs(struct lock_list *source_entry,
+ void *data,
+ int (*match)(struct lock_list *entry, void *data),
+ struct lock_list **target_entry,
+ int offset)
{
struct lock_list *entry;
struct lock_list *lock;
struct list_head *head;
struct circular_queue *cq = &lock_cq;
- int ret = 1;
+ enum bfs_result ret = BFS_RNOMATCH;

lockdep_assert_locked();

if (match(source_entry, data)) {
*target_entry = source_entry;
- ret = 0;
+ ret = BFS_RMATCH;
goto exit;
}

@@ -1506,7 +1536,7 @@ static int __bfs(struct lock_list *source_entry,
while ((lock = __cq_dequeue(cq))) {

if (!lock->class) {
- ret = -2;
+ ret = BFS_EINVALIDNODE;
goto exit;
}

@@ -1518,12 +1548,12 @@ static int __bfs(struct lock_list *source_entry,
mark_lock_accessed(entry, lock);
if (match(entry, data)) {
*target_entry = entry;
- ret = 0;
+ ret = BFS_RMATCH;
goto exit;
}

if (__cq_enqueue(cq, entry)) {
- ret = -1;
+ ret = BFS_EQUEUEFULL;
goto exit;
}
cq_depth = __cq_get_elem_count(cq);
@@ -1536,20 +1566,22 @@ static int __bfs(struct lock_list *source_entry,
return ret;
}

-static inline int __bfs_forwards(struct lock_list *src_entry,
- void *data,
- int (*match)(struct lock_list *entry, void *data),
- struct lock_list **target_entry)
+static inline enum bfs_result
+__bfs_forwards(struct lock_list *src_entry,
+ void *data,
+ int (*match)(struct lock_list *entry, void *data),
+ struct lock_list **target_entry)
{
return __bfs(src_entry, data, match, target_entry,
offsetof(struct lock_class, locks_after));

}

-static inline int __bfs_backwards(struct lock_list *src_entry,
- void *data,
- int (*match)(struct lock_list *entry, void *data),
- struct lock_list **target_entry)
+static inline enum bfs_result
+__bfs_backwards(struct lock_list *src_entry,
+ void *data,
+ int (*match)(struct lock_list *entry, void *data),
+ struct lock_list **target_entry)
{
return __bfs(src_entry, data, match, target_entry,
offsetof(struct lock_class, locks_before));
@@ -1775,18 +1807,18 @@ unsigned long lockdep_count_backward_deps(struct lock_class *class)

/*
* Check that the dependency graph starting at <src> can lead to
- * <target> or not. Print an error and return 0 if it does.
+ * <target> or not.
*/
-static noinline int
+static noinline enum bfs_result
check_path(struct lock_class *target, struct lock_list *src_entry,
struct lock_list **target_entry)
{
- int ret;
+ enum bfs_result ret;

ret = __bfs_forwards(src_entry, (void *)target, class_equal,
target_entry);

- if (unlikely(ret < 0))
+ if (unlikely(bfs_error(ret)))
print_bfs_bug(ret);

return ret;
@@ -1797,13 +1829,13 @@ check_path(struct lock_class *target, struct lock_list *src_entry,
* lead to <target>. If it can, there is a circle when adding
* <target> -> <src> dependency.
*
- * Print an error and return 0 if it does.
+ * Print an error and return BFS_RMATCH if it does.
*/
-static noinline int
+static noinline enum bfs_result
check_noncircular(struct held_lock *src, struct held_lock *target,
struct lock_trace **const trace)
{
- int ret;
+ enum bfs_result ret;
struct lock_list *uninitialized_var(target_entry);
struct lock_list src_entry = {
.class = hlock_class(src),
@@ -1814,7 +1846,7 @@ check_noncircular(struct held_lock *src, struct held_lock *target,

ret = check_path(hlock_class(target), &src_entry, &target_entry);

- if (unlikely(!ret)) {
+ if (unlikely(ret == BFS_RMATCH)) {
if (!*trace) {
/*
* If save_trace fails here, the printing might
@@ -1836,12 +1868,13 @@ check_noncircular(struct held_lock *src, struct held_lock *target,
* <target> or not. If it can, <src> -> <target> dependency is already
* in the graph.
*
- * Print an error and return 2 if it does or 1 if it does not.
+ * Return BFS_RMATCH if it does, or BFS_RMATCH if it does not, return BFS_E* if
+ * any error appears in the bfs search.
*/
-static noinline int
+static noinline enum bfs_result
check_redundant(struct held_lock *src, struct held_lock *target)
{
- int ret;
+ enum bfs_result ret;
struct lock_list *uninitialized_var(target_entry);
struct lock_list src_entry = {
.class = hlock_class(src),
@@ -1852,11 +1885,8 @@ check_redundant(struct held_lock *src, struct held_lock *target)

ret = check_path(hlock_class(target), &src_entry, &target_entry);

- if (!ret) {
+ if (ret == BFS_RMATCH)
debug_atomic_inc(nr_redundant);
- ret = 2;
- } else if (ret < 0)
- ret = 0;

return ret;
}
@@ -1886,17 +1916,14 @@ static inline int usage_match(struct lock_list *entry, void *mask)
* Find a node in the forwards-direction dependency sub-graph starting
* at @root->class that matches @bit.
*
- * Return 0 if such a node exists in the subgraph, and put that node
+ * Return BFS_MATCH if such a node exists in the subgraph, and put that node
* into *@target_entry.
- *
- * Return 1 otherwise and keep *@target_entry unchanged.
- * Return <0 on error.
*/
-static int
+static enum bfs_result
find_usage_forwards(struct lock_list *root, unsigned long usage_mask,
struct lock_list **target_entry)
{
- int result;
+ enum bfs_result result;

debug_atomic_inc(nr_find_usage_forwards_checks);

@@ -1908,18 +1935,12 @@ find_usage_forwards(struct lock_list *root, unsigned long usage_mask,
/*
* Find a node in the backwards-direction dependency sub-graph starting
* at @root->class that matches @bit.
- *
- * Return 0 if such a node exists in the subgraph, and put that node
- * into *@target_entry.
- *
- * Return 1 otherwise and keep *@target_entry unchanged.
- * Return <0 on error.
*/
-static int
+static enum bfs_result
find_usage_backwards(struct lock_list *root, unsigned long usage_mask,
struct lock_list **target_entry)
{
- int result;
+ enum bfs_result result;

debug_atomic_inc(nr_find_usage_backwards_checks);

@@ -2247,7 +2268,7 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
struct lock_list *uninitialized_var(target_entry1);
struct lock_list *uninitialized_var(target_entry);
struct lock_list this, that;
- int ret;
+ enum bfs_result ret;

/*
* Step 1: gather all hard/soft IRQs usages backward in an
@@ -2257,7 +2278,7 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
this.class = hlock_class(prev);

ret = __bfs_backwards(&this, &usage_mask, usage_accumulate, NULL);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
@@ -2276,12 +2297,12 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
that.class = hlock_class(next);

ret = find_usage_forwards(&that, forward_mask, &target_entry1);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
- if (ret == 1)
- return ret;
+ if (ret == BFS_RNOMATCH)
+ return 1;

/*
* Step 3: we found a bad match! Now retrieve a lock from the backward
@@ -2291,11 +2312,11 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
backward_mask = original_mask(target_entry1->class->usage_mask);

ret = find_usage_backwards(&this, backward_mask, &target_entry);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
- if (DEBUG_LOCKS_WARN_ON(ret == 1))
+ if (DEBUG_LOCKS_WARN_ON(ret == BFS_RNOMATCH))
return 1;

/*
@@ -2463,7 +2484,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
struct lock_trace **const trace)
{
struct lock_list *entry;
- int ret;
+ enum bfs_result ret;

if (!hlock_class(prev)->key || !hlock_class(next)->key) {
/*
@@ -2494,7 +2515,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
* in the graph whose neighbours are to be checked.
*/
ret = check_noncircular(next, prev, trace);
- if (unlikely(ret <= 0))
+ if (unlikely(bfs_error(ret) || ret == BFS_RMATCH))
return 0;

if (!check_irq_usage(curr, prev, next))
@@ -2531,8 +2552,10 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
* Is the <prev> -> <next> link redundant?
*/
ret = check_redundant(prev, next);
- if (ret != 1)
- return ret;
+ if (bfs_error(ret))
+ return 0;
+ else if (ret == BFS_RMATCH)
+ return 2;
#endif

if (!*trace) {
@@ -3436,19 +3459,19 @@ static int
check_usage_forwards(struct task_struct *curr, struct held_lock *this,
enum lock_usage_bit bit, const char *irqclass)
{
- int ret;
+ enum bfs_result ret;
struct lock_list root;
struct lock_list *uninitialized_var(target_entry);

root.parent = NULL;
root.class = hlock_class(this);
ret = find_usage_forwards(&root, lock_flag(bit), &target_entry);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
- if (ret == 1)
- return ret;
+ if (ret == BFS_RNOMATCH)
+ return 1;

print_irq_inversion_bug(curr, &root, target_entry,
this, 1, irqclass);
@@ -3463,19 +3486,19 @@ static int
check_usage_backwards(struct task_struct *curr, struct held_lock *this,
enum lock_usage_bit bit, const char *irqclass)
{
- int ret;
+ enum bfs_result ret;
struct lock_list root;
struct lock_list *uninitialized_var(target_entry);

root.parent = NULL;
root.class = hlock_class(this);
ret = find_usage_backwards(&root, lock_flag(bit), &target_entry);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
- if (ret == 1)
- return ret;
+ if (ret == BFS_RNOMATCH)
+ return 1;

print_irq_inversion_bug(curr, &root, target_entry,
this, 0, irqclass);
--
2.28.0

2020-08-07 07:45:15

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 04/19] lockdep: Make __bfs() visit every dependency until a match

Currently, __bfs() will do a breadth-first search in the dependency
graph and visit each lock class in the graph exactly once, so for
example, in the following graph:

A ---------> B
| ^
| |
+----------> C

a __bfs() call starts at A, will visit B through dependency A -> B and
visit C through dependency A -> C and that's it, IOW, __bfs() will not
visit dependency C -> B.

This is OK for now, as we only have strong dependencies in the
dependency graph, so whenever there is a traverse path from A to B in
__bfs(), it means A has strong dependencies to B (IOW, B depends on A
strongly). So no need to visit all dependencies in the graph.

However, as we are going to add recursive-read lock into the dependency
graph, as a result, not all the paths mean strong dependencies, in the
same example above, dependency A -> B may be a weak dependency and
traverse A -> C -> B may be a strong dependency path. And with the old
way of __bfs() (i.e. visiting every lock class exactly once), we will
miss the strong dependency path, which will result into failing to find
a deadlock. To cure this for the future, we need to find a way for
__bfs() to visit each dependency, rather than each class, exactly once
in the search until we find a match.

The solution is simple:

We used to mark lock_class::lockdep_dependency_gen_id to indicate a
class has been visited in __bfs(), now we change the semantics a little
bit: we now mark lock_class::lockdep_dependency_gen_id to indicate _all
the dependencies_ in its lock_{after,before} have been visited in the
__bfs() (note we only take one direction in a __bfs() search). In this
way, every dependency is guaranteed to be visited until we find a match.

Note: the checks in mark_lock_accessed() and lock_accessed() are
removed, because after this modification, we may call these two
functions on @source_entry of __bfs(), which may not be the entry in
"list_entries"

Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 61 +++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 8fba156db5ba..2d9798b71f74 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1421,23 +1421,19 @@ static inline unsigned int __cq_get_elem_count(struct circular_queue *cq)
return (cq->rear - cq->front) & CQ_MASK;
}

-static inline void mark_lock_accessed(struct lock_list *lock,
- struct lock_list *parent)
+static inline void mark_lock_accessed(struct lock_list *lock)
{
- unsigned long nr;
+ lock->class->dep_gen_id = lockdep_dependency_gen_id;
+}

- nr = lock - list_entries;
- WARN_ON(nr >= ARRAY_SIZE(list_entries)); /* Out-of-bounds, input fail */
+static inline void visit_lock_entry(struct lock_list *lock,
+ struct lock_list *parent)
+{
lock->parent = parent;
- lock->class->dep_gen_id = lockdep_dependency_gen_id;
}

static inline unsigned long lock_accessed(struct lock_list *lock)
{
- unsigned long nr;
-
- nr = lock - list_entries;
- WARN_ON(nr >= ARRAY_SIZE(list_entries)); /* Out-of-bounds, input fail */
return lock->class->dep_gen_id == lockdep_dependency_gen_id;
}

@@ -1540,26 +1536,39 @@ static enum bfs_result __bfs(struct lock_list *source_entry,
goto exit;
}

+ /*
+ * If we have visited all the dependencies from this @lock to
+ * others (iow, if we have visited all lock_list entries in
+ * @lock->class->locks_{after,before}) we skip, otherwise go
+ * and visit all the dependencies in the list and mark this
+ * list accessed.
+ */
+ if (lock_accessed(lock))
+ continue;
+ else
+ mark_lock_accessed(lock);
+
head = get_dep_list(lock, offset);

+ DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+
list_for_each_entry_rcu(entry, head, entry) {
- if (!lock_accessed(entry)) {
- unsigned int cq_depth;
- mark_lock_accessed(entry, lock);
- if (match(entry, data)) {
- *target_entry = entry;
- ret = BFS_RMATCH;
- goto exit;
- }
-
- if (__cq_enqueue(cq, entry)) {
- ret = BFS_EQUEUEFULL;
- goto exit;
- }
- cq_depth = __cq_get_elem_count(cq);
- if (max_bfs_queue_depth < cq_depth)
- max_bfs_queue_depth = cq_depth;
+ unsigned int cq_depth;
+
+ visit_lock_entry(entry, lock);
+ if (match(entry, data)) {
+ *target_entry = entry;
+ ret = BFS_RMATCH;
+ goto exit;
+ }
+
+ if (__cq_enqueue(cq, entry)) {
+ ret = BFS_EQUEUEFULL;
+ goto exit;
}
+ cq_depth = __cq_get_elem_count(cq);
+ if (max_bfs_queue_depth < cq_depth)
+ max_bfs_queue_depth = cq_depth;
}
}
exit:
--
2.28.0

2020-08-07 07:45:43

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 06/19] lockdep: Introduce lock_list::dep

To add recursive read locks into the dependency graph, we need to store
the types of dependencies for the BFS later. There are four types of
dependencies:

* Exclusive -> Non-recursive dependencies: EN
e.g. write_lock(prev) held and try to acquire write_lock(next)
or non-recursive read_lock(next), which can be represented as
"prev -(EN)-> next"

* Shared -> Non-recursive dependencies: SN
e.g. read_lock(prev) held and try to acquire write_lock(next) or
non-recursive read_lock(next), which can be represented as
"prev -(SN)-> next"

* Exclusive -> Recursive dependencies: ER
e.g. write_lock(prev) held and try to acquire recursive
read_lock(next), which can be represented as "prev -(ER)-> next"

* Shared -> Recursive dependencies: SR
e.g. read_lock(prev) held and try to acquire recursive
read_lock(next), which can be represented as "prev -(SR)-> next"

So we use 4 bits for the presence of each type in lock_list::dep. Helper
functions and macros are also introduced to convert a pair of locks into
lock_list::dep bit and maintain the addition of different types of
dependencies.

Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/lockdep.h | 2 +
kernel/locking/lockdep.c | 92 ++++++++++++++++++++++++++++++++++++++--
2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index b85973515f84..6ca0315d92c4 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -213,6 +213,8 @@ struct lock_list {
struct lock_class *links_to;
const struct lock_trace *trace;
u16 distance;
+ /* bitmap of different dependencies from head to this */
+ u8 dep;

/*
* The parent field is used to implement breadth-first search, and the
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 699e9039a9b3..edf0cc261e8e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1320,7 +1320,7 @@ static struct lock_list *alloc_list_entry(void)
*/
static int add_lock_to_list(struct lock_class *this,
struct lock_class *links_to, struct list_head *head,
- unsigned long ip, u16 distance,
+ unsigned long ip, u16 distance, u8 dep,
const struct lock_trace *trace)
{
struct lock_list *entry;
@@ -1334,6 +1334,7 @@ static int add_lock_to_list(struct lock_class *this,

entry->class = this;
entry->links_to = links_to;
+ entry->dep = dep;
entry->distance = distance;
entry->trace = trace;
/*
@@ -1498,6 +1499,57 @@ static inline bool bfs_error(enum bfs_result res)
return res < 0;
}

+/*
+ * DEP_*_BIT in lock_list::dep
+ *
+ * For dependency @prev -> @next:
+ *
+ * SR: @prev is shared reader (->read != 0) and @next is recursive reader
+ * (->read == 2)
+ * ER: @prev is exclusive locker (->read == 0) and @next is recursive reader
+ * SN: @prev is shared reader and @next is non-recursive locker (->read != 2)
+ * EN: @prev is exclusive locker and @next is non-recursive locker
+ *
+ * Note that we define the value of DEP_*_BITs so that:
+ * bit0 is prev->read == 0
+ * bit1 is next->read != 2
+ */
+#define DEP_SR_BIT (0 + (0 << 1)) /* 0 */
+#define DEP_ER_BIT (1 + (0 << 1)) /* 1 */
+#define DEP_SN_BIT (0 + (1 << 1)) /* 2 */
+#define DEP_EN_BIT (1 + (1 << 1)) /* 3 */
+
+#define DEP_SR_MASK (1U << (DEP_SR_BIT))
+#define DEP_ER_MASK (1U << (DEP_ER_BIT))
+#define DEP_SN_MASK (1U << (DEP_SN_BIT))
+#define DEP_EN_MASK (1U << (DEP_EN_BIT))
+
+static inline unsigned int
+__calc_dep_bit(struct held_lock *prev, struct held_lock *next)
+{
+ return (prev->read == 0) + ((next->read != 2) << 1);
+}
+
+static inline u8 calc_dep(struct held_lock *prev, struct held_lock *next)
+{
+ return 1U << __calc_dep_bit(prev, next);
+}
+
+/*
+ * calculate the dep_bit for backwards edges. We care about whether @prev is
+ * shared and whether @next is recursive.
+ */
+static inline unsigned int
+__calc_dep_bitb(struct held_lock *prev, struct held_lock *next)
+{
+ return (next->read != 2) + ((prev->read == 0) << 1);
+}
+
+static inline u8 calc_depb(struct held_lock *prev, struct held_lock *next)
+{
+ return 1U << __calc_dep_bitb(prev, next);
+}
+
/*
* Forward- or backward-dependency search, used for both circular dependency
* checking and hardirq-unsafe/softirq-unsafe checking.
@@ -2552,7 +2604,35 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
if (entry->class == hlock_class(next)) {
if (distance == 1)
entry->distance = 1;
- return 1;
+ entry->dep |= calc_dep(prev, next);
+
+ /*
+ * Also, update the reverse dependency in @next's
+ * ->locks_before list.
+ *
+ * Here we reuse @entry as the cursor, which is fine
+ * because we won't go to the next iteration of the
+ * outer loop:
+ *
+ * For normal cases, we return in the inner loop.
+ *
+ * If we fail to return, we have inconsistency, i.e.
+ * <prev>::locks_after contains <next> while
+ * <next>::locks_before doesn't contain <prev>. In
+ * that case, we return after the inner and indicate
+ * something is wrong.
+ */
+ list_for_each_entry(entry, &hlock_class(next)->locks_before, entry) {
+ if (entry->class == hlock_class(prev)) {
+ if (distance == 1)
+ entry->distance = 1;
+ entry->dep |= calc_depb(prev, next);
+ return 1;
+ }
+ }
+
+ /* <prev> is not found in <next>::locks_before */
+ return 0;
}
}

@@ -2579,14 +2659,18 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
*/
ret = add_lock_to_list(hlock_class(next), hlock_class(prev),
&hlock_class(prev)->locks_after,
- next->acquire_ip, distance, *trace);
+ next->acquire_ip, distance,
+ calc_dep(prev, next),
+ *trace);

if (!ret)
return 0;

ret = add_lock_to_list(hlock_class(prev), hlock_class(next),
&hlock_class(next)->locks_before,
- next->acquire_ip, distance, *trace);
+ next->acquire_ip, distance,
+ calc_depb(prev, next),
+ *trace);
if (!ret)
return 0;

--
2.28.0

2020-08-07 07:45:48

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 07/19] lockdep: Extend __bfs() to work with multiple types of dependencies

Now we have four types of dependencies in the dependency graph, and not
all the pathes carry real dependencies (the dependencies that may cause
a deadlock), for example:

Given lock A and B, if we have:

CPU1 CPU2
============= ==============
write_lock(A); read_lock(B);
read_lock(B); write_lock(A);

(assuming read_lock(B) is a recursive reader)

then we have dependencies A -(ER)-> B, and B -(SN)-> A, and a
dependency path A -(ER)-> B -(SN)-> A.

In lockdep w/o recursive locks, a dependency path from A to A
means a deadlock. However, the above case is obviously not a
deadlock, because no one holds B exclusively, therefore no one
waits for the other to release B, so who get A first in CPU1 and
CPU2 will run non-blockingly.

As a result, dependency path A -(ER)-> B -(SN)-> A is not a
real/strong dependency that could cause a deadlock.

From the observation above, we know that for a dependency path to be
real/strong, no two adjacent dependencies can be as -(*R)-> -(S*)->.

Now our mission is to make __bfs() traverse only the strong dependency
paths, which is simple: we record whether we only have -(*R)-> for the
previous lock_list of the path in lock_list::only_xr, and when we pick a
dependency in the traverse, we 1) filter out -(S*)-> dependency if the
previous lock_list only has -(*R)-> dependency (i.e. ->only_xr is true)
and 2) set the next lock_list::only_xr to true if we only have -(*R)->
left after we filter out dependencies based on 1), otherwise, set it to
false.

With this extension for __bfs(), we now need to initialize the root of
__bfs() properly (with a correct ->only_xr), to do so, we introduce some
helper functions, which also cleans up a little bit for the __bfs() root
initialization code.

Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/lockdep.h | 2 +
kernel/locking/lockdep.c | 113 ++++++++++++++++++++++++++++++++-------
2 files changed, 96 insertions(+), 19 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6ca0315d92c4..0b26d5d26411 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -215,6 +215,8 @@ struct lock_list {
u16 distance;
/* bitmap of different dependencies from head to this */
u8 dep;
+ /* used by BFS to record whether "prev -> this" only has -(*R)-> */
+ u8 only_xr;

/*
* The parent field is used to implement breadth-first search, and the
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index edf0cc261e8e..bb8b7e42c154 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1551,8 +1551,72 @@ static inline u8 calc_depb(struct held_lock *prev, struct held_lock *next)
}

/*
- * Forward- or backward-dependency search, used for both circular dependency
- * checking and hardirq-unsafe/softirq-unsafe checking.
+ * Initialize a lock_list entry @lock belonging to @class as the root for a BFS
+ * search.
+ */
+static inline void __bfs_init_root(struct lock_list *lock,
+ struct lock_class *class)
+{
+ lock->class = class;
+ lock->parent = NULL;
+ lock->only_xr = 0;
+}
+
+/*
+ * Initialize a lock_list entry @lock based on a lock acquisition @hlock as the
+ * root for a BFS search.
+ *
+ * ->only_xr of the initial lock node is set to @hlock->read == 2, to make sure
+ * that <prev> -> @hlock and @hlock -> <whatever __bfs() found> is not -(*R)->
+ * and -(S*)->.
+ */
+static inline void bfs_init_root(struct lock_list *lock,
+ struct held_lock *hlock)
+{
+ __bfs_init_root(lock, hlock_class(hlock));
+ lock->only_xr = (hlock->read == 2);
+}
+
+/*
+ * Similar to bfs_init_root() but initialize the root for backwards BFS.
+ *
+ * ->only_xr of the initial lock node is set to @hlock->read != 0, to make sure
+ * that <next> -> @hlock and @hlock -> <whatever backwards BFS found> is not
+ * -(*S)-> and -(R*)-> (reverse order of -(*R)-> and -(S*)->).
+ */
+static inline void bfs_init_rootb(struct lock_list *lock,
+ struct held_lock *hlock)
+{
+ __bfs_init_root(lock, hlock_class(hlock));
+ lock->only_xr = (hlock->read != 0);
+}
+
+/*
+ * Breadth-First Search to find a strong path in the dependency graph.
+ *
+ * @source_entry: the source of the path we are searching for.
+ * @data: data used for the second parameter of @match function
+ * @match: match function for the search
+ * @target_entry: pointer to the target of a matched path
+ * @offset: the offset to struct lock_class to determine whether it is
+ * locks_after or locks_before
+ *
+ * We may have multiple edges (considering different kinds of dependencies,
+ * e.g. ER and SN) between two nodes in the dependency graph. But
+ * only the strong dependency path in the graph is relevant to deadlocks. A
+ * strong dependency path is a dependency path that doesn't have two adjacent
+ * dependencies as -(*R)-> -(S*)->, please see:
+ *
+ * Documentation/locking/lockdep-design.rst
+ *
+ * for more explanation of the definition of strong dependency paths
+ *
+ * In __bfs(), we only traverse in the strong dependency path:
+ *
+ * In lock_list::only_xr, we record whether the previous dependency only
+ * has -(*R)-> in the search, and if it does (prev only has -(*R)->), we
+ * filter out any -(S*)-> in the current dependency and after that, the
+ * ->only_xr is set according to whether we only have -(*R)-> left.
*/
static enum bfs_result __bfs(struct lock_list *source_entry,
void *data,
@@ -1582,6 +1646,7 @@ static enum bfs_result __bfs(struct lock_list *source_entry,
__cq_enqueue(cq, source_entry);

while ((lock = __cq_dequeue(cq))) {
+ bool prev_only_xr;

if (!lock->class) {
ret = BFS_EINVALIDNODE;
@@ -1602,10 +1667,26 @@ static enum bfs_result __bfs(struct lock_list *source_entry,

head = get_dep_list(lock, offset);

- DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+ prev_only_xr = lock->only_xr;

list_for_each_entry_rcu(entry, head, entry) {
unsigned int cq_depth;
+ u8 dep = entry->dep;
+
+ /*
+ * Mask out all -(S*)-> if we only have *R in previous
+ * step, because -(*R)-> -(S*)-> don't make up a strong
+ * dependency.
+ */
+ if (prev_only_xr)
+ dep &= ~(DEP_SR_MASK | DEP_SN_MASK);
+
+ /* If nothing left, we skip */
+ if (!dep)
+ continue;
+
+ /* If there are only -(*R)-> left, set that for the next step */
+ entry->only_xr = !(dep & (DEP_SN_MASK | DEP_EN_MASK));

visit_lock_entry(entry, lock);
if (match(entry, data)) {
@@ -1827,8 +1908,7 @@ unsigned long lockdep_count_forward_deps(struct lock_class *class)
unsigned long ret, flags;
struct lock_list this;

- this.parent = NULL;
- this.class = class;
+ __bfs_init_root(&this, class);

raw_local_irq_save(flags);
lockdep_lock();
@@ -1854,8 +1934,7 @@ unsigned long lockdep_count_backward_deps(struct lock_class *class)
unsigned long ret, flags;
struct lock_list this;

- this.parent = NULL;
- this.class = class;
+ __bfs_init_root(&this, class);

raw_local_irq_save(flags);
lockdep_lock();
@@ -1898,10 +1977,9 @@ check_noncircular(struct held_lock *src, struct held_lock *target,
{
enum bfs_result ret;
struct lock_list *uninitialized_var(target_entry);
- struct lock_list src_entry = {
- .class = hlock_class(src),
- .parent = NULL,
- };
+ struct lock_list src_entry;
+
+ bfs_init_root(&src_entry, src);

debug_atomic_inc(nr_cyclic_checks);

@@ -1937,10 +2015,9 @@ check_redundant(struct held_lock *src, struct held_lock *target)
{
enum bfs_result ret;
struct lock_list *uninitialized_var(target_entry);
- struct lock_list src_entry = {
- .class = hlock_class(src),
- .parent = NULL,
- };
+ struct lock_list src_entry;
+
+ bfs_init_root(&src_entry, src);

debug_atomic_inc(nr_redundant_checks);

@@ -3556,8 +3633,7 @@ check_usage_forwards(struct task_struct *curr, struct held_lock *this,
struct lock_list root;
struct lock_list *uninitialized_var(target_entry);

- root.parent = NULL;
- root.class = hlock_class(this);
+ bfs_init_root(&root, this);
ret = find_usage_forwards(&root, lock_flag(bit), &target_entry);
if (bfs_error(ret)) {
print_bfs_bug(ret);
@@ -3583,8 +3659,7 @@ check_usage_backwards(struct task_struct *curr, struct held_lock *this,
struct lock_list root;
struct lock_list *uninitialized_var(target_entry);

- root.parent = NULL;
- root.class = hlock_class(this);
+ bfs_init_rootb(&root, this);
ret = find_usage_backwards(&root, lock_flag(bit), &target_entry);
if (bfs_error(ret)) {
print_bfs_bug(ret);
--
2.28.0

2020-08-07 07:46:02

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 09/19] lockdep: Support deadlock detection for recursive read locks in check_noncircular()

Currently, lockdep only has limit support for deadlock detection for
recursive read locks.

This patch support deadlock detection for recursive read locks. The
basic idea is:

We are about to add dependency B -> A in to the dependency graph, we use
check_noncircular() to find whether we have a strong dependency path
A -> .. -> B so that we have a strong dependency circle (a closed strong
dependency path):

A -> .. -> B -> A

, which doesn't have two adjacent dependencies as -(*R)-> L -(S*)->.

Since A -> .. -> B is already a strong dependency path, so if either
B -> A is -(E*)-> or A -> .. -> B is -(*N)->, the circle A -> .. -> B ->
A is strong, otherwise not. So we introduce a new match function
hlock_conflict() to replace the class_equal() for the deadlock check in
check_noncircular().

Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 43 ++++++++++++++++++++++++++++++++--------
1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 62f7f88e3673..e5b2c1cf4286 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1838,10 +1838,37 @@ static inline bool class_equal(struct lock_list *entry, void *data)
return entry->class == data;
}

+/*
+ * We are about to add B -> A into the dependency graph, and in __bfs() a
+ * strong dependency path A -> .. -> B is found: hlock_class equals
+ * entry->class.
+ *
+ * We will have a deadlock case (conflict) if A -> .. -> B -> A is a strong
+ * dependency cycle, that means:
+ *
+ * Either
+ *
+ * a) B -> A is -(E*)->
+ *
+ * or
+ *
+ * b) A -> .. -> B is -(*N)-> (i.e. A -> .. -(*N)-> B)
+ *
+ * as then we don't have -(*R)-> -(S*)-> in the cycle.
+ */
+static inline bool hlock_conflict(struct lock_list *entry, void *data)
+{
+ struct held_lock *hlock = (struct held_lock *)data;
+
+ return hlock_class(hlock) == entry->class && /* Found A -> .. -> B */
+ (hlock->read == 0 || /* B -> A is -(E*)-> */
+ !entry->only_xr); /* A -> .. -> B is -(*N)-> */
+}
+
static noinline void print_circular_bug(struct lock_list *this,
- struct lock_list *target,
- struct held_lock *check_src,
- struct held_lock *check_tgt)
+ struct lock_list *target,
+ struct held_lock *check_src,
+ struct held_lock *check_tgt)
{
struct task_struct *curr = current;
struct lock_list *parent;
@@ -1950,13 +1977,13 @@ unsigned long lockdep_count_backward_deps(struct lock_class *class)
* <target> or not.
*/
static noinline enum bfs_result
-check_path(struct lock_class *target, struct lock_list *src_entry,
+check_path(struct held_lock *target, struct lock_list *src_entry,
+ bool (*match)(struct lock_list *entry, void *data),
struct lock_list **target_entry)
{
enum bfs_result ret;

- ret = __bfs_forwards(src_entry, (void *)target, class_equal,
- target_entry);
+ ret = __bfs_forwards(src_entry, target, match, target_entry);

if (unlikely(bfs_error(ret)))
print_bfs_bug(ret);
@@ -1983,7 +2010,7 @@ check_noncircular(struct held_lock *src, struct held_lock *target,

debug_atomic_inc(nr_cyclic_checks);

- ret = check_path(hlock_class(target), &src_entry, &target_entry);
+ ret = check_path(target, &src_entry, hlock_conflict, &target_entry);

if (unlikely(ret == BFS_RMATCH)) {
if (!*trace) {
@@ -2021,7 +2048,7 @@ check_redundant(struct held_lock *src, struct held_lock *target)

debug_atomic_inc(nr_redundant_checks);

- ret = check_path(hlock_class(target), &src_entry, &target_entry);
+ ret = check_path(target, &src_entry, class_equal, &target_entry);

if (ret == BFS_RMATCH)
debug_atomic_inc(nr_redundant);
--
2.28.0

2020-08-07 07:46:19

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 11/19] lockdep: Fix recursive read lock related safe->unsafe detection

Currently, in safe->unsafe detection, lockdep misses the fact that a
LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
cause deadlock too, for example:

P1 P2
<irq disabled>
write_lock(l1); <irq enabled>
read_lock(l2);
write_lock(l2);
<in irq>
read_lock(l1);

Actually, all of the following cases may cause deadlocks:

LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ

To fix this, we need to 1) change the calculation of exclusive_mask() so
that READ bits are not dropped and 2) always call usage() in
mark_lock_irq() to check usage deadlocks, even when the new usage of the
lock is READ.

Besides, adjust usage_match() and usage_acculumate() to recursive read
lock changes.

Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 183 +++++++++++++++++++++++++++++----------
1 file changed, 138 insertions(+), 45 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 85a4d3539faa..040509667798 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2100,22 +2100,72 @@ check_redundant(struct held_lock *src, struct held_lock *target)

#ifdef CONFIG_TRACE_IRQFLAGS

+/*
+ * Forwards and backwards subgraph searching, for the purposes of
+ * proving that two subgraphs can be connected by a new dependency
+ * without creating any illegal irq-safe -> irq-unsafe lock dependency.
+ *
+ * A irq safe->unsafe deadlock happens with the following conditions:
+ *
+ * 1) We have a strong dependency path A -> ... -> B
+ *
+ * 2) and we have ENABLED_IRQ usage of B and USED_IN_IRQ usage of A, therefore
+ * irq can create a new dependency B -> A (consider the case that a holder
+ * of B gets interrupted by an irq whose handler will try to acquire A).
+ *
+ * 3) the dependency circle A -> ... -> B -> A we get from 1) and 2) is a
+ * strong circle:
+ *
+ * For the usage bits of B:
+ * a) if A -> B is -(*N)->, then B -> A could be any type, so any
+ * ENABLED_IRQ usage suffices.
+ * b) if A -> B is -(*R)->, then B -> A must be -(E*)->, so only
+ * ENABLED_IRQ_*_READ usage suffices.
+ *
+ * For the usage bits of A:
+ * c) if A -> B is -(E*)->, then B -> A could be any type, so any
+ * USED_IN_IRQ usage suffices.
+ * d) if A -> B is -(S*)->, then B -> A must be -(*N)->, so only
+ * USED_IN_IRQ_*_READ usage suffices.
+ */
+
+/*
+ * There is a strong dependency path in the dependency graph: A -> B, and now
+ * we need to decide which usage bit of A should be accumulated to detect
+ * safe->unsafe bugs.
+ *
+ * Note that usage_accumulate() is used in backwards search, so ->only_xr
+ * stands for whether A -> B only has -(S*)-> (in this case ->only_xr is true).
+ *
+ * As above, if only_xr is false, which means A -> B has -(E*)-> dependency
+ * path, any usage of A should be considered. Otherwise, we should only
+ * consider _READ usage.
+ */
static inline bool usage_accumulate(struct lock_list *entry, void *mask)
{
- *(unsigned long *)mask |= entry->class->usage_mask;
+ if (!entry->only_xr)
+ *(unsigned long *)mask |= entry->class->usage_mask;
+ else /* Mask out _READ usage bits */
+ *(unsigned long *)mask |= (entry->class->usage_mask & LOCKF_IRQ);

return false;
}

/*
- * Forwards and backwards subgraph searching, for the purposes of
- * proving that two subgraphs can be connected by a new dependency
- * without creating any illegal irq-safe -> irq-unsafe lock dependency.
+ * There is a strong dependency path in the dependency graph: A -> B, and now
+ * we need to decide which usage bit of B conflicts with the usage bits of A,
+ * i.e. which usage bit of B may introduce safe->unsafe deadlocks.
+ *
+ * As above, if only_xr is false, which means A -> B has -(*N)-> dependency
+ * path, any usage of B should be considered. Otherwise, we should only
+ * consider _READ usage.
*/
-
static inline bool usage_match(struct lock_list *entry, void *mask)
{
- return !!(entry->class->usage_mask & *(unsigned long *)mask);
+ if (!entry->only_xr)
+ return !!(entry->class->usage_mask & *(unsigned long *)mask);
+ else /* Mask out _READ usage bits */
+ return !!((entry->class->usage_mask & LOCKF_IRQ) & *(unsigned long *)mask);
}

/*
@@ -2406,17 +2456,39 @@ static unsigned long invert_dir_mask(unsigned long mask)
}

/*
- * As above, we clear bitnr0 (LOCK_*_READ off) with bitmask ops. First, for all
- * bits with bitnr0 set (LOCK_*_READ), add those with bitnr0 cleared (LOCK_*).
- * And then mask out all bitnr0.
+ * Note that a LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ
+ * usage may cause deadlock too, for example:
+ *
+ * P1 P2
+ * <irq disabled>
+ * write_lock(l1); <irq enabled>
+ * read_lock(l2);
+ * write_lock(l2);
+ * <in irq>
+ * read_lock(l1);
+ *
+ * , in above case, l1 will be marked as LOCK_USED_IN_IRQ_HARDIRQ_READ and l2
+ * will marked as LOCK_ENABLE_IRQ_HARDIRQ_READ, and this is a possible
+ * deadlock.
+ *
+ * In fact, all of the following cases may cause deadlocks:
+ *
+ * LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
+ * LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
+ * LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
+ * LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ
+ *
+ * As a result, to calculate the "exclusive mask", first we invert the
+ * direction (USED_IN/ENABLED) of the original mask, and 1) for all bits with
+ * bitnr0 set (LOCK_*_READ), add those with bitnr0 cleared (LOCK_*). 2) for all
+ * bits with bitnr0 cleared (LOCK_*_READ), add those with bitnr0 set (LOCK_*).
*/
static unsigned long exclusive_mask(unsigned long mask)
{
unsigned long excl = invert_dir_mask(mask);

- /* Strip read */
excl |= (excl & LOCKF_IRQ_READ) >> LOCK_USAGE_READ_MASK;
- excl &= ~LOCKF_IRQ_READ;
+ excl |= (excl & LOCKF_IRQ) << LOCK_USAGE_READ_MASK;

return excl;
}
@@ -2433,6 +2505,7 @@ static unsigned long original_mask(unsigned long mask)
unsigned long excl = invert_dir_mask(mask);

/* Include read in existing usages */
+ excl |= (excl & LOCKF_IRQ_READ) >> LOCK_USAGE_READ_MASK;
excl |= (excl & LOCKF_IRQ) << LOCK_USAGE_READ_MASK;

return excl;
@@ -2447,14 +2520,24 @@ static int find_exclusive_match(unsigned long mask,
enum lock_usage_bit *bitp,
enum lock_usage_bit *excl_bitp)
{
- int bit, excl;
+ int bit, excl, excl_read;

for_each_set_bit(bit, &mask, LOCK_USED) {
+ /*
+ * exclusive_bit() strips the read bit, however,
+ * LOCK_ENABLED_IRQ_*_READ may cause deadlocks too, so we need
+ * to search excl | LOCK_USAGE_READ_MASK as well.
+ */
excl = exclusive_bit(bit);
+ excl_read = excl | LOCK_USAGE_READ_MASK;
if (excl_mask & lock_flag(excl)) {
*bitp = bit;
*excl_bitp = excl;
return 0;
+ } else if (excl_mask & lock_flag(excl_read)) {
+ *bitp = bit;
+ *excl_bitp = excl_read;
+ return 0;
}
}
return -1;
@@ -2480,8 +2563,7 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
* Step 1: gather all hard/soft IRQs usages backward in an
* accumulated usage mask.
*/
- this.parent = NULL;
- this.class = hlock_class(prev);
+ bfs_init_rootb(&this, prev);

ret = __bfs_backwards(&this, &usage_mask, usage_accumulate, NULL);
if (bfs_error(ret)) {
@@ -2499,8 +2581,7 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
*/
forward_mask = exclusive_mask(usage_mask);

- that.parent = NULL;
- that.class = hlock_class(next);
+ bfs_init_root(&that, next);

ret = find_usage_forwards(&that, forward_mask, &target_entry1);
if (bfs_error(ret)) {
@@ -3695,14 +3776,16 @@ print_irq_inversion_bug(struct task_struct *curr,
*/
static int
check_usage_forwards(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit bit, const char *irqclass)
+ enum lock_usage_bit bit)
{
enum bfs_result ret;
struct lock_list root;
struct lock_list *uninitialized_var(target_entry);
+ enum lock_usage_bit read_bit = bit + LOCK_USAGE_READ_MASK;
+ unsigned usage_mask = lock_flag(bit) | lock_flag(read_bit);

bfs_init_root(&root, this);
- ret = find_usage_forwards(&root, lock_flag(bit), &target_entry);
+ ret = find_usage_forwards(&root, usage_mask, &target_entry);
if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
@@ -3710,8 +3793,13 @@ check_usage_forwards(struct task_struct *curr, struct held_lock *this,
if (ret == BFS_RNOMATCH)
return 1;

- print_irq_inversion_bug(curr, &root, target_entry,
- this, 1, irqclass);
+ /* Check whether write or read usage is the match */
+ if (target_entry->class->usage_mask & lock_flag(bit))
+ print_irq_inversion_bug(curr, &root, target_entry,
+ this, 1, state_name(bit));
+ else
+ print_irq_inversion_bug(curr, &root, target_entry,
+ this, 1, state_name(read_bit));
return 0;
}

@@ -3721,14 +3809,16 @@ check_usage_forwards(struct task_struct *curr, struct held_lock *this,
*/
static int
check_usage_backwards(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit bit, const char *irqclass)
+ enum lock_usage_bit bit)
{
enum bfs_result ret;
struct lock_list root;
struct lock_list *uninitialized_var(target_entry);
+ enum lock_usage_bit read_bit = bit + LOCK_USAGE_READ_MASK;
+ unsigned usage_mask = lock_flag(bit) | lock_flag(read_bit);

bfs_init_rootb(&root, this);
- ret = find_usage_backwards(&root, lock_flag(bit), &target_entry);
+ ret = find_usage_backwards(&root, usage_mask, &target_entry);
if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
@@ -3736,8 +3826,14 @@ check_usage_backwards(struct task_struct *curr, struct held_lock *this,
if (ret == BFS_RNOMATCH)
return 1;

- print_irq_inversion_bug(curr, &root, target_entry,
- this, 0, irqclass);
+ /* Check whether write or read usage is the match */
+ if (target_entry->class->usage_mask & lock_flag(bit))
+ print_irq_inversion_bug(curr, &root, target_entry,
+ this, 0, state_name(bit));
+ else
+ print_irq_inversion_bug(curr, &root, target_entry,
+ this, 0, state_name(read_bit));
+
return 0;
}

@@ -3800,16 +3896,6 @@ mark_lock_irq(struct task_struct *curr, struct held_lock *this,
int read = new_bit & LOCK_USAGE_READ_MASK;
int dir = new_bit & LOCK_USAGE_DIR_MASK;

- /*
- * mark USED_IN has to look forwards -- to ensure no dependency
- * has ENABLED state, which would allow recursion deadlocks.
- *
- * mark ENABLED has to look backwards -- to ensure no dependee
- * has USED_IN state, which, again, would allow recursion deadlocks.
- */
- check_usage_f usage = dir ?
- check_usage_backwards : check_usage_forwards;
-
/*
* Validate that this particular lock does not have conflicting
* usage states.
@@ -3818,23 +3904,30 @@ mark_lock_irq(struct task_struct *curr, struct held_lock *this,
return 0;

/*
- * Validate that the lock dependencies don't have conflicting usage
- * states.
+ * Check for read in write conflicts
*/
- if ((!read || STRICT_READ_CHECKS) &&
- !usage(curr, this, excl_bit, state_name(new_bit & ~LOCK_USAGE_READ_MASK)))
+ if (!read && !valid_state(curr, this, new_bit,
+ excl_bit + LOCK_USAGE_READ_MASK))
return 0;

+
/*
- * Check for read in write conflicts
+ * Validate that the lock dependencies don't have conflicting usage
+ * states.
*/
- if (!read) {
- if (!valid_state(curr, this, new_bit, excl_bit + LOCK_USAGE_READ_MASK))
+ if (dir) {
+ /*
+ * mark ENABLED has to look backwards -- to ensure no dependee
+ * has USED_IN state, which, again, would allow recursion deadlocks.
+ */
+ if (!check_usage_backwards(curr, this, excl_bit))
return 0;
-
- if (STRICT_READ_CHECKS &&
- !usage(curr, this, excl_bit + LOCK_USAGE_READ_MASK,
- state_name(new_bit + LOCK_USAGE_READ_MASK)))
+ } else {
+ /*
+ * mark USED_IN has to look forwards -- to ensure no dependency
+ * has ENABLED state, which would allow recursion deadlocks.
+ */
+ if (!check_usage_forwards(curr, this, excl_bit))
return 0;
}

--
2.28.0

2020-08-07 07:47:02

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 14/19] lockdep: Take read/write status in consideration when generate chainkey

Currently, the chainkey of a lock chain is a hash sum of the class_idx
of all the held locks, the read/write status are not taken in to
consideration while generating the chainkey. This could result into a
problem, if we have:

P1()
{
read_lock(B);
lock(A);
}

P2()
{
lock(A);
read_lock(B);
}

P3()
{
lock(A);
write_lock(B);
}

, and P1(), P2(), P3() run one by one. And when running P2(), lockdep
detects such a lock chain A -> B is not a deadlock, then it's added in
the chain cache, and then when running P3(), even if it's a deadlock, we
could miss it because of the hit of chain cache. This could be confirmed
by self testcase "chain cached mixed R-L/L-W ".

To resolve this, we use concept "hlock_id" to generate the chainkey, the
hlock_id is a tuple (hlock->class_idx, hlock->read), which fits in a u16
type. With this, the chainkeys are different is the lock sequences have
the same locks but different read/write status.

Besides, since we use "hlock_id" to generate chainkeys, the chain_hlocks
array now store the "hlock_id"s rather than lock_class indexes.

Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 53 ++++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 867199c4b85d..f332d1b9d87b 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -371,6 +371,21 @@ static struct hlist_head classhash_table[CLASSHASH_SIZE];

static struct hlist_head chainhash_table[CHAINHASH_SIZE];

+/*
+ * the id of held_lock
+ */
+static inline u16 hlock_id(struct held_lock *hlock)
+{
+ BUILD_BUG_ON(MAX_LOCKDEP_KEYS_BITS + 2 > 16);
+
+ return (hlock->class_idx | (hlock->read << MAX_LOCKDEP_KEYS_BITS));
+}
+
+static inline unsigned int chain_hlock_class_idx(u16 hlock_id)
+{
+ return hlock_id & MAX_LOCKDEP_KEYS;
+}
+
/*
* The hash key of the lock dependency chains is a hash itself too:
* it's a hash of all locks taken up to that lock, including that lock.
@@ -3202,7 +3217,10 @@ static inline void free_chain_hlocks(int base, int size)

struct lock_class *lock_chain_get_class(struct lock_chain *chain, int i)
{
- return lock_classes + chain_hlocks[chain->base + i];
+ u16 chain_hlock = chain_hlocks[chain->base + i];
+ unsigned int class_idx = chain_hlock_class_idx(chain_hlock);
+
+ return lock_classes + class_idx - 1;
}

/*
@@ -3228,12 +3246,12 @@ static inline int get_first_held_lock(struct task_struct *curr,
/*
* Returns the next chain_key iteration
*/
-static u64 print_chain_key_iteration(int class_idx, u64 chain_key)
+static u64 print_chain_key_iteration(u16 hlock_id, u64 chain_key)
{
- u64 new_chain_key = iterate_chain_key(chain_key, class_idx);
+ u64 new_chain_key = iterate_chain_key(chain_key, hlock_id);

- printk(" class_idx:%d -> chain_key:%016Lx",
- class_idx,
+ printk(" hlock_id:%d -> chain_key:%016Lx",
+ (unsigned int)hlock_id,
(unsigned long long)new_chain_key);
return new_chain_key;
}
@@ -3250,12 +3268,12 @@ print_chain_keys_held_locks(struct task_struct *curr, struct held_lock *hlock_ne
hlock_next->irq_context);
for (; i < depth; i++) {
hlock = curr->held_locks + i;
- chain_key = print_chain_key_iteration(hlock->class_idx, chain_key);
+ chain_key = print_chain_key_iteration(hlock_id(hlock), chain_key);

print_lock(hlock);
}

- print_chain_key_iteration(hlock_next->class_idx, chain_key);
+ print_chain_key_iteration(hlock_id(hlock_next), chain_key);
print_lock(hlock_next);
}

@@ -3263,14 +3281,14 @@ static void print_chain_keys_chain(struct lock_chain *chain)
{
int i;
u64 chain_key = INITIAL_CHAIN_KEY;
- int class_id;
+ u16 hlock_id;

printk("depth: %u\n", chain->depth);
for (i = 0; i < chain->depth; i++) {
- class_id = chain_hlocks[chain->base + i];
- chain_key = print_chain_key_iteration(class_id, chain_key);
+ hlock_id = chain_hlocks[chain->base + i];
+ chain_key = print_chain_key_iteration(hlock_id, chain_key);

- print_lock_name(lock_classes + class_id);
+ print_lock_name(lock_classes + chain_hlock_class_idx(hlock_id) - 1);
printk("\n");
}
}
@@ -3319,7 +3337,7 @@ static int check_no_collision(struct task_struct *curr,
}

for (j = 0; j < chain->depth - 1; j++, i++) {
- id = curr->held_locks[i].class_idx;
+ id = hlock_id(&curr->held_locks[i]);

if (DEBUG_LOCKS_WARN_ON(chain_hlocks[chain->base + j] != id)) {
print_collision(curr, hlock, chain);
@@ -3368,7 +3386,6 @@ static inline int add_chain_cache(struct task_struct *curr,
struct held_lock *hlock,
u64 chain_key)
{
- struct lock_class *class = hlock_class(hlock);
struct hlist_head *hash_head = chainhashentry(chain_key);
struct lock_chain *chain;
int i, j;
@@ -3411,11 +3428,11 @@ static inline int add_chain_cache(struct task_struct *curr,

chain->base = j;
for (j = 0; j < chain->depth - 1; j++, i++) {
- int lock_id = curr->held_locks[i].class_idx;
+ int lock_id = hlock_id(curr->held_locks + i);

chain_hlocks[chain->base + j] = lock_id;
}
- chain_hlocks[chain->base + j] = class - lock_classes;
+ chain_hlocks[chain->base + j] = hlock_id(hlock);
hlist_add_head_rcu(&chain->entry, hash_head);
debug_atomic_inc(chain_lookup_misses);
inc_chains(chain->irq_context);
@@ -3602,7 +3619,7 @@ static void check_chain_key(struct task_struct *curr)
if (prev_hlock && (prev_hlock->irq_context !=
hlock->irq_context))
chain_key = INITIAL_CHAIN_KEY;
- chain_key = iterate_chain_key(chain_key, hlock->class_idx);
+ chain_key = iterate_chain_key(chain_key, hlock_id(hlock));
prev_hlock = hlock;
}
if (chain_key != curr->curr_chain_key) {
@@ -4702,7 +4719,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
chain_key = INITIAL_CHAIN_KEY;
chain_head = 1;
}
- chain_key = iterate_chain_key(chain_key, class_idx);
+ chain_key = iterate_chain_key(chain_key, hlock_id(hlock));

if (nest_lock && !__lock_is_held(nest_lock, -1)) {
print_lock_nested_lock_not_held(curr, hlock, ip);
@@ -5597,7 +5614,7 @@ static void remove_class_from_lock_chain(struct pending_free *pf,
int i;

for (i = chain->base; i < chain->base + chain->depth; i++) {
- if (chain_hlocks[i] != class - lock_classes)
+ if (chain_hlock_class_idx(chain_hlocks[i]) != class - lock_classes)
continue;
/*
* Each lock class occurs at most once in a lock chain so once
--
2.28.0

2020-08-07 07:47:30

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 05/19] lockdep: Reduce the size of lock_list::distance

lock_list::distance is always not greater than MAX_LOCK_DEPTH (which
is 48 right now), so a u16 will fit. This patch reduces the size of
lock_list::distance to save space, so that we can introduce other fields
to help detect recursive read lock deadlocks without increasing the size
of lock_list structure.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/lockdep.h | 2 +-
kernel/locking/lockdep.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6b7cb390f19f..b85973515f84 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -212,7 +212,7 @@ struct lock_list {
struct lock_class *class;
struct lock_class *links_to;
const struct lock_trace *trace;
- int distance;
+ u16 distance;

/*
* The parent field is used to implement breadth-first search, and the
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 2d9798b71f74..699e9039a9b3 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1320,7 +1320,7 @@ static struct lock_list *alloc_list_entry(void)
*/
static int add_lock_to_list(struct lock_class *this,
struct lock_class *links_to, struct list_head *head,
- unsigned long ip, int distance,
+ unsigned long ip, u16 distance,
const struct lock_trace *trace)
{
struct lock_list *entry;
@@ -2489,7 +2489,7 @@ check_deadlock(struct task_struct *curr, struct held_lock *next)
*/
static int
check_prev_add(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next, int distance,
+ struct held_lock *next, u16 distance,
struct lock_trace **const trace)
{
struct lock_list *entry;
@@ -2622,7 +2622,7 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next)
goto out_bug;

for (;;) {
- int distance = curr->lockdep_depth - depth + 1;
+ u16 distance = curr->lockdep_depth - depth + 1;
hlock = curr->held_locks + depth - 1;

/*
--
2.28.0

2020-08-07 07:47:40

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 15/19] lockdep/selftest: Unleash irq_read_recursion2 and add more

Now since we can handle recursive read related irq inversion deadlocks
correctly, uncomment the irq_read_recursion2 and add more testcases.

Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 59 +++++++++++++++++++++++++++++++++---------
1 file changed, 47 insertions(+), 12 deletions(-)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 002d1ec09852..f65a658cc9e3 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1053,20 +1053,28 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_wlock)
#define E3() \
\
IRQ_ENTER(); \
- RL(A); \
+ LOCK(A); \
L(B); \
U(B); \
- RU(A); \
+ UNLOCK(A); \
IRQ_EXIT();

/*
- * Generate 12 testcases:
+ * Generate 24 testcases:
*/
#include "locking-selftest-hardirq.h"
-GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard)
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard_wlock)

#include "locking-selftest-softirq.h"
-GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft_wlock)

#undef E1
#undef E2
@@ -1080,8 +1088,8 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
\
IRQ_DISABLE(); \
L(B); \
- WL(A); \
- WU(A); \
+ LOCK(A); \
+ UNLOCK(A); \
U(B); \
IRQ_ENABLE();

@@ -1098,13 +1106,21 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
IRQ_EXIT();

/*
- * Generate 12 testcases:
+ * Generate 24 testcases:
*/
#include "locking-selftest-hardirq.h"
-// GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard)
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard_wlock)

#include "locking-selftest-softirq.h"
-// GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft)
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_wlock)

#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define I_SPINLOCK(x) lockdep_reset_lock(&lock_##x.dep_map)
@@ -1257,6 +1273,25 @@ static inline void print_testname(const char *testname)
dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
pr_cont("\n");

+#define DO_TESTCASE_2RW(desc, name, nr) \
+ print_testname(desc"/"#nr); \
+ pr_cont(" |"); \
+ dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK); \
+ dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
+ pr_cont("\n");
+
+#define DO_TESTCASE_2x2RW(desc, name, nr) \
+ DO_TESTCASE_2RW("hard-"desc, name##_hard, nr) \
+ DO_TESTCASE_2RW("soft-"desc, name##_soft, nr) \
+
+#define DO_TESTCASE_6x2x2RW(desc, name) \
+ DO_TESTCASE_2x2RW(desc, name, 123); \
+ DO_TESTCASE_2x2RW(desc, name, 132); \
+ DO_TESTCASE_2x2RW(desc, name, 213); \
+ DO_TESTCASE_2x2RW(desc, name, 231); \
+ DO_TESTCASE_2x2RW(desc, name, 312); \
+ DO_TESTCASE_2x2RW(desc, name, 321);
+
#define DO_TESTCASE_6(desc, name) \
print_testname(desc); \
dotest(name##_spin, FAILURE, LOCKTYPE_SPIN); \
@@ -2121,8 +2156,8 @@ void locking_selftest(void)
DO_TESTCASE_6x6("safe-A + unsafe-B #2", irqsafe4);
DO_TESTCASE_6x6RW("irq lock-inversion", irq_inversion);

- DO_TESTCASE_6x2("irq read-recursion", irq_read_recursion);
-// DO_TESTCASE_6x2B("irq read-recursion #2", irq_read_recursion2);
+ DO_TESTCASE_6x2x2RW("irq read-recursion", irq_read_recursion);
+ DO_TESTCASE_6x2x2RW("irq read-recursion #2", irq_read_recursion2);

ww_tests();

--
2.28.0

2020-08-07 07:47:51

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 10/19] lockdep: Adjust check_redundant() for recursive read change

check_redundant() will report redundancy if it finds a path could
replace the about-to-add dependency in the BFS search. With recursive
read lock changes, we certainly need to change the match function for
the check_redundant(), because the path needs to match not only the lock
class but also the dependency kinds. For example, if the about-to-add
dependency @prev -> @next is A -(SN)-> B, and we find a path A -(S*)->
.. -(*R)->B in the dependency graph with __bfs() (for simplicity, we can
also say we find an -(SR)-> path from A to B), we can not replace the
dependency with that path in the BFS search. Because the -(SN)->
dependency can make a strong path with a following -(S*)-> dependency,
however an -(SR)-> path cannot.

Further, we can replace an -(SN)-> dependency with a -(EN)-> path, that
means if we find a path which is stronger than or equal to the
about-to-add dependency, we can report the redundancy. By "stronger", it
means both the start and the end of the path are not weaker than the
start and the end of the dependency (E is "stronger" than S and N is
"stronger" than R), so that we can replace the dependency with that
path.

To make sure we find a path whose start point is not weaker than the
about-to-add dependency, we use a trick: the ->only_xr of the root
(start point) of __bfs() is initialized as @prev-> == 0, therefore if
@prev is E, __bfs() will pick only -(E*)-> for the first dependency,
otherwise, __bfs() can pick -(E*)-> or -(S*)-> for the first dependency.

To make sure we find a path whose end point is not weaker than the
about-to-add dependency, we replace the match function for __bfs()
check_redundant(), we check for the case that either @next is R
(anything is not weaker than it) or the end point of the path is N
(which is not weaker than anything).

Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 47 +++++++++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e5b2c1cf4286..85a4d3539faa 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1833,9 +1833,39 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
print_circular_bug_entry(entry, depth);
}

-static inline bool class_equal(struct lock_list *entry, void *data)
+/*
+ * We are about to add A -> B into the dependency graph, and in __bfs() a
+ * strong dependency path A -> .. -> B is found: hlock_class equals
+ * entry->class.
+ *
+ * If A -> .. -> B can replace A -> B in any __bfs() search (means the former
+ * is _stronger_ than or equal to the latter), we consider A -> B as redundant.
+ * For example if A -> .. -> B is -(EN)-> (i.e. A -(E*)-> .. -(*N)-> B), and A
+ * -> B is -(ER)-> or -(EN)->, then we don't need to add A -> B into the
+ * dependency graph, as any strong path ..-> A -> B ->.. we can get with
+ * having dependency A -> B, we could already get a equivalent path ..-> A ->
+ * .. -> B -> .. with A -> .. -> B. Therefore A -> B is reduntant.
+ *
+ * We need to make sure both the start and the end of A -> .. -> B is not
+ * weaker than A -> B. For the start part, please see the comment in
+ * check_redundant(). For the end part, we need:
+ *
+ * Either
+ *
+ * a) A -> B is -(*R)-> (everything is not weaker than that)
+ *
+ * or
+ *
+ * b) A -> .. -> B is -(*N)-> (nothing is stronger than this)
+ *
+ */
+static inline bool hlock_equal(struct lock_list *entry, void *data)
{
- return entry->class == data;
+ struct held_lock *hlock = (struct held_lock *)data;
+
+ return hlock_class(hlock) == entry->class && /* Found A -> .. -> B */
+ (hlock->read == 2 || /* A -> B is -(*R)-> */
+ !entry->only_xr); /* A -> .. -> B is -(*N)-> */
}

/*
@@ -2045,10 +2075,21 @@ check_redundant(struct held_lock *src, struct held_lock *target)
struct lock_list src_entry;

bfs_init_root(&src_entry, src);
+ /*
+ * Special setup for check_redundant().
+ *
+ * To report redundant, we need to find a strong dependency path that
+ * is equal to or stronger than <src> -> <target>. So if <src> is E,
+ * we need to let __bfs() only search for a path starting at a -(E*)->,
+ * we achieve this by setting the initial node's ->only_xr to true in
+ * that case. And if <prev> is S, we set initial ->only_xr to false
+ * because both -(S*)-> (equal) and -(E*)-> (stronger) are redundant.
+ */
+ src_entry.only_xr = src->read == 0;

debug_atomic_inc(nr_redundant_checks);

- ret = check_path(target, &src_entry, class_equal, &target_entry);
+ ret = check_path(target, &src_entry, hlock_equal, &target_entry);

if (ret == BFS_RMATCH)
debug_atomic_inc(nr_redundant);
--
2.28.0

2020-08-07 07:47:53

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 16/19] lockdep/selftest: Add more recursive read related test cases

Add those four test cases:

1. X --(ER)--> Y --(ER)--> Z --(ER)--> X is deadlock.

2. X --(EN)--> Y --(SR)--> Z --(ER)--> X is deadlock.

3. X --(EN)--> Y --(SR)--> Z --(SN)--> X is not deadlock.

4. X --(ER)--> Y --(SR)--> Z --(EN)--> X is not deadlock.

Those self testcases are valuable for the development of supporting
recursive read related deadlock detection.

Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 161 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 161 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index f65a658cc9e3..76c314ab4f03 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1034,6 +1034,133 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_wlock)
#undef E2
#undef E3

+/*
+ * write-read / write-read / write-read deadlock even if read is recursive
+ */
+
+#define E1() \
+ \
+ WL(X1); \
+ RL(Y1); \
+ RU(Y1); \
+ WU(X1);
+
+#define E2() \
+ \
+ WL(Y1); \
+ RL(Z1); \
+ RU(Z1); \
+ WU(Y1);
+
+#define E3() \
+ \
+ WL(Z1); \
+ RL(X1); \
+ RU(X1); \
+ WU(Z1);
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(W1R2_W2R3_W3R1)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * write-write / read-read / write-read deadlock even if read is recursive
+ */
+
+#define E1() \
+ \
+ WL(X1); \
+ WL(Y1); \
+ WU(Y1); \
+ WU(X1);
+
+#define E2() \
+ \
+ RL(Y1); \
+ RL(Z1); \
+ RU(Z1); \
+ RU(Y1);
+
+#define E3() \
+ \
+ WL(Z1); \
+ RL(X1); \
+ RU(X1); \
+ WU(Z1);
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(W1W2_R2R3_W3R1)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * write-write / read-read / read-write is not deadlock when read is recursive
+ */
+
+#define E1() \
+ \
+ WL(X1); \
+ WL(Y1); \
+ WU(Y1); \
+ WU(X1);
+
+#define E2() \
+ \
+ RL(Y1); \
+ RL(Z1); \
+ RU(Z1); \
+ RU(Y1);
+
+#define E3() \
+ \
+ RL(Z1); \
+ WL(X1); \
+ WU(X1); \
+ RU(Z1);
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(W1R2_R2R3_W3W1)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * write-read / read-read / write-write is not deadlock when read is recursive
+ */
+
+#define E1() \
+ \
+ WL(X1); \
+ RL(Y1); \
+ RU(Y1); \
+ WU(X1);
+
+#define E2() \
+ \
+ RL(Y1); \
+ RL(Z1); \
+ RU(Z1); \
+ RU(Y1);
+
+#define E3() \
+ \
+ WL(Z1); \
+ WL(X1); \
+ WU(X1); \
+ WU(Z1);
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(W1W2_R2R3_R3W1)
+
+#undef E1
+#undef E2
+#undef E3
/*
* read-lock / write-lock recursion that is actually safe.
*/
@@ -1259,6 +1386,19 @@ static inline void print_testname(const char *testname)
dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK); \
pr_cont("\n");

+#define DO_TESTCASE_1RR(desc, name, nr) \
+ print_testname(desc"/"#nr); \
+ pr_cont(" |"); \
+ dotest(name##_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
+ pr_cont("\n");
+
+#define DO_TESTCASE_1RRB(desc, name, nr) \
+ print_testname(desc"/"#nr); \
+ pr_cont(" |"); \
+ dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK); \
+ pr_cont("\n");
+
+
#define DO_TESTCASE_3(desc, name, nr) \
print_testname(desc"/"#nr); \
dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN); \
@@ -1368,6 +1508,22 @@ static inline void print_testname(const char *testname)
DO_TESTCASE_2IB(desc, name, 312); \
DO_TESTCASE_2IB(desc, name, 321);

+#define DO_TESTCASE_6x1RR(desc, name) \
+ DO_TESTCASE_1RR(desc, name, 123); \
+ DO_TESTCASE_1RR(desc, name, 132); \
+ DO_TESTCASE_1RR(desc, name, 213); \
+ DO_TESTCASE_1RR(desc, name, 231); \
+ DO_TESTCASE_1RR(desc, name, 312); \
+ DO_TESTCASE_1RR(desc, name, 321);
+
+#define DO_TESTCASE_6x1RRB(desc, name) \
+ DO_TESTCASE_1RRB(desc, name, 123); \
+ DO_TESTCASE_1RRB(desc, name, 132); \
+ DO_TESTCASE_1RRB(desc, name, 213); \
+ DO_TESTCASE_1RRB(desc, name, 231); \
+ DO_TESTCASE_1RRB(desc, name, 312); \
+ DO_TESTCASE_1RRB(desc, name, 321);
+
#define DO_TESTCASE_6x6(desc, name) \
DO_TESTCASE_6I(desc, name, 123); \
DO_TESTCASE_6I(desc, name, 132); \
@@ -2144,6 +2300,11 @@ void locking_selftest(void)
pr_cont(" |");
dotest(rlock_chaincache_ABBA1, FAILURE, LOCKTYPE_RWLOCK);

+ DO_TESTCASE_6x1RRB("rlock W1R2/W2R3/W3R1", W1R2_W2R3_W3R1);
+ DO_TESTCASE_6x1RRB("rlock W1W2/R2R3/W3R1", W1W2_R2R3_W3R1);
+ DO_TESTCASE_6x1RR("rlock W1W2/R2R3/R3W1", W1W2_R2R3_R3W1);
+ DO_TESTCASE_6x1RR("rlock W1R2/R2R3/W3W1", W1R2_R2R3_W3W1);
+
printk(" --------------------------------------------------------------------------\n");

/*
--
2.28.0

2020-08-07 07:47:56

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 08/19] lockdep: Make __bfs(.match) return bool

The "match" parameter of __bfs() is used for checking whether we hit a
match in the search, therefore it should return a boolean value rather
than an integer for better readability.

This patch then changes the return type of the function parameter and the
match functions to bool.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bb8b7e42c154..62f7f88e3673 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1620,7 +1620,7 @@ static inline void bfs_init_rootb(struct lock_list *lock,
*/
static enum bfs_result __bfs(struct lock_list *source_entry,
void *data,
- int (*match)(struct lock_list *entry, void *data),
+ bool (*match)(struct lock_list *entry, void *data),
struct lock_list **target_entry,
int offset)
{
@@ -1711,7 +1711,7 @@ static enum bfs_result __bfs(struct lock_list *source_entry,
static inline enum bfs_result
__bfs_forwards(struct lock_list *src_entry,
void *data,
- int (*match)(struct lock_list *entry, void *data),
+ bool (*match)(struct lock_list *entry, void *data),
struct lock_list **target_entry)
{
return __bfs(src_entry, data, match, target_entry,
@@ -1722,7 +1722,7 @@ __bfs_forwards(struct lock_list *src_entry,
static inline enum bfs_result
__bfs_backwards(struct lock_list *src_entry,
void *data,
- int (*match)(struct lock_list *entry, void *data),
+ bool (*match)(struct lock_list *entry, void *data),
struct lock_list **target_entry)
{
return __bfs(src_entry, data, match, target_entry,
@@ -1833,7 +1833,7 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
print_circular_bug_entry(entry, depth);
}

-static inline int class_equal(struct lock_list *entry, void *data)
+static inline bool class_equal(struct lock_list *entry, void *data)
{
return entry->class == data;
}
@@ -1888,10 +1888,10 @@ static noinline void print_bfs_bug(int ret)
WARN(1, "lockdep bfs error:%d\n", ret);
}

-static int noop_count(struct lock_list *entry, void *data)
+static bool noop_count(struct lock_list *entry, void *data)
{
(*(unsigned long *)data)++;
- return 0;
+ return false;
}

static unsigned long __lockdep_count_forward_deps(struct lock_list *this)
@@ -2032,11 +2032,11 @@ check_redundant(struct held_lock *src, struct held_lock *target)

#ifdef CONFIG_TRACE_IRQFLAGS

-static inline int usage_accumulate(struct lock_list *entry, void *mask)
+static inline bool usage_accumulate(struct lock_list *entry, void *mask)
{
*(unsigned long *)mask |= entry->class->usage_mask;

- return 0;
+ return false;
}

/*
@@ -2045,9 +2045,9 @@ static inline int usage_accumulate(struct lock_list *entry, void *mask)
* without creating any illegal irq-safe -> irq-unsafe lock dependency.
*/

-static inline int usage_match(struct lock_list *entry, void *mask)
+static inline bool usage_match(struct lock_list *entry, void *mask)
{
- return entry->class->usage_mask & *(unsigned long *)mask;
+ return !!(entry->class->usage_mask & *(unsigned long *)mask);
}

/*
--
2.28.0

2020-08-07 07:48:06

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 12/19] lockdep: Add recursive read locks into dependency graph

Since we have all the fundamental to handle recursive read locks, we now
add them into the dependency graph.

Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 19 ++-----------------
1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 040509667798..867199c4b85d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2808,16 +2808,6 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
if (!check_irq_usage(curr, prev, next))
return 0;

- /*
- * For recursive read-locks we do all the dependency checks,
- * but we dont store read-triggered dependencies (only
- * write-triggered dependencies). This ensures that only the
- * write-side dependencies matter, and that if for example a
- * write-lock never takes any other locks, then the reads are
- * equivalent to a NOP.
- */
- if (next->read == 2 || prev->read == 2)
- return 1;
/*
* Is the <prev> -> <next> dependency already present?
*
@@ -2935,13 +2925,8 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next)
u16 distance = curr->lockdep_depth - depth + 1;
hlock = curr->held_locks + depth - 1;

- /*
- * Only non-recursive-read entries get new dependencies
- * added:
- */
- if (hlock->read != 2 && hlock->check) {
- int ret = check_prev_add(curr, hlock, next, distance,
- &trace);
+ if (hlock->check) {
+ int ret = check_prev_add(curr, hlock, next, distance, &trace);
if (!ret)
return 0;

--
2.28.0

2020-08-07 07:48:14

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 17/19] Revert "locking/lockdep/selftests: Fix mixed read-write ABBA tests"

This reverts commit d82fed75294229abc9d757f08a4817febae6c4f4.

Since we now could handle mixed read-write deadlock detection well, the
self tests could be detected as expected, no need to use this
work-around.

Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 76c314ab4f03..4264cf4b60bb 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -2273,14 +2273,6 @@ void locking_selftest(void)
print_testname("mixed read-lock/lock-write ABBA");
pr_cont(" |");
dotest(rlock_ABBA1, FAILURE, LOCKTYPE_RWLOCK);
-#ifdef CONFIG_PROVE_LOCKING
- /*
- * Lockdep does indeed fail here, but there's nothing we can do about
- * that now. Don't kill lockdep for it.
- */
- unexpected_testcase_failures--;
-#endif
-
pr_cont(" |");
dotest(rwsem_ABBA1, FAILURE, LOCKTYPE_RWSEM);

--
2.28.0

2020-08-07 07:48:28

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 13/19] lockdep/selftest: Add a R-L/L-W test case specific to chain cache behavior

As our chain cache doesn't differ read/write locks, so even we can
detect a read-lock/lock-write deadlock in check_noncircular(), we can
still be fooled if a read-lock/lock-read case(which is not a deadlock)
comes first.

So introduce this test case to test specific to the chain cache behavior
on detecting recursive read lock related deadlocks.

Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 47 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index caadc4dd3368..002d1ec09852 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -396,6 +396,49 @@ static void rwsem_ABBA1(void)
MU(Y1); // should fail
}

+/*
+ * read_lock(A)
+ * spin_lock(B)
+ * spin_lock(B)
+ * write_lock(A)
+ *
+ * This test case is aimed at poking whether the chain cache prevents us from
+ * detecting a read-lock/lock-write deadlock: if the chain cache doesn't differ
+ * read/write locks, the following case may happen
+ *
+ * { read_lock(A)->lock(B) dependency exists }
+ *
+ * P0:
+ * lock(B);
+ * read_lock(A);
+ *
+ * { Not a deadlock, B -> A is added in the chain cache }
+ *
+ * P1:
+ * lock(B);
+ * write_lock(A);
+ *
+ * { B->A found in chain cache, not reported as a deadlock }
+ *
+ */
+static void rlock_chaincache_ABBA1(void)
+{
+ RL(X1);
+ L(Y1);
+ U(Y1);
+ RU(X1);
+
+ L(Y1);
+ RL(X1);
+ RU(X1);
+ U(Y1);
+
+ L(Y1);
+ WL(X1);
+ WU(X1);
+ U(Y1); // should fail
+}
+
/*
* read_lock(A)
* spin_lock(B)
@@ -2062,6 +2105,10 @@ void locking_selftest(void)
pr_cont(" |");
dotest(rwsem_ABBA3, FAILURE, LOCKTYPE_RWSEM);

+ print_testname("chain cached mixed R-L/L-W ABBA");
+ pr_cont(" |");
+ dotest(rlock_chaincache_ABBA1, FAILURE, LOCKTYPE_RWLOCK);
+
printk(" --------------------------------------------------------------------------\n");

/*
--
2.28.0

2020-08-07 07:48:44

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 19/19] lockdep/selftest: Introduce recursion3

Add a test case shows that USED_IN_*_READ and ENABLE_*_READ can cause
deadlock too.

Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 55 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 17f8f6f37165..a899b3f0e2e5 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1249,6 +1249,60 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_rlock)
#include "locking-selftest-wlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_wlock)

+#undef E1
+#undef E2
+#undef E3
+/*
+ * read-lock / write-lock recursion that is unsafe.
+ *
+ * A is a ENABLED_*_READ lock
+ * B is a USED_IN_*_READ lock
+ *
+ * read_lock(A);
+ * write_lock(B);
+ * <interrupt>
+ * read_lock(B);
+ * write_lock(A); // if this one is read_lock(), no deadlock
+ */
+
+#define E1() \
+ \
+ IRQ_DISABLE(); \
+ WL(B); \
+ LOCK(A); \
+ UNLOCK(A); \
+ WU(B); \
+ IRQ_ENABLE();
+
+#define E2() \
+ \
+ RL(A); \
+ RU(A); \
+
+#define E3() \
+ \
+ IRQ_ENTER(); \
+ RL(B); \
+ RU(B); \
+ IRQ_EXIT();
+
+/*
+ * Generate 24 testcases:
+ */
+#include "locking-selftest-hardirq.h"
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_hard_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_hard_wlock)
+
+#include "locking-selftest-softirq.h"
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_soft_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_soft_wlock)
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define I_SPINLOCK(x) lockdep_reset_lock(&lock_##x.dep_map)
# define I_RWLOCK(x) lockdep_reset_lock(&rwlock_##x.dep_map)
@@ -2413,6 +2467,7 @@ void locking_selftest(void)

DO_TESTCASE_6x2x2RW("irq read-recursion", irq_read_recursion);
DO_TESTCASE_6x2x2RW("irq read-recursion #2", irq_read_recursion2);
+ DO_TESTCASE_6x2x2RW("irq read-recursion #3", irq_read_recursion3);

ww_tests();

--
2.28.0

2020-08-07 07:50:20

by Boqun Feng

[permalink] [raw]
Subject: [RFC v7 18/19] locking/selftest: Add test cases for queued_read_lock()

Add two self test cases for the following case:

P0: P1: P2:

<in irq handler>
spin_lock_irq(&slock) read_lock(&rwlock)
write_lock_irq(&rwlock)
read_lock(&rwlock) spin_lock(&slock)

, which is a deadlock, as the read_lock() on P0 cannot get the lock
because of the fairness.

P0: P1: P2:

<in irq handler>
spin_lock(&slock) read_lock(&rwlock)
write_lock(&rwlock)
read_lock(&rwlock) spin_lock_irq(&slock)

, which is not a deadlock, as the read_lock() on P0 can get the lock
because it could use the unfair fastpass.

Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 104 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 4264cf4b60bb..17f8f6f37165 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -2201,6 +2201,108 @@ static void ww_tests(void)
pr_cont("\n");
}

+
+/*
+ * <in hardirq handler>
+ * read_lock(&A);
+ * <hardirq disable>
+ * spin_lock(&B);
+ * spin_lock(&B);
+ * read_lock(&A);
+ *
+ * is a deadlock.
+ */
+static void queued_read_lock_hardirq_RE_Er(void)
+{
+ HARDIRQ_ENTER();
+ read_lock(&rwlock_A);
+ LOCK(B);
+ UNLOCK(B);
+ read_unlock(&rwlock_A);
+ HARDIRQ_EXIT();
+
+ HARDIRQ_DISABLE();
+ LOCK(B);
+ read_lock(&rwlock_A);
+ read_unlock(&rwlock_A);
+ UNLOCK(B);
+ HARDIRQ_ENABLE();
+}
+
+/*
+ * <in hardirq handler>
+ * spin_lock(&B);
+ * <hardirq disable>
+ * read_lock(&A);
+ * read_lock(&A);
+ * spin_lock(&B);
+ *
+ * is not a deadlock.
+ */
+static void queued_read_lock_hardirq_ER_rE(void)
+{
+ HARDIRQ_ENTER();
+ LOCK(B);
+ read_lock(&rwlock_A);
+ read_unlock(&rwlock_A);
+ UNLOCK(B);
+ HARDIRQ_EXIT();
+
+ HARDIRQ_DISABLE();
+ read_lock(&rwlock_A);
+ LOCK(B);
+ UNLOCK(B);
+ read_unlock(&rwlock_A);
+ HARDIRQ_ENABLE();
+}
+
+/*
+ * <hardirq disable>
+ * spin_lock(&B);
+ * read_lock(&A);
+ * <in hardirq handler>
+ * spin_lock(&B);
+ * read_lock(&A);
+ *
+ * is a deadlock. Because the two read_lock()s are both non-recursive readers.
+ */
+static void queued_read_lock_hardirq_inversion(void)
+{
+
+ HARDIRQ_ENTER();
+ LOCK(B);
+ UNLOCK(B);
+ HARDIRQ_EXIT();
+
+ HARDIRQ_DISABLE();
+ LOCK(B);
+ read_lock(&rwlock_A);
+ read_unlock(&rwlock_A);
+ UNLOCK(B);
+ HARDIRQ_ENABLE();
+
+ read_lock(&rwlock_A);
+ read_unlock(&rwlock_A);
+}
+
+static void queued_read_lock_tests(void)
+{
+ printk(" --------------------------------------------------------------------------\n");
+ printk(" | queued read lock tests |\n");
+ printk(" ---------------------------\n");
+ print_testname("hardirq read-lock/lock-read");
+ dotest(queued_read_lock_hardirq_RE_Er, FAILURE, LOCKTYPE_RWLOCK);
+ pr_cont("\n");
+
+ print_testname("hardirq lock-read/read-lock");
+ dotest(queued_read_lock_hardirq_ER_rE, SUCCESS, LOCKTYPE_RWLOCK);
+ pr_cont("\n");
+
+ print_testname("hardirq inversion");
+ dotest(queued_read_lock_hardirq_inversion, FAILURE, LOCKTYPE_RWLOCK);
+ pr_cont("\n");
+}
+
void locking_selftest(void)
{
/*
@@ -2318,6 +2420,8 @@ void locking_selftest(void)
/*
* queued_read_lock() specific test cases can be put here
*/
+ if (IS_ENABLED(CONFIG_QUEUED_RWLOCKS))
+ queued_read_lock_tests();

if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
--
2.28.0

2020-08-21 17:43:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC v7 14/19] lockdep: Take read/write status in consideration when generate chainkey


So far so good, excellent work.

On Fri, Aug 07, 2020 at 03:42:33PM +0800, Boqun Feng wrote:
> @@ -371,6 +371,21 @@ static struct hlist_head classhash_table[CLASSHASH_SIZE];
>
> static struct hlist_head chainhash_table[CHAINHASH_SIZE];
>
> +/*
> + * the id of held_lock
> + */
> +static inline u16 hlock_id(struct held_lock *hlock)
> +{
> + BUILD_BUG_ON(MAX_LOCKDEP_KEYS_BITS + 2 > 16);
> +
> + return (hlock->class_idx | (hlock->read << MAX_LOCKDEP_KEYS_BITS));
> +}
> +
> +static inline unsigned int chain_hlock_class_idx(u16 hlock_id)
> +{
> + return hlock_id & MAX_LOCKDEP_KEYS;

But did that want to be:

return hlock_id & (MAX_LOCKDEP_KEYS-1);

?

> +}

2020-08-21 19:59:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC v7 00/19] lockdep: Support deadlock detection for recursive read locks

On Fri, Aug 07, 2020 at 03:42:19PM +0800, Boqun Feng wrote:
> Hi Peter and Waiman,
>
> As promised, this is the updated version of my previous lockdep patchset
> for recursive read lock support. It's based on v5.8. Previous versions
> can be found at:

OK, this all looks really nice.

I've stuck it in my locking/core branch for testing, I've had to fix a
few minor rejects (my bad for being to slow), made a few minor edits and
fixed that one masking thing.

It seems to boot with the selftests all green, haven't done much else
with it yet, we'll see.

git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core

Thanks!

2020-08-22 02:54:01

by Boqun Feng

[permalink] [raw]
Subject: Re: [RFC v7 14/19] lockdep: Take read/write status in consideration when generate chainkey

On Fri, Aug 21, 2020 at 07:41:32PM +0200, Peter Zijlstra wrote:
>
> So far so good, excellent work.
>
> On Fri, Aug 07, 2020 at 03:42:33PM +0800, Boqun Feng wrote:
> > @@ -371,6 +371,21 @@ static struct hlist_head classhash_table[CLASSHASH_SIZE];
> >
> > static struct hlist_head chainhash_table[CHAINHASH_SIZE];
> >
> > +/*
> > + * the id of held_lock
> > + */
> > +static inline u16 hlock_id(struct held_lock *hlock)
> > +{
> > + BUILD_BUG_ON(MAX_LOCKDEP_KEYS_BITS + 2 > 16);
> > +
> > + return (hlock->class_idx | (hlock->read << MAX_LOCKDEP_KEYS_BITS));
> > +}
> > +
> > +static inline unsigned int chain_hlock_class_idx(u16 hlock_id)
> > +{
> > + return hlock_id & MAX_LOCKDEP_KEYS;
>
> But did that want to be:
>
> return hlock_id & (MAX_LOCKDEP_KEYS-1);
>

Right, clearly I'm missing the fact we have change the definition of
MAX_LOCKDEP_KEYS at commit 01bb6f0af992 ("locking/lockdep: Change the
range of class_idx in held_lock struct").

Thanks for catching this!

Regards,
Boqun

> ?
>
> > +}

2020-08-23 01:24:38

by Boqun Feng

[permalink] [raw]
Subject: Re: [RFC v7 00/19] lockdep: Support deadlock detection for recursive read locks

On Fri, Aug 21, 2020 at 09:56:41PM +0200, Peter Zijlstra wrote:
> On Fri, Aug 07, 2020 at 03:42:19PM +0800, Boqun Feng wrote:
> > Hi Peter and Waiman,
> >
> > As promised, this is the updated version of my previous lockdep patchset
> > for recursive read lock support. It's based on v5.8. Previous versions
> > can be found at:
>
> OK, this all looks really nice.
>
> I've stuck it in my locking/core branch for testing, I've had to fix a
> few minor rejects (my bad for being to slow), made a few minor edits and
> fixed that one masking thing.
>

Thanks!

Regards,
Boqun

> It seems to boot with the selftests all green, haven't done much else
> with it yet, we'll see.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core
>
> Thanks!

Subject: [tip: locking/core] Revert "locking/lockdep/selftests: Fix mixed read-write ABBA tests"

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 108dc42ed3507fe06214d51ab15fca7771df8bbd
Gitweb: https://git.kernel.org/tip/108dc42ed3507fe06214d51ab15fca7771df8bbd
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:36 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:07 +02:00

Revert "locking/lockdep/selftests: Fix mixed read-write ABBA tests"

This reverts commit d82fed75294229abc9d757f08a4817febae6c4f4.

Since we now could handle mixed read-write deadlock detection well, the
self tests could be detected as expected, no need to use this
work-around.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
lib/locking-selftest.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 76c314a..4264cf4 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -2273,14 +2273,6 @@ void locking_selftest(void)
print_testname("mixed read-lock/lock-write ABBA");
pr_cont(" |");
dotest(rlock_ABBA1, FAILURE, LOCKTYPE_RWLOCK);
-#ifdef CONFIG_PROVE_LOCKING
- /*
- * Lockdep does indeed fail here, but there's nothing we can do about
- * that now. Don't kill lockdep for it.
- */
- unexpected_testcase_failures--;
-#endif
-
pr_cont(" |");
dotest(rwsem_ABBA1, FAILURE, LOCKTYPE_RWSEM);

Subject: [tip: locking/core] lockdep: Make __bfs(.match) return bool

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 61775ed243433ff0556c4f76905929fe01e92922
Gitweb: https://git.kernel.org/tip/61775ed243433ff0556c4f76905929fe01e92922
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:27 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:05 +02:00

lockdep: Make __bfs(.match) return bool

The "match" parameter of __bfs() is used for checking whether we hit a
match in the search, therefore it should return a boolean value rather
than an integer for better readability.

This patch then changes the return type of the function parameter and the
match functions to bool.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5abc227..78cd74d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1620,7 +1620,7 @@ static inline void bfs_init_rootb(struct lock_list *lock,
*/
static enum bfs_result __bfs(struct lock_list *source_entry,
void *data,
- int (*match)(struct lock_list *entry, void *data),
+ bool (*match)(struct lock_list *entry, void *data),
struct lock_list **target_entry,
int offset)
{
@@ -1711,7 +1711,7 @@ exit:
static inline enum bfs_result
__bfs_forwards(struct lock_list *src_entry,
void *data,
- int (*match)(struct lock_list *entry, void *data),
+ bool (*match)(struct lock_list *entry, void *data),
struct lock_list **target_entry)
{
return __bfs(src_entry, data, match, target_entry,
@@ -1722,7 +1722,7 @@ __bfs_forwards(struct lock_list *src_entry,
static inline enum bfs_result
__bfs_backwards(struct lock_list *src_entry,
void *data,
- int (*match)(struct lock_list *entry, void *data),
+ bool (*match)(struct lock_list *entry, void *data),
struct lock_list **target_entry)
{
return __bfs(src_entry, data, match, target_entry,
@@ -1833,7 +1833,7 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
print_circular_bug_entry(entry, depth);
}

-static inline int class_equal(struct lock_list *entry, void *data)
+static inline bool class_equal(struct lock_list *entry, void *data)
{
return entry->class == data;
}
@@ -1888,10 +1888,10 @@ static noinline void print_bfs_bug(int ret)
WARN(1, "lockdep bfs error:%d\n", ret);
}

-static int noop_count(struct lock_list *entry, void *data)
+static bool noop_count(struct lock_list *entry, void *data)
{
(*(unsigned long *)data)++;
- return 0;
+ return false;
}

static unsigned long __lockdep_count_forward_deps(struct lock_list *this)
@@ -2032,11 +2032,11 @@ check_redundant(struct held_lock *src, struct held_lock *target)

#ifdef CONFIG_TRACE_IRQFLAGS

-static inline int usage_accumulate(struct lock_list *entry, void *mask)
+static inline bool usage_accumulate(struct lock_list *entry, void *mask)
{
*(unsigned long *)mask |= entry->class->usage_mask;

- return 0;
+ return false;
}

/*
@@ -2045,9 +2045,9 @@ static inline int usage_accumulate(struct lock_list *entry, void *mask)
* without creating any illegal irq-safe -> irq-unsafe lock dependency.
*/

-static inline int usage_match(struct lock_list *entry, void *mask)
+static inline bool usage_match(struct lock_list *entry, void *mask)
{
- return entry->class->usage_mask & *(unsigned long *)mask;
+ return !!(entry->class->usage_mask & *(unsigned long *)mask);
}

/*

Subject: [tip: locking/core] lockdep: Adjust check_redundant() for recursive read change

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 68e305678583f13a67e2ce22088c2520bd4f97b4
Gitweb: https://git.kernel.org/tip/68e305678583f13a67e2ce22088c2520bd4f97b4
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:29 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:05 +02:00

lockdep: Adjust check_redundant() for recursive read change

check_redundant() will report redundancy if it finds a path could
replace the about-to-add dependency in the BFS search. With recursive
read lock changes, we certainly need to change the match function for
the check_redundant(), because the path needs to match not only the lock
class but also the dependency kinds. For example, if the about-to-add
dependency @prev -> @next is A -(SN)-> B, and we find a path A -(S*)->
.. -(*R)->B in the dependency graph with __bfs() (for simplicity, we can
also say we find an -(SR)-> path from A to B), we can not replace the
dependency with that path in the BFS search. Because the -(SN)->
dependency can make a strong path with a following -(S*)-> dependency,
however an -(SR)-> path cannot.

Further, we can replace an -(SN)-> dependency with a -(EN)-> path, that
means if we find a path which is stronger than or equal to the
about-to-add dependency, we can report the redundancy. By "stronger", it
means both the start and the end of the path are not weaker than the
start and the end of the dependency (E is "stronger" than S and N is
"stronger" than R), so that we can replace the dependency with that
path.

To make sure we find a path whose start point is not weaker than the
about-to-add dependency, we use a trick: the ->only_xr of the root
(start point) of __bfs() is initialized as @prev-> == 0, therefore if
@prev is E, __bfs() will pick only -(E*)-> for the first dependency,
otherwise, __bfs() can pick -(E*)-> or -(S*)-> for the first dependency.

To make sure we find a path whose end point is not weaker than the
about-to-add dependency, we replace the match function for __bfs()
check_redundant(), we check for the case that either @next is R
(anything is not weaker than it) or the end point of the path is N
(which is not weaker than anything).

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 47 ++++++++++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 9160f1d..42e2f1f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1833,9 +1833,39 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
print_circular_bug_entry(entry, depth);
}

-static inline bool class_equal(struct lock_list *entry, void *data)
+/*
+ * We are about to add A -> B into the dependency graph, and in __bfs() a
+ * strong dependency path A -> .. -> B is found: hlock_class equals
+ * entry->class.
+ *
+ * If A -> .. -> B can replace A -> B in any __bfs() search (means the former
+ * is _stronger_ than or equal to the latter), we consider A -> B as redundant.
+ * For example if A -> .. -> B is -(EN)-> (i.e. A -(E*)-> .. -(*N)-> B), and A
+ * -> B is -(ER)-> or -(EN)->, then we don't need to add A -> B into the
+ * dependency graph, as any strong path ..-> A -> B ->.. we can get with
+ * having dependency A -> B, we could already get a equivalent path ..-> A ->
+ * .. -> B -> .. with A -> .. -> B. Therefore A -> B is reduntant.
+ *
+ * We need to make sure both the start and the end of A -> .. -> B is not
+ * weaker than A -> B. For the start part, please see the comment in
+ * check_redundant(). For the end part, we need:
+ *
+ * Either
+ *
+ * a) A -> B is -(*R)-> (everything is not weaker than that)
+ *
+ * or
+ *
+ * b) A -> .. -> B is -(*N)-> (nothing is stronger than this)
+ *
+ */
+static inline bool hlock_equal(struct lock_list *entry, void *data)
{
- return entry->class == data;
+ struct held_lock *hlock = (struct held_lock *)data;
+
+ return hlock_class(hlock) == entry->class && /* Found A -> .. -> B */
+ (hlock->read == 2 || /* A -> B is -(*R)-> */
+ !entry->only_xr); /* A -> .. -> B is -(*N)-> */
}

/*
@@ -2045,10 +2075,21 @@ check_redundant(struct held_lock *src, struct held_lock *target)
struct lock_list src_entry;

bfs_init_root(&src_entry, src);
+ /*
+ * Special setup for check_redundant().
+ *
+ * To report redundant, we need to find a strong dependency path that
+ * is equal to or stronger than <src> -> <target>. So if <src> is E,
+ * we need to let __bfs() only search for a path starting at a -(E*)->,
+ * we achieve this by setting the initial node's ->only_xr to true in
+ * that case. And if <prev> is S, we set initial ->only_xr to false
+ * because both -(S*)-> (equal) and -(E*)-> (stronger) are redundant.
+ */
+ src_entry.only_xr = src->read == 0;

debug_atomic_inc(nr_redundant_checks);

- ret = check_path(target, &src_entry, class_equal, &target_entry);
+ ret = check_path(target, &src_entry, hlock_equal, &target_entry);

if (ret == BFS_RMATCH)
debug_atomic_inc(nr_redundant);

Subject: [tip: locking/core] lockdep/selftest: Add more recursive read related test cases

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 8ef7ca75120a39167def40f41daefee013c4b5af
Gitweb: https://git.kernel.org/tip/8ef7ca75120a39167def40f41daefee013c4b5af
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:35 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:07 +02:00

lockdep/selftest: Add more recursive read related test cases

Add those four test cases:

1. X --(ER)--> Y --(ER)--> Z --(ER)--> X is deadlock.

2. X --(EN)--> Y --(SR)--> Z --(ER)--> X is deadlock.

3. X --(EN)--> Y --(SR)--> Z --(SN)--> X is not deadlock.

4. X --(ER)--> Y --(SR)--> Z --(EN)--> X is not deadlock.

Those self testcases are valuable for the development of supporting
recursive read related deadlock detection.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
lib/locking-selftest.c | 161 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 161 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index f65a658..76c314a 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1035,6 +1035,133 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_wlock)
#undef E3

/*
+ * write-read / write-read / write-read deadlock even if read is recursive
+ */
+
+#define E1() \
+ \
+ WL(X1); \
+ RL(Y1); \
+ RU(Y1); \
+ WU(X1);
+
+#define E2() \
+ \
+ WL(Y1); \
+ RL(Z1); \
+ RU(Z1); \
+ WU(Y1);
+
+#define E3() \
+ \
+ WL(Z1); \
+ RL(X1); \
+ RU(X1); \
+ WU(Z1);
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(W1R2_W2R3_W3R1)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * write-write / read-read / write-read deadlock even if read is recursive
+ */
+
+#define E1() \
+ \
+ WL(X1); \
+ WL(Y1); \
+ WU(Y1); \
+ WU(X1);
+
+#define E2() \
+ \
+ RL(Y1); \
+ RL(Z1); \
+ RU(Z1); \
+ RU(Y1);
+
+#define E3() \
+ \
+ WL(Z1); \
+ RL(X1); \
+ RU(X1); \
+ WU(Z1);
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(W1W2_R2R3_W3R1)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * write-write / read-read / read-write is not deadlock when read is recursive
+ */
+
+#define E1() \
+ \
+ WL(X1); \
+ WL(Y1); \
+ WU(Y1); \
+ WU(X1);
+
+#define E2() \
+ \
+ RL(Y1); \
+ RL(Z1); \
+ RU(Z1); \
+ RU(Y1);
+
+#define E3() \
+ \
+ RL(Z1); \
+ WL(X1); \
+ WU(X1); \
+ RU(Z1);
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(W1R2_R2R3_W3W1)
+
+#undef E1
+#undef E2
+#undef E3
+
+/*
+ * write-read / read-read / write-write is not deadlock when read is recursive
+ */
+
+#define E1() \
+ \
+ WL(X1); \
+ RL(Y1); \
+ RU(Y1); \
+ WU(X1);
+
+#define E2() \
+ \
+ RL(Y1); \
+ RL(Z1); \
+ RU(Z1); \
+ RU(Y1);
+
+#define E3() \
+ \
+ WL(Z1); \
+ WL(X1); \
+ WU(X1); \
+ WU(Z1);
+
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(W1W2_R2R3_R3W1)
+
+#undef E1
+#undef E2
+#undef E3
+/*
* read-lock / write-lock recursion that is actually safe.
*/

@@ -1259,6 +1386,19 @@ static inline void print_testname(const char *testname)
dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK); \
pr_cont("\n");

+#define DO_TESTCASE_1RR(desc, name, nr) \
+ print_testname(desc"/"#nr); \
+ pr_cont(" |"); \
+ dotest(name##_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
+ pr_cont("\n");
+
+#define DO_TESTCASE_1RRB(desc, name, nr) \
+ print_testname(desc"/"#nr); \
+ pr_cont(" |"); \
+ dotest(name##_##nr, FAILURE, LOCKTYPE_RWLOCK); \
+ pr_cont("\n");
+
+
#define DO_TESTCASE_3(desc, name, nr) \
print_testname(desc"/"#nr); \
dotest(name##_spin_##nr, FAILURE, LOCKTYPE_SPIN); \
@@ -1368,6 +1508,22 @@ static inline void print_testname(const char *testname)
DO_TESTCASE_2IB(desc, name, 312); \
DO_TESTCASE_2IB(desc, name, 321);

+#define DO_TESTCASE_6x1RR(desc, name) \
+ DO_TESTCASE_1RR(desc, name, 123); \
+ DO_TESTCASE_1RR(desc, name, 132); \
+ DO_TESTCASE_1RR(desc, name, 213); \
+ DO_TESTCASE_1RR(desc, name, 231); \
+ DO_TESTCASE_1RR(desc, name, 312); \
+ DO_TESTCASE_1RR(desc, name, 321);
+
+#define DO_TESTCASE_6x1RRB(desc, name) \
+ DO_TESTCASE_1RRB(desc, name, 123); \
+ DO_TESTCASE_1RRB(desc, name, 132); \
+ DO_TESTCASE_1RRB(desc, name, 213); \
+ DO_TESTCASE_1RRB(desc, name, 231); \
+ DO_TESTCASE_1RRB(desc, name, 312); \
+ DO_TESTCASE_1RRB(desc, name, 321);
+
#define DO_TESTCASE_6x6(desc, name) \
DO_TESTCASE_6I(desc, name, 123); \
DO_TESTCASE_6I(desc, name, 132); \
@@ -2144,6 +2300,11 @@ void locking_selftest(void)
pr_cont(" |");
dotest(rlock_chaincache_ABBA1, FAILURE, LOCKTYPE_RWLOCK);

+ DO_TESTCASE_6x1RRB("rlock W1R2/W2R3/W3R1", W1R2_W2R3_W3R1);
+ DO_TESTCASE_6x1RRB("rlock W1W2/R2R3/W3R1", W1W2_R2R3_W3R1);
+ DO_TESTCASE_6x1RR("rlock W1W2/R2R3/R3W1", W1W2_R2R3_R3W1);
+ DO_TESTCASE_6x1RR("rlock W1R2/R2R3/W3W1", W1R2_R2R3_W3W1);
+
printk(" --------------------------------------------------------------------------\n");

/*

Subject: [tip: locking/core] locking/selftest: Add test cases for queued_read_lock()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: ad56450db86413ff911eb527b5a49e04a4345e61
Gitweb: https://git.kernel.org/tip/ad56450db86413ff911eb527b5a49e04a4345e61
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:37 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:07 +02:00

locking/selftest: Add test cases for queued_read_lock()

Add two self test cases for the following case:

P0: P1: P2:

<in irq handler>
spin_lock_irq(&slock) read_lock(&rwlock)
write_lock_irq(&rwlock)
read_lock(&rwlock) spin_lock(&slock)

, which is a deadlock, as the read_lock() on P0 cannot get the lock
because of the fairness.

P0: P1: P2:

<in irq handler>
spin_lock(&slock) read_lock(&rwlock)
write_lock(&rwlock)
read_lock(&rwlock) spin_lock_irq(&slock)

, which is not a deadlock, as the read_lock() on P0 can get the lock
because it could use the unfair fastpass.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
lib/locking-selftest.c | 104 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 104 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 4264cf4..17f8f6f 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -2201,6 +2201,108 @@ static void ww_tests(void)
pr_cont("\n");
}

+
+/*
+ * <in hardirq handler>
+ * read_lock(&A);
+ * <hardirq disable>
+ * spin_lock(&B);
+ * spin_lock(&B);
+ * read_lock(&A);
+ *
+ * is a deadlock.
+ */
+static void queued_read_lock_hardirq_RE_Er(void)
+{
+ HARDIRQ_ENTER();
+ read_lock(&rwlock_A);
+ LOCK(B);
+ UNLOCK(B);
+ read_unlock(&rwlock_A);
+ HARDIRQ_EXIT();
+
+ HARDIRQ_DISABLE();
+ LOCK(B);
+ read_lock(&rwlock_A);
+ read_unlock(&rwlock_A);
+ UNLOCK(B);
+ HARDIRQ_ENABLE();
+}
+
+/*
+ * <in hardirq handler>
+ * spin_lock(&B);
+ * <hardirq disable>
+ * read_lock(&A);
+ * read_lock(&A);
+ * spin_lock(&B);
+ *
+ * is not a deadlock.
+ */
+static void queued_read_lock_hardirq_ER_rE(void)
+{
+ HARDIRQ_ENTER();
+ LOCK(B);
+ read_lock(&rwlock_A);
+ read_unlock(&rwlock_A);
+ UNLOCK(B);
+ HARDIRQ_EXIT();
+
+ HARDIRQ_DISABLE();
+ read_lock(&rwlock_A);
+ LOCK(B);
+ UNLOCK(B);
+ read_unlock(&rwlock_A);
+ HARDIRQ_ENABLE();
+}
+
+/*
+ * <hardirq disable>
+ * spin_lock(&B);
+ * read_lock(&A);
+ * <in hardirq handler>
+ * spin_lock(&B);
+ * read_lock(&A);
+ *
+ * is a deadlock. Because the two read_lock()s are both non-recursive readers.
+ */
+static void queued_read_lock_hardirq_inversion(void)
+{
+
+ HARDIRQ_ENTER();
+ LOCK(B);
+ UNLOCK(B);
+ HARDIRQ_EXIT();
+
+ HARDIRQ_DISABLE();
+ LOCK(B);
+ read_lock(&rwlock_A);
+ read_unlock(&rwlock_A);
+ UNLOCK(B);
+ HARDIRQ_ENABLE();
+
+ read_lock(&rwlock_A);
+ read_unlock(&rwlock_A);
+}
+
+static void queued_read_lock_tests(void)
+{
+ printk(" --------------------------------------------------------------------------\n");
+ printk(" | queued read lock tests |\n");
+ printk(" ---------------------------\n");
+ print_testname("hardirq read-lock/lock-read");
+ dotest(queued_read_lock_hardirq_RE_Er, FAILURE, LOCKTYPE_RWLOCK);
+ pr_cont("\n");
+
+ print_testname("hardirq lock-read/read-lock");
+ dotest(queued_read_lock_hardirq_ER_rE, SUCCESS, LOCKTYPE_RWLOCK);
+ pr_cont("\n");
+
+ print_testname("hardirq inversion");
+ dotest(queued_read_lock_hardirq_inversion, FAILURE, LOCKTYPE_RWLOCK);
+ pr_cont("\n");
+}
+
void locking_selftest(void)
{
/*
@@ -2318,6 +2420,8 @@ void locking_selftest(void)
/*
* queued_read_lock() specific test cases can be put here
*/
+ if (IS_ENABLED(CONFIG_QUEUED_RWLOCKS))
+ queued_read_lock_tests();

if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");

Subject: [tip: locking/core] lockdep: Demagic the return value of BFS

The following commit has been merged into the locking/core branch of tip:

Commit-ID: b11be024de164213f6338973d76ab9ab139120cd
Gitweb: https://git.kernel.org/tip/b11be024de164213f6338973d76ab9ab139120cd
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:22 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:03 +02:00

lockdep: Demagic the return value of BFS

__bfs() could return four magic numbers:

1: search succeeds, but none match.
0: search succeeds, find one match.
-1: search fails because of the cq is full.
-2: search fails because a invalid node is found.

This patch cleans things up by using a enum type for the return value
of __bfs() and its friends, this improves the code readability of the
code, and further, could help if we want to extend the BFS.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 155 +++++++++++++++++++++-----------------
1 file changed, 89 insertions(+), 66 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 77cd9e6..462c68c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1471,28 +1471,58 @@ static inline struct list_head *get_dep_list(struct lock_list *lock, int offset)

return lock_class + offset;
}
+/*
+ * Return values of a bfs search:
+ *
+ * BFS_E* indicates an error
+ * BFS_R* indicates a result (match or not)
+ *
+ * BFS_EINVALIDNODE: Find a invalid node in the graph.
+ *
+ * BFS_EQUEUEFULL: The queue is full while doing the bfs.
+ *
+ * BFS_RMATCH: Find the matched node in the graph, and put that node into
+ * *@target_entry.
+ *
+ * BFS_RNOMATCH: Haven't found the matched node and keep *@target_entry
+ * _unchanged_.
+ */
+enum bfs_result {
+ BFS_EINVALIDNODE = -2,
+ BFS_EQUEUEFULL = -1,
+ BFS_RMATCH = 0,
+ BFS_RNOMATCH = 1,
+};
+
+/*
+ * bfs_result < 0 means error
+ */
+static inline bool bfs_error(enum bfs_result res)
+{
+ return res < 0;
+}

/*
* Forward- or backward-dependency search, used for both circular dependency
* checking and hardirq-unsafe/softirq-unsafe checking.
*/
-static int __bfs(struct lock_list *source_entry,
- void *data,
- int (*match)(struct lock_list *entry, void *data),
- struct lock_list **target_entry,
- int offset)
+static enum bfs_result __bfs(struct lock_list *source_entry,
+ void *data,
+ int (*match)(struct lock_list *entry, void *data),
+ struct lock_list **target_entry,
+ int offset)
{
struct lock_list *entry;
struct lock_list *lock;
struct list_head *head;
struct circular_queue *cq = &lock_cq;
- int ret = 1;
+ enum bfs_result ret = BFS_RNOMATCH;

lockdep_assert_locked();

if (match(source_entry, data)) {
*target_entry = source_entry;
- ret = 0;
+ ret = BFS_RMATCH;
goto exit;
}

@@ -1506,7 +1536,7 @@ static int __bfs(struct lock_list *source_entry,
while ((lock = __cq_dequeue(cq))) {

if (!lock->class) {
- ret = -2;
+ ret = BFS_EINVALIDNODE;
goto exit;
}

@@ -1518,12 +1548,12 @@ static int __bfs(struct lock_list *source_entry,
mark_lock_accessed(entry, lock);
if (match(entry, data)) {
*target_entry = entry;
- ret = 0;
+ ret = BFS_RMATCH;
goto exit;
}

if (__cq_enqueue(cq, entry)) {
- ret = -1;
+ ret = BFS_EQUEUEFULL;
goto exit;
}
cq_depth = __cq_get_elem_count(cq);
@@ -1536,20 +1566,22 @@ exit:
return ret;
}

-static inline int __bfs_forwards(struct lock_list *src_entry,
- void *data,
- int (*match)(struct lock_list *entry, void *data),
- struct lock_list **target_entry)
+static inline enum bfs_result
+__bfs_forwards(struct lock_list *src_entry,
+ void *data,
+ int (*match)(struct lock_list *entry, void *data),
+ struct lock_list **target_entry)
{
return __bfs(src_entry, data, match, target_entry,
offsetof(struct lock_class, locks_after));

}

-static inline int __bfs_backwards(struct lock_list *src_entry,
- void *data,
- int (*match)(struct lock_list *entry, void *data),
- struct lock_list **target_entry)
+static inline enum bfs_result
+__bfs_backwards(struct lock_list *src_entry,
+ void *data,
+ int (*match)(struct lock_list *entry, void *data),
+ struct lock_list **target_entry)
{
return __bfs(src_entry, data, match, target_entry,
offsetof(struct lock_class, locks_before));
@@ -1775,18 +1807,18 @@ unsigned long lockdep_count_backward_deps(struct lock_class *class)

/*
* Check that the dependency graph starting at <src> can lead to
- * <target> or not. Print an error and return 0 if it does.
+ * <target> or not.
*/
-static noinline int
+static noinline enum bfs_result
check_path(struct lock_class *target, struct lock_list *src_entry,
struct lock_list **target_entry)
{
- int ret;
+ enum bfs_result ret;

ret = __bfs_forwards(src_entry, (void *)target, class_equal,
target_entry);

- if (unlikely(ret < 0))
+ if (unlikely(bfs_error(ret)))
print_bfs_bug(ret);

return ret;
@@ -1797,13 +1829,13 @@ check_path(struct lock_class *target, struct lock_list *src_entry,
* lead to <target>. If it can, there is a circle when adding
* <target> -> <src> dependency.
*
- * Print an error and return 0 if it does.
+ * Print an error and return BFS_RMATCH if it does.
*/
-static noinline int
+static noinline enum bfs_result
check_noncircular(struct held_lock *src, struct held_lock *target,
struct lock_trace **const trace)
{
- int ret;
+ enum bfs_result ret;
struct lock_list *target_entry;
struct lock_list src_entry = {
.class = hlock_class(src),
@@ -1814,7 +1846,7 @@ check_noncircular(struct held_lock *src, struct held_lock *target,

ret = check_path(hlock_class(target), &src_entry, &target_entry);

- if (unlikely(!ret)) {
+ if (unlikely(ret == BFS_RMATCH)) {
if (!*trace) {
/*
* If save_trace fails here, the printing might
@@ -1836,12 +1868,13 @@ check_noncircular(struct held_lock *src, struct held_lock *target,
* <target> or not. If it can, <src> -> <target> dependency is already
* in the graph.
*
- * Print an error and return 2 if it does or 1 if it does not.
+ * Return BFS_RMATCH if it does, or BFS_RMATCH if it does not, return BFS_E* if
+ * any error appears in the bfs search.
*/
-static noinline int
+static noinline enum bfs_result
check_redundant(struct held_lock *src, struct held_lock *target)
{
- int ret;
+ enum bfs_result ret;
struct lock_list *target_entry;
struct lock_list src_entry = {
.class = hlock_class(src),
@@ -1852,11 +1885,8 @@ check_redundant(struct held_lock *src, struct held_lock *target)

ret = check_path(hlock_class(target), &src_entry, &target_entry);

- if (!ret) {
+ if (ret == BFS_RMATCH)
debug_atomic_inc(nr_redundant);
- ret = 2;
- } else if (ret < 0)
- ret = 0;

return ret;
}
@@ -1886,17 +1916,14 @@ static inline int usage_match(struct lock_list *entry, void *mask)
* Find a node in the forwards-direction dependency sub-graph starting
* at @root->class that matches @bit.
*
- * Return 0 if such a node exists in the subgraph, and put that node
+ * Return BFS_MATCH if such a node exists in the subgraph, and put that node
* into *@target_entry.
- *
- * Return 1 otherwise and keep *@target_entry unchanged.
- * Return <0 on error.
*/
-static int
+static enum bfs_result
find_usage_forwards(struct lock_list *root, unsigned long usage_mask,
struct lock_list **target_entry)
{
- int result;
+ enum bfs_result result;

debug_atomic_inc(nr_find_usage_forwards_checks);

@@ -1908,18 +1935,12 @@ find_usage_forwards(struct lock_list *root, unsigned long usage_mask,
/*
* Find a node in the backwards-direction dependency sub-graph starting
* at @root->class that matches @bit.
- *
- * Return 0 if such a node exists in the subgraph, and put that node
- * into *@target_entry.
- *
- * Return 1 otherwise and keep *@target_entry unchanged.
- * Return <0 on error.
*/
-static int
+static enum bfs_result
find_usage_backwards(struct lock_list *root, unsigned long usage_mask,
struct lock_list **target_entry)
{
- int result;
+ enum bfs_result result;

debug_atomic_inc(nr_find_usage_backwards_checks);

@@ -2247,7 +2268,7 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
struct lock_list *target_entry1;
struct lock_list *target_entry;
struct lock_list this, that;
- int ret;
+ enum bfs_result ret;

/*
* Step 1: gather all hard/soft IRQs usages backward in an
@@ -2257,7 +2278,7 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
this.class = hlock_class(prev);

ret = __bfs_backwards(&this, &usage_mask, usage_accumulate, NULL);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
@@ -2276,12 +2297,12 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
that.class = hlock_class(next);

ret = find_usage_forwards(&that, forward_mask, &target_entry1);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
- if (ret == 1)
- return ret;
+ if (ret == BFS_RNOMATCH)
+ return 1;

/*
* Step 3: we found a bad match! Now retrieve a lock from the backward
@@ -2291,11 +2312,11 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
backward_mask = original_mask(target_entry1->class->usage_mask);

ret = find_usage_backwards(&this, backward_mask, &target_entry);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
- if (DEBUG_LOCKS_WARN_ON(ret == 1))
+ if (DEBUG_LOCKS_WARN_ON(ret == BFS_RNOMATCH))
return 1;

/*
@@ -2463,7 +2484,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
struct lock_trace **const trace)
{
struct lock_list *entry;
- int ret;
+ enum bfs_result ret;

if (!hlock_class(prev)->key || !hlock_class(next)->key) {
/*
@@ -2494,7 +2515,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
* in the graph whose neighbours are to be checked.
*/
ret = check_noncircular(next, prev, trace);
- if (unlikely(ret <= 0))
+ if (unlikely(bfs_error(ret) || ret == BFS_RMATCH))
return 0;

if (!check_irq_usage(curr, prev, next))
@@ -2531,8 +2552,10 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
* Is the <prev> -> <next> link redundant?
*/
ret = check_redundant(prev, next);
- if (ret != 1)
- return ret;
+ if (bfs_error(ret))
+ return 0;
+ else if (ret == BFS_RMATCH)
+ return 2;
#endif

if (!*trace) {
@@ -3436,19 +3459,19 @@ static int
check_usage_forwards(struct task_struct *curr, struct held_lock *this,
enum lock_usage_bit bit, const char *irqclass)
{
- int ret;
+ enum bfs_result ret;
struct lock_list root;
struct lock_list *target_entry;

root.parent = NULL;
root.class = hlock_class(this);
ret = find_usage_forwards(&root, lock_flag(bit), &target_entry);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
- if (ret == 1)
- return ret;
+ if (ret == BFS_RNOMATCH)
+ return 1;

print_irq_inversion_bug(curr, &root, target_entry,
this, 1, irqclass);
@@ -3463,19 +3486,19 @@ static int
check_usage_backwards(struct task_struct *curr, struct held_lock *this,
enum lock_usage_bit bit, const char *irqclass)
{
- int ret;
+ enum bfs_result ret;
struct lock_list root;
struct lock_list *target_entry;

root.parent = NULL;
root.class = hlock_class(this);
ret = find_usage_backwards(&root, lock_flag(bit), &target_entry);
- if (ret < 0) {
+ if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
}
- if (ret == 1)
- return ret;
+ if (ret == BFS_RNOMATCH)
+ return 1;

print_irq_inversion_bug(curr, &root, target_entry,
this, 0, irqclass);

Subject: [tip: locking/core] lockdep: Introduce lock_list::dep

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 3454a36d6a39186de508dd43df590a6363364176
Gitweb: https://git.kernel.org/tip/3454a36d6a39186de508dd43df590a6363364176
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:25 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:04 +02:00

lockdep: Introduce lock_list::dep

To add recursive read locks into the dependency graph, we need to store
the types of dependencies for the BFS later. There are four types of
dependencies:

* Exclusive -> Non-recursive dependencies: EN
e.g. write_lock(prev) held and try to acquire write_lock(next)
or non-recursive read_lock(next), which can be represented as
"prev -(EN)-> next"

* Shared -> Non-recursive dependencies: SN
e.g. read_lock(prev) held and try to acquire write_lock(next) or
non-recursive read_lock(next), which can be represented as
"prev -(SN)-> next"

* Exclusive -> Recursive dependencies: ER
e.g. write_lock(prev) held and try to acquire recursive
read_lock(next), which can be represented as "prev -(ER)-> next"

* Shared -> Recursive dependencies: SR
e.g. read_lock(prev) held and try to acquire recursive
read_lock(next), which can be represented as "prev -(SR)-> next"

So we use 4 bits for the presence of each type in lock_list::dep. Helper
functions and macros are also introduced to convert a pair of locks into
lock_list::dep bit and maintain the addition of different types of
dependencies.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/lockdep.h | 2 +-
kernel/locking/lockdep.c | 92 +++++++++++++++++++++++++++++++++++++--
2 files changed, 90 insertions(+), 4 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 2275010..35c8bb0 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -55,6 +55,8 @@ struct lock_list {
struct lock_class *links_to;
const struct lock_trace *trace;
u16 distance;
+ /* bitmap of different dependencies from head to this */
+ u8 dep;

/*
* The parent field is used to implement breadth-first search, and the
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 668a983..16ad1b7 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1320,7 +1320,7 @@ static struct lock_list *alloc_list_entry(void)
*/
static int add_lock_to_list(struct lock_class *this,
struct lock_class *links_to, struct list_head *head,
- unsigned long ip, u16 distance,
+ unsigned long ip, u16 distance, u8 dep,
const struct lock_trace *trace)
{
struct lock_list *entry;
@@ -1334,6 +1334,7 @@ static int add_lock_to_list(struct lock_class *this,

entry->class = this;
entry->links_to = links_to;
+ entry->dep = dep;
entry->distance = distance;
entry->trace = trace;
/*
@@ -1499,6 +1500,57 @@ static inline bool bfs_error(enum bfs_result res)
}

/*
+ * DEP_*_BIT in lock_list::dep
+ *
+ * For dependency @prev -> @next:
+ *
+ * SR: @prev is shared reader (->read != 0) and @next is recursive reader
+ * (->read == 2)
+ * ER: @prev is exclusive locker (->read == 0) and @next is recursive reader
+ * SN: @prev is shared reader and @next is non-recursive locker (->read != 2)
+ * EN: @prev is exclusive locker and @next is non-recursive locker
+ *
+ * Note that we define the value of DEP_*_BITs so that:
+ * bit0 is prev->read == 0
+ * bit1 is next->read != 2
+ */
+#define DEP_SR_BIT (0 + (0 << 1)) /* 0 */
+#define DEP_ER_BIT (1 + (0 << 1)) /* 1 */
+#define DEP_SN_BIT (0 + (1 << 1)) /* 2 */
+#define DEP_EN_BIT (1 + (1 << 1)) /* 3 */
+
+#define DEP_SR_MASK (1U << (DEP_SR_BIT))
+#define DEP_ER_MASK (1U << (DEP_ER_BIT))
+#define DEP_SN_MASK (1U << (DEP_SN_BIT))
+#define DEP_EN_MASK (1U << (DEP_EN_BIT))
+
+static inline unsigned int
+__calc_dep_bit(struct held_lock *prev, struct held_lock *next)
+{
+ return (prev->read == 0) + ((next->read != 2) << 1);
+}
+
+static inline u8 calc_dep(struct held_lock *prev, struct held_lock *next)
+{
+ return 1U << __calc_dep_bit(prev, next);
+}
+
+/*
+ * calculate the dep_bit for backwards edges. We care about whether @prev is
+ * shared and whether @next is recursive.
+ */
+static inline unsigned int
+__calc_dep_bitb(struct held_lock *prev, struct held_lock *next)
+{
+ return (next->read != 2) + ((prev->read == 0) << 1);
+}
+
+static inline u8 calc_depb(struct held_lock *prev, struct held_lock *next)
+{
+ return 1U << __calc_dep_bitb(prev, next);
+}
+
+/*
* Forward- or backward-dependency search, used for both circular dependency
* checking and hardirq-unsafe/softirq-unsafe checking.
*/
@@ -2552,7 +2604,35 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
if (entry->class == hlock_class(next)) {
if (distance == 1)
entry->distance = 1;
- return 1;
+ entry->dep |= calc_dep(prev, next);
+
+ /*
+ * Also, update the reverse dependency in @next's
+ * ->locks_before list.
+ *
+ * Here we reuse @entry as the cursor, which is fine
+ * because we won't go to the next iteration of the
+ * outer loop:
+ *
+ * For normal cases, we return in the inner loop.
+ *
+ * If we fail to return, we have inconsistency, i.e.
+ * <prev>::locks_after contains <next> while
+ * <next>::locks_before doesn't contain <prev>. In
+ * that case, we return after the inner and indicate
+ * something is wrong.
+ */
+ list_for_each_entry(entry, &hlock_class(next)->locks_before, entry) {
+ if (entry->class == hlock_class(prev)) {
+ if (distance == 1)
+ entry->distance = 1;
+ entry->dep |= calc_depb(prev, next);
+ return 1;
+ }
+ }
+
+ /* <prev> is not found in <next>::locks_before */
+ return 0;
}
}

@@ -2579,14 +2659,18 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
*/
ret = add_lock_to_list(hlock_class(next), hlock_class(prev),
&hlock_class(prev)->locks_after,
- next->acquire_ip, distance, *trace);
+ next->acquire_ip, distance,
+ calc_dep(prev, next),
+ *trace);

if (!ret)
return 0;

ret = add_lock_to_list(hlock_class(prev), hlock_class(next),
&hlock_class(next)->locks_before,
- next->acquire_ip, distance, *trace);
+ next->acquire_ip, distance,
+ calc_depb(prev, next),
+ *trace);
if (!ret)
return 0;

Subject: [tip: locking/core] lockdep: Reduce the size of lock_list::distance

The following commit has been merged into the locking/core branch of tip:

Commit-ID: bd76eca10de2eb9998d5125b08e8997cbf5508d5
Gitweb: https://git.kernel.org/tip/bd76eca10de2eb9998d5125b08e8997cbf5508d5
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:24 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:04 +02:00

lockdep: Reduce the size of lock_list::distance

lock_list::distance is always not greater than MAX_LOCK_DEPTH (which
is 48 right now), so a u16 will fit. This patch reduces the size of
lock_list::distance to save space, so that we can introduce other fields
to help detect recursive read lock deadlocks without increasing the size
of lock_list structure.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/lockdep.h | 2 +-
kernel/locking/lockdep.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 7cae5ea..2275010 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -54,7 +54,7 @@ struct lock_list {
struct lock_class *class;
struct lock_class *links_to;
const struct lock_trace *trace;
- int distance;
+ u16 distance;

/*
* The parent field is used to implement breadth-first search, and the
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 150686a..668a983 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1320,7 +1320,7 @@ static struct lock_list *alloc_list_entry(void)
*/
static int add_lock_to_list(struct lock_class *this,
struct lock_class *links_to, struct list_head *head,
- unsigned long ip, int distance,
+ unsigned long ip, u16 distance,
const struct lock_trace *trace)
{
struct lock_list *entry;
@@ -2489,7 +2489,7 @@ check_deadlock(struct task_struct *curr, struct held_lock *next)
*/
static int
check_prev_add(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next, int distance,
+ struct held_lock *next, u16 distance,
struct lock_trace **const trace)
{
struct lock_list *entry;
@@ -2622,7 +2622,7 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next)
goto out_bug;

for (;;) {
- int distance = curr->lockdep_depth - depth + 1;
+ u16 distance = curr->lockdep_depth - depth + 1;
hlock = curr->held_locks + depth - 1;

/*

Subject: [tip: locking/core] lockdep: Fix recursive read lock related safe->unsafe detection

The following commit has been merged into the locking/core branch of tip:

Commit-ID: f08e3888574d490b31481eef6d84c61bedba7a47
Gitweb: https://git.kernel.org/tip/f08e3888574d490b31481eef6d84c61bedba7a47
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:30 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:05 +02:00

lockdep: Fix recursive read lock related safe->unsafe detection

Currently, in safe->unsafe detection, lockdep misses the fact that a
LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
cause deadlock too, for example:

P1 P2
<irq disabled>
write_lock(l1); <irq enabled>
read_lock(l2);
write_lock(l2);
<in irq>
read_lock(l1);

Actually, all of the following cases may cause deadlocks:

LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ

To fix this, we need to 1) change the calculation of exclusive_mask() so
that READ bits are not dropped and 2) always call usage() in
mark_lock_irq() to check usage deadlocks, even when the new usage of the
lock is READ.

Besides, adjust usage_match() and usage_acculumate() to recursive read
lock changes.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 188 ++++++++++++++++++++++++++++----------
1 file changed, 141 insertions(+), 47 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 42e2f1f..6644974 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2100,22 +2100,72 @@ check_redundant(struct held_lock *src, struct held_lock *target)

#ifdef CONFIG_TRACE_IRQFLAGS

+/*
+ * Forwards and backwards subgraph searching, for the purposes of
+ * proving that two subgraphs can be connected by a new dependency
+ * without creating any illegal irq-safe -> irq-unsafe lock dependency.
+ *
+ * A irq safe->unsafe deadlock happens with the following conditions:
+ *
+ * 1) We have a strong dependency path A -> ... -> B
+ *
+ * 2) and we have ENABLED_IRQ usage of B and USED_IN_IRQ usage of A, therefore
+ * irq can create a new dependency B -> A (consider the case that a holder
+ * of B gets interrupted by an irq whose handler will try to acquire A).
+ *
+ * 3) the dependency circle A -> ... -> B -> A we get from 1) and 2) is a
+ * strong circle:
+ *
+ * For the usage bits of B:
+ * a) if A -> B is -(*N)->, then B -> A could be any type, so any
+ * ENABLED_IRQ usage suffices.
+ * b) if A -> B is -(*R)->, then B -> A must be -(E*)->, so only
+ * ENABLED_IRQ_*_READ usage suffices.
+ *
+ * For the usage bits of A:
+ * c) if A -> B is -(E*)->, then B -> A could be any type, so any
+ * USED_IN_IRQ usage suffices.
+ * d) if A -> B is -(S*)->, then B -> A must be -(*N)->, so only
+ * USED_IN_IRQ_*_READ usage suffices.
+ */
+
+/*
+ * There is a strong dependency path in the dependency graph: A -> B, and now
+ * we need to decide which usage bit of A should be accumulated to detect
+ * safe->unsafe bugs.
+ *
+ * Note that usage_accumulate() is used in backwards search, so ->only_xr
+ * stands for whether A -> B only has -(S*)-> (in this case ->only_xr is true).
+ *
+ * As above, if only_xr is false, which means A -> B has -(E*)-> dependency
+ * path, any usage of A should be considered. Otherwise, we should only
+ * consider _READ usage.
+ */
static inline bool usage_accumulate(struct lock_list *entry, void *mask)
{
- *(unsigned long *)mask |= entry->class->usage_mask;
+ if (!entry->only_xr)
+ *(unsigned long *)mask |= entry->class->usage_mask;
+ else /* Mask out _READ usage bits */
+ *(unsigned long *)mask |= (entry->class->usage_mask & LOCKF_IRQ);

return false;
}

/*
- * Forwards and backwards subgraph searching, for the purposes of
- * proving that two subgraphs can be connected by a new dependency
- * without creating any illegal irq-safe -> irq-unsafe lock dependency.
+ * There is a strong dependency path in the dependency graph: A -> B, and now
+ * we need to decide which usage bit of B conflicts with the usage bits of A,
+ * i.e. which usage bit of B may introduce safe->unsafe deadlocks.
+ *
+ * As above, if only_xr is false, which means A -> B has -(*N)-> dependency
+ * path, any usage of B should be considered. Otherwise, we should only
+ * consider _READ usage.
*/
-
static inline bool usage_match(struct lock_list *entry, void *mask)
{
- return !!(entry->class->usage_mask & *(unsigned long *)mask);
+ if (!entry->only_xr)
+ return !!(entry->class->usage_mask & *(unsigned long *)mask);
+ else /* Mask out _READ usage bits */
+ return !!((entry->class->usage_mask & LOCKF_IRQ) & *(unsigned long *)mask);
}

/*
@@ -2406,17 +2456,39 @@ static unsigned long invert_dir_mask(unsigned long mask)
}

/*
- * As above, we clear bitnr0 (LOCK_*_READ off) with bitmask ops. First, for all
- * bits with bitnr0 set (LOCK_*_READ), add those with bitnr0 cleared (LOCK_*).
- * And then mask out all bitnr0.
+ * Note that a LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ
+ * usage may cause deadlock too, for example:
+ *
+ * P1 P2
+ * <irq disabled>
+ * write_lock(l1); <irq enabled>
+ * read_lock(l2);
+ * write_lock(l2);
+ * <in irq>
+ * read_lock(l1);
+ *
+ * , in above case, l1 will be marked as LOCK_USED_IN_IRQ_HARDIRQ_READ and l2
+ * will marked as LOCK_ENABLE_IRQ_HARDIRQ_READ, and this is a possible
+ * deadlock.
+ *
+ * In fact, all of the following cases may cause deadlocks:
+ *
+ * LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
+ * LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
+ * LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
+ * LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ
+ *
+ * As a result, to calculate the "exclusive mask", first we invert the
+ * direction (USED_IN/ENABLED) of the original mask, and 1) for all bits with
+ * bitnr0 set (LOCK_*_READ), add those with bitnr0 cleared (LOCK_*). 2) for all
+ * bits with bitnr0 cleared (LOCK_*_READ), add those with bitnr0 set (LOCK_*).
*/
static unsigned long exclusive_mask(unsigned long mask)
{
unsigned long excl = invert_dir_mask(mask);

- /* Strip read */
excl |= (excl & LOCKF_IRQ_READ) >> LOCK_USAGE_READ_MASK;
- excl &= ~LOCKF_IRQ_READ;
+ excl |= (excl & LOCKF_IRQ) << LOCK_USAGE_READ_MASK;

return excl;
}
@@ -2433,6 +2505,7 @@ static unsigned long original_mask(unsigned long mask)
unsigned long excl = invert_dir_mask(mask);

/* Include read in existing usages */
+ excl |= (excl & LOCKF_IRQ_READ) >> LOCK_USAGE_READ_MASK;
excl |= (excl & LOCKF_IRQ) << LOCK_USAGE_READ_MASK;

return excl;
@@ -2447,14 +2520,24 @@ static int find_exclusive_match(unsigned long mask,
enum lock_usage_bit *bitp,
enum lock_usage_bit *excl_bitp)
{
- int bit, excl;
+ int bit, excl, excl_read;

for_each_set_bit(bit, &mask, LOCK_USED) {
+ /*
+ * exclusive_bit() strips the read bit, however,
+ * LOCK_ENABLED_IRQ_*_READ may cause deadlocks too, so we need
+ * to search excl | LOCK_USAGE_READ_MASK as well.
+ */
excl = exclusive_bit(bit);
+ excl_read = excl | LOCK_USAGE_READ_MASK;
if (excl_mask & lock_flag(excl)) {
*bitp = bit;
*excl_bitp = excl;
return 0;
+ } else if (excl_mask & lock_flag(excl_read)) {
+ *bitp = bit;
+ *excl_bitp = excl_read;
+ return 0;
}
}
return -1;
@@ -2480,8 +2563,7 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
* Step 1: gather all hard/soft IRQs usages backward in an
* accumulated usage mask.
*/
- this.parent = NULL;
- this.class = hlock_class(prev);
+ bfs_init_rootb(&this, prev);

ret = __bfs_backwards(&this, &usage_mask, usage_accumulate, NULL);
if (bfs_error(ret)) {
@@ -2499,8 +2581,7 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
*/
forward_mask = exclusive_mask(usage_mask);

- that.parent = NULL;
- that.class = hlock_class(next);
+ bfs_init_root(&that, next);

ret = find_usage_forwards(&that, forward_mask, &target_entry1);
if (bfs_error(ret)) {
@@ -3695,14 +3776,16 @@ print_irq_inversion_bug(struct task_struct *curr,
*/
static int
check_usage_forwards(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit bit, const char *irqclass)
+ enum lock_usage_bit bit)
{
enum bfs_result ret;
struct lock_list root;
struct lock_list *target_entry;
+ enum lock_usage_bit read_bit = bit + LOCK_USAGE_READ_MASK;
+ unsigned usage_mask = lock_flag(bit) | lock_flag(read_bit);

bfs_init_root(&root, this);
- ret = find_usage_forwards(&root, lock_flag(bit), &target_entry);
+ ret = find_usage_forwards(&root, usage_mask, &target_entry);
if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
@@ -3710,8 +3793,15 @@ check_usage_forwards(struct task_struct *curr, struct held_lock *this,
if (ret == BFS_RNOMATCH)
return 1;

- print_irq_inversion_bug(curr, &root, target_entry,
- this, 1, irqclass);
+ /* Check whether write or read usage is the match */
+ if (target_entry->class->usage_mask & lock_flag(bit)) {
+ print_irq_inversion_bug(curr, &root, target_entry,
+ this, 1, state_name(bit));
+ } else {
+ print_irq_inversion_bug(curr, &root, target_entry,
+ this, 1, state_name(read_bit));
+ }
+
return 0;
}

@@ -3721,14 +3811,16 @@ check_usage_forwards(struct task_struct *curr, struct held_lock *this,
*/
static int
check_usage_backwards(struct task_struct *curr, struct held_lock *this,
- enum lock_usage_bit bit, const char *irqclass)
+ enum lock_usage_bit bit)
{
enum bfs_result ret;
struct lock_list root;
struct lock_list *target_entry;
+ enum lock_usage_bit read_bit = bit + LOCK_USAGE_READ_MASK;
+ unsigned usage_mask = lock_flag(bit) | lock_flag(read_bit);

bfs_init_rootb(&root, this);
- ret = find_usage_backwards(&root, lock_flag(bit), &target_entry);
+ ret = find_usage_backwards(&root, usage_mask, &target_entry);
if (bfs_error(ret)) {
print_bfs_bug(ret);
return 0;
@@ -3736,8 +3828,15 @@ check_usage_backwards(struct task_struct *curr, struct held_lock *this,
if (ret == BFS_RNOMATCH)
return 1;

- print_irq_inversion_bug(curr, &root, target_entry,
- this, 0, irqclass);
+ /* Check whether write or read usage is the match */
+ if (target_entry->class->usage_mask & lock_flag(bit)) {
+ print_irq_inversion_bug(curr, &root, target_entry,
+ this, 0, state_name(bit));
+ } else {
+ print_irq_inversion_bug(curr, &root, target_entry,
+ this, 0, state_name(read_bit));
+ }
+
return 0;
}

@@ -3776,8 +3875,6 @@ static int SOFTIRQ_verbose(struct lock_class *class)
return 0;
}

-#define STRICT_READ_CHECKS 1
-
static int (*state_verbose_f[])(struct lock_class *class) = {
#define LOCKDEP_STATE(__STATE) \
__STATE##_verbose,
@@ -3803,16 +3900,6 @@ mark_lock_irq(struct task_struct *curr, struct held_lock *this,
int dir = new_bit & LOCK_USAGE_DIR_MASK;

/*
- * mark USED_IN has to look forwards -- to ensure no dependency
- * has ENABLED state, which would allow recursion deadlocks.
- *
- * mark ENABLED has to look backwards -- to ensure no dependee
- * has USED_IN state, which, again, would allow recursion deadlocks.
- */
- check_usage_f usage = dir ?
- check_usage_backwards : check_usage_forwards;
-
- /*
* Validate that this particular lock does not have conflicting
* usage states.
*/
@@ -3820,23 +3907,30 @@ mark_lock_irq(struct task_struct *curr, struct held_lock *this,
return 0;

/*
- * Validate that the lock dependencies don't have conflicting usage
- * states.
+ * Check for read in write conflicts
*/
- if ((!read || STRICT_READ_CHECKS) &&
- !usage(curr, this, excl_bit, state_name(new_bit & ~LOCK_USAGE_READ_MASK)))
+ if (!read && !valid_state(curr, this, new_bit,
+ excl_bit + LOCK_USAGE_READ_MASK))
return 0;

+
/*
- * Check for read in write conflicts
+ * Validate that the lock dependencies don't have conflicting usage
+ * states.
*/
- if (!read) {
- if (!valid_state(curr, this, new_bit, excl_bit + LOCK_USAGE_READ_MASK))
+ if (dir) {
+ /*
+ * mark ENABLED has to look backwards -- to ensure no dependee
+ * has USED_IN state, which, again, would allow recursion deadlocks.
+ */
+ if (!check_usage_backwards(curr, this, excl_bit))
return 0;
-
- if (STRICT_READ_CHECKS &&
- !usage(curr, this, excl_bit + LOCK_USAGE_READ_MASK,
- state_name(new_bit + LOCK_USAGE_READ_MASK)))
+ } else {
+ /*
+ * mark USED_IN has to look forwards -- to ensure no dependency
+ * has ENABLED state, which would allow recursion deadlocks.
+ */
+ if (!check_usage_forwards(curr, this, excl_bit))
return 0;
}

Subject: [tip: locking/core] locking: More accurate annotations for read_lock()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: e918188611f073063415f40fae568fa4d86d9044
Gitweb: https://git.kernel.org/tip/e918188611f073063415f40fae568fa4d86d9044
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:20 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:02 +02:00

locking: More accurate annotations for read_lock()

On the archs using QUEUED_RWLOCKS, read_lock() is not always a recursive
read lock, actually it's only recursive if in_interrupt() is true. So
change the annotation accordingly to catch more deadlocks.

Note we used to treat read_lock() as pure recursive read locks in
lib/locking-seftest.c, and this is useful, especially for the lockdep
development selftest, so we keep this via a variable to force switching
lock annotation for read_lock().

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/lockdep.h | 23 ++++++++++++++++++++++-
kernel/locking/lockdep.c | 14 ++++++++++++++
lib/locking-selftest.c | 11 +++++++++++
3 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6a584b3..7cae5ea 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -469,6 +469,20 @@ static inline void print_irqtrace_events(struct task_struct *curr)
}
#endif

+/* Variable used to make lockdep treat read_lock() as recursive in selftests */
+#ifdef CONFIG_DEBUG_LOCKING_API_SELFTESTS
+extern unsigned int force_read_lock_recursive;
+#else /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+#define force_read_lock_recursive 0
+#endif /* CONFIG_DEBUG_LOCKING_API_SELFTESTS */
+
+#ifdef CONFIG_LOCKDEP
+extern bool read_lock_is_recursive(void);
+#else /* CONFIG_LOCKDEP */
+/* If !LOCKDEP, the value is meaningless */
+#define read_lock_is_recursive() 0
+#endif
+
/*
* For trivial one-depth nesting of a lock-class, the following
* global define can be used. (Subsystems with multiple levels
@@ -490,7 +504,14 @@ static inline void print_irqtrace_events(struct task_struct *curr)
#define spin_release(l, i) lock_release(l, i)

#define rwlock_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
-#define rwlock_acquire_read(l, s, t, i) lock_acquire_shared_recursive(l, s, t, NULL, i)
+#define rwlock_acquire_read(l, s, t, i) \
+do { \
+ if (read_lock_is_recursive()) \
+ lock_acquire_shared_recursive(l, s, t, NULL, i); \
+ else \
+ lock_acquire_shared(l, s, t, NULL, i); \
+} while (0)
+
#define rwlock_release(l, i) lock_release(l, i)

#define seqcount_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 54b74fa..77cd9e6 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4968,6 +4968,20 @@ static bool lockdep_nmi(void)
}

/*
+ * read_lock() is recursive if:
+ * 1. We force lockdep think this way in selftests or
+ * 2. The implementation is not queued read/write lock or
+ * 3. The locker is at an in_interrupt() context.
+ */
+bool read_lock_is_recursive(void)
+{
+ return force_read_lock_recursive ||
+ !IS_ENABLED(CONFIG_QUEUED_RWLOCKS) ||
+ in_interrupt();
+}
+EXPORT_SYMBOL_GPL(read_lock_is_recursive);
+
+/*
* We are not always called with irqs disabled - do that here,
* and also avoid lockdep recursion:
*/
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 14f44f5..caadc4d 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -28,6 +28,7 @@
* Change this to 1 if you want to see the failure printouts:
*/
static unsigned int debug_locks_verbose;
+unsigned int force_read_lock_recursive;

static DEFINE_WD_CLASS(ww_lockdep);

@@ -1979,6 +1980,11 @@ void locking_selftest(void)
}

/*
+ * treats read_lock() as recursive read locks for testing purpose
+ */
+ force_read_lock_recursive = 1;
+
+ /*
* Run the testsuite:
*/
printk("------------------------\n");
@@ -2073,6 +2079,11 @@ void locking_selftest(void)

ww_tests();

+ force_read_lock_recursive = 0;
+ /*
+ * queued_read_lock() specific test cases can be put here
+ */
+
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;

Subject: [tip: locking/core] lockdep/Documention: Recursive read lock detection reasoning

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 224ec489d3cdb0af6794e257eeee39d98dc9c5b2
Gitweb: https://git.kernel.org/tip/224ec489d3cdb0af6794e257eeee39d98dc9c5b2
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:21 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:03 +02:00

lockdep/Documention: Recursive read lock detection reasoning

This patch add the documentation piece for the reasoning of deadlock
detection related to recursive read lock. The following sections are
added:

* Explain what is a recursive read lock, and what deadlock cases
they could introduce.

* Introduce the notations for different types of dependencies, and
the definition of strong paths.

* Proof for a closed strong path is both sufficient and necessary
for deadlock detections with recursive read locks involved. The
proof could also explain why we call the path "strong"

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Documentation/locking/lockdep-design.rst | 258 ++++++++++++++++++++++-
1 file changed, 258 insertions(+)

diff --git a/Documentation/locking/lockdep-design.rst b/Documentation/locking/lockdep-design.rst
index 23fcbc4..cec03bd 100644
--- a/Documentation/locking/lockdep-design.rst
+++ b/Documentation/locking/lockdep-design.rst
@@ -392,3 +392,261 @@ Run the command and save the output, then compare against the output from
a later run of this command to identify the leakers. This same output
can also help you find situations where runtime lock initialization has
been omitted.
+
+Recursive read locks:
+---------------------
+The whole of the rest document tries to prove a certain type of cycle is equivalent
+to deadlock possibility.
+
+There are three types of lockers: writers (i.e. exclusive lockers, like
+spin_lock() or write_lock()), non-recursive readers (i.e. shared lockers, like
+down_read()) and recursive readers (recursive shared lockers, like rcu_read_lock()).
+And we use the following notations of those lockers in the rest of the document:
+
+ W or E: stands for writers (exclusive lockers).
+ r: stands for non-recursive readers.
+ R: stands for recursive readers.
+ S: stands for all readers (non-recursive + recursive), as both are shared lockers.
+ N: stands for writers and non-recursive readers, as both are not recursive.
+
+Obviously, N is "r or W" and S is "r or R".
+
+Recursive readers, as their name indicates, are the lockers allowed to acquire
+even inside the critical section of another reader of the same lock instance,
+in other words, allowing nested read-side critical sections of one lock instance.
+
+While non-recursive readers will cause a self deadlock if trying to acquire inside
+the critical section of another reader of the same lock instance.
+
+The difference between recursive readers and non-recursive readers is because:
+recursive readers get blocked only by a write lock *holder*, while non-recursive
+readers could get blocked by a write lock *waiter*. Considering the follow example:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+ write_lock(X);
+ read_lock_2(X);
+
+Task A gets the reader (no matter whether recursive or non-recursive) on X via
+read_lock() first. And when task B tries to acquire writer on X, it will block
+and become a waiter for writer on X. Now if read_lock_2() is recursive readers,
+task A will make progress, because writer waiters don't block recursive readers,
+and there is no deadlock. However, if read_lock_2() is non-recursive readers,
+it will get blocked by writer waiter B, and cause a self deadlock.
+
+Block conditions on readers/writers of the same lock instance:
+--------------------------------------------------------------
+There are simply four block conditions:
+
+1. Writers block other writers.
+2. Readers block writers.
+3. Writers block both recursive readers and non-recursive readers.
+4. And readers (recursive or not) don't block other recursive readers but
+ may block non-recursive readers (because of the potential co-existing
+ writer waiters)
+
+Block condition matrix, Y means the row blocks the column, and N means otherwise.
+
+ | E | r | R |
+ +---+---+---+---+
+ E | Y | Y | Y |
+ +---+---+---+---+
+ r | Y | Y | N |
+ +---+---+---+---+
+ R | Y | Y | N |
+
+ (W: writers, r: non-recursive readers, R: recursive readers)
+
+
+acquired recursively. Unlike non-recursive read locks, recursive read locks
+only get blocked by current write lock *holders* other than write lock
+*waiters*, for example:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+
+ write_lock(X);
+
+ read_lock(X);
+
+is not a deadlock for recursive read locks, as while the task B is waiting for
+the lock X, the second read_lock() doesn't need to wait because it's a recursive
+read lock. However if the read_lock() is non-recursive read lock, then the above
+case is a deadlock, because even if the write_lock() in TASK B cannot get the
+lock, but it can block the second read_lock() in TASK A.
+
+Note that a lock can be a write lock (exclusive lock), a non-recursive read
+lock (non-recursive shared lock) or a recursive read lock (recursive shared
+lock), depending on the lock operations used to acquire it (more specifically,
+the value of the 'read' parameter for lock_acquire()). In other words, a single
+lock instance has three types of acquisition depending on the acquisition
+functions: exclusive, non-recursive read, and recursive read.
+
+To be concise, we call that write locks and non-recursive read locks as
+"non-recursive" locks and recursive read locks as "recursive" locks.
+
+Recursive locks don't block each other, while non-recursive locks do (this is
+even true for two non-recursive read locks). A non-recursive lock can block the
+corresponding recursive lock, and vice versa.
+
+A deadlock case with recursive locks involved is as follow:
+
+ TASK A: TASK B:
+
+ read_lock(X);
+ read_lock(Y);
+ write_lock(Y);
+ write_lock(X);
+
+Task A is waiting for task B to read_unlock() Y and task B is waiting for task
+A to read_unlock() X.
+
+Dependency types and strong dependency paths:
+---------------------------------------------
+Lock dependencies record the orders of the acquisitions of a pair of locks, and
+because there are 3 types for lockers, there are, in theory, 9 types of lock
+dependencies, but we can show that 4 types of lock dependencies are enough for
+deadlock detection.
+
+For each lock dependency:
+
+ L1 -> L2
+
+, which means lockdep has seen L1 held before L2 held in the same context at runtime.
+And in deadlock detection, we care whether we could get blocked on L2 with L1 held,
+IOW, whether there is a locker L3 that L1 blocks L3 and L2 gets blocked by L3. So
+we only care about 1) what L1 blocks and 2) what blocks L2. As a result, we can combine
+recursive readers and non-recursive readers for L1 (as they block the same types) and
+we can combine writers and non-recursive readers for L2 (as they get blocked by the
+same types).
+
+With the above combination for simplification, there are 4 types of dependency edges
+in the lockdep graph:
+
+1) -(ER)->: exclusive writer to recursive reader dependency, "X -(ER)-> Y" means
+ X -> Y and X is a writer and Y is a recursive reader.
+
+2) -(EN)->: exclusive writer to non-recursive locker dependency, "X -(EN)-> Y" means
+ X -> Y and X is a writer and Y is either a writer or non-recursive reader.
+
+3) -(SR)->: shared reader to recursive reader dependency, "X -(SR)-> Y" means
+ X -> Y and X is a reader (recursive or not) and Y is a recursive reader.
+
+4) -(SN)->: shared reader to non-recursive locker dependency, "X -(SN)-> Y" means
+ X -> Y and X is a reader (recursive or not) and Y is either a writer or
+ non-recursive reader.
+
+Note that given two locks, they may have multiple dependencies between them, for example:
+
+ TASK A:
+
+ read_lock(X);
+ write_lock(Y);
+ ...
+
+ TASK B:
+
+ write_lock(X);
+ write_lock(Y);
+
+, we have both X -(SN)-> Y and X -(EN)-> Y in the dependency graph.
+
+We use -(xN)-> to represent edges that are either -(EN)-> or -(SN)->, the
+similar for -(Ex)->, -(xR)-> and -(Sx)->
+
+A "path" is a series of conjunct dependency edges in the graph. And we define a
+"strong" path, which indicates the strong dependency throughout each dependency
+in the path, as the path that doesn't have two conjunct edges (dependencies) as
+-(xR)-> and -(Sx)->. In other words, a "strong" path is a path from a lock
+walking to another through the lock dependencies, and if X -> Y -> Z is in the
+path (where X, Y, Z are locks), and the walk from X to Y is through a -(SR)-> or
+-(ER)-> dependency, the walk from Y to Z must not be through a -(SN)-> or
+-(SR)-> dependency.
+
+We will see why the path is called "strong" in next section.
+
+Recursive Read Deadlock Detection:
+----------------------------------
+
+We now prove two things:
+
+Lemma 1:
+
+If there is a closed strong path (i.e. a strong circle), then there is a
+combination of locking sequences that causes deadlock. I.e. a strong circle is
+sufficient for deadlock detection.
+
+Lemma 2:
+
+If there is no closed strong path (i.e. strong circle), then there is no
+combination of locking sequences that could cause deadlock. I.e. strong
+circles are necessary for deadlock detection.
+
+With these two Lemmas, we can easily say a closed strong path is both sufficient
+and necessary for deadlocks, therefore a closed strong path is equivalent to
+deadlock possibility. As a closed strong path stands for a dependency chain that
+could cause deadlocks, so we call it "strong", considering there are dependency
+circles that won't cause deadlocks.
+
+Proof for sufficiency (Lemma 1):
+
+Let's say we have a strong circle:
+
+ L1 -> L2 ... -> Ln -> L1
+
+, which means we have dependencies:
+
+ L1 -> L2
+ L2 -> L3
+ ...
+ Ln-1 -> Ln
+ Ln -> L1
+
+We now can construct a combination of locking sequences that cause deadlock:
+
+Firstly let's make one CPU/task get the L1 in L1 -> L2, and then another get
+the L2 in L2 -> L3, and so on. After this, all of the Lx in Lx -> Lx+1 are
+held by different CPU/tasks.
+
+And then because we have L1 -> L2, so the holder of L1 is going to acquire L2
+in L1 -> L2, however since L2 is already held by another CPU/task, plus L1 ->
+L2 and L2 -> L3 are not -(xR)-> and -(Sx)-> (the definition of strong), which
+means either L2 in L1 -> L2 is a non-recursive locker (blocked by anyone) or
+the L2 in L2 -> L3, is writer (blocking anyone), therefore the holder of L1
+cannot get L2, it has to wait L2's holder to release.
+
+Moreover, we can have a similar conclusion for L2's holder: it has to wait L3's
+holder to release, and so on. We now can prove that Lx's holder has to wait for
+Lx+1's holder to release, and note that Ln+1 is L1, so we have a circular
+waiting scenario and nobody can get progress, therefore a deadlock.
+
+Proof for necessary (Lemma 2):
+
+Lemma 2 is equivalent to: If there is a deadlock scenario, then there must be a
+strong circle in the dependency graph.
+
+According to Wikipedia[1], if there is a deadlock, then there must be a circular
+waiting scenario, means there are N CPU/tasks, where CPU/task P1 is waiting for
+a lock held by P2, and P2 is waiting for a lock held by P3, ... and Pn is waiting
+for a lock held by P1. Let's name the lock Px is waiting as Lx, so since P1 is waiting
+for L1 and holding Ln, so we will have Ln -> L1 in the dependency graph. Similarly,
+we have L1 -> L2, L2 -> L3, ..., Ln-1 -> Ln in the dependency graph, which means we
+have a circle:
+
+ Ln -> L1 -> L2 -> ... -> Ln
+
+, and now let's prove the circle is strong:
+
+For a lock Lx, Px contributes the dependency Lx-1 -> Lx and Px+1 contributes
+the dependency Lx -> Lx+1, and since Px is waiting for Px+1 to release Lx,
+so it's impossible that Lx on Px+1 is a reader and Lx on Px is a recursive
+reader, because readers (no matter recursive or not) don't block recursive
+readers, therefore Lx-1 -> Lx and Lx -> Lx+1 cannot be a -(xR)-> -(Sx)-> pair,
+and this is true for any lock in the circle, therefore, the circle is strong.
+
+References:
+-----------
+[1]: https://en.wikipedia.org/wiki/Deadlock
+[2]: Shibu, K. (2009). Intro To Embedded Systems (1st ed.). Tata McGraw-Hill

Subject: [tip: locking/core] lockdep/selftest: Unleash irq_read_recursion2 and add more

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 31e0d747708272356bee9b6a1b90c1e6525b0f6d
Gitweb: https://git.kernel.org/tip/31e0d747708272356bee9b6a1b90c1e6525b0f6d
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:34 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:06 +02:00

lockdep/selftest: Unleash irq_read_recursion2 and add more

Now since we can handle recursive read related irq inversion deadlocks
correctly, uncomment the irq_read_recursion2 and add more testcases.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
lib/locking-selftest.c | 59 ++++++++++++++++++++++++++++++++---------
1 file changed, 47 insertions(+), 12 deletions(-)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 002d1ec..f65a658 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1053,20 +1053,28 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_inversion_soft_wlock)
#define E3() \
\
IRQ_ENTER(); \
- RL(A); \
+ LOCK(A); \
L(B); \
U(B); \
- RU(A); \
+ UNLOCK(A); \
IRQ_EXIT();

/*
- * Generate 12 testcases:
+ * Generate 24 testcases:
*/
#include "locking-selftest-hardirq.h"
-GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard)
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_hard_wlock)

#include "locking-selftest-softirq.h"
-GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft_wlock)

#undef E1
#undef E2
@@ -1080,8 +1088,8 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
\
IRQ_DISABLE(); \
L(B); \
- WL(A); \
- WU(A); \
+ LOCK(A); \
+ UNLOCK(A); \
U(B); \
IRQ_ENABLE();

@@ -1098,13 +1106,21 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft)
IRQ_EXIT();

/*
- * Generate 12 testcases:
+ * Generate 24 testcases:
*/
#include "locking-selftest-hardirq.h"
-// GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard)
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_hard_wlock)

#include "locking-selftest-softirq.h"
-// GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft)
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_wlock)

#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define I_SPINLOCK(x) lockdep_reset_lock(&lock_##x.dep_map)
@@ -1257,6 +1273,25 @@ static inline void print_testname(const char *testname)
dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
pr_cont("\n");

+#define DO_TESTCASE_2RW(desc, name, nr) \
+ print_testname(desc"/"#nr); \
+ pr_cont(" |"); \
+ dotest(name##_wlock_##nr, FAILURE, LOCKTYPE_RWLOCK); \
+ dotest(name##_rlock_##nr, SUCCESS, LOCKTYPE_RWLOCK); \
+ pr_cont("\n");
+
+#define DO_TESTCASE_2x2RW(desc, name, nr) \
+ DO_TESTCASE_2RW("hard-"desc, name##_hard, nr) \
+ DO_TESTCASE_2RW("soft-"desc, name##_soft, nr) \
+
+#define DO_TESTCASE_6x2x2RW(desc, name) \
+ DO_TESTCASE_2x2RW(desc, name, 123); \
+ DO_TESTCASE_2x2RW(desc, name, 132); \
+ DO_TESTCASE_2x2RW(desc, name, 213); \
+ DO_TESTCASE_2x2RW(desc, name, 231); \
+ DO_TESTCASE_2x2RW(desc, name, 312); \
+ DO_TESTCASE_2x2RW(desc, name, 321);
+
#define DO_TESTCASE_6(desc, name) \
print_testname(desc); \
dotest(name##_spin, FAILURE, LOCKTYPE_SPIN); \
@@ -2121,8 +2156,8 @@ void locking_selftest(void)
DO_TESTCASE_6x6("safe-A + unsafe-B #2", irqsafe4);
DO_TESTCASE_6x6RW("irq lock-inversion", irq_inversion);

- DO_TESTCASE_6x2("irq read-recursion", irq_read_recursion);
-// DO_TESTCASE_6x2B("irq read-recursion #2", irq_read_recursion2);
+ DO_TESTCASE_6x2x2RW("irq read-recursion", irq_read_recursion);
+ DO_TESTCASE_6x2x2RW("irq read-recursion #2", irq_read_recursion2);

ww_tests();

Subject: [tip: locking/core] lockdep/selftest: Add a R-L/L-W test case specific to chain cache behavior

The following commit has been merged into the locking/core branch of tip:

Commit-ID: d4f200e579e96051f1f081f795820787826eb234
Gitweb: https://git.kernel.org/tip/d4f200e579e96051f1f081f795820787826eb234
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:32 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:06 +02:00

lockdep/selftest: Add a R-L/L-W test case specific to chain cache behavior

As our chain cache doesn't differ read/write locks, so even we can
detect a read-lock/lock-write deadlock in check_noncircular(), we can
still be fooled if a read-lock/lock-read case(which is not a deadlock)
comes first.

So introduce this test case to test specific to the chain cache behavior
on detecting recursive read lock related deadlocks.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
lib/locking-selftest.c | 47 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 47 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index caadc4d..002d1ec 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -400,6 +400,49 @@ static void rwsem_ABBA1(void)
* read_lock(A)
* spin_lock(B)
* spin_lock(B)
+ * write_lock(A)
+ *
+ * This test case is aimed at poking whether the chain cache prevents us from
+ * detecting a read-lock/lock-write deadlock: if the chain cache doesn't differ
+ * read/write locks, the following case may happen
+ *
+ * { read_lock(A)->lock(B) dependency exists }
+ *
+ * P0:
+ * lock(B);
+ * read_lock(A);
+ *
+ * { Not a deadlock, B -> A is added in the chain cache }
+ *
+ * P1:
+ * lock(B);
+ * write_lock(A);
+ *
+ * { B->A found in chain cache, not reported as a deadlock }
+ *
+ */
+static void rlock_chaincache_ABBA1(void)
+{
+ RL(X1);
+ L(Y1);
+ U(Y1);
+ RU(X1);
+
+ L(Y1);
+ RL(X1);
+ RU(X1);
+ U(Y1);
+
+ L(Y1);
+ WL(X1);
+ WU(X1);
+ U(Y1); // should fail
+}
+
+/*
+ * read_lock(A)
+ * spin_lock(B)
+ * spin_lock(B)
* read_lock(A)
*/
static void rlock_ABBA2(void)
@@ -2062,6 +2105,10 @@ void locking_selftest(void)
pr_cont(" |");
dotest(rwsem_ABBA3, FAILURE, LOCKTYPE_RWSEM);

+ print_testname("chain cached mixed R-L/L-W ABBA");
+ pr_cont(" |");
+ dotest(rlock_chaincache_ABBA1, FAILURE, LOCKTYPE_RWLOCK);
+
printk(" --------------------------------------------------------------------------\n");

/*

Subject: [tip: locking/core] lockdep: Add recursive read locks into dependency graph

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 621c9dac0eea7607cb9a57cc9ba47fbcd4e644c9
Gitweb: https://git.kernel.org/tip/621c9dac0eea7607cb9a57cc9ba47fbcd4e644c9
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:31 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:06 +02:00

lockdep: Add recursive read locks into dependency graph

Since we have all the fundamental to handle recursive read locks, we now
add them into the dependency graph.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 19 ++-----------------
1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 6644974..b87766e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2809,16 +2809,6 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
return 0;

/*
- * For recursive read-locks we do all the dependency checks,
- * but we dont store read-triggered dependencies (only
- * write-triggered dependencies). This ensures that only the
- * write-side dependencies matter, and that if for example a
- * write-lock never takes any other locks, then the reads are
- * equivalent to a NOP.
- */
- if (next->read == 2 || prev->read == 2)
- return 1;
- /*
* Is the <prev> -> <next> dependency already present?
*
* (this may occur even though this is a new chain: consider
@@ -2935,13 +2925,8 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next)
u16 distance = curr->lockdep_depth - depth + 1;
hlock = curr->held_locks + depth - 1;

- /*
- * Only non-recursive-read entries get new dependencies
- * added:
- */
- if (hlock->read != 2 && hlock->check) {
- int ret = check_prev_add(curr, hlock, next, distance,
- &trace);
+ if (hlock->check) {
+ int ret = check_prev_add(curr, hlock, next, distance, &trace);
if (!ret)
return 0;

Subject: [tip: locking/core] lockdep/selftest: Introduce recursion3

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 96a16f45aed89cf024606a11679b0609d09552c7
Gitweb: https://git.kernel.org/tip/96a16f45aed89cf024606a11679b0609d09552c7
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:38 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:08 +02:00

lockdep/selftest: Introduce recursion3

Add a test case shows that USED_IN_*_READ and ENABLE_*_READ can cause
deadlock too.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
lib/locking-selftest.c | 55 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 55 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 17f8f6f..a899b3f 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1249,6 +1249,60 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_rlock)
#include "locking-selftest-wlock.h"
GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion2_soft_wlock)

+#undef E1
+#undef E2
+#undef E3
+/*
+ * read-lock / write-lock recursion that is unsafe.
+ *
+ * A is a ENABLED_*_READ lock
+ * B is a USED_IN_*_READ lock
+ *
+ * read_lock(A);
+ * write_lock(B);
+ * <interrupt>
+ * read_lock(B);
+ * write_lock(A); // if this one is read_lock(), no deadlock
+ */
+
+#define E1() \
+ \
+ IRQ_DISABLE(); \
+ WL(B); \
+ LOCK(A); \
+ UNLOCK(A); \
+ WU(B); \
+ IRQ_ENABLE();
+
+#define E2() \
+ \
+ RL(A); \
+ RU(A); \
+
+#define E3() \
+ \
+ IRQ_ENTER(); \
+ RL(B); \
+ RU(B); \
+ IRQ_EXIT();
+
+/*
+ * Generate 24 testcases:
+ */
+#include "locking-selftest-hardirq.h"
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_hard_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_hard_wlock)
+
+#include "locking-selftest-softirq.h"
+#include "locking-selftest-rlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_soft_rlock)
+
+#include "locking-selftest-wlock.h"
+GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion3_soft_wlock)
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define I_SPINLOCK(x) lockdep_reset_lock(&lock_##x.dep_map)
# define I_RWLOCK(x) lockdep_reset_lock(&rwlock_##x.dep_map)
@@ -2413,6 +2467,7 @@ void locking_selftest(void)

DO_TESTCASE_6x2x2RW("irq read-recursion", irq_read_recursion);
DO_TESTCASE_6x2x2RW("irq read-recursion #2", irq_read_recursion2);
+ DO_TESTCASE_6x2x2RW("irq read-recursion #3", irq_read_recursion3);

ww_tests();

Subject: [tip: locking/core] lockdep: Extend __bfs() to work with multiple types of dependencies

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 6971c0f345620aae5e6172207a57b7524603a34e
Gitweb: https://git.kernel.org/tip/6971c0f345620aae5e6172207a57b7524603a34e
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:26 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:04 +02:00

lockdep: Extend __bfs() to work with multiple types of dependencies

Now we have four types of dependencies in the dependency graph, and not
all the pathes carry real dependencies (the dependencies that may cause
a deadlock), for example:

Given lock A and B, if we have:

CPU1 CPU2
============= ==============
write_lock(A); read_lock(B);
read_lock(B); write_lock(A);

(assuming read_lock(B) is a recursive reader)

then we have dependencies A -(ER)-> B, and B -(SN)-> A, and a
dependency path A -(ER)-> B -(SN)-> A.

In lockdep w/o recursive locks, a dependency path from A to A
means a deadlock. However, the above case is obviously not a
deadlock, because no one holds B exclusively, therefore no one
waits for the other to release B, so who get A first in CPU1 and
CPU2 will run non-blockingly.

As a result, dependency path A -(ER)-> B -(SN)-> A is not a
real/strong dependency that could cause a deadlock.

>From the observation above, we know that for a dependency path to be
real/strong, no two adjacent dependencies can be as -(*R)-> -(S*)->.

Now our mission is to make __bfs() traverse only the strong dependency
paths, which is simple: we record whether we only have -(*R)-> for the
previous lock_list of the path in lock_list::only_xr, and when we pick a
dependency in the traverse, we 1) filter out -(S*)-> dependency if the
previous lock_list only has -(*R)-> dependency (i.e. ->only_xr is true)
and 2) set the next lock_list::only_xr to true if we only have -(*R)->
left after we filter out dependencies based on 1), otherwise, set it to
false.

With this extension for __bfs(), we now need to initialize the root of
__bfs() properly (with a correct ->only_xr), to do so, we introduce some
helper functions, which also cleans up a little bit for the __bfs() root
initialization code.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
include/linux/lockdep.h | 2 +-
kernel/locking/lockdep.c | 113 +++++++++++++++++++++++++++++++-------
2 files changed, 96 insertions(+), 19 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 35c8bb0..57d642d 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -57,6 +57,8 @@ struct lock_list {
u16 distance;
/* bitmap of different dependencies from head to this */
u8 dep;
+ /* used by BFS to record whether "prev -> this" only has -(*R)-> */
+ u8 only_xr;

/*
* The parent field is used to implement breadth-first search, and the
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 16ad1b7..5abc227 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1551,8 +1551,72 @@ static inline u8 calc_depb(struct held_lock *prev, struct held_lock *next)
}

/*
- * Forward- or backward-dependency search, used for both circular dependency
- * checking and hardirq-unsafe/softirq-unsafe checking.
+ * Initialize a lock_list entry @lock belonging to @class as the root for a BFS
+ * search.
+ */
+static inline void __bfs_init_root(struct lock_list *lock,
+ struct lock_class *class)
+{
+ lock->class = class;
+ lock->parent = NULL;
+ lock->only_xr = 0;
+}
+
+/*
+ * Initialize a lock_list entry @lock based on a lock acquisition @hlock as the
+ * root for a BFS search.
+ *
+ * ->only_xr of the initial lock node is set to @hlock->read == 2, to make sure
+ * that <prev> -> @hlock and @hlock -> <whatever __bfs() found> is not -(*R)->
+ * and -(S*)->.
+ */
+static inline void bfs_init_root(struct lock_list *lock,
+ struct held_lock *hlock)
+{
+ __bfs_init_root(lock, hlock_class(hlock));
+ lock->only_xr = (hlock->read == 2);
+}
+
+/*
+ * Similar to bfs_init_root() but initialize the root for backwards BFS.
+ *
+ * ->only_xr of the initial lock node is set to @hlock->read != 0, to make sure
+ * that <next> -> @hlock and @hlock -> <whatever backwards BFS found> is not
+ * -(*S)-> and -(R*)-> (reverse order of -(*R)-> and -(S*)->).
+ */
+static inline void bfs_init_rootb(struct lock_list *lock,
+ struct held_lock *hlock)
+{
+ __bfs_init_root(lock, hlock_class(hlock));
+ lock->only_xr = (hlock->read != 0);
+}
+
+/*
+ * Breadth-First Search to find a strong path in the dependency graph.
+ *
+ * @source_entry: the source of the path we are searching for.
+ * @data: data used for the second parameter of @match function
+ * @match: match function for the search
+ * @target_entry: pointer to the target of a matched path
+ * @offset: the offset to struct lock_class to determine whether it is
+ * locks_after or locks_before
+ *
+ * We may have multiple edges (considering different kinds of dependencies,
+ * e.g. ER and SN) between two nodes in the dependency graph. But
+ * only the strong dependency path in the graph is relevant to deadlocks. A
+ * strong dependency path is a dependency path that doesn't have two adjacent
+ * dependencies as -(*R)-> -(S*)->, please see:
+ *
+ * Documentation/locking/lockdep-design.rst
+ *
+ * for more explanation of the definition of strong dependency paths
+ *
+ * In __bfs(), we only traverse in the strong dependency path:
+ *
+ * In lock_list::only_xr, we record whether the previous dependency only
+ * has -(*R)-> in the search, and if it does (prev only has -(*R)->), we
+ * filter out any -(S*)-> in the current dependency and after that, the
+ * ->only_xr is set according to whether we only have -(*R)-> left.
*/
static enum bfs_result __bfs(struct lock_list *source_entry,
void *data,
@@ -1582,6 +1646,7 @@ static enum bfs_result __bfs(struct lock_list *source_entry,
__cq_enqueue(cq, source_entry);

while ((lock = __cq_dequeue(cq))) {
+ bool prev_only_xr;

if (!lock->class) {
ret = BFS_EINVALIDNODE;
@@ -1602,10 +1667,26 @@ static enum bfs_result __bfs(struct lock_list *source_entry,

head = get_dep_list(lock, offset);

- DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+ prev_only_xr = lock->only_xr;

list_for_each_entry_rcu(entry, head, entry) {
unsigned int cq_depth;
+ u8 dep = entry->dep;
+
+ /*
+ * Mask out all -(S*)-> if we only have *R in previous
+ * step, because -(*R)-> -(S*)-> don't make up a strong
+ * dependency.
+ */
+ if (prev_only_xr)
+ dep &= ~(DEP_SR_MASK | DEP_SN_MASK);
+
+ /* If nothing left, we skip */
+ if (!dep)
+ continue;
+
+ /* If there are only -(*R)-> left, set that for the next step */
+ entry->only_xr = !(dep & (DEP_SN_MASK | DEP_EN_MASK));

visit_lock_entry(entry, lock);
if (match(entry, data)) {
@@ -1827,8 +1908,7 @@ unsigned long lockdep_count_forward_deps(struct lock_class *class)
unsigned long ret, flags;
struct lock_list this;

- this.parent = NULL;
- this.class = class;
+ __bfs_init_root(&this, class);

raw_local_irq_save(flags);
lockdep_lock();
@@ -1854,8 +1934,7 @@ unsigned long lockdep_count_backward_deps(struct lock_class *class)
unsigned long ret, flags;
struct lock_list this;

- this.parent = NULL;
- this.class = class;
+ __bfs_init_root(&this, class);

raw_local_irq_save(flags);
lockdep_lock();
@@ -1898,10 +1977,9 @@ check_noncircular(struct held_lock *src, struct held_lock *target,
{
enum bfs_result ret;
struct lock_list *target_entry;
- struct lock_list src_entry = {
- .class = hlock_class(src),
- .parent = NULL,
- };
+ struct lock_list src_entry;
+
+ bfs_init_root(&src_entry, src);

debug_atomic_inc(nr_cyclic_checks);

@@ -1937,10 +2015,9 @@ check_redundant(struct held_lock *src, struct held_lock *target)
{
enum bfs_result ret;
struct lock_list *target_entry;
- struct lock_list src_entry = {
- .class = hlock_class(src),
- .parent = NULL,
- };
+ struct lock_list src_entry;
+
+ bfs_init_root(&src_entry, src);

debug_atomic_inc(nr_redundant_checks);

@@ -3556,8 +3633,7 @@ check_usage_forwards(struct task_struct *curr, struct held_lock *this,
struct lock_list root;
struct lock_list *target_entry;

- root.parent = NULL;
- root.class = hlock_class(this);
+ bfs_init_root(&root, this);
ret = find_usage_forwards(&root, lock_flag(bit), &target_entry);
if (bfs_error(ret)) {
print_bfs_bug(ret);
@@ -3583,8 +3659,7 @@ check_usage_backwards(struct task_struct *curr, struct held_lock *this,
struct lock_list root;
struct lock_list *target_entry;

- root.parent = NULL;
- root.class = hlock_class(this);
+ bfs_init_rootb(&root, this);
ret = find_usage_backwards(&root, lock_flag(bit), &target_entry);
if (bfs_error(ret)) {
print_bfs_bug(ret);

Subject: [tip: locking/core] lockdep: Make __bfs() visit every dependency until a match

The following commit has been merged into the locking/core branch of tip:

Commit-ID: d563bc6ead9e79be37067d58509a605b67378184
Gitweb: https://git.kernel.org/tip/d563bc6ead9e79be37067d58509a605b67378184
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:23 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:03 +02:00

lockdep: Make __bfs() visit every dependency until a match

Currently, __bfs() will do a breadth-first search in the dependency
graph and visit each lock class in the graph exactly once, so for
example, in the following graph:

A ---------> B
| ^
| |
+----------> C

a __bfs() call starts at A, will visit B through dependency A -> B and
visit C through dependency A -> C and that's it, IOW, __bfs() will not
visit dependency C -> B.

This is OK for now, as we only have strong dependencies in the
dependency graph, so whenever there is a traverse path from A to B in
__bfs(), it means A has strong dependencies to B (IOW, B depends on A
strongly). So no need to visit all dependencies in the graph.

However, as we are going to add recursive-read lock into the dependency
graph, as a result, not all the paths mean strong dependencies, in the
same example above, dependency A -> B may be a weak dependency and
traverse A -> C -> B may be a strong dependency path. And with the old
way of __bfs() (i.e. visiting every lock class exactly once), we will
miss the strong dependency path, which will result into failing to find
a deadlock. To cure this for the future, we need to find a way for
__bfs() to visit each dependency, rather than each class, exactly once
in the search until we find a match.

The solution is simple:

We used to mark lock_class::lockdep_dependency_gen_id to indicate a
class has been visited in __bfs(), now we change the semantics a little
bit: we now mark lock_class::lockdep_dependency_gen_id to indicate _all
the dependencies_ in its lock_{after,before} have been visited in the
__bfs() (note we only take one direction in a __bfs() search). In this
way, every dependency is guaranteed to be visited until we find a match.

Note: the checks in mark_lock_accessed() and lock_accessed() are
removed, because after this modification, we may call these two
functions on @source_entry of __bfs(), which may not be the entry in
"list_entries"

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 61 ++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 26 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 462c68c..150686a 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1421,23 +1421,19 @@ static inline unsigned int __cq_get_elem_count(struct circular_queue *cq)
return (cq->rear - cq->front) & CQ_MASK;
}

-static inline void mark_lock_accessed(struct lock_list *lock,
- struct lock_list *parent)
+static inline void mark_lock_accessed(struct lock_list *lock)
{
- unsigned long nr;
+ lock->class->dep_gen_id = lockdep_dependency_gen_id;
+}

- nr = lock - list_entries;
- WARN_ON(nr >= ARRAY_SIZE(list_entries)); /* Out-of-bounds, input fail */
+static inline void visit_lock_entry(struct lock_list *lock,
+ struct lock_list *parent)
+{
lock->parent = parent;
- lock->class->dep_gen_id = lockdep_dependency_gen_id;
}

static inline unsigned long lock_accessed(struct lock_list *lock)
{
- unsigned long nr;
-
- nr = lock - list_entries;
- WARN_ON(nr >= ARRAY_SIZE(list_entries)); /* Out-of-bounds, input fail */
return lock->class->dep_gen_id == lockdep_dependency_gen_id;
}

@@ -1540,26 +1536,39 @@ static enum bfs_result __bfs(struct lock_list *source_entry,
goto exit;
}

+ /*
+ * If we have visited all the dependencies from this @lock to
+ * others (iow, if we have visited all lock_list entries in
+ * @lock->class->locks_{after,before}) we skip, otherwise go
+ * and visit all the dependencies in the list and mark this
+ * list accessed.
+ */
+ if (lock_accessed(lock))
+ continue;
+ else
+ mark_lock_accessed(lock);
+
head = get_dep_list(lock, offset);

+ DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+
list_for_each_entry_rcu(entry, head, entry) {
- if (!lock_accessed(entry)) {
- unsigned int cq_depth;
- mark_lock_accessed(entry, lock);
- if (match(entry, data)) {
- *target_entry = entry;
- ret = BFS_RMATCH;
- goto exit;
- }
-
- if (__cq_enqueue(cq, entry)) {
- ret = BFS_EQUEUEFULL;
- goto exit;
- }
- cq_depth = __cq_get_elem_count(cq);
- if (max_bfs_queue_depth < cq_depth)
- max_bfs_queue_depth = cq_depth;
+ unsigned int cq_depth;
+
+ visit_lock_entry(entry, lock);
+ if (match(entry, data)) {
+ *target_entry = entry;
+ ret = BFS_RMATCH;
+ goto exit;
+ }
+
+ if (__cq_enqueue(cq, entry)) {
+ ret = BFS_EQUEUEFULL;
+ goto exit;
}
+ cq_depth = __cq_get_elem_count(cq);
+ if (max_bfs_queue_depth < cq_depth)
+ max_bfs_queue_depth = cq_depth;
}
}
exit:

Subject: [tip: locking/core] lockdep: Support deadlock detection for recursive read locks in check_noncircular()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 9de0c9bbcedf752e762c67f105bff342e30f9105
Gitweb: https://git.kernel.org/tip/9de0c9bbcedf752e762c67f105bff342e30f9105
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:28 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:05 +02:00

lockdep: Support deadlock detection for recursive read locks in check_noncircular()

Currently, lockdep only has limit support for deadlock detection for
recursive read locks.

This patch support deadlock detection for recursive read locks. The
basic idea is:

We are about to add dependency B -> A in to the dependency graph, we use
check_noncircular() to find whether we have a strong dependency path
A -> .. -> B so that we have a strong dependency circle (a closed strong
dependency path):

A -> .. -> B -> A

, which doesn't have two adjacent dependencies as -(*R)-> L -(S*)->.

Since A -> .. -> B is already a strong dependency path, so if either
B -> A is -(E*)-> or A -> .. -> B is -(*N)->, the circle A -> .. -> B ->
A is strong, otherwise not. So we introduce a new match function
hlock_conflict() to replace the class_equal() for the deadlock check in
check_noncircular().

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 43 +++++++++++++++++++++++++++++++--------
1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 78cd74d..9160f1d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1838,10 +1838,37 @@ static inline bool class_equal(struct lock_list *entry, void *data)
return entry->class == data;
}

+/*
+ * We are about to add B -> A into the dependency graph, and in __bfs() a
+ * strong dependency path A -> .. -> B is found: hlock_class equals
+ * entry->class.
+ *
+ * We will have a deadlock case (conflict) if A -> .. -> B -> A is a strong
+ * dependency cycle, that means:
+ *
+ * Either
+ *
+ * a) B -> A is -(E*)->
+ *
+ * or
+ *
+ * b) A -> .. -> B is -(*N)-> (i.e. A -> .. -(*N)-> B)
+ *
+ * as then we don't have -(*R)-> -(S*)-> in the cycle.
+ */
+static inline bool hlock_conflict(struct lock_list *entry, void *data)
+{
+ struct held_lock *hlock = (struct held_lock *)data;
+
+ return hlock_class(hlock) == entry->class && /* Found A -> .. -> B */
+ (hlock->read == 0 || /* B -> A is -(E*)-> */
+ !entry->only_xr); /* A -> .. -> B is -(*N)-> */
+}
+
static noinline void print_circular_bug(struct lock_list *this,
- struct lock_list *target,
- struct held_lock *check_src,
- struct held_lock *check_tgt)
+ struct lock_list *target,
+ struct held_lock *check_src,
+ struct held_lock *check_tgt)
{
struct task_struct *curr = current;
struct lock_list *parent;
@@ -1950,13 +1977,13 @@ unsigned long lockdep_count_backward_deps(struct lock_class *class)
* <target> or not.
*/
static noinline enum bfs_result
-check_path(struct lock_class *target, struct lock_list *src_entry,
+check_path(struct held_lock *target, struct lock_list *src_entry,
+ bool (*match)(struct lock_list *entry, void *data),
struct lock_list **target_entry)
{
enum bfs_result ret;

- ret = __bfs_forwards(src_entry, (void *)target, class_equal,
- target_entry);
+ ret = __bfs_forwards(src_entry, target, match, target_entry);

if (unlikely(bfs_error(ret)))
print_bfs_bug(ret);
@@ -1983,7 +2010,7 @@ check_noncircular(struct held_lock *src, struct held_lock *target,

debug_atomic_inc(nr_cyclic_checks);

- ret = check_path(hlock_class(target), &src_entry, &target_entry);
+ ret = check_path(target, &src_entry, hlock_conflict, &target_entry);

if (unlikely(ret == BFS_RMATCH)) {
if (!*trace) {
@@ -2021,7 +2048,7 @@ check_redundant(struct held_lock *src, struct held_lock *target)

debug_atomic_inc(nr_redundant_checks);

- ret = check_path(hlock_class(target), &src_entry, &target_entry);
+ ret = check_path(target, &src_entry, class_equal, &target_entry);

if (ret == BFS_RMATCH)
debug_atomic_inc(nr_redundant);

Subject: [tip: locking/core] lockdep: Take read/write status in consideration when generate chainkey

The following commit has been merged into the locking/core branch of tip:

Commit-ID: f611e8cf98ec908b9c2c0da6064a660fc6022487
Gitweb: https://git.kernel.org/tip/f611e8cf98ec908b9c2c0da6064a660fc6022487
Author: Boqun Feng <[email protected]>
AuthorDate: Fri, 07 Aug 2020 15:42:33 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 26 Aug 2020 12:42:06 +02:00

lockdep: Take read/write status in consideration when generate chainkey

Currently, the chainkey of a lock chain is a hash sum of the class_idx
of all the held locks, the read/write status are not taken in to
consideration while generating the chainkey. This could result into a
problem, if we have:

P1()
{
read_lock(B);
lock(A);
}

P2()
{
lock(A);
read_lock(B);
}

P3()
{
lock(A);
write_lock(B);
}

, and P1(), P2(), P3() run one by one. And when running P2(), lockdep
detects such a lock chain A -> B is not a deadlock, then it's added in
the chain cache, and then when running P3(), even if it's a deadlock, we
could miss it because of the hit of chain cache. This could be confirmed
by self testcase "chain cached mixed R-L/L-W ".

To resolve this, we use concept "hlock_id" to generate the chainkey, the
hlock_id is a tuple (hlock->class_idx, hlock->read), which fits in a u16
type. With this, the chainkeys are different is the lock sequences have
the same locks but different read/write status.

Besides, since we use "hlock_id" to generate chainkeys, the chain_hlocks
array now store the "hlock_id"s rather than lock_class indexes.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 53 +++++++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index b87766e..cccf4bc 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -372,6 +372,21 @@ static struct hlist_head classhash_table[CLASSHASH_SIZE];
static struct hlist_head chainhash_table[CHAINHASH_SIZE];

/*
+ * the id of held_lock
+ */
+static inline u16 hlock_id(struct held_lock *hlock)
+{
+ BUILD_BUG_ON(MAX_LOCKDEP_KEYS_BITS + 2 > 16);
+
+ return (hlock->class_idx | (hlock->read << MAX_LOCKDEP_KEYS_BITS));
+}
+
+static inline unsigned int chain_hlock_class_idx(u16 hlock_id)
+{
+ return hlock_id & (MAX_LOCKDEP_KEYS - 1);
+}
+
+/*
* The hash key of the lock dependency chains is a hash itself too:
* it's a hash of all locks taken up to that lock, including that lock.
* It's a 64-bit hash, because it's important for the keys to be
@@ -3202,7 +3217,10 @@ static inline void free_chain_hlocks(int base, int size)

struct lock_class *lock_chain_get_class(struct lock_chain *chain, int i)
{
- return lock_classes + chain_hlocks[chain->base + i];
+ u16 chain_hlock = chain_hlocks[chain->base + i];
+ unsigned int class_idx = chain_hlock_class_idx(chain_hlock);
+
+ return lock_classes + class_idx - 1;
}

/*
@@ -3228,12 +3246,12 @@ static inline int get_first_held_lock(struct task_struct *curr,
/*
* Returns the next chain_key iteration
*/
-static u64 print_chain_key_iteration(int class_idx, u64 chain_key)
+static u64 print_chain_key_iteration(u16 hlock_id, u64 chain_key)
{
- u64 new_chain_key = iterate_chain_key(chain_key, class_idx);
+ u64 new_chain_key = iterate_chain_key(chain_key, hlock_id);

- printk(" class_idx:%d -> chain_key:%016Lx",
- class_idx,
+ printk(" hlock_id:%d -> chain_key:%016Lx",
+ (unsigned int)hlock_id,
(unsigned long long)new_chain_key);
return new_chain_key;
}
@@ -3250,12 +3268,12 @@ print_chain_keys_held_locks(struct task_struct *curr, struct held_lock *hlock_ne
hlock_next->irq_context);
for (; i < depth; i++) {
hlock = curr->held_locks + i;
- chain_key = print_chain_key_iteration(hlock->class_idx, chain_key);
+ chain_key = print_chain_key_iteration(hlock_id(hlock), chain_key);

print_lock(hlock);
}

- print_chain_key_iteration(hlock_next->class_idx, chain_key);
+ print_chain_key_iteration(hlock_id(hlock_next), chain_key);
print_lock(hlock_next);
}

@@ -3263,14 +3281,14 @@ static void print_chain_keys_chain(struct lock_chain *chain)
{
int i;
u64 chain_key = INITIAL_CHAIN_KEY;
- int class_id;
+ u16 hlock_id;

printk("depth: %u\n", chain->depth);
for (i = 0; i < chain->depth; i++) {
- class_id = chain_hlocks[chain->base + i];
- chain_key = print_chain_key_iteration(class_id, chain_key);
+ hlock_id = chain_hlocks[chain->base + i];
+ chain_key = print_chain_key_iteration(hlock_id, chain_key);

- print_lock_name(lock_classes + class_id);
+ print_lock_name(lock_classes + chain_hlock_class_idx(hlock_id) - 1);
printk("\n");
}
}
@@ -3319,7 +3337,7 @@ static int check_no_collision(struct task_struct *curr,
}

for (j = 0; j < chain->depth - 1; j++, i++) {
- id = curr->held_locks[i].class_idx;
+ id = hlock_id(&curr->held_locks[i]);

if (DEBUG_LOCKS_WARN_ON(chain_hlocks[chain->base + j] != id)) {
print_collision(curr, hlock, chain);
@@ -3368,7 +3386,6 @@ static inline int add_chain_cache(struct task_struct *curr,
struct held_lock *hlock,
u64 chain_key)
{
- struct lock_class *class = hlock_class(hlock);
struct hlist_head *hash_head = chainhashentry(chain_key);
struct lock_chain *chain;
int i, j;
@@ -3411,11 +3428,11 @@ static inline int add_chain_cache(struct task_struct *curr,

chain->base = j;
for (j = 0; j < chain->depth - 1; j++, i++) {
- int lock_id = curr->held_locks[i].class_idx;
+ int lock_id = hlock_id(curr->held_locks + i);

chain_hlocks[chain->base + j] = lock_id;
}
- chain_hlocks[chain->base + j] = class - lock_classes;
+ chain_hlocks[chain->base + j] = hlock_id(hlock);
hlist_add_head_rcu(&chain->entry, hash_head);
debug_atomic_inc(chain_lookup_misses);
inc_chains(chain->irq_context);
@@ -3602,7 +3619,7 @@ static void check_chain_key(struct task_struct *curr)
if (prev_hlock && (prev_hlock->irq_context !=
hlock->irq_context))
chain_key = INITIAL_CHAIN_KEY;
- chain_key = iterate_chain_key(chain_key, hlock->class_idx);
+ chain_key = iterate_chain_key(chain_key, hlock_id(hlock));
prev_hlock = hlock;
}
if (chain_key != curr->curr_chain_key) {
@@ -4749,7 +4766,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
chain_key = INITIAL_CHAIN_KEY;
chain_head = 1;
}
- chain_key = iterate_chain_key(chain_key, class_idx);
+ chain_key = iterate_chain_key(chain_key, hlock_id(hlock));

if (nest_lock && !__lock_is_held(nest_lock, -1)) {
print_lock_nested_lock_not_held(curr, hlock, ip);
@@ -5648,7 +5665,7 @@ static void remove_class_from_lock_chain(struct pending_free *pf,
int i;

for (i = chain->base; i < chain->base + chain->depth; i++) {
- if (chain_hlocks[i] != class - lock_classes)
+ if (chain_hlock_class_idx(chain_hlocks[i]) != class - lock_classes)
continue;
/*
* Each lock class occurs at most once in a lock chain so once

2020-09-14 18:18:06

by Qian Cai

[permalink] [raw]
Subject: Re: [RFC v7 12/19] lockdep: Add recursive read locks into dependency graph

On Fri, 2020-08-07 at 15:42 +0800, Boqun Feng wrote:
> Since we have all the fundamental to handle recursive read locks, we now
> add them into the dependency graph.
>
> Signed-off-by: Boqun Feng <[email protected]>

Reverting this patch and its dependency:

[14/19] lockdep: Take read/write status in consideration when generate chainkey

fixed a splat below. IOW, this patch introduced this new splat which looks like
a false positive because the existing locking dependency chains here:

&s->seqcount#2 ---> pidmap_lock

[ 528.078061][ T7867] -> #1 (pidmap_lock){....}-{2:2}:
[ 528.078078][ T7867] lock_acquire+0x10c/0x560
[ 528.078089][ T7867] _raw_spin_lock_irqsave+0x64/0xb0
[ 528.078108][ T7867] free_pid+0x5c/0x160
free_pid at kernel/pid.c:131
[ 528.078127][ T7867] release_task.part.40+0x59c/0x7f0
__unhash_process at kernel/exit.c:76
(inlined by) __exit_signal at kernel/exit.c:147
(inlined by) release_task at kernel/exit.c:198
[ 528.078145][ T7867] do_exit+0x77c/0xda0
exit_notify at kernel/exit.c:679
(inlined by) do_exit at kernel/exit.c:826
[ 528.078163][ T7867] kthread+0x148/0x1d0
[ 528.078182][ T7867] ret_from_kernel_thread+0x5c/0x80

It is write_seqlock(&sig->stats_lock) in __exit_signal(), but the seqcount in
read_mems_allowed_begin() is read_seqcount_begin(&current->mems_allowed_seq), so
there should be no deadlock?

[ 528.077900][ T7867] WARNING: possible circular locking dependency detected
[ 528.077912][ T7867] 5.9.0-rc5-next-20200914 #1 Not tainted
[ 528.077921][ T7867] ------------------------------------------------------
[ 528.077931][ T7867] runc:[1:CHILD]/7867 is trying to acquire lock:
[ 528.077942][ T7867] c000001fce5570c8 (&s->seqcount#2){....}-{0:0}, at: __slab_alloc+0x34/0xf0
[ 528.077972][ T7867]
[ 528.077972][ T7867] but task is already holding lock:
[ 528.077983][ T7867] c0000000056b0198 (pidmap_lock){....}-{2:2}, at: alloc_pid+0x258/0x590
[ 528.078009][ T7867]
[ 528.078009][ T7867] which lock already depends on the new lock.
[ 528.078009][ T7867]
[ 528.078031][ T7867]
[ 528.078031][ T7867] the existing dependency chain (in reverse order) is:
[ 528.078061][ T7867]
[ 528.078061][ T7867] -> #1 (pidmap_lock){....}-{2:2}:
[ 528.078078][ T7867] lock_acquire+0x10c/0x560
[ 528.078089][ T7867] _raw_spin_lock_irqsave+0x64/0xb0
[ 528.078108][ T7867] free_pid+0x5c/0x160
free_pid at kernel/pid.c:131
[ 528.078127][ T7867] release_task.part.40+0x59c/0x7f0
__unhash_process at kernel/exit.c:76
(inlined by) __exit_signal at kernel/exit.c:147
(inlined by) release_task at kernel/exit.c:198
[ 528.078145][ T7867] do_exit+0x77c/0xda0
exit_notify at kernel/exit.c:679
(inlined by) do_exit at kernel/exit.c:826
[ 528.078163][ T7867] kthread+0x148/0x1d0
[ 528.078182][ T7867] ret_from_kernel_thread+0x5c/0x80
[ 528.078208][ T7867]
[ 528.078208][ T7867] -> #0 (&s->seqcount#2){....}-{0:0}:
[ 528.078241][ T7867] check_prevs_add+0x1c4/0x1120
check_prev_add at kernel/locking/lockdep.c:2820
(inlined by) check_prevs_add at kernel/locking/lockdep.c:2944
[ 528.078260][ T7867] __lock_acquire+0x176c/0x1c00
validate_chain at kernel/locking/lockdep.c:3562
(inlined by) __lock_acquire at kernel/locking/lockdep.c:4796
[ 528.078278][ T7867] lock_acquire+0x10c/0x560
[ 528.078297][ T7867] ___slab_alloc+0xa40/0xb40
seqcount_lockdep_reader_access at include/linux/seqlock.h:103
(inlined by) read_mems_allowed_begin at include/linux/cpuset.h:135
(inlined by) get_any_partial at mm/slub.c:2035
(inlined by) get_partial at mm/slub.c:2078
(inlined by) new_slab_objects at mm/slub.c:2577
(inlined by) ___slab_alloc at mm/slub.c:2745
[ 528.078324][ T7867] __slab_alloc+0x34/0xf0
[ 528.078342][ T7867] kmem_cache_alloc+0x2d4/0x470
[ 528.078362][ T7867] create_object+0x74/0x430
[ 528.078381][ T7867] slab_post_alloc_hook+0xa4/0x670
[ 528.078399][ T7867] kmem_cache_alloc+0x1b4/0x470
[ 528.078418][ T7867] radix_tree_node_alloc.constprop.19+0xe4/0x160
[ 528.078438][ T7867] idr_get_free+0x298/0x360
[ 528.078456][ T7867] idr_alloc_u32+0x84/0x130
[ 528.078474][ T7867] idr_alloc_cyclic+0x7c/0x150
[ 528.078493][ T7867] alloc_pid+0x27c/0x590
[ 528.078511][ T7867] copy_process+0xc90/0x1930
copy_process at kernel/fork.c:2104
[ 528.078529][ T7867] kernel_clone+0x120/0xa10
[ 528.078546][ T7867] __do_sys_clone+0x88/0xd0
[ 528.078565][ T7867] system_call_exception+0xf8/0x1d0
[ 528.078592][ T7867] system_call_common+0xe8/0x218
[ 528.078609][ T7867]
[ 528.078609][ T7867] other info that might help us debug this:
[ 528.078609][ T7867]
[ 528.078650][ T7867] Possible unsafe locking scenario:
[ 528.078650][ T7867]
[ 528.078670][ T7867] CPU0 CPU1
[ 528.078695][ T7867] ---- ----
[ 528.078713][ T7867] lock(pidmap_lock);
[ 528.078730][ T7867] lock(&s->seqcount#2);
[ 528.078751][ T7867] lock(pidmap_lock);
[ 528.078770][ T7867] lock(&s->seqcount#2);
[ 528.078788][ T7867]
[ 528.078788][ T7867] *** DEADLOCK ***
[ 528.078788][ T7867]
[ 528.078800][ T7867] 2 locks held by runc:[1:CHILD]/7867:
[ 528.078808][ T7867] #0: c000001ffea6f4f0 (lock#2){+.+.}-{2:2}, at: __radix_tree_preload+0x8/0x370
__radix_tree_preload at lib/radix-tree.c:322
[ 528.078844][ T7867] #1: c0000000056b0198 (pidmap_lock){....}-{2:2}, at: alloc_pid+0x258/0x590
[ 528.078870][ T7867]
[ 528.078870][ T7867] stack backtrace:
[ 528.078890][ T7867] CPU: 46 PID: 7867 Comm: runc:[1:CHILD] Not tainted 5.9.0-rc5-next-20200914 #1
[ 528.078921][ T7867] Call Trace:
[ 528.078940][ T7867] [c000001ff07eefc0] [c00000000063f8c8] dump_stack+0xec/0x144 (unreliable)
[ 528.078964][ T7867] [c000001ff07ef000] [c00000000013f44c] print_circular_bug.isra.43+0x2dc/0x350
[ 528.078978][ T7867] [c000001ff07ef0a0] [c00000000013f640] check_noncircular+0x180/0x1b0
[ 528.079000][ T7867] [c000001ff07ef170] [c000000000140b84] check_prevs_add+0x1c4/0x1120
[ 528.079022][ T7867] [c000001ff07ef280] [c0000000001446ec] __lock_acquire+0x176c/0x1c00
[ 528.079043][ T7867] [c000001ff07ef3a0] [c00000000014578c] lock_acquire+0x10c/0x560
[ 528.079066][ T7867] [c000001ff07ef490] [c0000000003565f0] ___slab_alloc+0xa40/0xb40
[ 528.079079][ T7867] [c000001ff07ef590] [c000000000356724] __slab_alloc+0x34/0xf0
[ 528.079100][ T7867] [c000001ff07ef5e0] [c000000000356ab4] kmem_cache_alloc+0x2d4/0x470
[ 528.079122][ T7867] [c000001ff07ef670] [c000000000397e14] create_object+0x74/0x430
[ 528.079144][ T7867] [c000001ff07ef720] [c000000000351944] slab_post_alloc_hook+0xa4/0x670
[ 528.079165][ T7867] [c000001ff07ef7e0] [c000000000356994] kmem_cache_alloc+0x1b4/0x470
[ 528.079187][ T7867] [c000001ff07ef870] [c00000000064e004] radix_tree_node_alloc.constprop.19+0xe4/0x160
radix_tree_node_alloc at lib/radix-tree.c:252
[ 528.079219][ T7867] [c000001ff07ef8e0] [c00000000064f2b8] idr_get_free+0x298/0x360
idr_get_free at lib/radix-tree.c:1507
[ 528.079249][ T7867] [c000001ff07ef970] [c000000000645db4] idr_alloc_u32+0x84/0x130
idr_alloc_u32 at lib/idr.c:46 (discriminator 4)
[ 528.079271][ T7867] [c000001ff07ef9e0] [c000000000645f8c] idr_alloc_cyclic+0x7c/0x150
idr_alloc_cyclic at lib/idr.c:126 (discriminator 1)
[ 528.079301][ T7867] [c000001ff07efa40] [c0000000000e48ac] alloc_pid+0x27c/0x590
[ 528.079342][ T7867] [c000001ff07efb20] [c0000000000acc60] copy_process+0xc90/0x1930
[ 528.079404][ T7867] [c000001ff07efc40] [c0000000000adc00] kernel_clone+0x120/0xa10
[ 528.079499][ T7867] [c000001ff07efd00] [c0000000000ae578] __do_sys_clone+0x88/0xd0
[ 528.079579][ T7867] [c000001ff07efdc0] [c000000000029c48] system_call_exception+0xf8/0x1d0
[ 528.079691][ T7867] [c000001ff07efe20] [c00000000000d0a8] system_call_common+0xe8/0x218

> ---
> kernel/locking/lockdep.c | 19 ++-----------------
> 1 file changed, 2 insertions(+), 17 deletions(-)
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 040509667798..867199c4b85d 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -2808,16 +2808,6 @@ check_prev_add(struct task_struct *curr, struct
> held_lock *prev,
> if (!check_irq_usage(curr, prev, next))
> return 0;
>
> - /*
> - * For recursive read-locks we do all the dependency checks,
> - * but we dont store read-triggered dependencies (only
> - * write-triggered dependencies). This ensures that only the
> - * write-side dependencies matter, and that if for example a
> - * write-lock never takes any other locks, then the reads are
> - * equivalent to a NOP.
> - */
> - if (next->read == 2 || prev->read == 2)
> - return 1;
> /*
> * Is the <prev> -> <next> dependency already present?
> *
> @@ -2935,13 +2925,8 @@ check_prevs_add(struct task_struct *curr, struct
> held_lock *next)
> u16 distance = curr->lockdep_depth - depth + 1;
> hlock = curr->held_locks + depth - 1;
>
> - /*
> - * Only non-recursive-read entries get new dependencies
> - * added:
> - */
> - if (hlock->read != 2 && hlock->check) {
> - int ret = check_prev_add(curr, hlock, next, distance,
> - &trace);
> + if (hlock->check) {
> + int ret = check_prev_add(curr, hlock, next, distance,
> &trace);
> if (!ret)
> return 0;
>

2020-09-14 22:06:59

by Qian Cai

[permalink] [raw]
Subject: Re: [RFC v7 12/19] lockdep: Add recursive read locks into dependency graph

On Mon, 2020-09-14 at 14:16 -0400, Qian Cai wrote:
> On Fri, 2020-08-07 at 15:42 +0800, Boqun Feng wrote:
> > Since we have all the fundamental to handle recursive read locks, we now
> > add them into the dependency graph.
> >
> > Signed-off-by: Boqun Feng <[email protected]>
>
> Reverting this patch and its dependency:
>
> [14/19] lockdep: Take read/write status in consideration when generate
> chainkey
>
> fixed a splat below. IOW, this patch introduced this new splat which looks

Sorry, it turned out my previous reproducer was not so reliable, but now I found
a new reliable reproducer that could trigger this on both x86 and powerpc every
time which so far points out the culprit being the patchset:

"[PATCH v1 0/5] seqlock: Introduce PREEMPT_RT support" [1].

It also matched exactly the exact timing (today) between the issue showed up and
the patchset was merged into linux-next. I'll do a bit more confirmation and
report it there.

[1] https://lore.kernel.org/lkml/[email protected]/

2020-09-15 18:37:09

by Qian Cai

[permalink] [raw]
Subject: Re: [RFC v7 11/19] lockdep: Fix recursive read lock related safe->unsafe detection

On Fri, 2020-08-07 at 15:42 +0800, Boqun Feng wrote:
> Currently, in safe->unsafe detection, lockdep misses the fact that a
> LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
> cause deadlock too, for example:
>
> P1 P2
> <irq disabled>
> write_lock(l1); <irq enabled>
> read_lock(l2);
> write_lock(l2);
> <in irq>
> read_lock(l1);
>
> Actually, all of the following cases may cause deadlocks:
>
> LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
> LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
> LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
> LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ
>
> To fix this, we need to 1) change the calculation of exclusive_mask() so
> that READ bits are not dropped and 2) always call usage() in
> mark_lock_irq() to check usage deadlocks, even when the new usage of the
> lock is READ.
>
> Besides, adjust usage_match() and usage_acculumate() to recursive read
> lock changes.
>
> Signed-off-by: Boqun Feng <[email protected]>

So our daily CI starts to trigger a warning (graph corruption?) below. From the
call traces, this recent patchset changed a few related things here and there.
Does it ring any bells?

[14828.805563][T145183] lockdep bfs error:-1
[14828.826015][T145183] WARNING: CPU: 28 PID: 145183 at kernel/locking/lockdep.c:1960 print_bfs_bug+0xfc/0x180
[14828.871595][T145183] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio loop nls_ascii nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs [last unloaded: dummy_del_mod]
[14828.994188][T145183] CPU: 28 PID: 145183 Comm: trinity-c28 Tainted: G O 5.9.0-rc5-next-20200915+ #2
[14829.041983][T145183] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
[14829.075779][T145183] RIP: 0010:print_bfs_bug+0xfc/0x180
[14829.099551][T145183] Code: 04 08 00 00 01 48 c7 05 4e 02 75 07 00 00 00 00 c6 05 87 02 75 07 00 45 85 e4 74 10 89 ee 48 c7 c7 e0 71 45 90 e8 78 15 0a 01 <0f> 0b 5b 5d 41 5c c3 e8 a8 74 0d 01 85 c0 74 dd 48 c7 c7 18 9f 59
[14829.189430][T145183] RSP: 0018:ffffc90023d7ed90 EFLAGS: 00010082
[14829.217056][T145183] RAX: 0000000000000000 RBX: ffff888ac6238040 RCX: 0000000000000027
[14829.253274][T145183] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff88881e29fe08
[14829.289767][T145183] RBP: 00000000ffffffff R08: ffffed1103c53fc2 R09: ffffed1103c53fc2
[14829.328689][T145183] R10: ffff88881e29fe0b R11: ffffed1103c53fc1 R12: 0000000000000001
[14829.367921][T145183] R13: 0000000000000000 R14: ffff888ac6238040 R15: ffff888ac62388e8
[14829.404156][T145183] FS: 00007f850d4a0740(0000) GS:ffff88881e280000(0000) knlGS:0000000000000000
[14829.444478][T145183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14829.474221][T145183] CR2: 00007f850c3b00fc CR3: 0000000a3634a001 CR4: 00000000001706e0
[14829.511287][T145183] DR0: 00007f850ae99000 DR1: 00007f850b399000 DR2: 0000000000000000
[14829.548612][T145183] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[14829.586621][T145183] Call Trace:
[14829.602266][T145183] check_irq_usage+0x6a1/0xc30
check_irq_usage at kernel/locking/lockdep.c:2586
[14829.624092][T145183] ? print_usage_bug+0x1e0/0x1e0
[14829.646334][T145183] ? mark_lock.part.47+0x109/0x1920
[14829.670176][T145183] ? print_irq_inversion_bug+0x210/0x210
[14829.695950][T145183] ? print_usage_bug+0x1e0/0x1e0
[14829.718164][T145183] ? hlock_conflict+0x54/0x1f0
[14829.739717][T145183] ? __bfs+0x7d/0x580
[14829.757562][T145183] ? mark_lock.part.47+0x109/0x1920
[14829.780950][T145183] ? check_path.constprop.52+0x22/0x40
[14829.805551][T145183] ? check_noncircular+0x14b/0x320
[14829.831831][T145183] ? print_circular_bug.isra.42+0x360/0x360
[14829.860945][T145183] ? mark_lock.part.47+0x109/0x1920
[14829.884384][T145183] ? print_usage_bug+0x1e0/0x1e0
[14829.906638][T145183] ? check_prevs_add+0x3a2/0x2720
[14829.929349][T145183] check_prevs_add+0x3a2/0x2720
check_prev_add at kernel/locking/lockdep.c:2823
(inlined by) check_prevs_add at kernel/locking/lockdep.c:2944
[14829.951604][T145183] ? mark_lock.part.47+0x109/0x1920
[14829.975179][T145183] ? __thaw_task+0x70/0x70
[14829.995132][T145183] ? arch_stack_walk+0xa0/0xf0
[14830.016534][T145183] ? check_irq_usage+0xc30/0xc30
[14830.039256][T145183] __lock_acquire+0x29e0/0x39c0
[14830.061128][T145183] ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
[14830.088154][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
[14830.113159][T145183] lock_acquire+0x1bc/0x8e0
[14830.133453][T145183] ? __debug_object_init+0x598/0xf50
[14830.157250][T145183] ? rcu_read_unlock+0x40/0x40
[14830.178756][T145183] ? rwlock_bug.part.1+0x90/0x90
[14830.201096][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
[14830.226074][T145183] _raw_spin_lock+0x27/0x40
[14830.245905][T145183] ? __debug_object_init+0x598/0xf50
[14830.269943][T145183] __debug_object_init+0x598/0xf50
[14830.293271][T145183] ? lock_downgrade+0x730/0x730
[14830.315316][T145183] ? mark_held_locks+0xb0/0x110
[14830.340602][T145183] ? debug_object_fixup+0x30/0x30
[14830.365561][T145183] ? lockdep_hardirqs_on_prepare+0x32b/0x4d0
[14830.392895][T145183] ? _raw_spin_unlock_irqrestore+0x34/0x40
[14830.420261][T145183] debug_object_activate+0x25c/0x4a0
[14830.444531][T145183] ? __delete_object+0xb3/0x100
[14830.466404][T145183] ? debug_object_assert_init+0x380/0x380
[14830.492090][T145183] ? mark_held_locks+0xb0/0x110
[14830.513920][T145183] ? get_object+0x90/0x90
[14830.533650][T145183] ? __xfs_trans_commit+0x435/0xf30
[14830.557084][T145183] call_rcu+0x2c/0x7a0
[14830.575319][T145183] ? __xfs_trans_commit+0x435/0xf30
[14830.598880][T145183] slab_free_freelist_hook+0xed/0x1a0
[14830.623165][T145183] ? __xfs_trans_commit+0x435/0xf30
[14830.646649][T145183] kmem_cache_free+0xec/0x590
[14830.667871][T145183] __xfs_trans_commit+0x435/0xf30
[14830.690410][T145183] ? xfs_trans_free_items+0x360/0x360
[14830.714802][T145183] ? xfs_trans_ichgtime+0x120/0x120
[14830.738451][T145183] ? _down_write_nest_lock+0x150/0x150
[14830.763030][T145183] xfs_vn_update_time+0x345/0x5e0
[14830.785614][T145183] ? xfs_init_security.isra.12+0x10/0x10
[14830.811114][T145183] ? __sb_start_write+0x115/0x2d0
[14830.835603][T145183] touch_atime+0x187/0x1d0
[14830.858241][T145183] ? atime_needs_update+0x560/0x560
[14830.883515][T145183] generic_file_buffered_read+0x1064/0x16d0
[14830.910022][T145183] ? pagecache_get_page+0x940/0x940
[14830.933510][T145183] ? rcu_read_lock_bh_held+0xb0/0xb0
[14830.957767][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
[14830.982837][T145183] ? xfs_file_buffered_aio_read+0x107/0x380
[14831.009468][T145183] xfs_file_buffered_aio_read+0x112/0x380
[14831.035969][T145183] ? find_held_lock+0x33/0x1c0
[14831.057327][T145183] xfs_file_read_iter+0x215/0x490
[14831.080041][T145183] generic_file_splice_read+0x36b/0x570
[14831.105078][T145183] ? pipe_to_user+0x150/0x150
[14831.126097][T145183] ? lockdep_init_map_waits+0x267/0x7c0
[14831.151114][T145183] ? debug_mutex_init+0x31/0x60
[14831.172903][T145183] splice_direct_to_actor+0x2cd/0x8c0
[14831.197133][T145183] ? pipe_to_sendpage+0x410/0x410
[14831.219301][T145183] ? do_splice_to+0x140/0x140
[14831.240356][T145183] ? lock_acquire+0x1bc/0x8e0
[14831.261820][T145183] ? do_sendfile+0x7c4/0xc10
[14831.282583][T145183] do_splice_direct+0x153/0x250
[14831.304418][T145183] ? rcu_read_lock_any_held+0xcd/0xf0
[14831.328727][T145183] ? splice_direct_to_actor+0x8c0/0x8c0
[14831.356609][T145183] ? __sb_start_write+0x229/0x2d0
[14831.381737][T145183] do_sendfile+0x397/0xc10
[14831.402277][T145183] ? do_pwritev+0x140/0x140
[14831.422139][T145183] ? __task_pid_nr_ns+0x127/0x3a0
[14831.445366][T145183] ? lock_downgrade+0x730/0x730
[14831.468357][T145183] __x64_sys_sendfile64+0x188/0x1d0
[14831.492008][T145183] ? __x64_sys_sendfile+0x1d0/0x1d0
[14831.515492][T145183] ? lockdep_hardirqs_on_prepare+0x32b/0x4d0
[14831.542701][T145183] ? syscall_enter_from_user_mode+0x1c/0x50
[14831.569356][T145183] do_syscall_64+0x33/0x40
[14831.589219][T145183] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[14831.615913][T145183] RIP: 0033:0x7f850cdb36ed
[14831.636624][T145183] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
[14831.730414][T145183] RSP: 002b:00007ffe7f0106a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
[14831.769912][T145183] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007f850cdb36ed
[14831.807052][T145183] RDX: 0000000000000000 RSI: 00000000000001a9 RDI: 000000000000014e
[14831.846148][T145183] RBP: 0000000000000028 R08: 00000000ffffefff R09: 00000000ad1ac000
[14831.885391][T145183] R10: 00000000000000de R11: 0000000000000246 R12: 0000000000000002
[14831.925901][T145183] R13: 00007f850d3dc058 R14: 00007f850d4a06c0 R15: 00007f850d3dc000
[14831.964253][T145183] CPU: 28 PID: 145183 Comm: trinity-c28 Tainted: G O 5.9.0-rc5-next-20200915+ #2
[14832.013770][T145183] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
[14832.049176][T145183] Call Trace:
[14832.064213][T145183] dump_stack+0x99/0xcb
[14832.083561][T145183] __warn.cold.13+0xe/0x55
[14832.104413][T145183] ? print_bfs_bug+0xfc/0x180
[14832.126583][T145183] report_bug+0x1af/0x260
[14832.146927][T145183] handle_bug+0x44/0x80
[14832.166010][T145183] exc_invalid_op+0x13/0x40
[14832.186752][T145183] asm_exc_invalid_op+0x12/0x20
[14832.209419][T145183] RIP: 0010:print_bfs_bug+0xfc/0x180
[14832.233888][T145183] Code: 04 08 00 00 01 48 c7 05 4e 02 75 07 00 00 00 00 c6 05 87 02 75 07 00 45 85 e4 74 10 89 ee 48 c7 c7 e0 71 45 90 e8 78 15 0a 01 <0f> 0b 5b 5d 41 5c c3 e8 a8 74 0d 01 85 c0 74 dd 48 c7 c7 18 9f 59
[14832.326993][T145183] RSP: 0018:ffffc90023d7ed90 EFLAGS: 00010082
[14832.356293][T145183] RAX: 0000000000000000 RBX: ffff888ac6238040 RCX: 0000000000000027
[14832.396354][T145183] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff88881e29fe08
[14832.435379][T145183] RBP: 00000000ffffffff R08: ffffed1103c53fc2 R09: ffffed1103c53fc2
[14832.472535][T145183] R10: ffff88881e29fe0b R11: ffffed1103c53fc1 R12: 0000000000000001
[14832.509784][T145183] R13: 0000000000000000 R14: ffff888ac6238040 R15: ffff888ac62388e8
[14832.546738][T145183] check_irq_usage+0x6a1/0xc30
[14832.568603][T145183] ? print_usage_bug+0x1e0/0x1e0
[14832.591199][T145183] ? mark_lock.part.47+0x109/0x1920
[14832.616207][T145183] ? print_irq_inversion_bug+0x210/0x210
[14832.642117][T145183] ? print_usage_bug+0x1e0/0x1e0
[14832.665399][T145183] ? hlock_conflict+0x54/0x1f0
[14832.688110][T145183] ? __bfs+0x7d/0x580
[14832.706997][T145183] ? mark_lock.part.47+0x109/0x1920
[14832.731832][T145183] ? check_path.constprop.52+0x22/0x40
[14832.757443][T145183] ? check_noncircular+0x14b/0x320
[14832.781647][T145183] ? print_circular_bug.isra.42+0x360/0x360
[14832.809574][T145183] ? mark_lock.part.47+0x109/0x1920
[14832.833910][T145183] ? print_usage_bug+0x1e0/0x1e0
[14832.857359][T145183] ? check_prevs_add+0x3a2/0x2720
[14832.882082][T145183] check_prevs_add+0x3a2/0x2720
[14832.906347][T145183] ? mark_lock.part.47+0x109/0x1920
[14832.931692][T145183] ? __thaw_task+0x70/0x70
[14832.952667][T145183] ? arch_stack_walk+0xa0/0xf0
[14832.975451][T145183] ? check_irq_usage+0xc30/0xc30
[14832.998744][T145183] __lock_acquire+0x29e0/0x39c0
[14833.021580][T145183] ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
[14833.050223][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
[14833.076024][T145183] lock_acquire+0x1bc/0x8e0
[14833.097001][T145183] ? __debug_object_init+0x598/0xf50
[14833.121814][T145183] ? rcu_read_unlock+0x40/0x40
[14833.144434][T145183] ? rwlock_bug.part.1+0x90/0x90
[14833.167064][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
[14833.192689][T145183] _raw_spin_lock+0x27/0x40
[14833.214048][T145183] ? __debug_object_init+0x598/0xf50
[14833.239071][T145183] __debug_object_init+0x598/0xf50
[14833.263192][T145183] ? lock_downgrade+0x730/0x730
[14833.288021][T145183] ? mark_held_locks+0xb0/0x110
[14833.311062][T145183] ? debug_object_fixup+0x30/0x30
[14833.334906][T145183] ? lockdep_hardirqs_on_prepare+0x32b/0x4d0
[14833.363264][T145183] ? _raw_spin_unlock_irqrestore+0x34/0x40
[14833.391110][T145183] debug_object_activate+0x25c/0x4a0
[14833.417688][T145183] ? __delete_object+0xb3/0x100
[14833.441037][T145183] ? debug_object_assert_init+0x380/0x380
[14833.467770][T145183] ? mark_held_locks+0xb0/0x110
[14833.490579][T145183] ? get_object+0x90/0x90
[14833.510655][T145183] ? __xfs_trans_commit+0x435/0xf30
[14833.535194][T145183] call_rcu+0x2c/0x7a0
[14833.554375][T145183] ? __xfs_trans_commit+0x435/0xf30
[14833.578969][T145183] slab_free_freelist_hook+0xed/0x1a0
[14833.603774][T145183] ? __xfs_trans_commit+0x435/0xf30
[14833.627632][T145183] kmem_cache_free+0xec/0x590
[14833.649621][T145183] __xfs_trans_commit+0x435/0xf30
[14833.672732][T145183] ? xfs_trans_free_items+0x360/0x360
[14833.698161][T145183] ? xfs_trans_ichgtime+0x120/0x120
[14833.722806][T145183] ? _down_write_nest_lock+0x150/0x150
[14833.748799][T145183] xfs_vn_update_time+0x345/0x5e0
[14833.772690][T145183] ? xfs_init_security.isra.12+0x10/0x10
[14833.799516][T145183] ? __sb_start_write+0x115/0x2d0
[14833.823737][T145183] touch_atime+0x187/0x1d0
[14833.844875][T145183] ? atime_needs_update+0x560/0x560
[14833.870314][T145183] generic_file_buffered_read+0x1064/0x16d0
[14833.898953][T145183] ? pagecache_get_page+0x940/0x940
[14833.923985][T145183] ? rcu_read_lock_bh_held+0xb0/0xb0
[14833.950166][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
[14833.977192][T145183] ? xfs_file_buffered_aio_read+0x107/0x380
[14834.005370][T145183] xfs_file_buffered_aio_read+0x112/0x380
[14834.032303][T145183] ? find_held_lock+0x33/0x1c0
[14834.055381][T145183] xfs_file_read_iter+0x215/0x490
[14834.078876][T145183] generic_file_splice_read+0x36b/0x570
[14834.104809][T145183] ? pipe_to_user+0x150/0x150
[14834.126650][T145183] ? lockdep_init_map_waits+0x267/0x7c0
[14834.152610][T145183] ? debug_mutex_init+0x31/0x60
[14834.176068][T145183] splice_direct_to_actor+0x2cd/0x8c0
[14834.201764][T145183] ? pipe_to_sendpage+0x410/0x410
[14834.226171][T145183] ? do_splice_to+0x140/0x140
[14834.248455][T145183] ? lock_acquire+0x1bc/0x8e0
[14834.270058][T145183] ? do_sendfile+0x7c4/0xc10
[14834.291651][T145183] do_splice_direct+0x153/0x250
[14834.314677][T145183] ? rcu_read_lock_any_held+0xcd/0xf0
[14834.339852][T145183] ? splice_direct_to_actor+0x8c0/0x8c0
[14834.367457][T145183] ? __sb_start_write+0x229/0x2d0
[14834.391636][T145183] do_sendfile+0x397/0xc10
[14834.413480][T145183] ? do_pwritev+0x140/0x140
[14834.436208][T145183] ? __task_pid_nr_ns+0x127/0x3a0
[14834.461228][T145183] ? lock_downgrade+0x730/0x730
[14834.484859][T145183] __x64_sys_sendfile64+0x188/0x1d0
[14834.510071][T145183] ? __x64_sys_sendfile+0x1d0/0x1d0
[14834.535398][T145183] ? lockdep_hardirqs_on_prepare+0x32b/0x4d0
[14834.563792][T145183] ? syscall_enter_from_user_mode+0x1c/0x50
[14834.592028][T145183] do_syscall_64+0x33/0x40
[14834.612944][T145183] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[14834.640650][T145183] RIP: 0033:0x7f850cdb36ed
[14834.661464][T145183] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
[14834.755364][T145183] RSP: 002b:00007ffe7f0106a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
[14834.795590][T145183] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007f850cdb36ed
[14834.833840][T145183] RDX: 0000000000000000 RSI: 00000000000001a9 RDI: 000000000000014e
[14834.871998][T145183] RBP: 0000000000000028 R08: 00000000ffffefff R09: 00000000ad1ac000
[14834.910057][T145183] R10: 00000000000000de R11: 0000000000000246 R12: 0000000000000002
[14834.951023][T145183] R13: 00007f850d3dc058 R14: 00007f850d4a06c0 R15: 00007f850d3dc000
[14834.989980][T145183] irq event stamp: 5176
[14835.009480][T145183] hardirqs last enabled at (5175): [<ffffffff8ff00654>] _raw_spin_unlock_irqrestore+0x34/0x40
[14835.058951][T145183] hardirqs last disabled at (5176): [<ffffffff8ff00474>] _raw_spin_lock_irqsave+0x44/0x50
[14835.105634][T145183] softirqs last enabled at (4510): [<ffffffff9020061b>] __do_softirq+0x61b/0x95d
[14835.148617][T145183] softirqs last disabled at (4503): [<ffffffff90000ec2>] asm_call_on_stack+0x12/0x20
[14835.193016][T145183] ---[ end trace c18653e36a41b0d8 ]---

2020-09-16 08:14:03

by Boqun Feng

[permalink] [raw]
Subject: Re: [RFC v7 11/19] lockdep: Fix recursive read lock related safe->unsafe detection

On Tue, Sep 15, 2020 at 02:32:51PM -0400, Qian Cai wrote:
> On Fri, 2020-08-07 at 15:42 +0800, Boqun Feng wrote:
> > Currently, in safe->unsafe detection, lockdep misses the fact that a
> > LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
> > cause deadlock too, for example:
> >
> > P1 P2
> > <irq disabled>
> > write_lock(l1); <irq enabled>
> > read_lock(l2);
> > write_lock(l2);
> > <in irq>
> > read_lock(l1);
> >
> > Actually, all of the following cases may cause deadlocks:
> >
> > LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
> > LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
> > LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
> > LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ
> >
> > To fix this, we need to 1) change the calculation of exclusive_mask() so
> > that READ bits are not dropped and 2) always call usage() in
> > mark_lock_irq() to check usage deadlocks, even when the new usage of the
> > lock is READ.
> >
> > Besides, adjust usage_match() and usage_acculumate() to recursive read
> > lock changes.
> >
> > Signed-off-by: Boqun Feng <[email protected]>
>
> So our daily CI starts to trigger a warning (graph corruption?) below. From the
> call traces, this recent patchset changed a few related things here and there.
> Does it ring any bells?
>
> [14828.805563][T145183] lockdep bfs error:-1

-1 is BFS_EQUEUEFULL, that means we hit the size limitation in lockdep
searching, which is possible since recursive read deadlock detection
tries to make the all edges (dependencies) searched. So maybe we should
switch to DFS instead of BFS, I will look into this, in the meanwhile,
could you try the following to see if it can help on the warnings you
got?

Regards,
Boqun

------->8
index 454355c033d2..8f07bf37ab62 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1365,7 +1365,7 @@ static int add_lock_to_list(struct lock_class *this,
/*
* For good efficiency of modular, we use power of 2
*/
-#define MAX_CIRCULAR_QUEUE_SIZE 4096UL
+#define MAX_CIRCULAR_QUEUE_SIZE 8192UL
#define CQ_MASK (MAX_CIRCULAR_QUEUE_SIZE-1)

/*


> [14828.826015][T145183] WARNING: CPU: 28 PID: 145183 at kernel/locking/lockdep.c:1960 print_bfs_bug+0xfc/0x180
> [14828.871595][T145183] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio loop nls_ascii nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs [last unloaded: dummy_del_mod]
> [14828.994188][T145183] CPU: 28 PID: 145183 Comm: trinity-c28 Tainted: G O 5.9.0-rc5-next-20200915+ #2
> [14829.041983][T145183] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
> [14829.075779][T145183] RIP: 0010:print_bfs_bug+0xfc/0x180
> [14829.099551][T145183] Code: 04 08 00 00 01 48 c7 05 4e 02 75 07 00 00 00 00 c6 05 87 02 75 07 00 45 85 e4 74 10 89 ee 48 c7 c7 e0 71 45 90 e8 78 15 0a 01 <0f> 0b 5b 5d 41 5c c3 e8 a8 74 0d 01 85 c0 74 dd 48 c7 c7 18 9f 59
> [14829.189430][T145183] RSP: 0018:ffffc90023d7ed90 EFLAGS: 00010082
> [14829.217056][T145183] RAX: 0000000000000000 RBX: ffff888ac6238040 RCX: 0000000000000027
> [14829.253274][T145183] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff88881e29fe08
> [14829.289767][T145183] RBP: 00000000ffffffff R08: ffffed1103c53fc2 R09: ffffed1103c53fc2
> [14829.328689][T145183] R10: ffff88881e29fe0b R11: ffffed1103c53fc1 R12: 0000000000000001
> [14829.367921][T145183] R13: 0000000000000000 R14: ffff888ac6238040 R15: ffff888ac62388e8
> [14829.404156][T145183] FS: 00007f850d4a0740(0000) GS:ffff88881e280000(0000) knlGS:0000000000000000
> [14829.444478][T145183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [14829.474221][T145183] CR2: 00007f850c3b00fc CR3: 0000000a3634a001 CR4: 00000000001706e0
> [14829.511287][T145183] DR0: 00007f850ae99000 DR1: 00007f850b399000 DR2: 0000000000000000
> [14829.548612][T145183] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
> [14829.586621][T145183] Call Trace:
> [14829.602266][T145183] check_irq_usage+0x6a1/0xc30
> check_irq_usage at kernel/locking/lockdep.c:2586
> [14829.624092][T145183] ? print_usage_bug+0x1e0/0x1e0
> [14829.646334][T145183] ? mark_lock.part.47+0x109/0x1920
> [14829.670176][T145183] ? print_irq_inversion_bug+0x210/0x210
> [14829.695950][T145183] ? print_usage_bug+0x1e0/0x1e0
> [14829.718164][T145183] ? hlock_conflict+0x54/0x1f0
> [14829.739717][T145183] ? __bfs+0x7d/0x580
> [14829.757562][T145183] ? mark_lock.part.47+0x109/0x1920
> [14829.780950][T145183] ? check_path.constprop.52+0x22/0x40
> [14829.805551][T145183] ? check_noncircular+0x14b/0x320
> [14829.831831][T145183] ? print_circular_bug.isra.42+0x360/0x360
> [14829.860945][T145183] ? mark_lock.part.47+0x109/0x1920
> [14829.884384][T145183] ? print_usage_bug+0x1e0/0x1e0
> [14829.906638][T145183] ? check_prevs_add+0x3a2/0x2720
> [14829.929349][T145183] check_prevs_add+0x3a2/0x2720
> check_prev_add at kernel/locking/lockdep.c:2823
> (inlined by) check_prevs_add at kernel/locking/lockdep.c:2944
> [14829.951604][T145183] ? mark_lock.part.47+0x109/0x1920
> [14829.975179][T145183] ? __thaw_task+0x70/0x70
> [14829.995132][T145183] ? arch_stack_walk+0xa0/0xf0
> [14830.016534][T145183] ? check_irq_usage+0xc30/0xc30
> [14830.039256][T145183] __lock_acquire+0x29e0/0x39c0
> [14830.061128][T145183] ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
> [14830.088154][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
> [14830.113159][T145183] lock_acquire+0x1bc/0x8e0
> [14830.133453][T145183] ? __debug_object_init+0x598/0xf50
> [14830.157250][T145183] ? rcu_read_unlock+0x40/0x40
> [14830.178756][T145183] ? rwlock_bug.part.1+0x90/0x90
> [14830.201096][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
> [14830.226074][T145183] _raw_spin_lock+0x27/0x40
> [14830.245905][T145183] ? __debug_object_init+0x598/0xf50
> [14830.269943][T145183] __debug_object_init+0x598/0xf50
> [14830.293271][T145183] ? lock_downgrade+0x730/0x730
> [14830.315316][T145183] ? mark_held_locks+0xb0/0x110
> [14830.340602][T145183] ? debug_object_fixup+0x30/0x30
> [14830.365561][T145183] ? lockdep_hardirqs_on_prepare+0x32b/0x4d0
> [14830.392895][T145183] ? _raw_spin_unlock_irqrestore+0x34/0x40
> [14830.420261][T145183] debug_object_activate+0x25c/0x4a0
> [14830.444531][T145183] ? __delete_object+0xb3/0x100
> [14830.466404][T145183] ? debug_object_assert_init+0x380/0x380
> [14830.492090][T145183] ? mark_held_locks+0xb0/0x110
> [14830.513920][T145183] ? get_object+0x90/0x90
> [14830.533650][T145183] ? __xfs_trans_commit+0x435/0xf30
> [14830.557084][T145183] call_rcu+0x2c/0x7a0
> [14830.575319][T145183] ? __xfs_trans_commit+0x435/0xf30
> [14830.598880][T145183] slab_free_freelist_hook+0xed/0x1a0
> [14830.623165][T145183] ? __xfs_trans_commit+0x435/0xf30
> [14830.646649][T145183] kmem_cache_free+0xec/0x590
> [14830.667871][T145183] __xfs_trans_commit+0x435/0xf30
> [14830.690410][T145183] ? xfs_trans_free_items+0x360/0x360
> [14830.714802][T145183] ? xfs_trans_ichgtime+0x120/0x120
> [14830.738451][T145183] ? _down_write_nest_lock+0x150/0x150
> [14830.763030][T145183] xfs_vn_update_time+0x345/0x5e0
> [14830.785614][T145183] ? xfs_init_security.isra.12+0x10/0x10
> [14830.811114][T145183] ? __sb_start_write+0x115/0x2d0
> [14830.835603][T145183] touch_atime+0x187/0x1d0
> [14830.858241][T145183] ? atime_needs_update+0x560/0x560
> [14830.883515][T145183] generic_file_buffered_read+0x1064/0x16d0
> [14830.910022][T145183] ? pagecache_get_page+0x940/0x940
> [14830.933510][T145183] ? rcu_read_lock_bh_held+0xb0/0xb0
> [14830.957767][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
> [14830.982837][T145183] ? xfs_file_buffered_aio_read+0x107/0x380
> [14831.009468][T145183] xfs_file_buffered_aio_read+0x112/0x380
> [14831.035969][T145183] ? find_held_lock+0x33/0x1c0
> [14831.057327][T145183] xfs_file_read_iter+0x215/0x490
> [14831.080041][T145183] generic_file_splice_read+0x36b/0x570
> [14831.105078][T145183] ? pipe_to_user+0x150/0x150
> [14831.126097][T145183] ? lockdep_init_map_waits+0x267/0x7c0
> [14831.151114][T145183] ? debug_mutex_init+0x31/0x60
> [14831.172903][T145183] splice_direct_to_actor+0x2cd/0x8c0
> [14831.197133][T145183] ? pipe_to_sendpage+0x410/0x410
> [14831.219301][T145183] ? do_splice_to+0x140/0x140
> [14831.240356][T145183] ? lock_acquire+0x1bc/0x8e0
> [14831.261820][T145183] ? do_sendfile+0x7c4/0xc10
> [14831.282583][T145183] do_splice_direct+0x153/0x250
> [14831.304418][T145183] ? rcu_read_lock_any_held+0xcd/0xf0
> [14831.328727][T145183] ? splice_direct_to_actor+0x8c0/0x8c0
> [14831.356609][T145183] ? __sb_start_write+0x229/0x2d0
> [14831.381737][T145183] do_sendfile+0x397/0xc10
> [14831.402277][T145183] ? do_pwritev+0x140/0x140
> [14831.422139][T145183] ? __task_pid_nr_ns+0x127/0x3a0
> [14831.445366][T145183] ? lock_downgrade+0x730/0x730
> [14831.468357][T145183] __x64_sys_sendfile64+0x188/0x1d0
> [14831.492008][T145183] ? __x64_sys_sendfile+0x1d0/0x1d0
> [14831.515492][T145183] ? lockdep_hardirqs_on_prepare+0x32b/0x4d0
> [14831.542701][T145183] ? syscall_enter_from_user_mode+0x1c/0x50
> [14831.569356][T145183] do_syscall_64+0x33/0x40
> [14831.589219][T145183] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [14831.615913][T145183] RIP: 0033:0x7f850cdb36ed
> [14831.636624][T145183] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
> [14831.730414][T145183] RSP: 002b:00007ffe7f0106a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
> [14831.769912][T145183] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007f850cdb36ed
> [14831.807052][T145183] RDX: 0000000000000000 RSI: 00000000000001a9 RDI: 000000000000014e
> [14831.846148][T145183] RBP: 0000000000000028 R08: 00000000ffffefff R09: 00000000ad1ac000
> [14831.885391][T145183] R10: 00000000000000de R11: 0000000000000246 R12: 0000000000000002
> [14831.925901][T145183] R13: 00007f850d3dc058 R14: 00007f850d4a06c0 R15: 00007f850d3dc000
> [14831.964253][T145183] CPU: 28 PID: 145183 Comm: trinity-c28 Tainted: G O 5.9.0-rc5-next-20200915+ #2
> [14832.013770][T145183] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018
> [14832.049176][T145183] Call Trace:
> [14832.064213][T145183] dump_stack+0x99/0xcb
> [14832.083561][T145183] __warn.cold.13+0xe/0x55
> [14832.104413][T145183] ? print_bfs_bug+0xfc/0x180
> [14832.126583][T145183] report_bug+0x1af/0x260
> [14832.146927][T145183] handle_bug+0x44/0x80
> [14832.166010][T145183] exc_invalid_op+0x13/0x40
> [14832.186752][T145183] asm_exc_invalid_op+0x12/0x20
> [14832.209419][T145183] RIP: 0010:print_bfs_bug+0xfc/0x180
> [14832.233888][T145183] Code: 04 08 00 00 01 48 c7 05 4e 02 75 07 00 00 00 00 c6 05 87 02 75 07 00 45 85 e4 74 10 89 ee 48 c7 c7 e0 71 45 90 e8 78 15 0a 01 <0f> 0b 5b 5d 41 5c c3 e8 a8 74 0d 01 85 c0 74 dd 48 c7 c7 18 9f 59
> [14832.326993][T145183] RSP: 0018:ffffc90023d7ed90 EFLAGS: 00010082
> [14832.356293][T145183] RAX: 0000000000000000 RBX: ffff888ac6238040 RCX: 0000000000000027
> [14832.396354][T145183] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff88881e29fe08
> [14832.435379][T145183] RBP: 00000000ffffffff R08: ffffed1103c53fc2 R09: ffffed1103c53fc2
> [14832.472535][T145183] R10: ffff88881e29fe0b R11: ffffed1103c53fc1 R12: 0000000000000001
> [14832.509784][T145183] R13: 0000000000000000 R14: ffff888ac6238040 R15: ffff888ac62388e8
> [14832.546738][T145183] check_irq_usage+0x6a1/0xc30
> [14832.568603][T145183] ? print_usage_bug+0x1e0/0x1e0
> [14832.591199][T145183] ? mark_lock.part.47+0x109/0x1920
> [14832.616207][T145183] ? print_irq_inversion_bug+0x210/0x210
> [14832.642117][T145183] ? print_usage_bug+0x1e0/0x1e0
> [14832.665399][T145183] ? hlock_conflict+0x54/0x1f0
> [14832.688110][T145183] ? __bfs+0x7d/0x580
> [14832.706997][T145183] ? mark_lock.part.47+0x109/0x1920
> [14832.731832][T145183] ? check_path.constprop.52+0x22/0x40
> [14832.757443][T145183] ? check_noncircular+0x14b/0x320
> [14832.781647][T145183] ? print_circular_bug.isra.42+0x360/0x360
> [14832.809574][T145183] ? mark_lock.part.47+0x109/0x1920
> [14832.833910][T145183] ? print_usage_bug+0x1e0/0x1e0
> [14832.857359][T145183] ? check_prevs_add+0x3a2/0x2720
> [14832.882082][T145183] check_prevs_add+0x3a2/0x2720
> [14832.906347][T145183] ? mark_lock.part.47+0x109/0x1920
> [14832.931692][T145183] ? __thaw_task+0x70/0x70
> [14832.952667][T145183] ? arch_stack_walk+0xa0/0xf0
> [14832.975451][T145183] ? check_irq_usage+0xc30/0xc30
> [14832.998744][T145183] __lock_acquire+0x29e0/0x39c0
> [14833.021580][T145183] ? lockdep_hardirqs_on_prepare+0x4d0/0x4d0
> [14833.050223][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
> [14833.076024][T145183] lock_acquire+0x1bc/0x8e0
> [14833.097001][T145183] ? __debug_object_init+0x598/0xf50
> [14833.121814][T145183] ? rcu_read_unlock+0x40/0x40
> [14833.144434][T145183] ? rwlock_bug.part.1+0x90/0x90
> [14833.167064][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
> [14833.192689][T145183] _raw_spin_lock+0x27/0x40
> [14833.214048][T145183] ? __debug_object_init+0x598/0xf50
> [14833.239071][T145183] __debug_object_init+0x598/0xf50
> [14833.263192][T145183] ? lock_downgrade+0x730/0x730
> [14833.288021][T145183] ? mark_held_locks+0xb0/0x110
> [14833.311062][T145183] ? debug_object_fixup+0x30/0x30
> [14833.334906][T145183] ? lockdep_hardirqs_on_prepare+0x32b/0x4d0
> [14833.363264][T145183] ? _raw_spin_unlock_irqrestore+0x34/0x40
> [14833.391110][T145183] debug_object_activate+0x25c/0x4a0
> [14833.417688][T145183] ? __delete_object+0xb3/0x100
> [14833.441037][T145183] ? debug_object_assert_init+0x380/0x380
> [14833.467770][T145183] ? mark_held_locks+0xb0/0x110
> [14833.490579][T145183] ? get_object+0x90/0x90
> [14833.510655][T145183] ? __xfs_trans_commit+0x435/0xf30
> [14833.535194][T145183] call_rcu+0x2c/0x7a0
> [14833.554375][T145183] ? __xfs_trans_commit+0x435/0xf30
> [14833.578969][T145183] slab_free_freelist_hook+0xed/0x1a0
> [14833.603774][T145183] ? __xfs_trans_commit+0x435/0xf30
> [14833.627632][T145183] kmem_cache_free+0xec/0x590
> [14833.649621][T145183] __xfs_trans_commit+0x435/0xf30
> [14833.672732][T145183] ? xfs_trans_free_items+0x360/0x360
> [14833.698161][T145183] ? xfs_trans_ichgtime+0x120/0x120
> [14833.722806][T145183] ? _down_write_nest_lock+0x150/0x150
> [14833.748799][T145183] xfs_vn_update_time+0x345/0x5e0
> [14833.772690][T145183] ? xfs_init_security.isra.12+0x10/0x10
> [14833.799516][T145183] ? __sb_start_write+0x115/0x2d0
> [14833.823737][T145183] touch_atime+0x187/0x1d0
> [14833.844875][T145183] ? atime_needs_update+0x560/0x560
> [14833.870314][T145183] generic_file_buffered_read+0x1064/0x16d0
> [14833.898953][T145183] ? pagecache_get_page+0x940/0x940
> [14833.923985][T145183] ? rcu_read_lock_bh_held+0xb0/0xb0
> [14833.950166][T145183] ? rcu_read_lock_sched_held+0x9c/0xd0
> [14833.977192][T145183] ? xfs_file_buffered_aio_read+0x107/0x380
> [14834.005370][T145183] xfs_file_buffered_aio_read+0x112/0x380
> [14834.032303][T145183] ? find_held_lock+0x33/0x1c0
> [14834.055381][T145183] xfs_file_read_iter+0x215/0x490
> [14834.078876][T145183] generic_file_splice_read+0x36b/0x570
> [14834.104809][T145183] ? pipe_to_user+0x150/0x150
> [14834.126650][T145183] ? lockdep_init_map_waits+0x267/0x7c0
> [14834.152610][T145183] ? debug_mutex_init+0x31/0x60
> [14834.176068][T145183] splice_direct_to_actor+0x2cd/0x8c0
> [14834.201764][T145183] ? pipe_to_sendpage+0x410/0x410
> [14834.226171][T145183] ? do_splice_to+0x140/0x140
> [14834.248455][T145183] ? lock_acquire+0x1bc/0x8e0
> [14834.270058][T145183] ? do_sendfile+0x7c4/0xc10
> [14834.291651][T145183] do_splice_direct+0x153/0x250
> [14834.314677][T145183] ? rcu_read_lock_any_held+0xcd/0xf0
> [14834.339852][T145183] ? splice_direct_to_actor+0x8c0/0x8c0
> [14834.367457][T145183] ? __sb_start_write+0x229/0x2d0
> [14834.391636][T145183] do_sendfile+0x397/0xc10
> [14834.413480][T145183] ? do_pwritev+0x140/0x140
> [14834.436208][T145183] ? __task_pid_nr_ns+0x127/0x3a0
> [14834.461228][T145183] ? lock_downgrade+0x730/0x730
> [14834.484859][T145183] __x64_sys_sendfile64+0x188/0x1d0
> [14834.510071][T145183] ? __x64_sys_sendfile+0x1d0/0x1d0
> [14834.535398][T145183] ? lockdep_hardirqs_on_prepare+0x32b/0x4d0
> [14834.563792][T145183] ? syscall_enter_from_user_mode+0x1c/0x50
> [14834.592028][T145183] do_syscall_64+0x33/0x40
> [14834.612944][T145183] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [14834.640650][T145183] RIP: 0033:0x7f850cdb36ed
> [14834.661464][T145183] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6b 57 2c 00 f7 d8 64 89 01 48
> [14834.755364][T145183] RSP: 002b:00007ffe7f0106a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
> [14834.795590][T145183] RAX: ffffffffffffffda RBX: 0000000000000028 RCX: 00007f850cdb36ed
> [14834.833840][T145183] RDX: 0000000000000000 RSI: 00000000000001a9 RDI: 000000000000014e
> [14834.871998][T145183] RBP: 0000000000000028 R08: 00000000ffffefff R09: 00000000ad1ac000
> [14834.910057][T145183] R10: 00000000000000de R11: 0000000000000246 R12: 0000000000000002
> [14834.951023][T145183] R13: 00007f850d3dc058 R14: 00007f850d4a06c0 R15: 00007f850d3dc000
> [14834.989980][T145183] irq event stamp: 5176
> [14835.009480][T145183] hardirqs last enabled at (5175): [<ffffffff8ff00654>] _raw_spin_unlock_irqrestore+0x34/0x40
> [14835.058951][T145183] hardirqs last disabled at (5176): [<ffffffff8ff00474>] _raw_spin_lock_irqsave+0x44/0x50
> [14835.105634][T145183] softirqs last enabled at (4510): [<ffffffff9020061b>] __do_softirq+0x61b/0x95d
> [14835.148617][T145183] softirqs last disabled at (4503): [<ffffffff90000ec2>] asm_call_on_stack+0x12/0x20
> [14835.193016][T145183] ---[ end trace c18653e36a41b0d8 ]---
>

2020-09-16 18:05:38

by Boqun Feng

[permalink] [raw]
Subject: Re: [RFC v7 11/19] lockdep: Fix recursive read lock related safe->unsafe detection

On Wed, Sep 16, 2020 at 04:10:46PM +0800, Boqun Feng wrote:
> On Tue, Sep 15, 2020 at 02:32:51PM -0400, Qian Cai wrote:
> > On Fri, 2020-08-07 at 15:42 +0800, Boqun Feng wrote:
> > > Currently, in safe->unsafe detection, lockdep misses the fact that a
> > > LOCK_ENABLED_IRQ_*_READ usage and a LOCK_USED_IN_IRQ_*_READ usage may
> > > cause deadlock too, for example:
> > >
> > > P1 P2
> > > <irq disabled>
> > > write_lock(l1); <irq enabled>
> > > read_lock(l2);
> > > write_lock(l2);
> > > <in irq>
> > > read_lock(l1);
> > >
> > > Actually, all of the following cases may cause deadlocks:
> > >
> > > LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*
> > > LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*
> > > LOCK_USED_IN_IRQ_* -> LOCK_ENABLED_IRQ_*_READ
> > > LOCK_USED_IN_IRQ_*_READ -> LOCK_ENABLED_IRQ_*_READ
> > >
> > > To fix this, we need to 1) change the calculation of exclusive_mask() so
> > > that READ bits are not dropped and 2) always call usage() in
> > > mark_lock_irq() to check usage deadlocks, even when the new usage of the
> > > lock is READ.
> > >
> > > Besides, adjust usage_match() and usage_acculumate() to recursive read
> > > lock changes.
> > >
> > > Signed-off-by: Boqun Feng <[email protected]>
> >
> > So our daily CI starts to trigger a warning (graph corruption?) below. From the
> > call traces, this recent patchset changed a few related things here and there.
> > Does it ring any bells?
> >
> > [14828.805563][T145183] lockdep bfs error:-1
>
> -1 is BFS_EQUEUEFULL, that means we hit the size limitation in lockdep
> searching, which is possible since recursive read deadlock detection
> tries to make the all edges (dependencies) searched. So maybe we should
> switch to DFS instead of BFS, I will look into this, in the meanwhile,
> could you try the following to see if it can help on the warnings you
> got?
>

Found a way to resolve this while still keeping the BFS. Every time when
we want to enqueue a lock_list, we basically enqueue a whole dep list of
entries from the previous lock_list, so we can use a trick here: instead
enqueue all the entries, we only enqueue the first entry and we can
fetch other silbing entries with list_next_or_null_rcu(). Patch as
below, I also took the chance to clear the code up and add more
comments. I could see this number (in /proc/lockdep_stats):

max bfs queue depth: 201

down to (after apply this patch)

max bfs queue depth: 61

with x86_64_defconfig along with lockdep and selftest configs.

Qian, could you give it a try?

Regards,
Boqun

---------------------->8
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 454355c033d2..1cc1302bf319 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1640,35 +1640,22 @@ static enum bfs_result __bfs(struct lock_list *source_entry,
int offset)
{
struct lock_list *entry;
- struct lock_list *lock;
+ struct lock_list *lock = NULL;
struct list_head *head;
struct circular_queue *cq = &lock_cq;
- enum bfs_result ret = BFS_RNOMATCH;

lockdep_assert_locked();

- if (match(source_entry, data)) {
- *target_entry = source_entry;
- ret = BFS_RMATCH;
- goto exit;
- }
-
- head = get_dep_list(source_entry, offset);
- if (list_empty(head))
- goto exit;
-
__cq_init(cq);
__cq_enqueue(cq, source_entry);

- while ((lock = __cq_dequeue(cq))) {
- bool prev_only_xr;
-
- if (!lock->class) {
- ret = BFS_EINVALIDNODE;
- goto exit;
- }
+ while (lock || (lock = __cq_dequeue(cq))) {
+ if (!lock->class)
+ return BFS_EINVALIDNODE;

/*
+ * Step 1: check whether we already finish on this one.
+ *
* If we have visited all the dependencies from this @lock to
* others (iow, if we have visited all lock_list entries in
* @lock->class->locks_{after,before}) we skip, otherwise go
@@ -1676,17 +1663,17 @@ static enum bfs_result __bfs(struct lock_list *source_entry,
* list accessed.
*/
if (lock_accessed(lock))
- continue;
+ goto next;
else
mark_lock_accessed(lock);

- head = get_dep_list(lock, offset);
-
- prev_only_xr = lock->only_xr;
-
- list_for_each_entry_rcu(entry, head, entry) {
- unsigned int cq_depth;
- u8 dep = entry->dep;
+ /*
+ * Step 2: check whether prev dependency and this form a strong
+ * dependency path.
+ */
+ if (lock->parent) { /* Parent exists, check prev dependency */
+ u8 dep = lock->dep;
+ bool prev_only_xr = lock->parent->only_xr;

/*
* Mask out all -(S*)-> if we only have *R in previous
@@ -1698,29 +1685,68 @@ static enum bfs_result __bfs(struct lock_list *source_entry,

/* If nothing left, we skip */
if (!dep)
- continue;
+ goto next;

/* If there are only -(*R)-> left, set that for the next step */
- entry->only_xr = !(dep & (DEP_SN_MASK | DEP_EN_MASK));
+ lock->only_xr = !(dep & (DEP_SN_MASK | DEP_EN_MASK));
+ }

- visit_lock_entry(entry, lock);
- if (match(entry, data)) {
- *target_entry = entry;
- ret = BFS_RMATCH;
- goto exit;
- }
+ /*
+ * Step 3: we haven't visited this and there is a strong
+ * dependency path to this, so check with @match.
+ */
+ if (match(lock, data)) {
+ *target_entry = lock;
+ return BFS_RMATCH;
+ }
+
+ /*
+ * Step 4: if not match, expand the path by adding the
+ * afterwards or backwards dependencis in the search
+ *
+ * Note we only enqueue the first of the list into the queue,
+ * because we can always find a sibling dependency from one
+ * (see label 'next'), as a result the space of queue is saved.
+ */
+ head = get_dep_list(lock, offset);
+ entry = list_first_or_null_rcu(head, struct lock_list, entry);
+ if (entry) {
+ unsigned int cq_depth;
+
+ if (__cq_enqueue(cq, entry))
+ return BFS_EQUEUEFULL;

- if (__cq_enqueue(cq, entry)) {
- ret = BFS_EQUEUEFULL;
- goto exit;
- }
cq_depth = __cq_get_elem_count(cq);
if (max_bfs_queue_depth < cq_depth)
max_bfs_queue_depth = cq_depth;
}
+
+ /*
+ * Update the ->parent, so when @entry is iterated, we know the
+ * previous dependency.
+ */
+ list_for_each_entry_rcu(entry, head, entry)
+ visit_lock_entry(entry, lock);
+next:
+ /*
+ * Step 5: fetch the next dependency to process.
+ *
+ * If there is a previous dependency, we fetch the sibling
+ * dependency in the dep list of previous dependency.
+ *
+ * Otherwise set @lock to NULL to fetch the next entry from
+ * queue.
+ */
+ if (lock->parent) {
+ head = get_dep_list(lock->parent, offset);
+ lock = list_next_or_null_rcu(head, &lock->entry,
+ struct lock_list, entry);
+ } else {
+ lock = NULL;
+ }
}
-exit:
- return ret;
+
+ return BFS_RNOMATCH;
}

static inline enum bfs_result

2020-09-16 22:17:09

by Qian Cai

[permalink] [raw]
Subject: Re: [RFC v7 11/19] lockdep: Fix recursive read lock related safe->unsafe detection

On Thu, 2020-09-17 at 00:14 +0800, Boqun Feng wrote:
> Found a way to resolve this while still keeping the BFS. Every time when
> we want to enqueue a lock_list, we basically enqueue a whole dep list of
> entries from the previous lock_list, so we can use a trick here: instead
> enqueue all the entries, we only enqueue the first entry and we can
> fetch other silbing entries with list_next_or_null_rcu(). Patch as
> below, I also took the chance to clear the code up and add more
> comments. I could see this number (in /proc/lockdep_stats):
>
> max bfs queue depth: 201
>
> down to (after apply this patch)
>
> max bfs queue depth: 61
>
> with x86_64_defconfig along with lockdep and selftest configs.
>
> Qian, could you give it a try?

It works fine as the number went down from around 3000 to 500 on our workloads.

2020-09-17 02:06:23

by Boqun Feng

[permalink] [raw]
Subject: Re: [RFC v7 11/19] lockdep: Fix recursive read lock related safe->unsafe detection

On Wed, Sep 16, 2020 at 05:11:59PM -0400, Qian Cai wrote:
> On Thu, 2020-09-17 at 00:14 +0800, Boqun Feng wrote:
> > Found a way to resolve this while still keeping the BFS. Every time when
> > we want to enqueue a lock_list, we basically enqueue a whole dep list of
> > entries from the previous lock_list, so we can use a trick here: instead
> > enqueue all the entries, we only enqueue the first entry and we can
> > fetch other silbing entries with list_next_or_null_rcu(). Patch as
> > below, I also took the chance to clear the code up and add more
> > comments. I could see this number (in /proc/lockdep_stats):
> >
> > max bfs queue depth: 201
> >
> > down to (after apply this patch)
> >
> > max bfs queue depth: 61
> >
> > with x86_64_defconfig along with lockdep and selftest configs.
> >
> > Qian, could you give it a try?
>
> It works fine as the number went down from around 3000 to 500 on our workloads.
>

Thanks, let me send a proper patch. I will add a Reported-by tag from
you.

Regards,
Boqun