2017-08-07 07:15:20

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 00/14] lockdep: Implement crossrelease feature

Change from v7
- rebase on latest tip/sched/core (Jul 26 2017)
- apply peterz's suggestions
- simplify code of crossrelease_{hist/soft/hard}_{start/end}
- exclude a patch avoiding redundant links
- exclude a patch already applied onto the base

Change from v6
- unwind the ring buffer instead tagging for 'work' context
- introduce hist_id to distinguish every entry of ring buffer
- change the point calling crossrelease_work_start()
- handle cases the ring buffer was overwritten
- change LOCKDEP_CROSSRELEASE config in Kconfig
(select PROVE_LOCKING -> depends on PROVE_LOCKING)
- rename xhlock_used() -> xhlock_valid()
- simplify serveral code (e.g. traversal the ring buffer)
- add/enhance several comments and changelogs

Change from v5
- force XHLOCKS_SIZE to be power of 2 and simplify code
- remove nmi check
- separate an optimization using prev_gen_id with a full changelog
- separate non(multi)-acquisition handling with a full changelog
- replace vmalloc with kmallock(GFP_KERNEL) for xhlocks
- select PROVE_LOCKING when choosing CROSSRELEASE
- clean serveral code (e.g. loose some ifdefferies)
- enhance several comments and changelogs

Change from v4
- rebase on vanilla v4.9 tag
- re-name pend_lock(plock) to hist_lock(xhlock)
- allow overwriting ring buffer for hist_lock
- unwind ring buffer instead of tagging id for each irq
- introduce lockdep_map_cross embedding cross_lock
- make each work of workqueue distinguishable
- enhance comments
(I will update the document at the next spin.)

Change from v3
- reviced document

Change from v2
- rebase on vanilla v4.7 tag
- move lockdep data for page lock from struct page to page_ext
- allocate plocks buffer via vmalloc instead of in struct task
- enhanced comments and document
- optimize performance
- make reporting function crossrelease-aware

Change from v1
- enhanced the document
- removed save_stack_trace() optimizing patch
- made this based on the seperated save_stack_trace patchset
https://www.mail-archive.com/[email protected]/msg1182242.html

Can we detect deadlocks below with original lockdep?

Example 1)

PROCESS X PROCESS Y
-------------- --------------
mutext_lock A
lock_page B
lock_page B
mutext_lock A // DEADLOCK
unlock_page B
mutext_unlock A
mutex_unlock A
unlock_page B

where A and B are different lock classes.

No, we cannot.

Example 2)

PROCESS X PROCESS Y PROCESS Z
-------------- -------------- --------------
mutex_lock A
lock_page B
lock_page B
mutext_lock A // DEADLOCK
mutext_unlock A
unlock_page B
(B was held by PROCESS X)
unlock_page B
mutex_unlock A

where A and B are different lock classes.

No, we cannot.

Example 3)

PROCESS X PROCESS Y
-------------- --------------
mutex_lock A
mutex_lock A
wait_for_complete B // DEADLOCK
mutex_unlock A
complete B
mutex_unlock A

where A is a lock class and B is a completion variable.

No, we cannot.

Not only lock operations, but also any operations causing to wait or
spin for something can cause deadlock unless it's eventually *released*
by someone. The important point here is that the waiting or spinning
must be *released* by someone.

Using crossrelease feature, we can check dependency and detect deadlock
possibility not only for typical lock, but also for lock_page(),
wait_for_xxx() and so on, which might be released in any context.

See the last patch including the document for more information.

Byungchul Park (14):
lockdep: Refactor lookup_chain_cache()
lockdep: Add a function building a chain between two classes
lockdep: Change the meaning of check_prev_add()'s return value
lockdep: Make check_prev_add() able to handle external stack_trace
lockdep: Implement crossrelease feature
lockdep: Detect and handle hist_lock ring buffer overwrite
lockdep: Handle non(or multi)-acquisition of a crosslock
lockdep: Make print_circular_bug() aware of crossrelease
lockdep: Apply crossrelease to completions
pagemap.h: Remove trailing white space
lockdep: Apply crossrelease to PG_locked locks
lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked
lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext
lockdep: Crossrelease feature documentation

Documentation/locking/crossrelease.txt | 874 +++++++++++++++++++++++++++++++++
include/linux/completion.h | 118 ++++-
include/linux/irqflags.h | 24 +-
include/linux/lockdep.h | 150 +++++-
include/linux/mm_types.h | 4 +
include/linux/page-flags.h | 43 +-
include/linux/page_ext.h | 4 +
include/linux/pagemap.h | 125 ++++-
include/linux/sched.h | 11 +
kernel/exit.c | 1 +
kernel/fork.c | 4 +
kernel/locking/lockdep.c | 862 ++++++++++++++++++++++++++++----
kernel/sched/completion.c | 56 ++-
kernel/workqueue.c | 2 +
lib/Kconfig.debug | 29 ++
mm/filemap.c | 73 ++-
mm/page_ext.c | 4 +
17 files changed, 2233 insertions(+), 151 deletions(-)
create mode 100644 Documentation/locking/crossrelease.txt

--
1.9.1


2017-08-07 07:14:16

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 02/14] lockdep: Add a function building a chain between two classes

Crossrelease needs to build a chain between two classes regardless of
their contexts. However, add_chain_cache() cannot be used for that
purpose since it assumes that it's called in the acquisition context
of the hlock. So this patch introduces a new function doing it.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/locking/lockdep.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 9260b40..9d16723 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2126,6 +2126,76 @@ static int check_no_collision(struct task_struct *curr,
}

/*
+ * This is for building a chain between just two different classes,
+ * instead of adding a new hlock upon current, which is done by
+ * add_chain_cache().
+ *
+ * This can be called in any context with two classes, while
+ * add_chain_cache() must be done within the lock owener's context
+ * since it uses hlock which might be racy in another context.
+ */
+static inline int add_chain_cache_classes(unsigned int prev,
+ unsigned int next,
+ unsigned int irq_context,
+ u64 chain_key)
+{
+ struct hlist_head *hash_head = chainhashentry(chain_key);
+ struct lock_chain *chain;
+
+ /*
+ * Allocate a new chain entry from the static array, and add
+ * it to the hash:
+ */
+
+ /*
+ * We might need to take the graph lock, ensure we've got IRQs
+ * disabled to make this an IRQ-safe lock.. for recursion reasons
+ * lockdep won't complain about its own locking errors.
+ */
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return 0;
+
+ if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
+ dump_stack();
+ return 0;
+ }
+
+ chain = lock_chains + nr_lock_chains++;
+ chain->chain_key = chain_key;
+ chain->irq_context = irq_context;
+ chain->depth = 2;
+ if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
+ chain->base = nr_chain_hlocks;
+ nr_chain_hlocks += chain->depth;
+ chain_hlocks[chain->base] = prev - 1;
+ chain_hlocks[chain->base + 1] = next -1;
+ }
+#ifdef CONFIG_DEBUG_LOCKDEP
+ /*
+ * Important for check_no_collision().
+ */
+ else {
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
+ dump_stack();
+ return 0;
+ }
+#endif
+
+ hlist_add_head_rcu(&chain->entry, hash_head);
+ debug_atomic_inc(chain_lookup_misses);
+ inc_chains();
+
+ return 1;
+}
+
+/*
* Adds a dependency chain into chain hashtable. And must be called with
* graph_lock held.
*
--
1.9.1

2017-08-07 07:14:26

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 14/14] lockdep: Crossrelease feature documentation

This document describes the concept of crossrelease feature.

Signed-off-by: Byungchul Park <[email protected]>
---
Documentation/locking/crossrelease.txt | 874 +++++++++++++++++++++++++++++++++
1 file changed, 874 insertions(+)
create mode 100644 Documentation/locking/crossrelease.txt

diff --git a/Documentation/locking/crossrelease.txt b/Documentation/locking/crossrelease.txt
new file mode 100644
index 0000000..bdf1423
--- /dev/null
+++ b/Documentation/locking/crossrelease.txt
@@ -0,0 +1,874 @@
+Crossrelease
+============
+
+Started by Byungchul Park <[email protected]>
+
+Contents:
+
+ (*) Background
+
+ - What causes deadlock
+ - How lockdep works
+
+ (*) Limitation
+
+ - Limit lockdep
+ - Pros from the limitation
+ - Cons from the limitation
+ - Relax the limitation
+
+ (*) Crossrelease
+
+ - Introduce crossrelease
+ - Introduce commit
+
+ (*) Implementation
+
+ - Data structures
+ - How crossrelease works
+
+ (*) Optimizations
+
+ - Avoid duplication
+ - Lockless for hot paths
+
+ (*) APPENDIX A: What lockdep does to work aggresively
+
+ (*) APPENDIX B: How to avoid adding false dependencies
+
+
+==========
+Background
+==========
+
+What causes deadlock
+--------------------
+
+A deadlock occurs when a context is waiting for an event to happen,
+which is impossible because another (or the) context who can trigger the
+event is also waiting for another (or the) event to happen, which is
+also impossible due to the same reason.
+
+For example:
+
+ A context going to trigger event C is waiting for event A to happen.
+ A context going to trigger event A is waiting for event B to happen.
+ A context going to trigger event B is waiting for event C to happen.
+
+A deadlock occurs when these three wait operations run at the same time,
+because event C cannot be triggered if event A does not happen, which in
+turn cannot be triggered if event B does not happen, which in turn
+cannot be triggered if event C does not happen. After all, no event can
+be triggered since any of them never meets its condition to wake up.
+
+A dependency might exist between two waiters and a deadlock might happen
+due to an incorrect releationship between dependencies. Thus, we must
+define what a dependency is first. A dependency exists between them if:
+
+ 1. There are two waiters waiting for each event at a given time.
+ 2. The only way to wake up each waiter is to trigger its event.
+ 3. Whether one can be woken up depends on whether the other can.
+
+Each wait in the example creates its dependency like:
+
+ Event C depends on event A.
+ Event A depends on event B.
+ Event B depends on event C.
+
+ NOTE: Precisely speaking, a dependency is one between whether a
+ waiter for an event can be woken up and whether another waiter for
+ another event can be woken up. However from now on, we will describe
+ a dependency as if it's one between an event and another event for
+ simplicity.
+
+And they form circular dependencies like:
+
+ -> C -> A -> B -
+ / \
+ \ /
+ ----------------
+
+ where 'A -> B' means that event A depends on event B.
+
+Such circular dependencies lead to a deadlock since no waiter can meet
+its condition to wake up as described.
+
+CONCLUSION
+
+Circular dependencies cause a deadlock.
+
+
+How lockdep works
+-----------------
+
+Lockdep tries to detect a deadlock by checking dependencies created by
+lock operations, acquire and release. Waiting for a lock corresponds to
+waiting for an event, and releasing a lock corresponds to triggering an
+event in the previous section.
+
+In short, lockdep does:
+
+ 1. Detect a new dependency.
+ 2. Add the dependency into a global graph.
+ 3. Check if that makes dependencies circular.
+ 4. Report a deadlock or its possibility if so.
+
+For example, consider a graph built by lockdep that looks like:
+
+ A -> B -
+ \
+ -> E
+ /
+ C -> D -
+
+ where A, B,..., E are different lock classes.
+
+Lockdep will add a dependency into the graph on detection of a new
+dependency. For example, it will add a dependency 'E -> C' when a new
+dependency between lock E and lock C is detected. Then the graph will be:
+
+ A -> B -
+ \
+ -> E -
+ / \
+ -> C -> D - \
+ / /
+ \ /
+ ------------------
+
+ where A, B,..., E are different lock classes.
+
+This graph contains a subgraph which demonstrates circular dependencies:
+
+ -> E -
+ / \
+ -> C -> D - \
+ / /
+ \ /
+ ------------------
+
+ where C, D and E are different lock classes.
+
+This is the condition under which a deadlock might occur. Lockdep
+reports it on detection after adding a new dependency. This is the way
+how lockdep works.
+
+CONCLUSION
+
+Lockdep detects a deadlock or its possibility by checking if circular
+dependencies were created after adding each new dependency.
+
+
+==========
+Limitation
+==========
+
+Limit lockdep
+-------------
+
+Limiting lockdep to work on only typical locks e.g. spin locks and
+mutexes, which are released within the acquire context, the
+implementation becomes simple but its capacity for detection becomes
+limited. Let's check pros and cons in next section.
+
+
+Pros from the limitation
+------------------------
+
+Given the limitation, when acquiring a lock, locks in a held_locks
+cannot be released if the context cannot acquire it so has to wait to
+acquire it, which means all waiters for the locks in the held_locks are
+stuck. It's an exact case to create dependencies between each lock in
+the held_locks and the lock to acquire.
+
+For example:
+
+ CONTEXT X
+ ---------
+ acquire A
+ acquire B /* Add a dependency 'A -> B' */
+ release B
+ release A
+
+ where A and B are different lock classes.
+
+When acquiring lock A, the held_locks of CONTEXT X is empty thus no
+dependency is added. But when acquiring lock B, lockdep detects and adds
+a new dependency 'A -> B' between lock A in the held_locks and lock B.
+They can be simply added whenever acquiring each lock.
+
+And data required by lockdep exists in a local structure, held_locks
+embedded in task_struct. Forcing to access the data within the context,
+lockdep can avoid racy problems without explicit locks while handling
+the local data.
+
+Lastly, lockdep only needs to keep locks currently being held, to build
+a dependency graph. However, relaxing the limitation, it needs to keep
+even locks already released, because a decision whether they created
+dependencies might be long-deferred.
+
+To sum up, we can expect several advantages from the limitation:
+
+ 1. Lockdep can easily identify a dependency when acquiring a lock.
+ 2. Races are avoidable while accessing local locks in a held_locks.
+ 3. Lockdep only needs to keep locks currently being held.
+
+CONCLUSION
+
+Given the limitation, the implementation becomes simple and efficient.
+
+
+Cons from the limitation
+------------------------
+
+Given the limitation, lockdep is applicable only to typical locks. For
+example, page locks for page access or completions for synchronization
+cannot work with lockdep.
+
+Can we detect deadlocks below, under the limitation?
+
+Example 1:
+
+ CONTEXT X CONTEXT Y CONTEXT Z
+ --------- --------- ----------
+ mutex_lock A
+ lock_page B
+ lock_page B
+ mutex_lock A /* DEADLOCK */
+ unlock_page B held by X
+ unlock_page B
+ mutex_unlock A
+ mutex_unlock A
+
+ where A and B are different lock classes.
+
+No, we cannot.
+
+Example 2:
+
+ CONTEXT X CONTEXT Y
+ --------- ---------
+ mutex_lock A
+ mutex_lock A
+ wait_for_complete B /* DEADLOCK */
+ complete B
+ mutex_unlock A
+ mutex_unlock A
+
+ where A is a lock class and B is a completion variable.
+
+No, we cannot.
+
+CONCLUSION
+
+Given the limitation, lockdep cannot detect a deadlock or its
+possibility caused by page locks or completions.
+
+
+Relax the limitation
+--------------------
+
+Under the limitation, things to create dependencies are limited to
+typical locks. However, synchronization primitives like page locks and
+completions, which are allowed to be released in any context, also
+create dependencies and can cause a deadlock. So lockdep should track
+these locks to do a better job. We have to relax the limitation for
+these locks to work with lockdep.
+
+Detecting dependencies is very important for lockdep to work because
+adding a dependency means adding an opportunity to check whether it
+causes a deadlock. The more lockdep adds dependencies, the more it
+thoroughly works. Thus Lockdep has to do its best to detect and add as
+many true dependencies into a graph as possible.
+
+For example, considering only typical locks, lockdep builds a graph like:
+
+ A -> B -
+ \
+ -> E
+ /
+ C -> D -
+
+ where A, B,..., E are different lock classes.
+
+On the other hand, under the relaxation, additional dependencies might
+be created and added. Assuming additional 'FX -> C' and 'E -> GX' are
+added thanks to the relaxation, the graph will be:
+
+ A -> B -
+ \
+ -> E -> GX
+ /
+ FX -> C -> D -
+
+ where A, B,..., E, FX and GX are different lock classes, and a suffix
+ 'X' is added on non-typical locks.
+
+The latter graph gives us more chances to check circular dependencies
+than the former. However, it might suffer performance degradation since
+relaxing the limitation, with which design and implementation of lockdep
+can be efficient, might introduce inefficiency inevitably. So lockdep
+should provide two options, strong detection and efficient detection.
+
+Choosing efficient detection:
+
+ Lockdep works with only locks restricted to be released within the
+ acquire context. However, lockdep works efficiently.
+
+Choosing strong detection:
+
+ Lockdep works with all synchronization primitives. However, lockdep
+ suffers performance degradation.
+
+CONCLUSION
+
+Relaxing the limitation, lockdep can add additional dependencies giving
+additional opportunities to check circular dependencies.
+
+
+============
+Crossrelease
+============
+
+Introduce crossrelease
+----------------------
+
+In order to allow lockdep to handle additional dependencies by what
+might be released in any context, namely 'crosslock', we have to be able
+to identify those created by crosslocks. The proposed 'crossrelease'
+feature provoides a way to do that.
+
+Crossrelease feature has to do:
+
+ 1. Identify dependencies created by crosslocks.
+ 2. Add the dependencies into a dependency graph.
+
+That's all. Once a meaningful dependency is added into graph, then
+lockdep would work with the graph as it did. The most important thing
+crossrelease feature has to do is to correctly identify and add true
+dependencies into the global graph.
+
+A dependency e.g. 'A -> B' can be identified only in the A's release
+context because a decision required to identify the dependency can be
+made only in the release context. That is to decide whether A can be
+released so that a waiter for A can be woken up. It cannot be made in
+other than the A's release context.
+
+It's no matter for typical locks because each acquire context is same as
+its release context, thus lockdep can decide whether a lock can be
+released in the acquire context. However for crosslocks, lockdep cannot
+make the decision in the acquire context but has to wait until the
+release context is identified.
+
+Therefore, deadlocks by crosslocks cannot be detected just when it
+happens, because those cannot be identified until the crosslocks are
+released. However, deadlock possibilities can be detected and it's very
+worth. See 'APPENDIX A' section to check why.
+
+CONCLUSION
+
+Using crossrelease feature, lockdep can work with what might be released
+in any context, namely crosslock.
+
+
+Introduce commit
+----------------
+
+Since crossrelease defers the work adding true dependencies of
+crosslocks until they are actually released, crossrelease has to queue
+all acquisitions which might create dependencies with the crosslocks.
+Then it identifies dependencies using the queued data in batches at a
+proper time. We call it 'commit'.
+
+There are four types of dependencies:
+
+1. TT type: 'typical lock A -> typical lock B'
+
+ Just when acquiring B, lockdep can see it's in the A's release
+ context. So the dependency between A and B can be identified
+ immediately. Commit is unnecessary.
+
+2. TC type: 'typical lock A -> crosslock BX'
+
+ Just when acquiring BX, lockdep can see it's in the A's release
+ context. So the dependency between A and BX can be identified
+ immediately. Commit is unnecessary, too.
+
+3. CT type: 'crosslock AX -> typical lock B'
+
+ When acquiring B, lockdep cannot identify the dependency because
+ there's no way to know if it's in the AX's release context. It has
+ to wait until the decision can be made. Commit is necessary.
+
+4. CC type: 'crosslock AX -> crosslock BX'
+
+ When acquiring BX, lockdep cannot identify the dependency because
+ there's no way to know if it's in the AX's release context. It has
+ to wait until the decision can be made. Commit is necessary.
+ But, handling CC type is not implemented yet. It's a future work.
+
+Lockdep can work without commit for typical locks, but commit step is
+necessary once crosslocks are involved. Introducing commit, lockdep
+performs three steps. What lockdep does in each step is:
+
+1. Acquisition: For typical locks, lockdep does what it originally did
+ and queues the lock so that CT type dependencies can be checked using
+ it at the commit step. For crosslocks, it saves data which will be
+ used at the commit step and increases a reference count for it.
+
+2. Commit: No action is reauired for typical locks. For crosslocks,
+ lockdep adds CT type dependencies using the data saved at the
+ acquisition step.
+
+3. Release: No changes are required for typical locks. When a crosslock
+ is released, it decreases a reference count for it.
+
+CONCLUSION
+
+Crossrelease introduces commit step to handle dependencies of crosslocks
+in batches at a proper time.
+
+
+==============
+Implementation
+==============
+
+Data structures
+---------------
+
+Crossrelease introduces two main data structures.
+
+1. hist_lock
+
+ This is an array embedded in task_struct, for keeping lock history so
+ that dependencies can be added using them at the commit step. Since
+ it's local data, it can be accessed locklessly in the owner context.
+ The array is filled at the acquisition step and consumed at the
+ commit step. And it's managed in circular manner.
+
+2. cross_lock
+
+ One per lockdep_map exists. This is for keeping data of crosslocks
+ and used at the commit step.
+
+
+How crossrelease works
+----------------------
+
+It's the key of how crossrelease works, to defer necessary works to an
+appropriate point in time and perform in at once at the commit step.
+Let's take a look with examples step by step, starting from how lockdep
+works without crossrelease for typical locks.
+
+ acquire A /* Push A onto held_locks */
+ acquire B /* Push B onto held_locks and add 'A -> B' */
+ acquire C /* Push C onto held_locks and add 'B -> C' */
+ release C /* Pop C from held_locks */
+ release B /* Pop B from held_locks */
+ release A /* Pop A from held_locks */
+
+ where A, B and C are different lock classes.
+
+ NOTE: This document assumes that readers already understand how
+ lockdep works without crossrelease thus omits details. But there's
+ one thing to note. Lockdep pretends to pop a lock from held_locks
+ when releasing it. But it's subtly different from the original pop
+ operation because lockdep allows other than the top to be poped.
+
+In this case, lockdep adds 'the top of held_locks -> the lock to acquire'
+dependency every time acquiring a lock.
+
+After adding 'A -> B', a dependency graph will be:
+
+ A -> B
+
+ where A and B are different lock classes.
+
+And after adding 'B -> C', the graph will be:
+
+ A -> B -> C
+
+ where A, B and C are different lock classes.
+
+Let's performs commit step even for typical locks to add dependencies.
+Of course, commit step is not necessary for them, however, it would work
+well because this is a more general way.
+
+ acquire A
+ /*
+ * Queue A into hist_locks
+ *
+ * In hist_locks: A
+ * In graph: Empty
+ */
+
+ acquire B
+ /*
+ * Queue B into hist_locks
+ *
+ * In hist_locks: A, B
+ * In graph: Empty
+ */
+
+ acquire C
+ /*
+ * Queue C into hist_locks
+ *
+ * In hist_locks: A, B, C
+ * In graph: Empty
+ */
+
+ commit C
+ /*
+ * Add 'C -> ?'
+ * Answer the following to decide '?'
+ * What has been queued since acquire C: Nothing
+ *
+ * In hist_locks: A, B, C
+ * In graph: Empty
+ */
+
+ release C
+
+ commit B
+ /*
+ * Add 'B -> ?'
+ * Answer the following to decide '?'
+ * What has been queued since acquire B: C
+ *
+ * In hist_locks: A, B, C
+ * In graph: 'B -> C'
+ */
+
+ release B
+
+ commit A
+ /*
+ * Add 'A -> ?'
+ * Answer the following to decide '?'
+ * What has been queued since acquire A: B, C
+ *
+ * In hist_locks: A, B, C
+ * In graph: 'B -> C', 'A -> B', 'A -> C'
+ */
+
+ release A
+
+ where A, B and C are different lock classes.
+
+In this case, dependencies are added at the commit step as described.
+
+After commits for A, B and C, the graph will be:
+
+ A -> B -> C
+
+ where A, B and C are different lock classes.
+
+ NOTE: A dependency 'A -> C' is optimized out.
+
+We can see the former graph built without commit step is same as the
+latter graph built using commit steps. Of course the former way leads to
+earlier finish for building the graph, which means we can detect a
+deadlock or its possibility sooner. So the former way would be prefered
+when possible. But we cannot avoid using the latter way for crosslocks.
+
+Let's look at how commit steps work for crosslocks. In this case, the
+commit step is performed only on crosslock AX as real. And it assumes
+that the AX release context is different from the AX acquire context.
+
+ BX RELEASE CONTEXT BX ACQUIRE CONTEXT
+ ------------------ ------------------
+ acquire A
+ /*
+ * Push A onto held_locks
+ * Queue A into hist_locks
+ *
+ * In held_locks: A
+ * In hist_locks: A
+ * In graph: Empty
+ */
+
+ acquire BX
+ /*
+ * Add 'the top of held_locks -> BX'
+ *
+ * In held_locks: A
+ * In hist_locks: A
+ * In graph: 'A -> BX'
+ */
+
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ It must be guaranteed that the following operations are seen after
+ acquiring BX globally. It can be done by things like barrier.
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ acquire C
+ /*
+ * Push C onto held_locks
+ * Queue C into hist_locks
+ *
+ * In held_locks: C
+ * In hist_locks: C
+ * In graph: 'A -> BX'
+ */
+
+ release C
+ /*
+ * Pop C from held_locks
+ *
+ * In held_locks: Empty
+ * In hist_locks: C
+ * In graph: 'A -> BX'
+ */
+ acquire D
+ /*
+ * Push D onto held_locks
+ * Queue D into hist_locks
+ * Add 'the top of held_locks -> D'
+ *
+ * In held_locks: A, D
+ * In hist_locks: A, D
+ * In graph: 'A -> BX', 'A -> D'
+ */
+ acquire E
+ /*
+ * Push E onto held_locks
+ * Queue E into hist_locks
+ *
+ * In held_locks: E
+ * In hist_locks: C, E
+ * In graph: 'A -> BX', 'A -> D'
+ */
+
+ release E
+ /*
+ * Pop E from held_locks
+ *
+ * In held_locks: Empty
+ * In hist_locks: D, E
+ * In graph: 'A -> BX', 'A -> D'
+ */
+ release D
+ /*
+ * Pop D from held_locks
+ *
+ * In held_locks: A
+ * In hist_locks: A, D
+ * In graph: 'A -> BX', 'A -> D'
+ */
+ commit BX
+ /*
+ * Add 'BX -> ?'
+ * What has been queued since acquire BX: C, E
+ *
+ * In held_locks: Empty
+ * In hist_locks: D, E
+ * In graph: 'A -> BX', 'A -> D',
+ * 'BX -> C', 'BX -> E'
+ */
+
+ release BX
+ /*
+ * In held_locks: Empty
+ * In hist_locks: D, E
+ * In graph: 'A -> BX', 'A -> D',
+ * 'BX -> C', 'BX -> E'
+ */
+ release A
+ /*
+ * Pop A from held_locks
+ *
+ * In held_locks: Empty
+ * In hist_locks: A, D
+ * In graph: 'A -> BX', 'A -> D',
+ * 'BX -> C', 'BX -> E'
+ */
+
+ where A, BX, C,..., E are different lock classes, and a suffix 'X' is
+ added on crosslocks.
+
+Crossrelease considers all acquisitions after acqiuring BX are
+candidates which might create dependencies with BX. True dependencies
+will be determined when identifying the release context of BX. Meanwhile,
+all typical locks are queued so that they can be used at the commit step.
+And then two dependencies 'BX -> C' and 'BX -> E' are added at the
+commit step when identifying the release context.
+
+The final graph will be, with crossrelease:
+
+ -> C
+ /
+ -> BX -
+ / \
+ A - -> E
+ \
+ -> D
+
+ where A, BX, C,..., E are different lock classes, and a suffix 'X' is
+ added on crosslocks.
+
+However, the final graph will be, without crossrelease:
+
+ A -> D
+
+ where A and D are different lock classes.
+
+The former graph has three more dependencies, 'A -> BX', 'BX -> C' and
+'BX -> E' giving additional opportunities to check if they cause
+deadlocks. This way lockdep can detect a deadlock or its possibility
+caused by crosslocks.
+
+CONCLUSION
+
+We checked how crossrelease works with several examples.
+
+
+=============
+Optimizations
+=============
+
+Avoid duplication
+-----------------
+
+Crossrelease feature uses a cache like what lockdep already uses for
+dependency chains, but this time it's for caching CT type dependencies.
+Once that dependency is cached, the same will never be added again.
+
+
+Lockless for hot paths
+----------------------
+
+To keep all locks for later use at the commit step, crossrelease adopts
+a local array embedded in task_struct, which makes access to the data
+lockless by forcing it to happen only within the owner context. It's
+like how lockdep handles held_locks. Lockless implmentation is important
+since typical locks are very frequently acquired and released.
+
+
+=================================================
+APPENDIX A: What lockdep does to work aggresively
+=================================================
+
+A deadlock actually occurs when all wait operations creating circular
+dependencies run at the same time. Even though they don't, a potential
+deadlock exists if the problematic dependencies exist. Thus it's
+meaningful to detect not only an actual deadlock but also its potential
+possibility. The latter is rather valuable. When a deadlock occurs
+actually, we can identify what happens in the system by some means or
+other even without lockdep. However, there's no way to detect possiblity
+without lockdep unless the whole code is parsed in head. It's terrible.
+Lockdep does the both, and crossrelease only focuses on the latter.
+
+Whether or not a deadlock actually occurs depends on several factors.
+For example, what order contexts are switched in is a factor. Assuming
+circular dependencies exist, a deadlock would occur when contexts are
+switched so that all wait operations creating the dependencies run
+simultaneously. Thus to detect a deadlock possibility even in the case
+that it has not occured yet, lockdep should consider all possible
+combinations of dependencies, trying to:
+
+1. Use a global dependency graph.
+
+ Lockdep combines all dependencies into one global graph and uses them,
+ regardless of which context generates them or what order contexts are
+ switched in. Aggregated dependencies are only considered so they are
+ prone to be circular if a problem exists.
+
+2. Check dependencies between classes instead of instances.
+
+ What actually causes a deadlock are instances of lock. However,
+ lockdep checks dependencies between classes instead of instances.
+ This way lockdep can detect a deadlock which has not happened but
+ might happen in future by others but the same class.
+
+3. Assume all acquisitions lead to waiting.
+
+ Although locks might be acquired without waiting which is essential
+ to create dependencies, lockdep assumes all acquisitions lead to
+ waiting since it might be true some time or another.
+
+CONCLUSION
+
+Lockdep detects not only an actual deadlock but also its possibility,
+and the latter is more valuable.
+
+
+==================================================
+APPENDIX B: How to avoid adding false dependencies
+==================================================
+
+Remind what a dependency is. A dependency exists if:
+
+ 1. There are two waiters waiting for each event at a given time.
+ 2. The only way to wake up each waiter is to trigger its event.
+ 3. Whether one can be woken up depends on whether the other can.
+
+For example:
+
+ acquire A
+ acquire B /* A dependency 'A -> B' exists */
+ release B
+ release A
+
+ where A and B are different lock classes.
+
+A depedency 'A -> B' exists since:
+
+ 1. A waiter for A and a waiter for B might exist when acquiring B.
+ 2. Only way to wake up each is to release what it waits for.
+ 3. Whether the waiter for A can be woken up depends on whether the
+ other can. IOW, TASK X cannot release A if it fails to acquire B.
+
+For another example:
+
+ TASK X TASK Y
+ ------ ------
+ acquire AX
+ acquire B /* A dependency 'AX -> B' exists */
+ release B
+ release AX held by Y
+
+ where AX and B are different lock classes, and a suffix 'X' is added
+ on crosslocks.
+
+Even in this case involving crosslocks, the same rule can be applied. A
+depedency 'AX -> B' exists since:
+
+ 1. A waiter for AX and a waiter for B might exist when acquiring B.
+ 2. Only way to wake up each is to release what it waits for.
+ 3. Whether the waiter for AX can be woken up depends on whether the
+ other can. IOW, TASK X cannot release AX if it fails to acquire B.
+
+Let's take a look at more complicated example:
+
+ TASK X TASK Y
+ ------ ------
+ acquire B
+ release B
+ fork Y
+ acquire AX
+ acquire C /* A dependency 'AX -> C' exists */
+ release C
+ release AX held by Y
+
+ where AX, B and C are different lock classes, and a suffix 'X' is
+ added on crosslocks.
+
+Does a dependency 'AX -> B' exist? Nope.
+
+Two waiters are essential to create a dependency. However, waiters for
+AX and B to create 'AX -> B' cannot exist at the same time in this
+example. Thus the dependency 'AX -> B' cannot be created.
+
+It would be ideal if the full set of true ones can be considered. But
+we can ensure nothing but what actually happened. Relying on what
+actually happens at runtime, we can anyway add only true ones, though
+they might be a subset of true ones. It's similar to how lockdep works
+for typical locks. There might be more true dependencies than what
+lockdep has detected in runtime. Lockdep has no choice but to rely on
+what actually happens. Crossrelease also relies on it.
+
+CONCLUSION
+
+Relying on what actually happens, lockdep can avoid adding false
+dependencies.
--
1.9.1

2017-08-07 07:14:14

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 08/14] lockdep: Make print_circular_bug() aware of crossrelease

print_circular_bug() reporting circular bug assumes that target hlock is
owned by the current. However, in crossrelease, target hlock can be
owned by other than the current. So the report format needs to be
changed to reflect the change.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/locking/lockdep.c | 67 +++++++++++++++++++++++++++++++++---------------
1 file changed, 47 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 4eae7dc..a2b6aaa 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1141,22 +1141,41 @@ static inline int __bfs_backwards(struct lock_list *src_entry,
printk(KERN_CONT "\n\n");
}

- printk(" Possible unsafe locking scenario:\n\n");
- printk(" CPU0 CPU1\n");
- printk(" ---- ----\n");
- printk(" lock(");
- __print_lock_name(target);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(parent);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(target);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(source);
- printk(KERN_CONT ");\n");
- printk("\n *** DEADLOCK ***\n\n");
+ if (cross_lock(tgt->instance)) {
+ printk(" Possible unsafe locking scenario by crosslock:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(parent);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(source);
+ printk(KERN_CONT ");\n");
+ printk(" unlock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+ } else {
+ printk(" Possible unsafe locking scenario:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(parent);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(source);
+ printk(KERN_CONT ");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+ }
}

/*
@@ -1181,7 +1200,12 @@ static inline int __bfs_backwards(struct lock_list *src_entry,
pr_warn("%s/%d is trying to acquire lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(check_src);
- pr_warn("\nbut task is already holding lock:\n");
+
+ if (cross_lock(check_tgt->instance))
+ pr_warn("\nbut now in release context of a crosslock acquired at the following:\n");
+ else
+ pr_warn("\nbut task is already holding lock:\n");
+
print_lock(check_tgt);
pr_warn("\nwhich lock already depends on the new lock.\n\n");
pr_warn("\nthe existing dependency chain (in reverse order) is:\n");
@@ -1199,7 +1223,8 @@ static inline int class_equal(struct lock_list *entry, void *data)
static noinline int print_circular_bug(struct lock_list *this,
struct lock_list *target,
struct held_lock *check_src,
- struct held_lock *check_tgt)
+ struct held_lock *check_tgt,
+ struct stack_trace *trace)
{
struct task_struct *curr = current;
struct lock_list *parent;
@@ -1209,7 +1234,9 @@ static noinline int print_circular_bug(struct lock_list *this,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;

- if (!save_trace(&this->trace))
+ if (cross_lock(check_tgt->instance))
+ this->trace = *trace;
+ else if (!save_trace(&this->trace))
return 0;

depth = get_lock_depth(target);
@@ -1853,7 +1880,7 @@ static inline void inc_chains(void)
this.parent = NULL;
ret = check_noncircular(&this, hlock_class(prev), &target_entry);
if (unlikely(!ret))
- return print_circular_bug(&this, target_entry, next, prev);
+ return print_circular_bug(&this, target_entry, next, prev, trace);
else if (unlikely(ret < 0))
return print_bfs_bug(ret);

--
1.9.1

2017-08-07 07:14:45

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 13/14] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext

CONFIG_LOCKDEP_PAGELOCK needs to keep lockdep_map_cross per page. Since
it's a debug feature, it's preferred to keep it in struct page_ext than
struct page. Move it to struct page_ext.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/mm_types.h | 4 ---
include/linux/page-flags.h | 19 +++++++++++--
include/linux/page_ext.h | 4 +++
include/linux/pagemap.h | 28 ++++++++++++++++---
lib/Kconfig.debug | 1 +
mm/filemap.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 3 --
mm/page_ext.c | 4 +++
8 files changed, 118 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f1e3dba..ac3121c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -220,10 +220,6 @@ struct page {
#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
int _last_cpupid;
#endif
-
-#ifdef CONFIG_LOCKDEP_PAGELOCK
- struct lockdep_map_cross map;
-#endif
}
/*
* The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index b793342..879dd0d 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -374,28 +374,41 @@ static __always_inline int PageSwapCache(struct page *page)

#ifdef CONFIG_LOCKDEP_PAGELOCK
#include <linux/lockdep.h>
+#include <linux/page_ext.h>

TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)

static __always_inline void __SetPageLocked(struct page *page)
{
+ struct page_ext *e;
+
__set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);

page = compound_head(page);
- lock_acquire_exclusive((struct lockdep_map *)&page->map, 0, 1, NULL, _RET_IP_);
+ e = lookup_page_ext(page);
+ if (unlikely(!e))
+ return;
+
+ lock_acquire_exclusive((struct lockdep_map *)&e->map, 0, 1, NULL, _RET_IP_);
}

static __always_inline void __ClearPageLocked(struct page *page)
{
+ struct page_ext *e;
+
__clear_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);

page = compound_head(page);
+ e = lookup_page_ext(page);
+ if (unlikely(!e))
+ return;
+
/*
* lock_commit_crosslock() is necessary for crosslock
* when the lock is released, before lock_release().
*/
- lock_commit_crosslock((struct lockdep_map *)&page->map);
- lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+ lock_commit_crosslock((struct lockdep_map *)&e->map);
+ lock_release((struct lockdep_map *)&e->map, 0, _RET_IP_);
}
#else
__PAGEFLAG(Locked, locked, PF_NO_TAIL)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index 9298c39..d1c52c8c 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -44,6 +44,10 @@ enum page_ext_flags {
*/
struct page_ext {
unsigned long flags;
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+ struct lockdep_map_cross map;
+#endif
};

extern void pgdat_page_ext_init(struct pglist_data *pgdat);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9f448c6..b75b8bc 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -16,6 +16,7 @@
#include <linux/hugetlb_inline.h>
#ifdef CONFIG_LOCKDEP_PAGELOCK
#include <linux/lockdep.h>
+#include <linux/page_ext.h>
#endif

/*
@@ -454,28 +455,47 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
}

#ifdef CONFIG_LOCKDEP_PAGELOCK
+extern struct page_ext_operations lockdep_pagelock_ops;
+
#define lock_page_init(p) \
do { \
static struct lock_class_key __key; \
- lockdep_init_map_crosslock((struct lockdep_map *)&(p)->map, \
+ struct page_ext *e = lookup_page_ext(p); \
+ \
+ if (unlikely(!e)) \
+ break; \
+ \
+ lockdep_init_map_crosslock((struct lockdep_map *)&(e)->map, \
"(PG_locked)" #p, &__key, 0); \
} while (0)

static inline void lock_page_acquire(struct page *page, int try)
{
+ struct page_ext *e;
+
page = compound_head(page);
- lock_acquire_exclusive((struct lockdep_map *)&page->map, 0,
+ e = lookup_page_ext(page);
+ if (unlikely(!e))
+ return;
+
+ lock_acquire_exclusive((struct lockdep_map *)&e->map, 0,
try, NULL, _RET_IP_);
}

static inline void lock_page_release(struct page *page)
{
+ struct page_ext *e;
+
page = compound_head(page);
+ e = lookup_page_ext(page);
+ if (unlikely(!e))
+ return;
+
/*
* lock_commit_crosslock() is necessary for crosslocks.
*/
- lock_commit_crosslock((struct lockdep_map *)&page->map);
- lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+ lock_commit_crosslock((struct lockdep_map *)&e->map);
+ lock_release((struct lockdep_map *)&e->map, 0, _RET_IP_);
}
#else
static inline void lock_page_init(struct page *page) {}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 99b5f76..3a890fb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1096,6 +1096,7 @@ config LOCKDEP_COMPLETE
config LOCKDEP_PAGELOCK
bool "Lock debugging: allow PG_locked lock to use deadlock detector"
select LOCKDEP_CROSSRELEASE
+ select PAGE_EXTENSION
default n
help
PG_locked lock is a kind of crosslock. Using crossrelease feature,
diff --git a/mm/filemap.c b/mm/filemap.c
index 0d83bf0..6372bd8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -36,6 +36,9 @@
#include <linux/memcontrol.h>
#include <linux/cleancache.h>
#include <linux/rmap.h>
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/page_ext.h>
+#endif
#include "internal.h"

#define CREATE_TRACE_POINTS
@@ -1210,6 +1213,72 @@ int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
}
}

+#ifdef CONFIG_LOCKDEP_PAGELOCK
+static bool need_lockdep_pagelock(void) { return true; }
+
+static void init_pages_in_zone(pg_data_t *pgdat, struct zone *zone)
+{
+ struct page *page;
+ struct page_ext *page_ext;
+ unsigned long pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = pfn + zone->spanned_pages;
+ unsigned long count = 0;
+
+ for (; pfn < end_pfn; pfn++) {
+ if (!pfn_valid(pfn)) {
+ pfn = ALIGN(pfn + 1, MAX_ORDER_NR_PAGES);
+ continue;
+ }
+
+ if (!pfn_valid_within(pfn))
+ continue;
+
+ page = pfn_to_page(pfn);
+
+ if (page_zone(page) != zone)
+ continue;
+
+ page_ext = lookup_page_ext(page);
+ if (unlikely(!page_ext))
+ continue;
+
+ lock_page_init(page);
+ count++;
+ }
+
+ pr_info("Node %d, zone %8s: lockdep pagelock found early allocated %lu pages\n",
+ pgdat->node_id, zone->name, count);
+}
+
+static void init_zones_in_node(pg_data_t *pgdat)
+{
+ struct zone *zone;
+ struct zone *node_zones = pgdat->node_zones;
+ unsigned long flags;
+
+ for (zone = node_zones; zone - node_zones < MAX_NR_ZONES; ++zone) {
+ if (!populated_zone(zone))
+ continue;
+
+ spin_lock_irqsave(&zone->lock, flags);
+ init_pages_in_zone(pgdat, zone);
+ spin_unlock_irqrestore(&zone->lock, flags);
+ }
+}
+
+static void init_lockdep_pagelock(void)
+{
+ pg_data_t *pgdat;
+ for_each_online_pgdat(pgdat)
+ init_zones_in_node(pgdat);
+}
+
+struct page_ext_operations lockdep_pagelock_ops = {
+ .need = need_lockdep_pagelock,
+ .init = init_lockdep_pagelock,
+};
+#endif
+
/**
* page_cache_next_hole - find the next hole (not-present entry)
* @mapping: mapping
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2cbf412..6d30e91 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5406,9 +5406,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
} else {
__init_single_pfn(pfn, zone, nid);
}
-#ifdef CONFIG_LOCKDEP_PAGELOCK
- lock_page_init(pfn_to_page(pfn));
-#endif
}
}

diff --git a/mm/page_ext.c b/mm/page_ext.c
index 88ccc044..2ac1fb1 100644
--- a/mm/page_ext.c
+++ b/mm/page_ext.c
@@ -7,6 +7,7 @@
#include <linux/kmemleak.h>
#include <linux/page_owner.h>
#include <linux/page_idle.h>
+#include <linux/pagemap.h>

/*
* struct page extension
@@ -65,6 +66,9 @@
#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
&page_idle_ops,
#endif
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+ &lockdep_pagelock_ops,
+#endif
};

static unsigned long total_usage;
--
1.9.1

2017-08-07 07:14:43

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 04/14] lockdep: Make check_prev_add() able to handle external stack_trace

Currently, a space for stack_trace is pinned in check_prev_add(), that
makes us not able to use external stack_trace. The simplest way to
achieve it is to pass an external stack_trace as an argument.

A more suitable solution is to pass a callback additionally along with
a stack_trace so that callers can decide the way to save or whether to
save. Actually crossrelease needs to do other than saving a stack_trace.
So pass a stack_trace and callback to handle it, to check_prev_add().

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/locking/lockdep.c | 40 +++++++++++++++++++---------------------
1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index b23e930..22a13f9 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1813,20 +1813,13 @@ static inline void inc_chains(void)
*/
static int
check_prev_add(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next, int distance, int *stack_saved)
+ struct held_lock *next, int distance, struct stack_trace *trace,
+ int (*save)(struct stack_trace *trace))
{
struct lock_list *entry;
int ret;
struct lock_list this;
struct lock_list *uninitialized_var(target_entry);
- /*
- * Static variable, serialized by the graph_lock().
- *
- * We use this static variable to save the stack trace in case
- * we call into this function multiple times due to encountering
- * trylocks in the held lock stack.
- */
- static struct stack_trace trace;

/*
* Prove that the new <prev> -> <next> dependency would not
@@ -1874,11 +1867,8 @@ static inline void inc_chains(void)
}
}

- if (!*stack_saved) {
- if (!save_trace(&trace))
- return 0;
- *stack_saved = 1;
- }
+ if (save && !save(trace))
+ return 0;

/*
* Ok, all validations passed, add the new lock
@@ -1886,14 +1876,14 @@ static inline void inc_chains(void)
*/
ret = add_lock_to_list(hlock_class(next),
&hlock_class(prev)->locks_after,
- next->acquire_ip, distance, &trace);
+ next->acquire_ip, distance, trace);

if (!ret)
return 0;

ret = add_lock_to_list(hlock_class(prev),
&hlock_class(next)->locks_before,
- next->acquire_ip, distance, &trace);
+ next->acquire_ip, distance, trace);
if (!ret)
return 0;

@@ -1901,8 +1891,6 @@ static inline void inc_chains(void)
* Debugging printouts:
*/
if (verbose(hlock_class(prev)) || verbose(hlock_class(next))) {
- /* We drop graph lock, so another thread can overwrite trace. */
- *stack_saved = 0;
graph_unlock();
printk("\n new dependency: ");
print_lock_name(hlock_class(prev));
@@ -1926,8 +1914,9 @@ static inline void inc_chains(void)
check_prevs_add(struct task_struct *curr, struct held_lock *next)
{
int depth = curr->lockdep_depth;
- int stack_saved = 0;
struct held_lock *hlock;
+ struct stack_trace trace;
+ int (*save)(struct stack_trace *trace) = save_trace;

/*
* Debugging checks.
@@ -1952,9 +1941,18 @@ static inline void inc_chains(void)
* added:
*/
if (hlock->read != 2 && hlock->check) {
- if (!check_prev_add(curr, hlock, next,
- distance, &stack_saved))
+ int ret = check_prev_add(curr, hlock, next,
+ distance, &trace, save);
+ if (!ret)
return 0;
+
+ /*
+ * Stop saving stack_trace if save_trace() was
+ * called at least once:
+ */
+ if (save && ret == 2)
+ save = NULL;
+
/*
* Stop after the first non-trylock entry,
* as non-trylock entries have added their
--
1.9.1

2017-08-07 07:15:18

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

Although wait_for_completion() and its family can cause deadlock, the
lock correctness validator could not be applied to them until now,
because things like complete() are usually called in a different context
from the waiting context, which violates lockdep's assumption.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to those completion operations. Applied it.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/completion.h | 118 +++++++++++++++++++++++++++++++++++++++++----
kernel/sched/completion.c | 56 ++++++++++++---------
lib/Kconfig.debug | 8 +++
3 files changed, 149 insertions(+), 33 deletions(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 5d5aaae..6b3bcfc 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
*/

#include <linux/wait.h>
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include <linux/lockdep.h>
+#endif

/*
* struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,50 @@
struct completion {
unsigned int done;
wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+ struct lockdep_map_cross map;
+#endif
};

+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+ lock_acquire_exclusive((struct lockdep_map *)&x->map, 0, 0, NULL, _RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+ lock_release((struct lockdep_map *)&x->map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+ lock_commit_crosslock((struct lockdep_map *)&x->map);
+}
+
+#define init_completion(x) \
+do { \
+ static struct lock_class_key __key; \
+ lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \
+ "(complete)" #x, \
+ &__key, 0); \
+ __init_completion(x); \
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+ { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+ STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
+#else
#define COMPLETION_INITIALIZER(work) \
{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif

#define COMPLETION_INITIALIZER_ONSTACK(work) \
({ init_completion(&work); work; })
@@ -70,7 +113,7 @@ struct completion {
* This inline function will initialize a dynamically created completion
* structure.
*/
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x)
{
x->done = 0;
init_waitqueue_head(&x->wait);
@@ -88,18 +131,75 @@ static inline void reinit_completion(struct completion *x)
x->done = 0;
}

-extern void wait_for_completion(struct completion *);
-extern void wait_for_completion_io(struct completion *);
-extern int wait_for_completion_interruptible(struct completion *x);
-extern int wait_for_completion_killable(struct completion *x);
-extern unsigned long wait_for_completion_timeout(struct completion *x,
+extern void __wait_for_completion(struct completion *);
+extern void __wait_for_completion_io(struct completion *);
+extern int __wait_for_completion_interruptible(struct completion *x);
+extern int __wait_for_completion_killable(struct completion *x);
+extern unsigned long __wait_for_completion_timeout(struct completion *x,
unsigned long timeout);
-extern unsigned long wait_for_completion_io_timeout(struct completion *x,
+extern unsigned long __wait_for_completion_io_timeout(struct completion *x,
unsigned long timeout);
-extern long wait_for_completion_interruptible_timeout(
+extern long __wait_for_completion_interruptible_timeout(
struct completion *x, unsigned long timeout);
-extern long wait_for_completion_killable_timeout(
+extern long __wait_for_completion_killable_timeout(
struct completion *x, unsigned long timeout);
+
+static inline void wait_for_completion(struct completion *x)
+{
+ complete_acquire(x);
+ __wait_for_completion(x);
+ complete_release(x);
+}
+
+static inline void wait_for_completion_io(struct completion *x)
+{
+ complete_acquire(x);
+ __wait_for_completion_io(x);
+ complete_release(x);
+}
+
+static inline int wait_for_completion_interruptible(struct completion *x)
+{
+ int ret;
+ complete_acquire(x);
+ ret = __wait_for_completion_interruptible(x);
+ complete_release(x);
+ return ret;
+}
+
+static inline int wait_for_completion_killable(struct completion *x)
+{
+ int ret;
+ complete_acquire(x);
+ ret = __wait_for_completion_killable(x);
+ complete_release(x);
+ return ret;
+}
+
+static inline unsigned long wait_for_completion_timeout(struct completion *x,
+ unsigned long timeout)
+{
+ return __wait_for_completion_timeout(x, timeout);
+}
+
+static inline unsigned long wait_for_completion_io_timeout(struct completion *x,
+ unsigned long timeout)
+{
+ return __wait_for_completion_io_timeout(x, timeout);
+}
+
+static inline long wait_for_completion_interruptible_timeout(
+ struct completion *x, unsigned long timeout)
+{
+ return __wait_for_completion_interruptible_timeout(x, timeout);
+}
+
+static inline long wait_for_completion_killable_timeout(
+ struct completion *x, unsigned long timeout)
+{
+ return __wait_for_completion_killable_timeout(x, timeout);
+}
+
extern bool try_wait_for_completion(struct completion *x);
extern bool completion_done(struct completion *x);

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 13fc5ae..0b5f16b 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -32,6 +32,12 @@ void complete(struct completion *x)
unsigned long flags;

spin_lock_irqsave(&x->wait.lock, flags);
+
+ /*
+ * Perform commit of crossrelease here.
+ */
+ complete_release_commit(x);
+
if (x->done != UINT_MAX)
x->done++;
__wake_up_locked(&x->wait, TASK_NORMAL, 1);
@@ -111,7 +117,7 @@ void complete_all(struct completion *x)
}

/**
- * wait_for_completion: - waits for completion of a task
+ * __wait_for_completion: - waits for completion of a task
* @x: holds the state of this particular completion
*
* This waits to be signaled for completion of a specific task. It is NOT
@@ -120,14 +126,14 @@ void complete_all(struct completion *x)
* See also similar routines (i.e. wait_for_completion_timeout()) with timeout
* and interrupt capability. Also see complete().
*/
-void __sched wait_for_completion(struct completion *x)
+void __sched __wait_for_completion(struct completion *x)
{
wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion);
+EXPORT_SYMBOL(__wait_for_completion);

/**
- * wait_for_completion_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_timeout: - waits for completion of a task (w/timeout)
* @x: holds the state of this particular completion
* @timeout: timeout value in jiffies
*
@@ -139,28 +145,28 @@ void __sched wait_for_completion(struct completion *x)
* till timeout) if completed.
*/
unsigned long __sched
-wait_for_completion_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_timeout(struct completion *x, unsigned long timeout)
{
return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion_timeout);
+EXPORT_SYMBOL(__wait_for_completion_timeout);

/**
- * wait_for_completion_io: - waits for completion of a task
+ * __wait_for_completion_io: - waits for completion of a task
* @x: holds the state of this particular completion
*
* This waits to be signaled for completion of a specific task. It is NOT
* interruptible and there is no timeout. The caller is accounted as waiting
* for IO (which traditionally means blkio only).
*/
-void __sched wait_for_completion_io(struct completion *x)
+void __sched __wait_for_completion_io(struct completion *x)
{
wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion_io);
+EXPORT_SYMBOL(__wait_for_completion_io);

/**
- * wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
+ * __wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
* @x: holds the state of this particular completion
* @timeout: timeout value in jiffies
*
@@ -173,14 +179,14 @@ void __sched wait_for_completion_io(struct completion *x)
* till timeout) if completed.
*/
unsigned long __sched
-wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
+__wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
{
return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion_io_timeout);
+EXPORT_SYMBOL(__wait_for_completion_io_timeout);

/**
- * wait_for_completion_interruptible: - waits for completion of a task (w/intr)
+ * __wait_for_completion_interruptible: - waits for completion of a task (w/intr)
* @x: holds the state of this particular completion
*
* This waits for completion of a specific task to be signaled. It is
@@ -188,17 +194,18 @@ void __sched wait_for_completion_io(struct completion *x)
*
* Return: -ERESTARTSYS if interrupted, 0 if completed.
*/
-int __sched wait_for_completion_interruptible(struct completion *x)
+int __sched __wait_for_completion_interruptible(struct completion *x)
{
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
+
if (t == -ERESTARTSYS)
return t;
return 0;
}
-EXPORT_SYMBOL(wait_for_completion_interruptible);
+EXPORT_SYMBOL(__wait_for_completion_interruptible);

/**
- * wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
+ * __wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
* @x: holds the state of this particular completion
* @timeout: timeout value in jiffies
*
@@ -209,15 +216,15 @@ int __sched wait_for_completion_interruptible(struct completion *x)
* or number of jiffies left till timeout) if completed.
*/
long __sched
-wait_for_completion_interruptible_timeout(struct completion *x,
+__wait_for_completion_interruptible_timeout(struct completion *x,
unsigned long timeout)
{
return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
}
-EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
+EXPORT_SYMBOL(__wait_for_completion_interruptible_timeout);

/**
- * wait_for_completion_killable: - waits for completion of a task (killable)
+ * __wait_for_completion_killable: - waits for completion of a task (killable)
* @x: holds the state of this particular completion
*
* This waits to be signaled for completion of a specific task. It can be
@@ -225,17 +232,18 @@ int __sched wait_for_completion_interruptible(struct completion *x)
*
* Return: -ERESTARTSYS if interrupted, 0 if completed.
*/
-int __sched wait_for_completion_killable(struct completion *x)
+int __sched __wait_for_completion_killable(struct completion *x)
{
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);
+
if (t == -ERESTARTSYS)
return t;
return 0;
}
-EXPORT_SYMBOL(wait_for_completion_killable);
+EXPORT_SYMBOL(__wait_for_completion_killable);

/**
- * wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
+ * __wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
* @x: holds the state of this particular completion
* @timeout: timeout value in jiffies
*
@@ -247,12 +255,12 @@ int __sched wait_for_completion_killable(struct completion *x)
* or number of jiffies left till timeout) if completed.
*/
long __sched
-wait_for_completion_killable_timeout(struct completion *x,
+__wait_for_completion_killable_timeout(struct completion *x,
unsigned long timeout)
{
return wait_for_common(x, timeout, TASK_KILLABLE);
}
-EXPORT_SYMBOL(wait_for_completion_killable_timeout);
+EXPORT_SYMBOL(__wait_for_completion_killable_timeout);

/**
* try_wait_for_completion - try to decrement a completion without blocking
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 037e813..4ba8adc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1085,6 +1085,14 @@ config LOCKDEP_CROSSRELEASE
such as page locks or completions can use the lock correctness
detector, lockdep.

+config LOCKDEP_COMPLETE
+ bool "Lock debugging: allow completions to use deadlock detector"
+ select LOCKDEP_CROSSRELEASE
+ default n
+ help
+ A deadlock caused by wait_for_completion() and complete() can be
+ detected by lockdep using crossrelease feature.
+
config PROVE_LOCKING
bool "Lock debugging: prove locking correctness"
depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
--
1.9.1

2017-08-07 07:15:21

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 12/14] lockdep: Apply lock_acquire(release) on __Set(__Clear)PageLocked

Usually PG_locked bit is updated by lock_page() or unlock_page().
However, it can be also updated through __SetPageLocked() or
__ClearPageLockded(). They have to be considered, to get paired between
acquire and release.

Furthermore, e.g. __SetPageLocked() in add_to_page_cache_lru() is called
frequently. We might miss many chances to check deadlock if we ignore it.
Make __Set(__Clear)PageLockded considered as well.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/page-flags.h | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index d33e328..b793342 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -261,7 +261,6 @@ static __always_inline int PageCompound(struct page *page)
#define TESTSCFLAG_FALSE(uname) \
TESTSETFLAG_FALSE(uname) TESTCLEARFLAG_FALSE(uname)

-__PAGEFLAG(Locked, locked, PF_NO_TAIL)
PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD)
PAGEFLAG(Error, error, PF_NO_COMPOUND) TESTCLEARFLAG(Error, error, PF_NO_COMPOUND)
PAGEFLAG(Referenced, referenced, PF_HEAD)
@@ -373,6 +372,35 @@ static __always_inline int PageSwapCache(struct page *page)
PAGEFLAG(Idle, idle, PF_ANY)
#endif

+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+
+TESTPAGEFLAG(Locked, locked, PF_NO_TAIL)
+
+static __always_inline void __SetPageLocked(struct page *page)
+{
+ __set_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+
+ page = compound_head(page);
+ lock_acquire_exclusive((struct lockdep_map *)&page->map, 0, 1, NULL, _RET_IP_);
+}
+
+static __always_inline void __ClearPageLocked(struct page *page)
+{
+ __clear_bit(PG_locked, &PF_NO_TAIL(page, 1)->flags);
+
+ page = compound_head(page);
+ /*
+ * lock_commit_crosslock() is necessary for crosslock
+ * when the lock is released, before lock_release().
+ */
+ lock_commit_crosslock((struct lockdep_map *)&page->map);
+ lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+}
+#else
+__PAGEFLAG(Locked, locked, PF_NO_TAIL)
+#endif
+
/*
* On an anonymous page mapped into a user virtual memory area,
* page->mapping points to its anon_vma, not to a struct address_space;
--
1.9.1

2017-08-07 07:15:18

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 05/14] lockdep: Implement crossrelease feature

Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.

However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock. So lockdep should
track these locks to do a better job. The 'crossrelease' implementation
makes these primitives also be tracked.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/irqflags.h | 24 ++-
include/linux/lockdep.h | 110 +++++++++-
include/linux/sched.h | 8 +
kernel/exit.c | 1 +
kernel/fork.c | 4 +
kernel/locking/lockdep.c | 508 ++++++++++++++++++++++++++++++++++++++++++++---
kernel/workqueue.c | 2 +
lib/Kconfig.debug | 12 ++
8 files changed, 635 insertions(+), 34 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5dd1272..e9ed580 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -23,10 +23,26 @@
# define trace_softirq_context(p) ((p)->softirq_context)
# define trace_hardirqs_enabled(p) ((p)->hardirqs_enabled)
# define trace_softirqs_enabled(p) ((p)->softirqs_enabled)
-# define trace_hardirq_enter() do { current->hardirq_context++; } while (0)
-# define trace_hardirq_exit() do { current->hardirq_context--; } while (0)
-# define lockdep_softirq_enter() do { current->softirq_context++; } while (0)
-# define lockdep_softirq_exit() do { current->softirq_context--; } while (0)
+# define trace_hardirq_enter() \
+do { \
+ current->hardirq_context++; \
+ crossrelease_hist_start(HARD); \
+} while (0)
+# define trace_hardirq_exit() \
+do { \
+ current->hardirq_context--; \
+ crossrelease_hist_end(HARD); \
+} while (0)
+# define lockdep_softirq_enter() \
+do { \
+ current->softirq_context++; \
+ crossrelease_hist_start(SOFT); \
+} while (0)
+# define lockdep_softirq_exit() \
+do { \
+ current->softirq_context--; \
+ crossrelease_hist_end(SOFT); \
+} while (0)
# define INIT_TRACE_IRQFLAGS .softirqs_enabled = 1,
#else
# define trace_hardirqs_on() do { } while (0)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index fffe49f..0c8a1b8 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -155,6 +155,12 @@ struct lockdep_map {
int cpu;
unsigned long ip;
#endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+ /*
+ * Whether it's a crosslock.
+ */
+ int cross;
+#endif
};

static inline void lockdep_copy_map(struct lockdep_map *to,
@@ -258,8 +264,62 @@ struct held_lock {
unsigned int hardirqs_off:1;
unsigned int references:12; /* 32 bits */
unsigned int pin_count;
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+ /*
+ * Generation id.
+ *
+ * A value of cross_gen_id will be stored when holding this,
+ * which is globally increased whenever each crosslock is held.
+ */
+ unsigned int gen_id;
+#endif
+};
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_XHLOCK_TRACE_ENTRIES 5
+
+/*
+ * This is for keeping locks waiting for commit so that true dependencies
+ * can be added at commit step.
+ */
+struct hist_lock {
+ /*
+ * Seperate stack_trace data. This will be used at commit step.
+ */
+ struct stack_trace trace;
+ unsigned long trace_entries[MAX_XHLOCK_TRACE_ENTRIES];
+
+ /*
+ * Seperate hlock instance. This will be used at commit step.
+ *
+ * TODO: Use a smaller data structure containing only necessary
+ * data. However, we should make lockdep code able to handle the
+ * smaller one first.
+ */
+ struct held_lock hlock;
+};
+
+/*
+ * To initialize a lock as crosslock, lockdep_init_map_crosslock() should
+ * be called instead of lockdep_init_map().
+ */
+struct cross_lock {
+ /*
+ * Seperate hlock instance. This will be used at commit step.
+ *
+ * TODO: Use a smaller data structure containing only necessary
+ * data. However, we should make lockdep code able to handle the
+ * smaller one first.
+ */
+ struct held_lock hlock;
};

+struct lockdep_map_cross {
+ struct lockdep_map map;
+ struct cross_lock xlock;
+};
+#endif
+
/*
* Initialization, self-test and debugging-output methods:
*/
@@ -282,13 +342,6 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass);

/*
- * To initialize a lockdep_map statically use this macro.
- * Note that _name must not be NULL.
- */
-#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
- { .name = (_name), .key = (void *)(_key), }
-
-/*
* Reinitialize a lock key - for cases where there is special locking or
* special initialization of locks so that the validator gets the scope
* of dependencies wrong: they are either too broad (they need a class-split)
@@ -467,6 +520,49 @@ static inline void lockdep_on(void)

#endif /* !LOCKDEP */

+enum context_t {
+ HARD,
+ SOFT,
+ PROC,
+ CONTEXT_NR,
+};
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
+ const char *name,
+ struct lock_class_key *key,
+ int subclass);
+extern void lock_commit_crosslock(struct lockdep_map *lock);
+
+#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
+ { .map.name = (_name), .map.key = (void *)(_key), \
+ .map.cross = 1, }
+
+/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+ { .name = (_name), .key = (void *)(_key), .cross = 0, }
+
+extern void crossrelease_hist_start(enum context_t c);
+extern void crossrelease_hist_end(enum context_t c);
+extern void lockdep_init_task(struct task_struct *task);
+extern void lockdep_free_task(struct task_struct *task);
+#else
+/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+ { .name = (_name), .key = (void *)(_key), }
+
+static inline void crossrelease_hist_start(enum context_t c) {}
+static inline void crossrelease_hist_end(enum context_t c) {}
+static inline void lockdep_init_task(struct task_struct *task) {}
+static inline void lockdep_free_task(struct task_struct *task) {}
+#endif
+
#ifdef CONFIG_LOCK_STAT

extern void lock_contended(struct lockdep_map *lock, unsigned long ip);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8337e2d..5becef5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -849,6 +849,14 @@ struct task_struct {
gfp_t lockdep_reclaim_gfp;
#endif

+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_XHLOCKS_NR 64UL
+ struct hist_lock *xhlocks; /* Crossrelease history locks */
+ unsigned int xhlock_idx;
+ /* For restoring at history boundaries */
+ unsigned int xhlock_idx_hist[CONTEXT_NR];
+#endif
+
#ifdef CONFIG_UBSAN
unsigned int in_ubsan;
#endif
diff --git a/kernel/exit.c b/kernel/exit.c
index c5548fa..fa72d57 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -920,6 +920,7 @@ void __noreturn do_exit(long code)
exit_rcu();
TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));

+ lockdep_free_task(tsk);
do_task_dead();
}
EXPORT_SYMBOL_GPL(do_exit);
diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0..cbf2221 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -484,6 +484,8 @@ void __init fork_init(void)
cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "fork:vm_stack_cache",
NULL, free_vm_stack_cache);
#endif
+
+ lockdep_init_task(&init_task);
}

int __weak arch_dup_task_struct(struct task_struct *dst,
@@ -1691,6 +1693,7 @@ static __latent_entropy struct task_struct *copy_process(
p->lockdep_depth = 0; /* no locks held yet */
p->curr_chain_key = 0;
p->lockdep_recursion = 0;
+ lockdep_init_task(p);
#endif

#ifdef CONFIG_DEBUG_MUTEXES
@@ -1949,6 +1952,7 @@ static __latent_entropy struct task_struct *copy_process(
bad_fork_cleanup_perf:
perf_event_free_task(p);
bad_fork_cleanup_policy:
+ lockdep_free_task(p);
#ifdef CONFIG_NUMA
mpol_put(p->mempolicy);
bad_fork_cleanup_threadgroup_lock:
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 22a13f9..afd6e64 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -58,6 +58,10 @@
#define CREATE_TRACE_POINTS
#include <trace/events/lock.h>

+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#include <linux/slab.h>
+#endif
+
#ifdef CONFIG_PROVE_LOCKING
int prove_locking = 1;
module_param(prove_locking, int, 0644);
@@ -726,6 +730,18 @@ static int count_matching_names(struct lock_class *new_class)
return is_static || static_obj(lock->key) ? NULL : ERR_PTR(-EINVAL);
}

+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+static void cross_init(struct lockdep_map *lock, int cross);
+static int cross_lock(struct lockdep_map *lock);
+static int lock_acquire_crosslock(struct held_lock *hlock);
+static int lock_release_crosslock(struct lockdep_map *lock);
+#else
+static inline void cross_init(struct lockdep_map *lock, int cross) {}
+static inline int cross_lock(struct lockdep_map *lock) { return 0; }
+static inline int lock_acquire_crosslock(struct held_lock *hlock) { return 2; }
+static inline int lock_release_crosslock(struct lockdep_map *lock) { return 2; }
+#endif
+
/*
* Register a lock's class in the hash-table, if the class is not present
* yet. Otherwise we look it up. We cache the result in the lock object
@@ -1784,6 +1800,9 @@ static inline void inc_chains(void)
if (nest)
return 2;

+ if (cross_lock(prev->instance))
+ continue;
+
return print_deadlock_bug(curr, prev, next);
}
return 1;
@@ -1937,30 +1956,36 @@ static inline void inc_chains(void)
int distance = curr->lockdep_depth - depth + 1;
hlock = curr->held_locks + depth - 1;
/*
- * Only non-recursive-read entries get new dependencies
- * added:
+ * Only non-crosslock entries get new dependencies added.
+ * Crosslock entries will be added by commit later:
*/
- if (hlock->read != 2 && hlock->check) {
- int ret = check_prev_add(curr, hlock, next,
- distance, &trace, save);
- if (!ret)
- return 0;
-
+ if (!cross_lock(hlock->instance)) {
/*
- * Stop saving stack_trace if save_trace() was
- * called at least once:
+ * Only non-recursive-read entries get new dependencies
+ * added:
*/
- if (save && ret == 2)
- save = NULL;
+ if (hlock->read != 2 && hlock->check) {
+ int ret = check_prev_add(curr, hlock, next,
+ distance, &trace, save);
+ if (!ret)
+ return 0;

- /*
- * Stop after the first non-trylock entry,
- * as non-trylock entries have added their
- * own direct dependencies already, so this
- * lock is connected to them indirectly:
- */
- if (!hlock->trylock)
- break;
+ /*
+ * Stop saving stack_trace if save_trace() was
+ * called at least once:
+ */
+ if (save && ret == 2)
+ save = NULL;
+
+ /*
+ * Stop after the first non-trylock entry,
+ * as non-trylock entries have added their
+ * own direct dependencies already, so this
+ * lock is connected to them indirectly:
+ */
+ if (!hlock->trylock)
+ break;
+ }
}
depth--;
/*
@@ -3225,7 +3250,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
/*
* Initialize a lock instance's lock-class mapping info:
*/
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
+static void __lockdep_init_map(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass)
{
int i;
@@ -3283,8 +3308,25 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
raw_local_irq_restore(flags);
}
}
+
+void lockdep_init_map(struct lockdep_map *lock, const char *name,
+ struct lock_class_key *key, int subclass)
+{
+ cross_init(lock, 0);
+ __lockdep_init_map(lock, name, key, subclass);
+}
EXPORT_SYMBOL_GPL(lockdep_init_map);

+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+void lockdep_init_map_crosslock(struct lockdep_map *lock, const char *name,
+ struct lock_class_key *key, int subclass)
+{
+ cross_init(lock, 1);
+ __lockdep_init_map(lock, name, key, subclass);
+}
+EXPORT_SYMBOL_GPL(lockdep_init_map_crosslock);
+#endif
+
struct lock_class_key __lockdep_no_validate__;
EXPORT_SYMBOL_GPL(__lockdep_no_validate__);

@@ -3340,6 +3382,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
int chain_head = 0;
int class_idx;
u64 chain_key;
+ int ret;

if (unlikely(!debug_locks))
return 0;
@@ -3388,7 +3431,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,

class_idx = class - lock_classes + 1;

- if (depth) {
+ /* TODO: nest_lock is not implemented for crosslock yet. */
+ if (depth && !cross_lock(lock)) {
hlock = curr->held_locks + depth - 1;
if (hlock->class_idx == class_idx && nest_lock) {
if (hlock->references) {
@@ -3476,6 +3520,14 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
return 0;

+ ret = lock_acquire_crosslock(hlock);
+ /*
+ * 2 means normal acquire operations are needed. Otherwise, it's
+ * ok just to return with '0:fail, 1:success'.
+ */
+ if (ret != 2)
+ return ret;
+
curr->curr_chain_key = chain_key;
curr->lockdep_depth++;
check_chain_key(curr);
@@ -3713,11 +3765,19 @@ static int __lock_downgrade(struct lockdep_map *lock, unsigned long ip)
struct task_struct *curr = current;
struct held_lock *hlock;
unsigned int depth;
- int i;
+ int ret, i;

if (unlikely(!debug_locks))
return 0;

+ ret = lock_release_crosslock(lock);
+ /*
+ * 2 means normal release operations are needed. Otherwise, it's
+ * ok just to return with '0:fail, 1:success'.
+ */
+ if (ret != 2)
+ return ret;
+
depth = curr->lockdep_depth;
/*
* So we're all set to release this lock.. wait what lock? We don't
@@ -4593,6 +4653,13 @@ asmlinkage __visible void lockdep_sys_exit(void)
curr->comm, curr->pid);
lockdep_print_held_locks(curr);
}
+
+ /*
+ * The lock history for each syscall should be independent. So wipe the
+ * slate clean on return to userspace.
+ */
+ crossrelease_hist_end(PROC);
+ crossrelease_hist_start(PROC);
}

void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
@@ -4641,3 +4708,398 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
dump_stack();
}
EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+
+/*
+ * Crossrelease works by recording a lock history for each thread and
+ * connecting those historic locks that were taken after the
+ * wait_for_completion() in the complete() context.
+ *
+ * Task-A Task-B
+ *
+ * mutex_lock(&A);
+ * mutex_unlock(&A);
+ *
+ * wait_for_completion(&C);
+ * lock_acquire_crosslock();
+ * atomic_inc_return(&cross_gen_id);
+ * |
+ * | mutex_lock(&B);
+ * | mutex_unlock(&B);
+ * |
+ * | complete(&C);
+ * `-- lock_commit_crosslock();
+ *
+ * Which will then add a dependency between B and C.
+ */
+
+#define xhlock(i) (current->xhlocks[(i) % MAX_XHLOCKS_NR])
+
+/*
+ * Whenever a crosslock is held, cross_gen_id will be increased.
+ */
+static atomic_t cross_gen_id; /* Can be wrapped */
+
+/*
+ * Lock history stacks; we have 3 nested lock history stacks:
+ *
+ * Hard IRQ
+ * Soft IRQ
+ * History / Task
+ *
+ * The thing is that once we complete a (Hard/Soft) IRQ the future task locks
+ * should not depend on any of the locks observed while running the IRQ.
+ *
+ * So what we do is rewind the history buffer and erase all our knowledge of
+ * that temporal event.
+ */
+
+/*
+ * We need this to annotate lock history boundaries. Take for instance
+ * workqueues; each work is independent of the last. The completion of a future
+ * work does not depend on the completion of a past work (in general).
+ * Therefore we must not carry that (lock) dependency across works.
+ *
+ * This is true for many things; pretty much all kthreads fall into this
+ * pattern, where they have an 'idle' state and future completions do not
+ * depend on past completions. Its just that since they all have the 'same'
+ * form -- the kthread does the same over and over -- it doesn't typically
+ * matter.
+ *
+ * The same is true for system-calls, once a system call is completed (we've
+ * returned to userspace) the next system call does not depend on the lock
+ * history of the previous system call.
+ */
+void crossrelease_hist_start(enum context_t c)
+{
+ if (current->xhlocks)
+ current->xhlock_idx_hist[c] = current->xhlock_idx;
+}
+
+void crossrelease_hist_end(enum context_t c)
+{
+ if (current->xhlocks)
+ current->xhlock_idx = current->xhlock_idx_hist[c];
+}
+
+static int cross_lock(struct lockdep_map *lock)
+{
+ return lock ? lock->cross : 0;
+}
+
+/*
+ * This is needed to decide the relationship between wrapable variables.
+ */
+static inline int before(unsigned int a, unsigned int b)
+{
+ return (int)(a - b) < 0;
+}
+
+static inline struct lock_class *xhlock_class(struct hist_lock *xhlock)
+{
+ return hlock_class(&xhlock->hlock);
+}
+
+static inline struct lock_class *xlock_class(struct cross_lock *xlock)
+{
+ return hlock_class(&xlock->hlock);
+}
+
+/*
+ * Should we check a dependency with previous one?
+ */
+static inline int depend_before(struct held_lock *hlock)
+{
+ return hlock->read != 2 && hlock->check && !hlock->trylock;
+}
+
+/*
+ * Should we check a dependency with next one?
+ */
+static inline int depend_after(struct held_lock *hlock)
+{
+ return hlock->read != 2 && hlock->check;
+}
+
+/*
+ * Check if the xhlock is valid, which would be false if,
+ *
+ * 1. Has not used after initializaion yet.
+ *
+ * Remind hist_lock is implemented as a ring buffer.
+ */
+static inline int xhlock_valid(struct hist_lock *xhlock)
+{
+ /*
+ * xhlock->hlock.instance must be !NULL.
+ */
+ return !!xhlock->hlock.instance;
+}
+
+/*
+ * Record a hist_lock entry.
+ *
+ * Irq disable is only required.
+ */
+static void add_xhlock(struct held_lock *hlock)
+{
+ unsigned int idx = ++current->xhlock_idx;
+ struct hist_lock *xhlock = &xhlock(idx);
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+ /*
+ * This can be done locklessly because they are all task-local
+ * state, we must however ensure IRQs are disabled.
+ */
+ WARN_ON_ONCE(!irqs_disabled());
+#endif
+
+ /* Initialize hist_lock's members */
+ xhlock->hlock = *hlock;
+
+ xhlock->trace.nr_entries = 0;
+ xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
+ xhlock->trace.entries = xhlock->trace_entries;
+ xhlock->trace.skip = 3;
+ save_stack_trace(&xhlock->trace);
+}
+
+static inline int same_context_xhlock(struct hist_lock *xhlock)
+{
+ return xhlock->hlock.irq_context == task_irq_context(current);
+}
+
+/*
+ * This should be lockless as far as possible because this would be
+ * called very frequently.
+ */
+static void check_add_xhlock(struct held_lock *hlock)
+{
+ /*
+ * Record a hist_lock, only in case that acquisitions ahead
+ * could depend on the held_lock. For example, if the held_lock
+ * is trylock then acquisitions ahead never depends on that.
+ * In that case, we don't need to record it. Just return.
+ */
+ if (!current->xhlocks || !depend_before(hlock))
+ return;
+
+ add_xhlock(hlock);
+}
+
+/*
+ * For crosslock.
+ */
+static int add_xlock(struct held_lock *hlock)
+{
+ struct cross_lock *xlock;
+ unsigned int gen_id;
+
+ if (!graph_lock())
+ return 0;
+
+ xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
+
+ gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
+ xlock->hlock = *hlock;
+ xlock->hlock.gen_id = gen_id;
+ graph_unlock();
+
+ return 1;
+}
+
+/*
+ * Called for both normal and crosslock acquires. Normal locks will be
+ * pushed on the hist_lock queue. Cross locks will record state and
+ * stop regular lock_acquire() to avoid being placed on the held_lock
+ * stack.
+ *
+ * Return: 0 - failure;
+ * 1 - crosslock, done;
+ * 2 - normal lock, continue to held_lock[] ops.
+ */
+static int lock_acquire_crosslock(struct held_lock *hlock)
+{
+ /*
+ * CONTEXT 1 CONTEXT 2
+ * --------- ---------
+ * lock A (cross)
+ * X = atomic_inc_return(&cross_gen_id)
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * Y = atomic_read_acquire(&cross_gen_id)
+ * lock B
+ *
+ * atomic_read_acquire() is for ordering between A and B,
+ * IOW, A happens before B, when CONTEXT 2 see Y >= X.
+ *
+ * Pairs with atomic_inc_return() in add_xlock().
+ */
+ hlock->gen_id = (unsigned int)atomic_read_acquire(&cross_gen_id);
+
+ if (cross_lock(hlock->instance))
+ return add_xlock(hlock);
+
+ check_add_xhlock(hlock);
+ return 2;
+}
+
+static int copy_trace(struct stack_trace *trace)
+{
+ unsigned long *buf = stack_trace + nr_stack_trace_entries;
+ unsigned int max_nr = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
+ unsigned int nr = min(max_nr, trace->nr_entries);
+
+ trace->nr_entries = nr;
+ memcpy(buf, trace->entries, nr * sizeof(trace->entries[0]));
+ trace->entries = buf;
+ nr_stack_trace_entries += nr;
+
+ if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
+ dump_stack();
+
+ return 0;
+ }
+
+ return 1;
+}
+
+static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
+{
+ unsigned int xid, pid;
+ u64 chain_key;
+
+ xid = xlock_class(xlock) - lock_classes;
+ chain_key = iterate_chain_key((u64)0, xid);
+ pid = xhlock_class(xhlock) - lock_classes;
+ chain_key = iterate_chain_key(chain_key, pid);
+
+ if (lookup_chain_cache(chain_key))
+ return 1;
+
+ if (!add_chain_cache_classes(xid, pid, xhlock->hlock.irq_context,
+ chain_key))
+ return 0;
+
+ if (!check_prev_add(current, &xlock->hlock, &xhlock->hlock, 1,
+ &xhlock->trace, copy_trace))
+ return 0;
+
+ return 1;
+}
+
+static void commit_xhlocks(struct cross_lock *xlock)
+{
+ unsigned int cur = current->xhlock_idx;
+ unsigned int i;
+
+ if (!graph_lock())
+ return;
+
+ for (i = 0; i < MAX_XHLOCKS_NR; i++) {
+ struct hist_lock *xhlock = &xhlock(cur - i);
+
+ if (!xhlock_valid(xhlock))
+ break;
+
+ if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
+ break;
+
+ if (!same_context_xhlock(xhlock))
+ break;
+
+ /*
+ * commit_xhlock() returns 0 with graph_lock already
+ * released if fail.
+ */
+ if (!commit_xhlock(xlock, xhlock))
+ return;
+ }
+
+ graph_unlock();
+}
+
+void lock_commit_crosslock(struct lockdep_map *lock)
+{
+ struct cross_lock *xlock;
+ unsigned long flags;
+
+ if (unlikely(!debug_locks || current->lockdep_recursion))
+ return;
+
+ if (!current->xhlocks)
+ return;
+
+ /*
+ * Do commit hist_locks with the cross_lock, only in case that
+ * the cross_lock could depend on acquisitions after that.
+ *
+ * For example, if the cross_lock does not have the 'check' flag
+ * then we don't need to check dependencies and commit for that.
+ * Just skip it. In that case, of course, the cross_lock does
+ * not depend on acquisitions ahead, either.
+ *
+ * WARNING: Don't do that in add_xlock() in advance. When an
+ * acquisition context is different from the commit context,
+ * invalid(skipped) cross_lock might be accessed.
+ */
+ if (!depend_after(&((struct lockdep_map_cross *)lock)->xlock.hlock))
+ return;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+ current->lockdep_recursion = 1;
+ xlock = &((struct lockdep_map_cross *)lock)->xlock;
+ commit_xhlocks(xlock);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_commit_crosslock);
+
+/*
+ * Return: 1 - crosslock, done;
+ * 2 - normal lock, continue to held_lock[] ops.
+ */
+static int lock_release_crosslock(struct lockdep_map *lock)
+{
+ return cross_lock(lock) ? 1 : 2;
+}
+
+static void cross_init(struct lockdep_map *lock, int cross)
+{
+ lock->cross = cross;
+
+ /*
+ * Crossrelease assumes that the ring buffer size of xhlocks
+ * is aligned with power of 2. So force it on build.
+ */
+ BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCKS_NR - 1));
+}
+
+void lockdep_init_task(struct task_struct *task)
+{
+ int i;
+
+ task->xhlock_idx = UINT_MAX;
+
+ for (i = 0; i < CONTEXT_NR; i++)
+ task->xhlock_idx_hist[i] = UINT_MAX;
+
+ task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
+ GFP_KERNEL);
+}
+
+void lockdep_free_task(struct task_struct *task)
+{
+ if (task->xhlocks) {
+ void *tmp = task->xhlocks;
+ /* Diable crossrelease for current */
+ task->xhlocks = NULL;
+ kfree(tmp);
+ }
+}
+#endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a86688f..eb03c4f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2093,6 +2093,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)

lock_map_acquire_read(&pwq->wq->lockdep_map);
lock_map_acquire(&lockdep_map);
+ crossrelease_hist_start(PROC);
trace_workqueue_execute_start(work);
worker->current_func(work);
/*
@@ -2100,6 +2101,7 @@ static void process_one_work(struct worker *worker, struct work_struct *work)
* point will only record its address.
*/
trace_workqueue_execute_end(work);
+ crossrelease_hist_end(PROC);
lock_map_release(&lockdep_map);
lock_map_release(&pwq->wq->lockdep_map);

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 98fe715..037e813 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1073,6 +1073,18 @@ config DEBUG_LOCK_ALLOC
spin_lock_init()/mutex_init()/etc., or whether there is any lock
held during task exit.

+config LOCKDEP_CROSSRELEASE
+ bool "Lock debugging: make lockdep work for crosslocks"
+ depends on PROVE_LOCKING
+ default n
+ help
+ This makes lockdep work for crosslock which is a lock allowed to
+ be released in a different context from the acquisition context.
+ Normally a lock must be released in the context acquiring the lock.
+ However, relexing this constraint helps synchronization primitives
+ such as page locks or completions can use the lock correctness
+ detector, lockdep.
+
config PROVE_LOCKING
bool "Lock debugging: prove locking correctness"
depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
--
1.9.1

2017-08-07 07:16:12

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 03/14] lockdep: Change the meaning of check_prev_add()'s return value

Firstly, return 1 instead of 2 when 'prev -> next' dependency already
exists. Since the value 2 is not referenced anywhere, just return 1
indicating success in this case.

Secondly, return 2 instead of 1 when successfully added a lock_list
entry with saving stack_trace. With that, a caller can decide whether
to avoid redundant save_trace() on the caller site.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/locking/lockdep.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 9d16723..b23e930 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1870,7 +1870,7 @@ static inline void inc_chains(void)
if (entry->class == hlock_class(next)) {
if (distance == 1)
entry->distance = 1;
- return 2;
+ return 1;
}
}

@@ -1910,9 +1910,10 @@ static inline void inc_chains(void)
print_lock_name(hlock_class(next));
printk(KERN_CONT "\n");
dump_stack();
- return graph_lock();
+ if (!graph_lock())
+ return 0;
}
- return 1;
+ return 2;
}

/*
--
1.9.1

2017-08-07 07:16:16

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 07/14] lockdep: Handle non(or multi)-acquisition of a crosslock

No acquisition might be in progress on commit of a crosslock. Completion
operations enabling crossrelease are the case like:

CONTEXT X CONTEXT Y
--------- ---------
trigger completion context
complete AX
commit AX
wait_for_complete AX
acquire AX
wait

where AX is a crosslock.

When no acquisition is in progress, we should not perform commit because
the lock does not exist, which might cause incorrect memory access. So
we have to track the number of acquisitions of a crosslock to handle it.

Moreover, in case that more than one acquisition of a crosslock are
overlapped like:

CONTEXT W CONTEXT X CONTEXT Y CONTEXT Z
--------- --------- --------- ---------
acquire AX (gen_id: 1)
acquire A
acquire AX (gen_id: 10)
acquire B
commit AX
acquire C
commit AX

where A, B and C are typical locks and AX is a crosslock.

Current crossrelease code performs commits in Y and Z with gen_id = 10.
However, we can use gen_id = 1 to do it, since not only 'acquire AX in X'
but 'acquire AX in W' also depends on each acquisition in Y and Z until
their commits. So make it use gen_id = 1 instead of 10 on their commits,
which adds an additional dependency 'AX -> A' in the example above.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/lockdep.h | 22 ++++++++++++-
kernel/locking/lockdep.c | 82 +++++++++++++++++++++++++++++++++---------------
2 files changed, 77 insertions(+), 27 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 48c244c..54916f7 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -325,6 +325,19 @@ struct hist_lock {
*/
struct cross_lock {
/*
+ * When more than one acquisition of crosslocks are overlapped,
+ * we have to perform commit for them based on cross_gen_id of
+ * the first acquisition, which allows us to add more true
+ * dependencies.
+ *
+ * Moreover, when no acquisition of a crosslock is in progress,
+ * we should not perform commit because the lock might not exist
+ * any more, which might cause incorrect memory access. So we
+ * have to track the number of acquisitions of a crosslock.
+ */
+ int nr_acquire;
+
+ /*
* Seperate hlock instance. This will be used at commit step.
*
* TODO: Use a smaller data structure containing only necessary
@@ -554,9 +567,16 @@ extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
int subclass);
extern void lock_commit_crosslock(struct lockdep_map *lock);

+/*
+ * What we essencially have to initialize is 'nr_acquire'. Other members
+ * will be initialized in add_xlock().
+ */
+#define STATIC_CROSS_LOCK_INIT() \
+ { .nr_acquire = 0,}
+
#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
{ .map.name = (_name), .map.key = (void *)(_key), \
- .map.cross = 1, }
+ .map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }

/*
* To initialize a lockdep_map statically use this macro.
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5168dac..4eae7dc 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4928,11 +4928,28 @@ static int add_xlock(struct held_lock *hlock)

xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;

+ /*
+ * When acquisitions for a crosslock are overlapped, we use
+ * nr_acquire to perform commit for them, based on cross_gen_id
+ * of the first acquisition, which allows to add additional
+ * dependencies.
+ *
+ * Moreover, when no acquisition of a crosslock is in progress,
+ * we should not perform commit because the lock might not exist
+ * any more, which might cause incorrect memory access. So we
+ * have to track the number of acquisitions of a crosslock.
+ *
+ * depend_after() is necessary to initialize only the first
+ * valid xlock so that the xlock can be used on its commit.
+ */
+ if (xlock->nr_acquire++ && depend_after(&xlock->hlock))
+ goto unlock;
+
gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
xlock->hlock = *hlock;
xlock->hlock.gen_id = gen_id;
+unlock:
graph_unlock();
-
return 1;
}

@@ -5028,35 +5045,37 @@ static void commit_xhlocks(struct cross_lock *xlock)
if (!graph_lock())
return;

- for (i = 0; i < MAX_XHLOCKS_NR; i++) {
- struct hist_lock *xhlock = &xhlock(cur - i);
+ if (xlock->nr_acquire) {
+ for (i = 0; i < MAX_XHLOCKS_NR; i++) {
+ struct hist_lock *xhlock = &xhlock(cur - i);

- if (!xhlock_valid(xhlock))
- break;
+ if (!xhlock_valid(xhlock))
+ break;

- if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
- break;
+ if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
+ break;

- if (!same_context_xhlock(xhlock))
- break;
+ if (!same_context_xhlock(xhlock))
+ break;

- /*
- * Filter out the cases that the ring buffer was
- * overwritten and the previous entry has a bigger
- * hist_id than the following one, which is impossible
- * otherwise.
- */
- if (unlikely(before(xhlock->hist_id, prev_hist_id)))
- break;
+ /*
+ * Filter out the cases that the ring buffer was
+ * overwritten and the previous entry has a bigger
+ * hist_id than the following one, which is impossible
+ * otherwise.
+ */
+ if (unlikely(before(xhlock->hist_id, prev_hist_id)))
+ break;

- prev_hist_id = xhlock->hist_id;
+ prev_hist_id = xhlock->hist_id;

- /*
- * commit_xhlock() returns 0 with graph_lock already
- * released if fail.
- */
- if (!commit_xhlock(xlock, xhlock))
- return;
+ /*
+ * commit_xhlock() returns 0 with graph_lock already
+ * released if fail.
+ */
+ if (!commit_xhlock(xlock, xhlock))
+ return;
+ }
}

graph_unlock();
@@ -5100,16 +5119,27 @@ void lock_commit_crosslock(struct lockdep_map *lock)
EXPORT_SYMBOL_GPL(lock_commit_crosslock);

/*
- * Return: 1 - crosslock, done;
+ * Return: 0 - failure;
+ * 1 - crosslock, done;
* 2 - normal lock, continue to held_lock[] ops.
*/
static int lock_release_crosslock(struct lockdep_map *lock)
{
- return cross_lock(lock) ? 1 : 2;
+ if (cross_lock(lock)) {
+ if (!graph_lock())
+ return 0;
+ ((struct lockdep_map_cross *)lock)->xlock.nr_acquire--;
+ graph_unlock();
+ return 1;
+ }
+ return 2;
}

static void cross_init(struct lockdep_map *lock, int cross)
{
+ if (cross)
+ ((struct lockdep_map_cross *)lock)->xlock.nr_acquire = 0;
+
lock->cross = cross;

/*
--
1.9.1

2017-08-07 07:16:18

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

The ring buffer can be overwritten by hardirq/softirq/work contexts.
That cases must be considered on rollback or commit. For example,

|<------ hist_lock ring buffer size ----->|
ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
wrapped > iiiiiiiiiiiiiiiiiiiiiii....................

where 'p' represents an acquisition in process context,
'i' represents an acquisition in irq context.

On irq exit, crossrelease tries to rollback idx to original position,
but it should not because the entry already has been invalid by
overwriting 'i'. Avoid rollback or commit for entries overwritten.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/lockdep.h | 20 +++++++++++++++++++
include/linux/sched.h | 3 +++
kernel/locking/lockdep.c | 52 +++++++++++++++++++++++++++++++++++++++++++-----
3 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 0c8a1b8..48c244c 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -284,6 +284,26 @@ struct held_lock {
*/
struct hist_lock {
/*
+ * Id for each entry in the ring buffer. This is used to
+ * decide whether the ring buffer was overwritten or not.
+ *
+ * For example,
+ *
+ * |<----------- hist_lock ring buffer size ------->|
+ * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
+ * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
+ *
+ * where 'p' represents an acquisition in process
+ * context, 'i' represents an acquisition in irq
+ * context.
+ *
+ * In this example, the ring buffer was overwritten by
+ * acquisitions in irq context, that should be detected on
+ * rollback or commit.
+ */
+ unsigned int hist_id;
+
+ /*
* Seperate stack_trace data. This will be used at commit step.
*/
struct stack_trace trace;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5becef5..373466b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -855,6 +855,9 @@ struct task_struct {
unsigned int xhlock_idx;
/* For restoring at history boundaries */
unsigned int xhlock_idx_hist[CONTEXT_NR];
+ unsigned int hist_id;
+ /* For overwrite check at each context exit */
+ unsigned int hist_id_save[CONTEXT_NR];
#endif

#ifdef CONFIG_UBSAN
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index afd6e64..5168dac 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4742,6 +4742,17 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
static atomic_t cross_gen_id; /* Can be wrapped */

/*
+ * Make an entry of the ring buffer invalid.
+ */
+static inline void invalidate_xhlock(struct hist_lock *xhlock)
+{
+ /*
+ * Normally, xhlock->hlock.instance must be !NULL.
+ */
+ xhlock->hlock.instance = NULL;
+}
+
+/*
* Lock history stacks; we have 3 nested lock history stacks:
*
* Hard IRQ
@@ -4773,14 +4784,28 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
*/
void crossrelease_hist_start(enum context_t c)
{
- if (current->xhlocks)
- current->xhlock_idx_hist[c] = current->xhlock_idx;
+ struct task_struct *cur = current;
+
+ if (cur->xhlocks) {
+ cur->xhlock_idx_hist[c] = cur->xhlock_idx;
+ cur->hist_id_save[c] = cur->hist_id;
+ }
}

void crossrelease_hist_end(enum context_t c)
{
- if (current->xhlocks)
- current->xhlock_idx = current->xhlock_idx_hist[c];
+ struct task_struct *cur = current;
+
+ if (cur->xhlocks) {
+ unsigned int idx = cur->xhlock_idx_hist[c];
+ struct hist_lock *h = &xhlock(idx);
+
+ cur->xhlock_idx = idx;
+
+ /* Check if the ring was overwritten. */
+ if (h->hist_id != cur->hist_id_save[c])
+ invalidate_xhlock(h);
+ }
}

static int cross_lock(struct lockdep_map *lock)
@@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock *hlock)
* Check if the xhlock is valid, which would be false if,
*
* 1. Has not used after initializaion yet.
+ * 2. Got invalidated.
*
* Remind hist_lock is implemented as a ring buffer.
*/
@@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)

/* Initialize hist_lock's members */
xhlock->hlock = *hlock;
+ xhlock->hist_id = current->hist_id++;

xhlock->trace.nr_entries = 0;
xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
@@ -4995,6 +5022,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
static void commit_xhlocks(struct cross_lock *xlock)
{
unsigned int cur = current->xhlock_idx;
+ unsigned int prev_hist_id = xhlock(cur).hist_id;
unsigned int i;

if (!graph_lock())
@@ -5013,6 +5041,17 @@ static void commit_xhlocks(struct cross_lock *xlock)
break;

/*
+ * Filter out the cases that the ring buffer was
+ * overwritten and the previous entry has a bigger
+ * hist_id than the following one, which is impossible
+ * otherwise.
+ */
+ if (unlikely(before(xhlock->hist_id, prev_hist_id)))
+ break;
+
+ prev_hist_id = xhlock->hist_id;
+
+ /*
* commit_xhlock() returns 0 with graph_lock already
* released if fail.
*/
@@ -5085,9 +5124,12 @@ void lockdep_init_task(struct task_struct *task)
int i;

task->xhlock_idx = UINT_MAX;
+ task->hist_id = 0;

- for (i = 0; i < CONTEXT_NR; i++)
+ for (i = 0; i < CONTEXT_NR; i++) {
task->xhlock_idx_hist[i] = UINT_MAX;
+ task->hist_id_save[i] = 0;
+ }

task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
GFP_KERNEL);
--
1.9.1

2017-08-07 07:16:15

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 10/14] pagemap.h: Remove trailing white space

Trailing white space is not accepted in kernel coding style. Remove
them.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/pagemap.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index baa9344..9717ca8 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -506,7 +506,7 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
extern void wait_on_page_bit(struct page *page, int bit_nr);
extern int wait_on_page_bit_killable(struct page *page, int bit_nr);

-/*
+/*
* Wait for a page to be unlocked.
*
* This must be called with the caller "holding" the page,
@@ -526,7 +526,7 @@ static inline int wait_on_page_locked_killable(struct page *page)
return wait_on_page_bit_killable(compound_head(page), PG_locked);
}

-/*
+/*
* Wait for a page to complete writeback
*/
static inline void wait_on_page_writeback(struct page *page)
--
1.9.1

2017-08-07 07:16:13

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 01/14] lockdep: Refactor lookup_chain_cache()

Currently, lookup_chain_cache() provides both 'lookup' and 'add'
functionalities in a function. However, each is useful. So this
patch makes lookup_chain_cache() only do 'lookup' functionality and
makes add_chain_cahce() only do 'add' functionality. And it's more
readable than before.

Signed-off-by: Byungchul Park <[email protected]>
---
kernel/locking/lockdep.c | 132 ++++++++++++++++++++++++++++++-----------------
1 file changed, 86 insertions(+), 46 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 7d2499b..9260b40 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2126,14 +2126,15 @@ static int check_no_collision(struct task_struct *curr,
}

/*
- * Look up a dependency chain. If the key is not present yet then
- * add it and return 1 - in this case the new dependency chain is
- * validated. If the key is already hashed, return 0.
- * (On return with 1 graph_lock is held.)
+ * Adds a dependency chain into chain hashtable. And must be called with
+ * graph_lock held.
+ *
+ * Return 0 if fail, and graph_lock is released.
+ * Return 1 if succeed, with graph_lock held.
*/
-static inline int lookup_chain_cache(struct task_struct *curr,
- struct held_lock *hlock,
- u64 chain_key)
+static inline int add_chain_cache(struct task_struct *curr,
+ struct held_lock *hlock,
+ u64 chain_key)
{
struct lock_class *class = hlock_class(hlock);
struct hlist_head *hash_head = chainhashentry(chain_key);
@@ -2141,49 +2142,18 @@ static inline int lookup_chain_cache(struct task_struct *curr,
int i, j;

/*
+ * Allocate a new chain entry from the static array, and add
+ * it to the hash:
+ */
+
+ /*
* We might need to take the graph lock, ensure we've got IRQs
* disabled to make this an IRQ-safe lock.. for recursion reasons
* lockdep won't complain about its own locking errors.
*/
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return 0;
- /*
- * We can walk it lock-free, because entries only get added
- * to the hash:
- */
- hlist_for_each_entry_rcu(chain, hash_head, entry) {
- if (chain->chain_key == chain_key) {
-cache_hit:
- debug_atomic_inc(chain_lookup_hits);
- if (!check_no_collision(curr, hlock, chain))
- return 0;

- if (very_verbose(class))
- printk("\nhash chain already cached, key: "
- "%016Lx tail class: [%p] %s\n",
- (unsigned long long)chain_key,
- class->key, class->name);
- return 0;
- }
- }
- if (very_verbose(class))
- printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
- (unsigned long long)chain_key, class->key, class->name);
- /*
- * Allocate a new chain entry from the static array, and add
- * it to the hash:
- */
- if (!graph_lock())
- return 0;
- /*
- * We have to walk the chain again locked - to avoid duplicates:
- */
- hlist_for_each_entry(chain, hash_head, entry) {
- if (chain->chain_key == chain_key) {
- graph_unlock();
- goto cache_hit;
- }
- }
if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
if (!debug_locks_off_graph_unlock())
return 0;
@@ -2235,6 +2205,75 @@ static inline int lookup_chain_cache(struct task_struct *curr,
return 1;
}

+/*
+ * Look up a dependency chain.
+ */
+static inline struct lock_chain *lookup_chain_cache(u64 chain_key)
+{
+ struct hlist_head *hash_head = chainhashentry(chain_key);
+ struct lock_chain *chain;
+
+ /*
+ * We can walk it lock-free, because entries only get added
+ * to the hash:
+ */
+ hlist_for_each_entry_rcu(chain, hash_head, entry) {
+ if (chain->chain_key == chain_key) {
+ debug_atomic_inc(chain_lookup_hits);
+ return chain;
+ }
+ }
+ return NULL;
+}
+
+/*
+ * If the key is not present yet in dependency chain cache then
+ * add it and return 1 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 0.
+ * (On return with 1 graph_lock is held.)
+ */
+static inline int lookup_chain_cache_add(struct task_struct *curr,
+ struct held_lock *hlock,
+ u64 chain_key)
+{
+ struct lock_class *class = hlock_class(hlock);
+ struct lock_chain *chain = lookup_chain_cache(chain_key);
+
+ if (chain) {
+cache_hit:
+ if (!check_no_collision(curr, hlock, chain))
+ return 0;
+
+ if (very_verbose(class))
+ printk("\nhash chain already cached, key: "
+ "%016Lx tail class: [%p] %s\n",
+ (unsigned long long)chain_key,
+ class->key, class->name);
+ return 0;
+ }
+
+ if (very_verbose(class))
+ printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
+ (unsigned long long)chain_key, class->key, class->name);
+
+ if (!graph_lock())
+ return 0;
+
+ /*
+ * We have to walk the chain again locked - to avoid duplicates:
+ */
+ chain = lookup_chain_cache(chain_key);
+ if (chain) {
+ graph_unlock();
+ goto cache_hit;
+ }
+
+ if (!add_chain_cache(curr, hlock, chain_key))
+ return 0;
+
+ return 1;
+}
+
static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
struct held_lock *hlock, int chain_head, u64 chain_key)
{
@@ -2245,11 +2284,11 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
*
* We look up the chain_key and do the O(N^2) check and update of
* the dependencies only if this is a new dependency chain.
- * (If lookup_chain_cache() returns with 1 it acquires
+ * (If lookup_chain_cache_add() return with 1 it acquires
* graph_lock for us)
*/
if (!hlock->trylock && hlock->check &&
- lookup_chain_cache(curr, hlock, chain_key)) {
+ lookup_chain_cache_add(curr, hlock, chain_key)) {
/*
* Check whether last held lock:
*
@@ -2280,9 +2319,10 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
if (!chain_head && ret != 2)
if (!check_prevs_add(curr, hlock))
return 0;
+
graph_unlock();
} else
- /* after lookup_chain_cache(): */
+ /* after lookup_chain_cache_add(): */
if (unlikely(!debug_locks))
return 0;

--
1.9.1

2017-08-07 07:16:11

by Byungchul Park

[permalink] [raw]
Subject: [PATCH v8 11/14] lockdep: Apply crossrelease to PG_locked locks

Although lock_page() and its family can cause deadlock, the lock
correctness validator could not be applied to them until now, becasue
things like unlock_page() might be called in a different context from
the acquisition context, which violates lockdep's assumption.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to page locks. Applied it.

Signed-off-by: Byungchul Park <[email protected]>
---
include/linux/mm_types.h | 8 ++++
include/linux/pagemap.h | 101 ++++++++++++++++++++++++++++++++++++++++++++---
lib/Kconfig.debug | 8 ++++
mm/filemap.c | 4 +-
mm/page_alloc.c | 3 ++
5 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index ff15181..f1e3dba 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -16,6 +16,10 @@

#include <asm/mmu.h>

+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+#endif
+
#ifndef AT_VECTOR_SIZE_ARCH
#define AT_VECTOR_SIZE_ARCH 0
#endif
@@ -216,6 +220,10 @@ struct page {
#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
int _last_cpupid;
#endif
+
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+ struct lockdep_map_cross map;
+#endif
}
/*
* The struct page can be forced to be double word aligned so that atomic ops
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 9717ca8..9f448c6 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -14,6 +14,9 @@
#include <linux/bitops.h>
#include <linux/hardirq.h> /* for in_interrupt() */
#include <linux/hugetlb_inline.h>
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#include <linux/lockdep.h>
+#endif

/*
* Bits in mapping->flags.
@@ -450,26 +453,91 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
return pgoff;
}

+#ifdef CONFIG_LOCKDEP_PAGELOCK
+#define lock_page_init(p) \
+do { \
+ static struct lock_class_key __key; \
+ lockdep_init_map_crosslock((struct lockdep_map *)&(p)->map, \
+ "(PG_locked)" #p, &__key, 0); \
+} while (0)
+
+static inline void lock_page_acquire(struct page *page, int try)
+{
+ page = compound_head(page);
+ lock_acquire_exclusive((struct lockdep_map *)&page->map, 0,
+ try, NULL, _RET_IP_);
+}
+
+static inline void lock_page_release(struct page *page)
+{
+ page = compound_head(page);
+ /*
+ * lock_commit_crosslock() is necessary for crosslocks.
+ */
+ lock_commit_crosslock((struct lockdep_map *)&page->map);
+ lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
+}
+#else
+static inline void lock_page_init(struct page *page) {}
+static inline void lock_page_free(struct page *page) {}
+static inline void lock_page_acquire(struct page *page, int try) {}
+static inline void lock_page_release(struct page *page) {}
+#endif
+
extern void __lock_page(struct page *page);
extern int __lock_page_killable(struct page *page);
extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags);
-extern void unlock_page(struct page *page);
+extern void do_raw_unlock_page(struct page *page);

-static inline int trylock_page(struct page *page)
+static inline void unlock_page(struct page *page)
+{
+ lock_page_release(page);
+ do_raw_unlock_page(page);
+}
+
+static inline int do_raw_trylock_page(struct page *page)
{
page = compound_head(page);
return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
}

+static inline int trylock_page(struct page *page)
+{
+ if (do_raw_trylock_page(page)) {
+ lock_page_acquire(page, 1);
+ return 1;
+ }
+ return 0;
+}
+
/*
* lock_page may only be called if we have the page's inode pinned.
*/
static inline void lock_page(struct page *page)
{
might_sleep();
- if (!trylock_page(page))
+
+ if (!do_raw_trylock_page(page))
__lock_page(page);
+ /*
+ * acquire() must be after actual lock operation for crosslocks.
+ * This way a crosslock and current lock can be ordered like:
+ *
+ * CONTEXT 1 CONTEXT 2
+ * --------- ---------
+ * lock A (cross)
+ * acquire A
+ * X = atomic_inc_return(&cross_gen_id)
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * acquire B
+ * Y = atomic_read_acquire(&cross_gen_id)
+ * lock B
+ *
+ * so that 'lock A and then lock B' can be seen globally,
+ * if X <= Y.
+ */
+ lock_page_acquire(page, 0);
}

/*
@@ -479,9 +547,20 @@ static inline void lock_page(struct page *page)
*/
static inline int lock_page_killable(struct page *page)
{
+ int ret;
+
might_sleep();
- if (!trylock_page(page))
- return __lock_page_killable(page);
+
+ if (!do_raw_trylock_page(page)) {
+ ret = __lock_page_killable(page);
+ if (ret)
+ return ret;
+ }
+ /*
+ * acquire() must be after actual lock operation for crosslocks.
+ * This way a crosslock and other locks can be ordered.
+ */
+ lock_page_acquire(page, 0);
return 0;
}

@@ -496,7 +575,17 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
unsigned int flags)
{
might_sleep();
- return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
+
+ if (do_raw_trylock_page(page) || __lock_page_or_retry(page, mm, flags)) {
+ /*
+ * acquire() must be after actual lock operation for crosslocks.
+ * This way a crosslock and other locks can be ordered.
+ */
+ lock_page_acquire(page, 0);
+ return 1;
+ }
+
+ return 0;
}

/*
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 4ba8adc..99b5f76 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1093,6 +1093,14 @@ config LOCKDEP_COMPLETE
A deadlock caused by wait_for_completion() and complete() can be
detected by lockdep using crossrelease feature.

+config LOCKDEP_PAGELOCK
+ bool "Lock debugging: allow PG_locked lock to use deadlock detector"
+ select LOCKDEP_CROSSRELEASE
+ default n
+ help
+ PG_locked lock is a kind of crosslock. Using crossrelease feature,
+ PG_locked lock can work with runtime deadlock detector.
+
config PROVE_LOCKING
bool "Lock debugging: prove locking correctness"
depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
diff --git a/mm/filemap.c b/mm/filemap.c
index a497024..0d83bf0 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1083,7 +1083,7 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
* portably (architectures that do LL/SC can test any bit, while x86 can
* test the sign bit).
*/
-void unlock_page(struct page *page)
+void do_raw_unlock_page(struct page *page)
{
BUILD_BUG_ON(PG_waiters != 7);
page = compound_head(page);
@@ -1091,7 +1091,7 @@ void unlock_page(struct page *page)
if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
wake_up_page_bit(page, PG_locked);
}
-EXPORT_SYMBOL(unlock_page);
+EXPORT_SYMBOL(do_raw_unlock_page);

/**
* end_page_writeback - end writeback against a page
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6d30e91..2cbf412 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5406,6 +5406,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
} else {
__init_single_pfn(pfn, zone, nid);
}
+#ifdef CONFIG_LOCKDEP_PAGELOCK
+ lock_page_init(pfn_to_page(pfn));
+#endif
}
}

--
1.9.1

2017-08-07 10:21:41

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha

All errors (new ones prefixed by >>):

warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING)
warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING)
In file included from include/linux/srcutree.h:28:0,
from include/linux/srcu.h:62,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:771,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/alpha/kernel/asm-offsets.c:10:
>> include/linux/completion.h:32:27: error: field 'map' has incomplete type
struct lockdep_map_cross map;
^~~
make[2]: *** [arch/alpha/kernel/asm-offsets.s] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [sub-make] Error 2

vim +/map +32 include/linux/completion.h

15
16 /*
17 * struct completion - structure used to maintain state for a "completion"
18 *
19 * This is the opaque structure used to maintain the state for a "completion".
20 * Completions currently use a FIFO to queue threads that have to wait for
21 * the "completion" event.
22 *
23 * See also: complete(), wait_for_completion() (and friends _timeout,
24 * _interruptible, _interruptible_timeout, and _killable), init_completion(),
25 * reinit_completion(), and macros DECLARE_COMPLETION(),
26 * DECLARE_COMPLETION_ONSTACK().
27 */
28 struct completion {
29 unsigned int done;
30 wait_queue_head_t wait;
31 #ifdef CONFIG_LOCKDEP_COMPLETE
> 32 struct lockdep_map_cross map;
33 #endif
34 };
35

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.79 kB)
.config.gz (49.81 kB)
Download all attachments

2017-08-07 10:36:43

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v8 11/14] lockdep: Apply crossrelease to PG_locked locks

Hi Byungchul,

[auto build test WARNING on linus/master]
[also build test WARNING on v4.13-rc4 next-20170807]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha

All warnings (new ones prefixed by >>):

warning: (LOCKDEP_COMPLETE && LOCKDEP_PAGELOCK) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING)

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (0.99 kB)
.config.gz (49.81 kB)
Download all attachments

2017-08-07 10:43:43

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v8 13/14] lockdep: Move data of CONFIG_LOCKDEP_PAGELOCK from page to page_ext

Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170807]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha

All errors (new ones prefixed by >>):

warning: (LOCKDEP_COMPLETE && LOCKDEP_PAGELOCK) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING)
warning: (LOCKDEP_COMPLETE && LOCKDEP_PAGELOCK) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING)
In file included from include/linux/srcutree.h:28:0,
from include/linux/srcu.h:62,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:771,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/alpha/kernel/asm-offsets.c:10:
include/linux/completion.h:32:27: error: field 'map' has incomplete type
struct lockdep_map_cross map;
^~~
In file included from include/linux/mm.h:23:0,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/alpha/kernel/asm-offsets.c:10:
>> include/linux/page_ext.h:49:27: error: field 'map' has incomplete type
struct lockdep_map_cross map;
^~~
make[2]: *** [arch/alpha/kernel/asm-offsets.s] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [sub-make] Error 2

vim +/map +49 include/linux/page_ext.h

37
38 /*
39 * Page Extension can be considered as an extended mem_map.
40 * A page_ext page is associated with every page descriptor. The
41 * page_ext helps us add more information about the page.
42 * All page_ext are allocated at boot or memory hotplug event,
43 * then the page_ext for pfn always exists.
44 */
45 struct page_ext {
46 unsigned long flags;
47
48 #ifdef CONFIG_LOCKDEP_PAGELOCK
> 49 struct lockdep_map_cross map;
50 #endif
51 };
52

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (2.92 kB)
.config.gz (49.81 kB)
Download all attachments

2017-08-08 00:19:11

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: cris-allmodconfig (attached as .config)
compiler: cris-linux-gcc (GCC) 6.2.0
reproduce:
wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=cris

All error/warnings (new ones prefixed by >>):

In file included from include/linux/pm.h:29:0,
from include/linux/device.h:25,
from include/linux/pci.h:30,
from drivers/usb/host/ehci-hcd.c:24:
include/linux/completion.h:32:27: error: field 'map' has incomplete type
struct lockdep_map_cross map;
^~~
In file included from include/linux/spinlock_types.h:18:0,
from include/linux/spinlock.h:81,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/usb/host/ehci-hcd.c:23:
drivers/usb/host/ehci-hub.c: In function 'ehset_single_step_set_feature':
>> include/linux/lockdep.h:578:4: error: field name not in record or union initializer
{ .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:578:4: note: (near initialization for 'done.map')
{ .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:578:25: error: field name not in record or union initializer
{ .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:578:25: note: (near initialization for 'done.map')
{ .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:579:4: error: field name not in record or union initializer
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:579:4: note: (near initialization for 'done.map')
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:579:20: error: field name not in record or union initializer
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:579:20: note: (near initialization for 'done.map')
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:575:4: error: field name not in record or union initializer
{ .nr_acquire = 0,}
^
>> include/linux/lockdep.h:579:29: note: in expansion of macro 'STATIC_CROSS_LOCK_INIT'
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^~~~~~~~~~~~~~~~~~~~~~
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:575:4: note: (near initialization for 'done.map')
{ .nr_acquire = 0,}
^
>> include/linux/lockdep.h:579:29: note: in expansion of macro 'STATIC_CROSS_LOCK_INIT'
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^~~~~~~~~~~~~~~~~~~~~~
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
--
In file included from include/linux/pm.h:29:0,
from include/linux/device.h:25,
from include/linux/genhd.h:64,
from include/linux/blkdev.h:10,
from drivers/usb/gadget/function/f_fs.c:21:
include/linux/completion.h:32:27: error: field 'map' has incomplete type
struct lockdep_map_cross map;
^~~
In file included from include/linux/rcupdate.h:42:0,
from include/linux/rculist.h:10,
from include/linux/pid.h:4,
from include/linux/sched.h:13,
from include/linux/blkdev.h:4,
from drivers/usb/gadget/function/f_fs.c:21:
drivers/usb/gadget/function/f_fs.c: In function 'ffs_epfile_io':
>> include/linux/lockdep.h:578:4: error: field name not in record or union initializer
{ .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:578:4: note: (near initialization for 'done.map')
{ .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:578:25: error: field name not in record or union initializer
{ .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:578:25: note: (near initialization for 'done.map')
{ .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:579:4: error: field name not in record or union initializer
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:579:4: note: (near initialization for 'done.map')
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:579:20: error: field name not in record or union initializer
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:579:20: note: (near initialization for 'done.map')
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:575:4: error: field name not in record or union initializer
{ .nr_acquire = 0,}
^
>> include/linux/lockdep.h:579:29: note: in expansion of macro 'STATIC_CROSS_LOCK_INIT'
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^~~~~~~~~~~~~~~~~~~~~~
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/lockdep.h:575:4: note: (near initialization for 'done.map')
{ .nr_acquire = 0,}
^
>> include/linux/lockdep.h:579:29: note: in expansion of macro 'STATIC_CROSS_LOCK_INIT'
.map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
^~~~~~~~~~~~~~~~~~~~~~
>> include/linux/completion.h:70:2: note: in expansion of macro 'STATIC_CROSS_LOCKDEP_MAP_INIT'
STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER'
struct completion work = COMPLETION_INITIALIZER(work)
^~~~~~~~~~~~~~~~~~~~~~
include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
^~~~~~~~~~~~~~~~~~
drivers/usb/gadget/function/f_fs.c:983:3: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK'
DECLARE_COMPLETION_ONSTACK(done);
^~~~~~~~~~~~~~~~~~~~~~~~~~
..

vim +578 include/linux/lockdep.h

c8ffcc97 Byungchul Park 2017-08-07 569
5ec8f43e Byungchul Park 2017-08-07 570 /*
5ec8f43e Byungchul Park 2017-08-07 571 * What we essencially have to initialize is 'nr_acquire'. Other members
5ec8f43e Byungchul Park 2017-08-07 572 * will be initialized in add_xlock().
5ec8f43e Byungchul Park 2017-08-07 573 */
5ec8f43e Byungchul Park 2017-08-07 574 #define STATIC_CROSS_LOCK_INIT() \
5ec8f43e Byungchul Park 2017-08-07 @575 { .nr_acquire = 0,}
5ec8f43e Byungchul Park 2017-08-07 576
c8ffcc97 Byungchul Park 2017-08-07 577 #define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
c8ffcc97 Byungchul Park 2017-08-07 @578 { .map.name = (_name), .map.key = (void *)(_key), \
5ec8f43e Byungchul Park 2017-08-07 @579 .map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }
c8ffcc97 Byungchul Park 2017-08-07 580

:::::: The code at line 578 was first introduced by commit
:::::: c8ffcc977b10be9026a251daeec76048b610e3d4 lockdep: Implement crossrelease feature

:::::: TO: Byungchul Park <[email protected]>
:::::: CC: 0day robot <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (22.72 kB)
.config.gz (41.99 kB)
Download all attachments

2017-08-08 00:19:15

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v8 14/14] lockdep: Crossrelease feature documentation

Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

All errors (new ones prefixed by >>):

>> ERROR: "lookup_page_ext" [net/sunrpc/sunrpc.ko] undefined!
>> ERROR: "lookup_page_ext" [mm/zsmalloc.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/xfs/xfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ufs/ufs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/udf/udf.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ubifs/ubifs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/sysv/sysv.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/squashfs/squashfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/romfs/romfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/reiserfs/reiserfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/orangefs/orangefs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ocfs2/ocfs2.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ntfs/ntfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/nilfs2/nilfs2.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/nfs/nfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/ncpfs/ncpfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/minix/minix.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/jfs/jfs.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/jffs2/jffs2.ko] undefined!
>> ERROR: "lookup_page_ext" [fs/jbd2/jbd2.ko] undefined!

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.85 kB)
.config.gz (59.59 kB)
Download all attachments

2017-08-09 09:51:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> +static inline void wait_for_completion(struct completion *x)
> +{
> + complete_acquire(x);
> + __wait_for_completion(x);
> + complete_release(x);
> +}
> +
> +static inline void wait_for_completion_io(struct completion *x)
> +{
> + complete_acquire(x);
> + __wait_for_completion_io(x);
> + complete_release(x);
> +}
> +
> +static inline int wait_for_completion_interruptible(struct completion *x)
> +{
> + int ret;
> + complete_acquire(x);
> + ret = __wait_for_completion_interruptible(x);
> + complete_release(x);
> + return ret;
> +}
> +
> +static inline int wait_for_completion_killable(struct completion *x)
> +{
> + int ret;
> + complete_acquire(x);
> + ret = __wait_for_completion_killable(x);
> + complete_release(x);
> + return ret;
> +}

I don't understand, why not change __wait_for_common() ?

2017-08-09 10:24:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote:
> On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> > +static inline void wait_for_completion(struct completion *x)
> > +{
> > + complete_acquire(x);
> > + __wait_for_completion(x);
> > + complete_release(x);
> > +}
> > +
> > +static inline void wait_for_completion_io(struct completion *x)
> > +{
> > + complete_acquire(x);
> > + __wait_for_completion_io(x);
> > + complete_release(x);
> > +}
> > +
> > +static inline int wait_for_completion_interruptible(struct completion *x)
> > +{
> > + int ret;
> > + complete_acquire(x);
> > + ret = __wait_for_completion_interruptible(x);
> > + complete_release(x);
> > + return ret;
> > +}
> > +
> > +static inline int wait_for_completion_killable(struct completion *x)
> > +{
> > + int ret;
> > + complete_acquire(x);
> > + ret = __wait_for_completion_killable(x);
> > + complete_release(x);
> > + return ret;
> > +}
>
> I don't understand, why not change __wait_for_common() ?

That is what is wrong with the below?

Yes, it adds acquire/release to the timeout variants too, but I don't
see why we should exclude those, and even if we'd want to do that, it
would be trivial:

bool timo = (timeout == MAX_SCHEDULE_TIMEOUT);

if (!timo)
complete_acquire(x);

/* ... */

if (!timo)
complete_release(x);

But like said, I think we very much want to annotate waits with timeouts
too. Hitting the max timo doesn't necessarily mean we'll make fwd
progress, we could be stuck in a loop doing something else again before
returning to wait.

Also, even if we'd make fwd progress, hitting that max timo is still not
desirable.

---
Subject: lockdep: Apply crossrelease to completions
From: Byungchul Park <[email protected]>
Date: Mon, 7 Aug 2017 16:12:56 +0900

Although wait_for_completion() and its family can cause deadlock, the
lock correctness validator could not be applied to them until now,
because things like complete() are usually called in a different context
from the waiting context, which violates lockdep's assumption.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to those completion operations. Applied it.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
include/linux/completion.h | 45 ++++++++++++++++++++++++++++++++++++++++++++-
kernel/sched/completion.c | 11 +++++++++++
lib/Kconfig.debug | 8 ++++++++
3 files changed, 63 insertions(+), 1 deletion(-)

--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
*/

#include <linux/wait.h>
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include <linux/lockdep.h>
+#endif

/*
* struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,50 @@
struct completion {
unsigned int done;
wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+ struct lockdep_map_cross map;
+#endif
};

+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+ lock_acquire_exclusive((struct lockdep_map *)&x->map, 0, 0, NULL, _RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+ lock_release((struct lockdep_map *)&x->map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+ lock_commit_crosslock((struct lockdep_map *)&x->map);
+}
+
+#define init_completion(x) \
+do { \
+ static struct lock_class_key __key; \
+ lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \
+ "(complete)" #x, \
+ &__key, 0); \
+ __init_completion(x); \
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+ { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+ STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
+#else
#define COMPLETION_INITIALIZER(work) \
{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif

#define COMPLETION_INITIALIZER_ONSTACK(work) \
({ init_completion(&work); work; })
@@ -70,7 +113,7 @@ struct completion {
* This inline function will initialize a dynamically created completion
* structure.
*/
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x)
{
x->done = 0;
init_waitqueue_head(&x->wait);
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -32,6 +32,12 @@ void complete(struct completion *x)
unsigned long flags;

spin_lock_irqsave(&x->wait.lock, flags);
+
+ /*
+ * Perform commit of crossrelease here.
+ */
+ complete_release_commit(x);
+
if (x->done != UINT_MAX)
x->done++;
__wake_up_locked(&x->wait, TASK_NORMAL, 1);
@@ -92,9 +98,14 @@ __wait_for_common(struct completion *x,
{
might_sleep();

+ complete_acquire(x);
+
spin_lock_irq(&x->wait.lock);
timeout = do_wait_for_common(x, action, timeout, state);
spin_unlock_irq(&x->wait.lock);
+
+ complete_release(x);
+
return timeout;
}

--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1085,6 +1085,14 @@ config LOCKDEP_CROSSRELEASE
such as page locks or completions can use the lock correctness
detector, lockdep.

+config LOCKDEP_COMPLETE
+ bool "Lock debugging: allow completions to use deadlock detector"
+ select LOCKDEP_CROSSRELEASE
+ default n
+ help
+ A deadlock caused by wait_for_completion() and complete() can be
+ detected by lockdep using crossrelease feature.
+
config PROVE_LOCKING
bool "Lock debugging: prove locking correctness"
depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT

2017-08-09 14:05:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 05/14] lockdep: Implement crossrelease feature

On Mon, Aug 07, 2017 at 04:12:52PM +0900, Byungchul Park wrote:
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index fffe49f..0c8a1b8 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -467,6 +520,49 @@ static inline void lockdep_on(void)
>
> #endif /* !LOCKDEP */
>
> +enum context_t {
> + HARD,
> + SOFT,
> + PROC,
> + CONTEXT_NR,
> +};

Since this is the global namespace and those being somewhat generic
names, I've renamed the lot:

+enum xhlock_context_t {
+ XHLOCK_HARD,
+ XHLOCK_SOFT,
+ XHLOCK_PROC,
+ XHLOCK_NR,
+};

2017-08-09 14:16:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Mon, Aug 07, 2017 at 04:12:53PM +0900, Byungchul Park wrote:
> @@ -4773,14 +4784,28 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> */
> void crossrelease_hist_start(enum context_t c)
> {
> - if (current->xhlocks)
> - current->xhlock_idx_hist[c] = current->xhlock_idx;
> + struct task_struct *cur = current;
> +
> + if (cur->xhlocks) {
> + cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> + cur->hist_id_save[c] = cur->hist_id;
> + }
> }
>
> void crossrelease_hist_end(enum context_t c)
> {
> - if (current->xhlocks)
> - current->xhlock_idx = current->xhlock_idx_hist[c];
> + struct task_struct *cur = current;
> +
> + if (cur->xhlocks) {
> + unsigned int idx = cur->xhlock_idx_hist[c];
> + struct hist_lock *h = &xhlock(idx);
> +
> + cur->xhlock_idx = idx;
> +
> + /* Check if the ring was overwritten. */
> + if (h->hist_id != cur->hist_id_save[c])
> + invalidate_xhlock(h);
> + }
> }
>
> static int cross_lock(struct lockdep_map *lock)
> @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock *hlock)
> * Check if the xhlock is valid, which would be false if,
> *
> * 1. Has not used after initializaion yet.
> + * 2. Got invalidated.
> *
> * Remind hist_lock is implemented as a ring buffer.
> */
> @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
>
> /* Initialize hist_lock's members */
> xhlock->hlock = *hlock;
> + xhlock->hist_id = current->hist_id++;
>
> xhlock->trace.nr_entries = 0;
> xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;


Hehe, _another_ scheme...

Yes I think this works.. but I had just sort of understood the last one.

How about I do this on top? That I think is a combination of what I
proposed last and your single invalidate thing. Combined they solve the
problem with the least amount of extra storage (a single int).


---
Subject: lockdep: Simplify xhlock ring buffer invalidation
From: Peter Zijlstra <[email protected]>
Date: Wed Aug 9 15:31:27 CEST 2017


Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
include/linux/lockdep.h | 20 -----------
include/linux/sched.h | 4 --
kernel/locking/lockdep.c | 82 ++++++++++++++++++++++++++++++-----------------
3 files changed, 54 insertions(+), 52 deletions(-)

--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -284,26 +284,6 @@ struct held_lock {
*/
struct hist_lock {
/*
- * Id for each entry in the ring buffer. This is used to
- * decide whether the ring buffer was overwritten or not.
- *
- * For example,
- *
- * |<----------- hist_lock ring buffer size ------->|
- * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
- * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
- *
- * where 'p' represents an acquisition in process
- * context, 'i' represents an acquisition in irq
- * context.
- *
- * In this example, the ring buffer was overwritten by
- * acquisitions in irq context, that should be detected on
- * rollback or commit.
- */
- unsigned int hist_id;
-
- /*
* Seperate stack_trace data. This will be used at commit step.
*/
struct stack_trace trace;
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -855,9 +855,7 @@ struct task_struct {
unsigned int xhlock_idx;
/* For restoring at history boundaries */
unsigned int xhlock_idx_hist[XHLOCK_NR];
- unsigned int hist_id;
- /* For overwrite check at each context exit */
- unsigned int hist_id_save[XHLOCK_NR];
+ unsigned int xhlock_idx_max;
#endif

#ifdef CONFIG_UBSAN
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4818,26 +4818,65 @@ void crossrelease_hist_start(enum contex
{
struct task_struct *cur = current;

- if (cur->xhlocks) {
+ if (cur->xhlocks)
cur->xhlock_idx_hist[c] = cur->xhlock_idx;
- cur->hist_id_save[c] = cur->hist_id;
- }
}

void crossrelease_hist_end(enum context_t c)
{
struct task_struct *cur = current;
+ unsigned int idx;

- if (cur->xhlocks) {
- unsigned int idx = cur->xhlock_idx_hist[c];
- struct hist_lock *h = &xhlock(idx);
-
- cur->xhlock_idx = idx;
-
- /* Check if the ring was overwritten. */
- if (h->hist_id != cur->hist_id_save[c])
- invalidate_xhlock(h);
- }
+ if (!cur->xhlocks)
+ return;
+
+ idx = cur->xhlock_idx_hist[c];
+ cur->xhlock_idx = idx;
+
+ /*
+ * A bit of magic here.. this deals with rewinding the (cyclic) history
+ * array further than its size. IOW. looses the complete history.
+ *
+ * We detect this by tracking the previous oldest entry we've (over)
+ * written in @xhlock_idx_max, this means the next entry is the oldest
+ * entry still in the buffer, ie. its tail.
+ *
+ * So when we restore an @xhlock_idx that is at least MAX_XHLOCKS_NR
+ * older than @xhlock_idx_max we know we've just wiped the entire
+ * history.
+ */
+ if ((cur->xhlock_idx_max - idx) < MAX_XHLOCKS_NR)
+ return;
+
+ /*
+ * Now that we know the buffer is effectively empty, reset our state
+ * such that it appears empty (without in fact clearing the entire
+ * buffer).
+ *
+ * Pick @idx as the 'new' beginning, (re)set all save-points to not
+ * rewind past it and reset the max. Then invalidate this idx such that
+ * commit_xhlocks() will never rewind past it. Since xhlock_idx_inc()
+ * will return the _next_ entry, we'll not overwrite this invalid entry
+ * until the entire buffer is full again.
+ */
+ for (c = 0; c < XHLOCK_NR; c++)
+ cur->xhlock_idx_hist[c] = idx;
+ cur->xhlock_idx_max = idx;
+ invalidate_xhlock(&xhlock(idx));
+}
+
+static inline unsigned int xhlock_idx_inc(void)
+{
+ struct task_struct *cur = current;
+ unsigned int idx = ++cur->xhlock_idx;
+
+ /*
+ * As per the requirement in crossrelease_hist_end(), track the tail.
+ */
+ if ((int)(cur->xhlock_idx_max - idx) < 0)
+ cur->xhlock_idx_max = idx;
+
+ return idx;
}

static int cross_lock(struct lockdep_map *lock)
@@ -4902,7 +4941,7 @@ static inline int xhlock_valid(struct hi
*/
static void add_xhlock(struct held_lock *hlock)
{
- unsigned int idx = ++current->xhlock_idx;
+ unsigned int idx = xhlock_idx_inc();
struct hist_lock *xhlock = &xhlock(idx);

#ifdef CONFIG_DEBUG_LOCKDEP
@@ -4915,7 +4954,6 @@ static void add_xhlock(struct held_lock

/* Initialize hist_lock's members */
xhlock->hlock = *hlock;
- xhlock->hist_id = current->hist_id++;

xhlock->trace.nr_entries = 0;
xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
@@ -5071,7 +5109,6 @@ static int commit_xhlock(struct cross_lo
static void commit_xhlocks(struct cross_lock *xlock)
{
unsigned int cur = current->xhlock_idx;
- unsigned int prev_hist_id = xhlock(cur).hist_id;
unsigned int i;

if (!graph_lock())
@@ -5091,17 +5128,6 @@ static void commit_xhlocks(struct cross_
break;

/*
- * Filter out the cases that the ring buffer was
- * overwritten and the previous entry has a bigger
- * hist_id than the following one, which is impossible
- * otherwise.
- */
- if (unlikely(before(xhlock->hist_id, prev_hist_id)))
- break;
-
- prev_hist_id = xhlock->hist_id;
-
- /*
* commit_xhlock() returns 0 with graph_lock already
* released if fail.
*/
@@ -5186,11 +5212,9 @@ void lockdep_init_task(struct task_struc
int i;

task->xhlock_idx = UINT_MAX;
- task->hist_id = 0;

for (i = 0; i < XHLOCK_NR; i++) {
task->xhlock_idx_hist[i] = UINT_MAX;
- task->hist_id_save[i] = 0;
}

task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,

2017-08-09 15:51:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature



Heh, look what it does...


4======================================================
4WARNING: possible circular locking dependency detected
4.13.0-rc2-00317-gadc6764a3adf-dirty #797 Tainted: G W
4------------------------------------------------------
4startpar/582 is trying to acquire lock:
c (c(complete)&barr->donec){+.+.}c, at: [<ffffffff8110de4d>] flush_work+0x1fd/0x2c0
4
but task is already holding lock:
c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
4
which lock already depends on the new lock.

4
the existing dependency chain (in reverse order) is:

-> #4c (clockc#3c){+.+.}c:
__lock_acquire+0x10a5/0x1100
lock_acquire+0xea/0x1f0
__mutex_lock+0x6c/0x960
mutex_lock_nested+0x1b/0x20
lru_add_drain_all_cpuslocked+0x46/0x1a0
lru_add_drain_all+0x13/0x20
SyS_mlockall+0xb8/0x1c0
entry_SYSCALL_64_fastpath+0x23/0xc2

-> #3c (ccpu_hotplug_lock.rw_semc){++++}c:
__lock_acquire+0x10a5/0x1100
lock_acquire+0xea/0x1f0
cpus_read_lock+0x2a/0x90
kmem_cache_create+0x2a/0x1d0
scsi_init_sense_cache+0xa0/0xc0
scsi_add_host_with_dma+0x67/0x360
isci_pci_probe+0x873/0xc90
local_pci_probe+0x42/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x273/0x6b0
worker_thread+0x21b/0x3f0
kthread+0x147/0x180
ret_from_fork+0x2a/0x40

-> #2c (cscsi_sense_cache_mutexc){+.+.}c:
__lock_acquire+0x10a5/0x1100
lock_acquire+0xea/0x1f0
__mutex_lock+0x6c/0x960
mutex_lock_nested+0x1b/0x20
scsi_init_sense_cache+0x3d/0xc0
scsi_add_host_with_dma+0x67/0x360
isci_pci_probe+0x873/0xc90
local_pci_probe+0x42/0xa0
work_for_cpu_fn+0x14/0x20
process_one_work+0x273/0x6b0
worker_thread+0x21b/0x3f0
kthread+0x147/0x180
ret_from_fork+0x2a/0x40

-> #1c (c(&wfc.work)c){+.+.}c:
process_one_work+0x244/0x6b0
worker_thread+0x21b/0x3f0
kthread+0x147/0x180
ret_from_fork+0x2a/0x40
0xffffffffffffffff

-> #0c (c(complete)&barr->donec){+.+.}c:
check_prev_add+0x3be/0x700
__lock_acquire+0x10a5/0x1100
lock_acquire+0xea/0x1f0
wait_for_completion+0x3b/0x130
flush_work+0x1fd/0x2c0
lru_add_drain_all_cpuslocked+0x158/0x1a0
lru_add_drain_all+0x13/0x20
SyS_mlockall+0xb8/0x1c0
entry_SYSCALL_64_fastpath+0x23/0xc2

other info that might help us debug this:

Chain exists of:
c(complete)&barr->donec --> ccpu_hotplug_lock.rw_semc --> clockc#3c

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(clockc#3c);
lock(ccpu_hotplug_lock.rw_semc);
lock(clockc#3c);
lock(c(complete)&barr->donec);

*** DEADLOCK ***

2 locks held by startpar/582:
#0: c (ccpu_hotplug_lock.rw_semc){++++}c, at: [<ffffffff8122e9ce>] lru_add_drain_all+0xe/0x20
#1: c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0

stack backtrace:
dCPU: 23 PID: 582 Comm: startpar Tainted: G W 4.13.0-rc2-00317-gadc6764a3adf-dirty #797
dHardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
dCall Trace:
d dump_stack+0x86/0xcf
d print_circular_bug+0x203/0x2f0
d check_prev_add+0x3be/0x700
d ? add_lock_to_list.isra.30+0xc0/0xc0
d ? is_bpf_text_address+0x82/0xe0
d ? unwind_get_return_address+0x1f/0x30
d __lock_acquire+0x10a5/0x1100
d ? __lock_acquire+0x10a5/0x1100
d ? add_lock_to_list.isra.30+0xc0/0xc0
d lock_acquire+0xea/0x1f0
d ? flush_work+0x1fd/0x2c0
d wait_for_completion+0x3b/0x130
d ? flush_work+0x1fd/0x2c0
d flush_work+0x1fd/0x2c0
d ? flush_workqueue_prep_pwqs+0x1c0/0x1c0
d ? trace_hardirqs_on+0xd/0x10
d lru_add_drain_all_cpuslocked+0x158/0x1a0
d lru_add_drain_all+0x13/0x20
d SyS_mlockall+0xb8/0x1c0
d entry_SYSCALL_64_fastpath+0x23/0xc2
dRIP: 0033:0x7f818d2e54c7
dRSP: 002b:00007fffcce83798 EFLAGS: 00000246c ORIG_RAX: 0000000000000097
dRAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f818d2e54c7
dRDX: 0000000000000000 RSI: 00007fffcce83650 RDI: 0000000000000003
dRBP: 000000000002c010 R08: 0000000000000000 R09: 0000000000000000
dR10: 0000000000000008 R11: 0000000000000246 R12: 000000000002d000
dR13: 000000000002c010 R14: 0000000000001000 R15: 00007f818d599b00

2017-08-10 00:57:12

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 09, 2017 at 05:50:59PM +0200, Peter Zijlstra wrote:
>
>
> Heh, look what it does...

It does not happen in my machine..

I tihink it happens because of "Simplify xhlock ring buffer invalidation"
patch of you.

First of all, could you reverse yours and check if it happens, too?
If not, we have to think the simplification more.

BTW, does your patch consider the possibility that a worker and irqs can
be nested? Is it no problem even in the case?

>
>
> 4======================================================
> 4WARNING: possible circular locking dependency detected
> 4.13.0-rc2-00317-gadc6764a3adf-dirty #797 Tainted: G W
> 4------------------------------------------------------
> 4startpar/582 is trying to acquire lock:
> c (c(complete)&barr->donec){+.+.}c, at: [<ffffffff8110de4d>] flush_work+0x1fd/0x2c0
> 4
> but task is already holding lock:
> c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
> 4
> which lock already depends on the new lock.
>
> 4
> the existing dependency chain (in reverse order) is:
>
> -> #4c (clockc#3c){+.+.}c:
> __lock_acquire+0x10a5/0x1100
> lock_acquire+0xea/0x1f0
> __mutex_lock+0x6c/0x960
> mutex_lock_nested+0x1b/0x20
> lru_add_drain_all_cpuslocked+0x46/0x1a0
> lru_add_drain_all+0x13/0x20
> SyS_mlockall+0xb8/0x1c0
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> -> #3c (ccpu_hotplug_lock.rw_semc){++++}c:
> __lock_acquire+0x10a5/0x1100
> lock_acquire+0xea/0x1f0
> cpus_read_lock+0x2a/0x90
> kmem_cache_create+0x2a/0x1d0
> scsi_init_sense_cache+0xa0/0xc0
> scsi_add_host_with_dma+0x67/0x360
> isci_pci_probe+0x873/0xc90
> local_pci_probe+0x42/0xa0
> work_for_cpu_fn+0x14/0x20
> process_one_work+0x273/0x6b0
> worker_thread+0x21b/0x3f0
> kthread+0x147/0x180
> ret_from_fork+0x2a/0x40
>
> -> #2c (cscsi_sense_cache_mutexc){+.+.}c:
> __lock_acquire+0x10a5/0x1100
> lock_acquire+0xea/0x1f0
> __mutex_lock+0x6c/0x960
> mutex_lock_nested+0x1b/0x20
> scsi_init_sense_cache+0x3d/0xc0
> scsi_add_host_with_dma+0x67/0x360
> isci_pci_probe+0x873/0xc90
> local_pci_probe+0x42/0xa0
> work_for_cpu_fn+0x14/0x20
> process_one_work+0x273/0x6b0
> worker_thread+0x21b/0x3f0
> kthread+0x147/0x180
> ret_from_fork+0x2a/0x40
>
> -> #1c (c(&wfc.work)c){+.+.}c:
> process_one_work+0x244/0x6b0
> worker_thread+0x21b/0x3f0
> kthread+0x147/0x180
> ret_from_fork+0x2a/0x40
> 0xffffffffffffffff
>
> -> #0c (c(complete)&barr->donec){+.+.}c:
> check_prev_add+0x3be/0x700
> __lock_acquire+0x10a5/0x1100
> lock_acquire+0xea/0x1f0
> wait_for_completion+0x3b/0x130
> flush_work+0x1fd/0x2c0
> lru_add_drain_all_cpuslocked+0x158/0x1a0
> lru_add_drain_all+0x13/0x20
> SyS_mlockall+0xb8/0x1c0
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> other info that might help us debug this:
>
> Chain exists of:
> c(complete)&barr->donec --> ccpu_hotplug_lock.rw_semc --> clockc#3c
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(clockc#3c);
> lock(ccpu_hotplug_lock.rw_semc);
> lock(clockc#3c);
> lock(c(complete)&barr->donec);
>
> *** DEADLOCK ***
>
> 2 locks held by startpar/582:
> #0: c (ccpu_hotplug_lock.rw_semc){++++}c, at: [<ffffffff8122e9ce>] lru_add_drain_all+0xe/0x20
> #1: c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
>
> stack backtrace:
> dCPU: 23 PID: 582 Comm: startpar Tainted: G W 4.13.0-rc2-00317-gadc6764a3adf-dirty #797
> dHardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> dCall Trace:
> d dump_stack+0x86/0xcf
> d print_circular_bug+0x203/0x2f0
> d check_prev_add+0x3be/0x700
> d ? add_lock_to_list.isra.30+0xc0/0xc0
> d ? is_bpf_text_address+0x82/0xe0
> d ? unwind_get_return_address+0x1f/0x30
> d __lock_acquire+0x10a5/0x1100
> d ? __lock_acquire+0x10a5/0x1100
> d ? add_lock_to_list.isra.30+0xc0/0xc0
> d lock_acquire+0xea/0x1f0
> d ? flush_work+0x1fd/0x2c0
> d wait_for_completion+0x3b/0x130
> d ? flush_work+0x1fd/0x2c0
> d flush_work+0x1fd/0x2c0
> d ? flush_workqueue_prep_pwqs+0x1c0/0x1c0
> d ? trace_hardirqs_on+0xd/0x10
> d lru_add_drain_all_cpuslocked+0x158/0x1a0
> d lru_add_drain_all+0x13/0x20
> d SyS_mlockall+0xb8/0x1c0
> d entry_SYSCALL_64_fastpath+0x23/0xc2
> dRIP: 0033:0x7f818d2e54c7
> dRSP: 002b:00007fffcce83798 EFLAGS: 00000246c ORIG_RAX: 0000000000000097
> dRAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f818d2e54c7
> dRDX: 0000000000000000 RSI: 00007fffcce83650 RDI: 0000000000000003
> dRBP: 000000000002c010 R08: 0000000000000000 R09: 0000000000000000
> dR10: 0000000000000008 R11: 0000000000000246 R12: 000000000002d000
> dR13: 000000000002c010 R14: 0000000000001000 R15: 00007f818d599b00

2017-08-10 01:25:46

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

On Wed, Aug 09, 2017 at 12:24:39PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> > > +static inline void wait_for_completion(struct completion *x)
> > > +{
> > > + complete_acquire(x);
> > > + __wait_for_completion(x);
> > > + complete_release(x);
> > > +}
> > > +
> > > +static inline void wait_for_completion_io(struct completion *x)
> > > +{
> > > + complete_acquire(x);
> > > + __wait_for_completion_io(x);
> > > + complete_release(x);
> > > +}
> > > +
> > > +static inline int wait_for_completion_interruptible(struct completion *x)
> > > +{
> > > + int ret;
> > > + complete_acquire(x);
> > > + ret = __wait_for_completion_interruptible(x);
> > > + complete_release(x);
> > > + return ret;
> > > +}
> > > +
> > > +static inline int wait_for_completion_killable(struct completion *x)
> > > +{
> > > + int ret;
> > > + complete_acquire(x);
> > > + ret = __wait_for_completion_killable(x);
> > > + complete_release(x);
> > > + return ret;
> > > +}
> >
> > I don't understand, why not change __wait_for_common() ?
>
> That is what is wrong with the below?
>
> Yes, it adds acquire/release to the timeout variants too, but I don't

Yes, I didn't want to involve them in lockdep play which reports _deadlock_
warning since it's not a dependency causing a deadlock.

> see why we should exclude those, and even if we'd want to do that, it
> would be trivial:
>
> bool timo = (timeout == MAX_SCHEDULE_TIMEOUT);
>
> if (!timo)
> complete_acquire(x);
>
> /* ... */
>
> if (!timo)
> complete_release(x);

Yes, frankly I wanted to use this.. but skip it.

> But like said, I think we very much want to annotate waits with timeouts
> too. Hitting the max timo doesn't necessarily mean we'll make fwd
> progress, we could be stuck in a loop doing something else again before
> returning to wait.

In that case, it should be detected by other dependencies which makes
problems, not the dependency by wait_for_complete().

> Also, even if we'd make fwd progress, hitting that max timo is still not
> desirable.

It's not desirable but it's not a dependency causing a deadlock, so I did
not want to _deadlock_ warning in that cases.. I didn't want to abuse
lockdep reports..

However, it's OK if you think it's worth warning even in that cases.

Thank you very much,
Byungchul

2017-08-10 01:32:10

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 05/14] lockdep: Implement crossrelease feature

On Wed, Aug 09, 2017 at 04:05:35PM +0200, Peter Zijlstra wrote:
> On Mon, Aug 07, 2017 at 04:12:52PM +0900, Byungchul Park wrote:
> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index fffe49f..0c8a1b8 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -467,6 +520,49 @@ static inline void lockdep_on(void)
> >
> > #endif /* !LOCKDEP */
> >
> > +enum context_t {
> > + HARD,
> > + SOFT,
> > + PROC,
> > + CONTEXT_NR,
> > +};
>
> Since this is the global namespace and those being somewhat generic
> names, I've renamed the lot:
>
> +enum xhlock_context_t {
> + XHLOCK_HARD,
> + XHLOCK_SOFT,
> + XHLOCK_PROC,
> + XHLOCK_NR,
> +};

I like it. Thank you.

With a little feedback, it rather makes us a bit confused between
XHLOCK_NR and MAX_XHLOCK_NR. what about the following?

+enum xhlock_context_t {
+ XHLOCK_HARD,
+ XHLOCK_SOFT,
+ XHLOCK_PROC,
+ XHLOCK_CXT_NR,
+};

But it's trivial. I like yours, too.

2017-08-10 01:33:32

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Wed, Aug 09, 2017 at 04:16:05PM +0200, Peter Zijlstra wrote:
> Hehe, _another_ scheme...
>
> Yes I think this works.. but I had just sort of understood the last one.
>
> How about I do this on top? That I think is a combination of what I
> proposed last and your single invalidate thing. Combined they solve the
> problem with the least amount of extra storage (a single int).

I'm sorry for saying that.. I'm not sure if this works well.

> ---
> Subject: lockdep: Simplify xhlock ring buffer invalidation
> From: Peter Zijlstra <[email protected]>
> Date: Wed Aug 9 15:31:27 CEST 2017
>
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> include/linux/lockdep.h | 20 -----------
> include/linux/sched.h | 4 --
> kernel/locking/lockdep.c | 82 ++++++++++++++++++++++++++++++-----------------
> 3 files changed, 54 insertions(+), 52 deletions(-)
>
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -284,26 +284,6 @@ struct held_lock {
> */
> struct hist_lock {
> /*
> - * Id for each entry in the ring buffer. This is used to
> - * decide whether the ring buffer was overwritten or not.
> - *
> - * For example,
> - *
> - * |<----------- hist_lock ring buffer size ------->|
> - * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> - * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
> - *
> - * where 'p' represents an acquisition in process
> - * context, 'i' represents an acquisition in irq
> - * context.
> - *
> - * In this example, the ring buffer was overwritten by
> - * acquisitions in irq context, that should be detected on
> - * rollback or commit.
> - */
> - unsigned int hist_id;
> -
> - /*
> * Seperate stack_trace data. This will be used at commit step.
> */
> struct stack_trace trace;
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -855,9 +855,7 @@ struct task_struct {
> unsigned int xhlock_idx;
> /* For restoring at history boundaries */
> unsigned int xhlock_idx_hist[XHLOCK_NR];
> - unsigned int hist_id;
> - /* For overwrite check at each context exit */
> - unsigned int hist_id_save[XHLOCK_NR];
> + unsigned int xhlock_idx_max;
> #endif
>
> #ifdef CONFIG_UBSAN
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4818,26 +4818,65 @@ void crossrelease_hist_start(enum contex
> {
> struct task_struct *cur = current;
>
> - if (cur->xhlocks) {
> + if (cur->xhlocks)
> cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> - cur->hist_id_save[c] = cur->hist_id;
> - }
> }
>
> void crossrelease_hist_end(enum context_t c)
> {
> struct task_struct *cur = current;
> + unsigned int idx;
>
> - if (cur->xhlocks) {
> - unsigned int idx = cur->xhlock_idx_hist[c];
> - struct hist_lock *h = &xhlock(idx);
> -
> - cur->xhlock_idx = idx;
> -
> - /* Check if the ring was overwritten. */
> - if (h->hist_id != cur->hist_id_save[c])
> - invalidate_xhlock(h);
> - }
> + if (!cur->xhlocks)
> + return;
> +
> + idx = cur->xhlock_idx_hist[c];
> + cur->xhlock_idx = idx;
> +
> + /*
> + * A bit of magic here.. this deals with rewinding the (cyclic) history
> + * array further than its size. IOW. looses the complete history.
> + *
> + * We detect this by tracking the previous oldest entry we've (over)
> + * written in @xhlock_idx_max, this means the next entry is the oldest
> + * entry still in the buffer, ie. its tail.
> + *
> + * So when we restore an @xhlock_idx that is at least MAX_XHLOCKS_NR
> + * older than @xhlock_idx_max we know we've just wiped the entire
> + * history.
> + */
> + if ((cur->xhlock_idx_max - idx) < MAX_XHLOCKS_NR)
> + return;
> +
> + /*
> + * Now that we know the buffer is effectively empty, reset our state
> + * such that it appears empty (without in fact clearing the entire
> + * buffer).
> + *
> + * Pick @idx as the 'new' beginning, (re)set all save-points to not
> + * rewind past it and reset the max. Then invalidate this idx such that
> + * commit_xhlocks() will never rewind past it. Since xhlock_idx_inc()
> + * will return the _next_ entry, we'll not overwrite this invalid entry
> + * until the entire buffer is full again.
> + */
> + for (c = 0; c < XHLOCK_NR; c++)
> + cur->xhlock_idx_hist[c] = idx;
> + cur->xhlock_idx_max = idx;
> + invalidate_xhlock(&xhlock(idx));
> +}
> +
> +static inline unsigned int xhlock_idx_inc(void)
> +{
> + struct task_struct *cur = current;
> + unsigned int idx = ++cur->xhlock_idx;
> +
> + /*
> + * As per the requirement in crossrelease_hist_end(), track the tail.
> + */
> + if ((int)(cur->xhlock_idx_max - idx) < 0)
> + cur->xhlock_idx_max = idx;
> +
> + return idx;
> }
>
> static int cross_lock(struct lockdep_map *lock)
> @@ -4902,7 +4941,7 @@ static inline int xhlock_valid(struct hi
> */
> static void add_xhlock(struct held_lock *hlock)
> {
> - unsigned int idx = ++current->xhlock_idx;
> + unsigned int idx = xhlock_idx_inc();
> struct hist_lock *xhlock = &xhlock(idx);
>
> #ifdef CONFIG_DEBUG_LOCKDEP
> @@ -4915,7 +4954,6 @@ static void add_xhlock(struct held_lock
>
> /* Initialize hist_lock's members */
> xhlock->hlock = *hlock;
> - xhlock->hist_id = current->hist_id++;
>
> xhlock->trace.nr_entries = 0;
> xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> @@ -5071,7 +5109,6 @@ static int commit_xhlock(struct cross_lo
> static void commit_xhlocks(struct cross_lock *xlock)
> {
> unsigned int cur = current->xhlock_idx;
> - unsigned int prev_hist_id = xhlock(cur).hist_id;
> unsigned int i;
>
> if (!graph_lock())
> @@ -5091,17 +5128,6 @@ static void commit_xhlocks(struct cross_
> break;
>
> /*
> - * Filter out the cases that the ring buffer was
> - * overwritten and the previous entry has a bigger
> - * hist_id than the following one, which is impossible
> - * otherwise.
> - */
> - if (unlikely(before(xhlock->hist_id, prev_hist_id)))
> - break;
> -
> - prev_hist_id = xhlock->hist_id;
> -
> - /*
> * commit_xhlock() returns 0 with graph_lock already
> * released if fail.
> */
> @@ -5186,11 +5212,9 @@ void lockdep_init_task(struct task_struc
> int i;
>
> task->xhlock_idx = UINT_MAX;
> - task->hist_id = 0;
>
> for (i = 0; i < XHLOCK_NR; i++) {
> task->xhlock_idx_hist[i] = UINT_MAX;
> - task->hist_id_save[i] = 0;
> }
>
> task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,

2017-08-10 01:36:17

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 11/14] lockdep: Apply crossrelease to PG_locked locks

On Mon, Aug 07, 2017 at 04:12:58PM +0900, Byungchul Park wrote:
> Although lock_page() and its family can cause deadlock, the lock
> correctness validator could not be applied to them until now, becasue
> things like unlock_page() might be called in a different context from
> the acquisition context, which violates lockdep's assumption.
>
> Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> detector to page locks. Applied it.

Is there any reason excluding applying it into PG_locked?

2017-08-10 03:48:34

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Thu, Aug 10, 2017 at 09:55:56AM +0900, Byungchul Park wrote:
> On Wed, Aug 09, 2017 at 05:50:59PM +0200, Peter Zijlstra wrote:
> >
> >
> > Heh, look what it does...
>
> It does not happen in my machine..
>
> I tihink it happens because of "Simplify xhlock ring buffer invalidation"
> patch of you.
>
> First of all, could you reverse yours and check if it happens, too?
> If not, we have to think the simplification more.
>
> BTW, does your patch consider the possibility that a worker and irqs can
> be nested? Is it no problem even in the case?

In addition, now that each syscall context is isolated by your suggestion
with crossrelease_hist_end() and crossrelease_hist_start(), contexts can
be nested easily. I want to keep my patches unchanged at first and change
code carefully.

>
> >
> >
> > 4======================================================
> > 4WARNING: possible circular locking dependency detected
> > 4.13.0-rc2-00317-gadc6764a3adf-dirty #797 Tainted: G W
> > 4------------------------------------------------------
> > 4startpar/582 is trying to acquire lock:
> > c (c(complete)&barr->donec){+.+.}c, at: [<ffffffff8110de4d>] flush_work+0x1fd/0x2c0
> > 4
> > but task is already holding lock:
> > c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
> > 4
> > which lock already depends on the new lock.
> >
> > 4
> > the existing dependency chain (in reverse order) is:
> >
> > -> #4c (clockc#3c){+.+.}c:
> > __lock_acquire+0x10a5/0x1100
> > lock_acquire+0xea/0x1f0
> > __mutex_lock+0x6c/0x960
> > mutex_lock_nested+0x1b/0x20
> > lru_add_drain_all_cpuslocked+0x46/0x1a0
> > lru_add_drain_all+0x13/0x20
> > SyS_mlockall+0xb8/0x1c0
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> >
> > -> #3c (ccpu_hotplug_lock.rw_semc){++++}c:
> > __lock_acquire+0x10a5/0x1100
> > lock_acquire+0xea/0x1f0
> > cpus_read_lock+0x2a/0x90
> > kmem_cache_create+0x2a/0x1d0
> > scsi_init_sense_cache+0xa0/0xc0
> > scsi_add_host_with_dma+0x67/0x360
> > isci_pci_probe+0x873/0xc90
> > local_pci_probe+0x42/0xa0
> > work_for_cpu_fn+0x14/0x20
> > process_one_work+0x273/0x6b0
> > worker_thread+0x21b/0x3f0
> > kthread+0x147/0x180
> > ret_from_fork+0x2a/0x40
> >
> > -> #2c (cscsi_sense_cache_mutexc){+.+.}c:
> > __lock_acquire+0x10a5/0x1100
> > lock_acquire+0xea/0x1f0
> > __mutex_lock+0x6c/0x960
> > mutex_lock_nested+0x1b/0x20
> > scsi_init_sense_cache+0x3d/0xc0
> > scsi_add_host_with_dma+0x67/0x360
> > isci_pci_probe+0x873/0xc90
> > local_pci_probe+0x42/0xa0
> > work_for_cpu_fn+0x14/0x20
> > process_one_work+0x273/0x6b0
> > worker_thread+0x21b/0x3f0
> > kthread+0x147/0x180
> > ret_from_fork+0x2a/0x40
> >
> > -> #1c (c(&wfc.work)c){+.+.}c:
> > process_one_work+0x244/0x6b0
> > worker_thread+0x21b/0x3f0
> > kthread+0x147/0x180
> > ret_from_fork+0x2a/0x40
> > 0xffffffffffffffff
> >
> > -> #0c (c(complete)&barr->donec){+.+.}c:
> > check_prev_add+0x3be/0x700
> > __lock_acquire+0x10a5/0x1100
> > lock_acquire+0xea/0x1f0
> > wait_for_completion+0x3b/0x130
> > flush_work+0x1fd/0x2c0
> > lru_add_drain_all_cpuslocked+0x158/0x1a0
> > lru_add_drain_all+0x13/0x20
> > SyS_mlockall+0xb8/0x1c0
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> > c(complete)&barr->donec --> ccpu_hotplug_lock.rw_semc --> clockc#3c
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(clockc#3c);
> > lock(ccpu_hotplug_lock.rw_semc);
> > lock(clockc#3c);
> > lock(c(complete)&barr->donec);
> >
> > *** DEADLOCK ***
> >
> > 2 locks held by startpar/582:
> > #0: c (ccpu_hotplug_lock.rw_semc){++++}c, at: [<ffffffff8122e9ce>] lru_add_drain_all+0xe/0x20
> > #1: c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
> >
> > stack backtrace:
> > dCPU: 23 PID: 582 Comm: startpar Tainted: G W 4.13.0-rc2-00317-gadc6764a3adf-dirty #797
> > dHardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> > dCall Trace:
> > d dump_stack+0x86/0xcf
> > d print_circular_bug+0x203/0x2f0
> > d check_prev_add+0x3be/0x700
> > d ? add_lock_to_list.isra.30+0xc0/0xc0
> > d ? is_bpf_text_address+0x82/0xe0
> > d ? unwind_get_return_address+0x1f/0x30
> > d __lock_acquire+0x10a5/0x1100
> > d ? __lock_acquire+0x10a5/0x1100
> > d ? add_lock_to_list.isra.30+0xc0/0xc0
> > d lock_acquire+0xea/0x1f0
> > d ? flush_work+0x1fd/0x2c0
> > d wait_for_completion+0x3b/0x130
> > d ? flush_work+0x1fd/0x2c0
> > d flush_work+0x1fd/0x2c0
> > d ? flush_workqueue_prep_pwqs+0x1c0/0x1c0
> > d ? trace_hardirqs_on+0xd/0x10
> > d lru_add_drain_all_cpuslocked+0x158/0x1a0
> > d lru_add_drain_all+0x13/0x20
> > d SyS_mlockall+0xb8/0x1c0
> > d entry_SYSCALL_64_fastpath+0x23/0xc2
> > dRIP: 0033:0x7f818d2e54c7
> > dRSP: 002b:00007fffcce83798 EFLAGS: 00000246c ORIG_RAX: 0000000000000097
> > dRAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f818d2e54c7
> > dRDX: 0000000000000000 RSI: 00007fffcce83650 RDI: 0000000000000003
> > dRBP: 000000000002c010 R08: 0000000000000000 R09: 0000000000000000
> > dR10: 0000000000000008 R11: 0000000000000246 R12: 000000000002d000
> > dR13: 000000000002c010 R14: 0000000000001000 R15: 00007f818d599b00

2017-08-10 09:21:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 05/14] lockdep: Implement crossrelease feature

On Thu, Aug 10, 2017 at 10:30:54AM +0900, Byungchul Park wrote:

> With a little feedback, it rather makes us a bit confused between
> XHLOCK_NR and MAX_XHLOCK_NR. what about the following?
>
> +enum xhlock_context_t {
> + XHLOCK_HARD,
> + XHLOCK_SOFT,
> + XHLOCK_PROC,
> + XHLOCK_CXT_NR,
> +};
>
> But it's trivial. I like yours, too.

grep -l "XHLOCK_NR" `quilt series` | while read file; do sed -i
's/XHLOCK_NR/XHLOCK_CTX_NR/g' $file; done

:-)

2017-08-10 09:22:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Thu, Aug 10, 2017 at 10:32:16AM +0900, Byungchul Park wrote:
> On Wed, Aug 09, 2017 at 04:16:05PM +0200, Peter Zijlstra wrote:
> > Hehe, _another_ scheme...
> >
> > Yes I think this works.. but I had just sort of understood the last one.
> >
> > How about I do this on top? That I think is a combination of what I
> > proposed last and your single invalidate thing. Combined they solve the
> > problem with the least amount of extra storage (a single int).
>
> I'm sorry for saying that.. I'm not sure if this works well.

OK, I'll sit on the patch a little while, if you could share your
concerns then maybe I can improve the comments ;-)

2017-08-10 09:25:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 11/14] lockdep: Apply crossrelease to PG_locked locks

On Thu, Aug 10, 2017 at 10:35:02AM +0900, Byungchul Park wrote:
> On Mon, Aug 07, 2017 at 04:12:58PM +0900, Byungchul Park wrote:
> > Although lock_page() and its family can cause deadlock, the lock
> > correctness validator could not be applied to them until now, becasue
> > things like unlock_page() might be called in a different context from
> > the acquisition context, which violates lockdep's assumption.
> >
> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> > detector to page locks. Applied it.
>
> Is there any reason excluding applying it into PG_locked?

Wanted to start small..

2017-08-10 09:38:23

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 09, 2017 at 05:50:59PM +0200, Peter Zijlstra wrote:
>
>
> Heh, look what it does...

Wait.. execuse me but.. is it a real problem?

>
>
> 4======================================================
> 4WARNING: possible circular locking dependency detected
> 4.13.0-rc2-00317-gadc6764a3adf-dirty #797 Tainted: G W
> 4------------------------------------------------------
> 4startpar/582 is trying to acquire lock:
> c (c(complete)&barr->donec){+.+.}c, at: [<ffffffff8110de4d>] flush_work+0x1fd/0x2c0
> 4
> but task is already holding lock:
> c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
> 4
> which lock already depends on the new lock.
>
> 4
> the existing dependency chain (in reverse order) is:
>
> -> #4c (clockc#3c){+.+.}c:
> __lock_acquire+0x10a5/0x1100
> lock_acquire+0xea/0x1f0
> __mutex_lock+0x6c/0x960
> mutex_lock_nested+0x1b/0x20
> lru_add_drain_all_cpuslocked+0x46/0x1a0
> lru_add_drain_all+0x13/0x20
> SyS_mlockall+0xb8/0x1c0
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> -> #3c (ccpu_hotplug_lock.rw_semc){++++}c:
> __lock_acquire+0x10a5/0x1100
> lock_acquire+0xea/0x1f0
> cpus_read_lock+0x2a/0x90
> kmem_cache_create+0x2a/0x1d0
> scsi_init_sense_cache+0xa0/0xc0
> scsi_add_host_with_dma+0x67/0x360
> isci_pci_probe+0x873/0xc90
> local_pci_probe+0x42/0xa0
> work_for_cpu_fn+0x14/0x20
> process_one_work+0x273/0x6b0
> worker_thread+0x21b/0x3f0
> kthread+0x147/0x180
> ret_from_fork+0x2a/0x40
>
> -> #2c (cscsi_sense_cache_mutexc){+.+.}c:
> __lock_acquire+0x10a5/0x1100
> lock_acquire+0xea/0x1f0
> __mutex_lock+0x6c/0x960
> mutex_lock_nested+0x1b/0x20
> scsi_init_sense_cache+0x3d/0xc0
> scsi_add_host_with_dma+0x67/0x360
> isci_pci_probe+0x873/0xc90
> local_pci_probe+0x42/0xa0
> work_for_cpu_fn+0x14/0x20
> process_one_work+0x273/0x6b0
> worker_thread+0x21b/0x3f0
> kthread+0x147/0x180
> ret_from_fork+0x2a/0x40
>
> -> #1c (c(&wfc.work)c){+.+.}c:
> process_one_work+0x244/0x6b0
> worker_thread+0x21b/0x3f0
> kthread+0x147/0x180
> ret_from_fork+0x2a/0x40
> 0xffffffffffffffff
>
> -> #0c (c(complete)&barr->donec){+.+.}c:
> check_prev_add+0x3be/0x700
> __lock_acquire+0x10a5/0x1100
> lock_acquire+0xea/0x1f0
> wait_for_completion+0x3b/0x130
> flush_work+0x1fd/0x2c0
> lru_add_drain_all_cpuslocked+0x158/0x1a0
> lru_add_drain_all+0x13/0x20
> SyS_mlockall+0xb8/0x1c0
> entry_SYSCALL_64_fastpath+0x23/0xc2
>
> other info that might help us debug this:
>
> Chain exists of:
> c(complete)&barr->donec --> ccpu_hotplug_lock.rw_semc --> clockc#3c
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(clockc#3c);
> lock(ccpu_hotplug_lock.rw_semc);
> lock(clockc#3c);
> lock(c(complete)&barr->donec);
>
> *** DEADLOCK ***
>
> 2 locks held by startpar/582:
> #0: c (ccpu_hotplug_lock.rw_semc){++++}c, at: [<ffffffff8122e9ce>] lru_add_drain_all+0xe/0x20
> #1: c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
>
> stack backtrace:
> dCPU: 23 PID: 582 Comm: startpar Tainted: G W 4.13.0-rc2-00317-gadc6764a3adf-dirty #797
> dHardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> dCall Trace:
> d dump_stack+0x86/0xcf
> d print_circular_bug+0x203/0x2f0
> d check_prev_add+0x3be/0x700
> d ? add_lock_to_list.isra.30+0xc0/0xc0
> d ? is_bpf_text_address+0x82/0xe0
> d ? unwind_get_return_address+0x1f/0x30
> d __lock_acquire+0x10a5/0x1100
> d ? __lock_acquire+0x10a5/0x1100
> d ? add_lock_to_list.isra.30+0xc0/0xc0
> d lock_acquire+0xea/0x1f0
> d ? flush_work+0x1fd/0x2c0
> d wait_for_completion+0x3b/0x130
> d ? flush_work+0x1fd/0x2c0
> d flush_work+0x1fd/0x2c0
> d ? flush_workqueue_prep_pwqs+0x1c0/0x1c0
> d ? trace_hardirqs_on+0xd/0x10
> d lru_add_drain_all_cpuslocked+0x158/0x1a0
> d lru_add_drain_all+0x13/0x20
> d SyS_mlockall+0xb8/0x1c0
> d entry_SYSCALL_64_fastpath+0x23/0xc2
> dRIP: 0033:0x7f818d2e54c7
> dRSP: 002b:00007fffcce83798 EFLAGS: 00000246c ORIG_RAX: 0000000000000097
> dRAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f818d2e54c7
> dRDX: 0000000000000000 RSI: 00007fffcce83650 RDI: 0000000000000003
> dRBP: 000000000002c010 R08: 0000000000000000 R09: 0000000000000000
> dR10: 0000000000000008 R11: 0000000000000246 R12: 000000000002d000
> dR13: 000000000002c010 R14: 0000000000001000 R15: 00007f818d599b00

2017-08-10 10:34:12

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Wed, Aug 09, 2017 at 04:16:05PM +0200, Peter Zijlstra wrote:
>
> Hehe, _another_ scheme...
>
> Yes I think this works.. but I had just sort of understood the last one.
>
> How about I do this on top? That I think is a combination of what I
> proposed last and your single invalidate thing. Combined they solve the
> problem with the least amount of extra storage (a single int).
>

I like your trying because it looks like making code simple, but there are
some cases the patch does not cover.

pppppppppppppppwwwwwwwwwwwwwwwwwwwwwwwwwiiiiiiiiiiiii
wrapped > iiiiiiiiiiiiiiiiiiiii................................

where,
p: process
w: work
i: irq

In this case, your patch cannot detect overwriting 'w' with 'i'. What do
you think about it?

> ---
> Subject: lockdep: Simplify xhlock ring buffer invalidation
> From: Peter Zijlstra <[email protected]>
> Date: Wed Aug 9 15:31:27 CEST 2017
>
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> include/linux/lockdep.h | 20 -----------
> include/linux/sched.h | 4 --
> kernel/locking/lockdep.c | 82 ++++++++++++++++++++++++++++++-----------------
> 3 files changed, 54 insertions(+), 52 deletions(-)
>
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -284,26 +284,6 @@ struct held_lock {
> */
> struct hist_lock {
> /*
> - * Id for each entry in the ring buffer. This is used to
> - * decide whether the ring buffer was overwritten or not.
> - *
> - * For example,
> - *
> - * |<----------- hist_lock ring buffer size ------->|
> - * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> - * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
> - *
> - * where 'p' represents an acquisition in process
> - * context, 'i' represents an acquisition in irq
> - * context.
> - *
> - * In this example, the ring buffer was overwritten by
> - * acquisitions in irq context, that should be detected on
> - * rollback or commit.
> - */
> - unsigned int hist_id;
> -
> - /*
> * Seperate stack_trace data. This will be used at commit step.
> */
> struct stack_trace trace;
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -855,9 +855,7 @@ struct task_struct {
> unsigned int xhlock_idx;
> /* For restoring at history boundaries */
> unsigned int xhlock_idx_hist[XHLOCK_NR];
> - unsigned int hist_id;
> - /* For overwrite check at each context exit */
> - unsigned int hist_id_save[XHLOCK_NR];
> + unsigned int xhlock_idx_max;
> #endif
>
> #ifdef CONFIG_UBSAN
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4818,26 +4818,65 @@ void crossrelease_hist_start(enum contex
> {
> struct task_struct *cur = current;
>
> - if (cur->xhlocks) {
> + if (cur->xhlocks)
> cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> - cur->hist_id_save[c] = cur->hist_id;
> - }
> }
>
> void crossrelease_hist_end(enum context_t c)
> {
> struct task_struct *cur = current;
> + unsigned int idx;
>
> - if (cur->xhlocks) {
> - unsigned int idx = cur->xhlock_idx_hist[c];
> - struct hist_lock *h = &xhlock(idx);
> -
> - cur->xhlock_idx = idx;
> -
> - /* Check if the ring was overwritten. */
> - if (h->hist_id != cur->hist_id_save[c])
> - invalidate_xhlock(h);
> - }
> + if (!cur->xhlocks)
> + return;
> +
> + idx = cur->xhlock_idx_hist[c];
> + cur->xhlock_idx = idx;
> +
> + /*
> + * A bit of magic here.. this deals with rewinding the (cyclic) history
> + * array further than its size. IOW. looses the complete history.
> + *
> + * We detect this by tracking the previous oldest entry we've (over)
> + * written in @xhlock_idx_max, this means the next entry is the oldest
> + * entry still in the buffer, ie. its tail.
> + *
> + * So when we restore an @xhlock_idx that is at least MAX_XHLOCKS_NR
> + * older than @xhlock_idx_max we know we've just wiped the entire
> + * history.
> + */
> + if ((cur->xhlock_idx_max - idx) < MAX_XHLOCKS_NR)
> + return;
> +
> + /*
> + * Now that we know the buffer is effectively empty, reset our state
> + * such that it appears empty (without in fact clearing the entire
> + * buffer).
> + *
> + * Pick @idx as the 'new' beginning, (re)set all save-points to not
> + * rewind past it and reset the max. Then invalidate this idx such that
> + * commit_xhlocks() will never rewind past it. Since xhlock_idx_inc()
> + * will return the _next_ entry, we'll not overwrite this invalid entry
> + * until the entire buffer is full again.
> + */
> + for (c = 0; c < XHLOCK_NR; c++)
> + cur->xhlock_idx_hist[c] = idx;
> + cur->xhlock_idx_max = idx;
> + invalidate_xhlock(&xhlock(idx));
> +}
> +
> +static inline unsigned int xhlock_idx_inc(void)
> +{
> + struct task_struct *cur = current;
> + unsigned int idx = ++cur->xhlock_idx;
> +
> + /*
> + * As per the requirement in crossrelease_hist_end(), track the tail.
> + */
> + if ((int)(cur->xhlock_idx_max - idx) < 0)
> + cur->xhlock_idx_max = idx;
> +
> + return idx;
> }
>
> static int cross_lock(struct lockdep_map *lock)
> @@ -4902,7 +4941,7 @@ static inline int xhlock_valid(struct hi
> */
> static void add_xhlock(struct held_lock *hlock)
> {
> - unsigned int idx = ++current->xhlock_idx;
> + unsigned int idx = xhlock_idx_inc();
> struct hist_lock *xhlock = &xhlock(idx);
>
> #ifdef CONFIG_DEBUG_LOCKDEP
> @@ -4915,7 +4954,6 @@ static void add_xhlock(struct held_lock
>
> /* Initialize hist_lock's members */
> xhlock->hlock = *hlock;
> - xhlock->hist_id = current->hist_id++;
>
> xhlock->trace.nr_entries = 0;
> xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> @@ -5071,7 +5109,6 @@ static int commit_xhlock(struct cross_lo
> static void commit_xhlocks(struct cross_lock *xlock)
> {
> unsigned int cur = current->xhlock_idx;
> - unsigned int prev_hist_id = xhlock(cur).hist_id;
> unsigned int i;
>
> if (!graph_lock())
> @@ -5091,17 +5128,6 @@ static void commit_xhlocks(struct cross_
> break;
>
> /*
> - * Filter out the cases that the ring buffer was
> - * overwritten and the previous entry has a bigger
> - * hist_id than the following one, which is impossible
> - * otherwise.
> - */
> - if (unlikely(before(xhlock->hist_id, prev_hist_id)))
> - break;
> -
> - prev_hist_id = xhlock->hist_id;
> -
> - /*
> * commit_xhlock() returns 0 with graph_lock already
> * released if fail.
> */
> @@ -5186,11 +5212,9 @@ void lockdep_init_task(struct task_struc
> int i;
>
> task->xhlock_idx = UINT_MAX;
> - task->hist_id = 0;
>
> for (i = 0; i < XHLOCK_NR; i++) {
> task->xhlock_idx_hist[i] = UINT_MAX;
> - task->hist_id_save[i] = 0;
> }
>
> task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,

2017-08-10 10:52:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Thu, Aug 10, 2017 at 06:37:07PM +0900, Byungchul Park wrote:
> On Wed, Aug 09, 2017 at 05:50:59PM +0200, Peter Zijlstra wrote:
> >
> >
> > Heh, look what it does...
>
> Wait.. execuse me but.. is it a real problem?

I've not tried again with my patch removed -- I'm chasing another issue
atm. But note that I'm running this on tip/master which has a bunch of
hotplug lock rework in, and that sequence includes hotplug lock.

2017-08-10 10:53:21

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Thu, Aug 10, 2017 at 09:55:56AM +0900, Byungchul Park wrote:
> On Wed, Aug 09, 2017 at 05:50:59PM +0200, Peter Zijlstra wrote:
> >
> >
> > Heh, look what it does...
>
> It does not happen in my machine..
>
> I tihink it happens because of "Simplify xhlock ring buffer invalidation"
> patch of you.

I mis-understood your simplification patch. I think your patch works well
unless overwriting occurs - it doesn't work if overwriting occurs though.

Anyway, if crossrelease and lockdep report the following in a normal
condition, it would be a desirable result.

What do you think about the following report? Positive? Or negative?

>
> First of all, could you reverse yours and check if it happens, too?
> If not, we have to think the simplification more.
>
> BTW, does your patch consider the possibility that a worker and irqs can
> be nested? Is it no problem even in the case?
>
> >
> >
> > 4======================================================
> > 4WARNING: possible circular locking dependency detected
> > 4.13.0-rc2-00317-gadc6764a3adf-dirty #797 Tainted: G W
> > 4------------------------------------------------------
> > 4startpar/582 is trying to acquire lock:
> > c (c(complete)&barr->donec){+.+.}c, at: [<ffffffff8110de4d>] flush_work+0x1fd/0x2c0
> > 4
> > but task is already holding lock:
> > c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
> > 4
> > which lock already depends on the new lock.
> >
> > 4
> > the existing dependency chain (in reverse order) is:
> >
> > -> #4c (clockc#3c){+.+.}c:
> > __lock_acquire+0x10a5/0x1100
> > lock_acquire+0xea/0x1f0
> > __mutex_lock+0x6c/0x960
> > mutex_lock_nested+0x1b/0x20
> > lru_add_drain_all_cpuslocked+0x46/0x1a0
> > lru_add_drain_all+0x13/0x20
> > SyS_mlockall+0xb8/0x1c0
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> >
> > -> #3c (ccpu_hotplug_lock.rw_semc){++++}c:
> > __lock_acquire+0x10a5/0x1100
> > lock_acquire+0xea/0x1f0
> > cpus_read_lock+0x2a/0x90
> > kmem_cache_create+0x2a/0x1d0
> > scsi_init_sense_cache+0xa0/0xc0
> > scsi_add_host_with_dma+0x67/0x360
> > isci_pci_probe+0x873/0xc90
> > local_pci_probe+0x42/0xa0
> > work_for_cpu_fn+0x14/0x20
> > process_one_work+0x273/0x6b0
> > worker_thread+0x21b/0x3f0
> > kthread+0x147/0x180
> > ret_from_fork+0x2a/0x40
> >
> > -> #2c (cscsi_sense_cache_mutexc){+.+.}c:
> > __lock_acquire+0x10a5/0x1100
> > lock_acquire+0xea/0x1f0
> > __mutex_lock+0x6c/0x960
> > mutex_lock_nested+0x1b/0x20
> > scsi_init_sense_cache+0x3d/0xc0
> > scsi_add_host_with_dma+0x67/0x360
> > isci_pci_probe+0x873/0xc90
> > local_pci_probe+0x42/0xa0
> > work_for_cpu_fn+0x14/0x20
> > process_one_work+0x273/0x6b0
> > worker_thread+0x21b/0x3f0
> > kthread+0x147/0x180
> > ret_from_fork+0x2a/0x40
> >
> > -> #1c (c(&wfc.work)c){+.+.}c:
> > process_one_work+0x244/0x6b0
> > worker_thread+0x21b/0x3f0
> > kthread+0x147/0x180
> > ret_from_fork+0x2a/0x40
> > 0xffffffffffffffff
> >
> > -> #0c (c(complete)&barr->donec){+.+.}c:
> > check_prev_add+0x3be/0x700
> > __lock_acquire+0x10a5/0x1100
> > lock_acquire+0xea/0x1f0
> > wait_for_completion+0x3b/0x130
> > flush_work+0x1fd/0x2c0
> > lru_add_drain_all_cpuslocked+0x158/0x1a0
> > lru_add_drain_all+0x13/0x20
> > SyS_mlockall+0xb8/0x1c0
> > entry_SYSCALL_64_fastpath+0x23/0xc2
> >
> > other info that might help us debug this:
> >
> > Chain exists of:
> > c(complete)&barr->donec --> ccpu_hotplug_lock.rw_semc --> clockc#3c
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(clockc#3c);
> > lock(ccpu_hotplug_lock.rw_semc);
> > lock(clockc#3c);
> > lock(c(complete)&barr->donec);
> >
> > *** DEADLOCK ***
> >
> > 2 locks held by startpar/582:
> > #0: c (ccpu_hotplug_lock.rw_semc){++++}c, at: [<ffffffff8122e9ce>] lru_add_drain_all+0xe/0x20
> > #1: c (clockc#3c){+.+.}c, at: [<ffffffff8122e866>] lru_add_drain_all_cpuslocked+0x46/0x1a0
> >
> > stack backtrace:
> > dCPU: 23 PID: 582 Comm: startpar Tainted: G W 4.13.0-rc2-00317-gadc6764a3adf-dirty #797
> > dHardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> > dCall Trace:
> > d dump_stack+0x86/0xcf
> > d print_circular_bug+0x203/0x2f0
> > d check_prev_add+0x3be/0x700
> > d ? add_lock_to_list.isra.30+0xc0/0xc0
> > d ? is_bpf_text_address+0x82/0xe0
> > d ? unwind_get_return_address+0x1f/0x30
> > d __lock_acquire+0x10a5/0x1100
> > d ? __lock_acquire+0x10a5/0x1100
> > d ? add_lock_to_list.isra.30+0xc0/0xc0
> > d lock_acquire+0xea/0x1f0
> > d ? flush_work+0x1fd/0x2c0
> > d wait_for_completion+0x3b/0x130
> > d ? flush_work+0x1fd/0x2c0
> > d flush_work+0x1fd/0x2c0
> > d ? flush_workqueue_prep_pwqs+0x1c0/0x1c0
> > d ? trace_hardirqs_on+0xd/0x10
> > d lru_add_drain_all_cpuslocked+0x158/0x1a0
> > d lru_add_drain_all+0x13/0x20
> > d SyS_mlockall+0xb8/0x1c0
> > d entry_SYSCALL_64_fastpath+0x23/0xc2
> > dRIP: 0033:0x7f818d2e54c7
> > dRSP: 002b:00007fffcce83798 EFLAGS: 00000246c ORIG_RAX: 0000000000000097
> > dRAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007f818d2e54c7
> > dRDX: 0000000000000000 RSI: 00007fffcce83650 RDI: 0000000000000003
> > dRBP: 000000000002c010 R08: 0000000000000000 R09: 0000000000000000
> > dR10: 0000000000000008 R11: 0000000000000246 R12: 000000000002d000
> > dR13: 000000000002c010 R14: 0000000000001000 R15: 00007f818d599b00

2017-08-10 11:10:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature


* Byungchul Park <[email protected]> wrote:

> Change from v7
> - rebase on latest tip/sched/core (Jul 26 2017)
> - apply peterz's suggestions
> - simplify code of crossrelease_{hist/soft/hard}_{start/end}
> - exclude a patch avoiding redundant links
> - exclude a patch already applied onto the base

Ok, it's looking pretty good here now, there's one thing I'd like you to change,
please remove all the new Kconfig dependencies:

CONFIG_LOCKDEP_CROSSRELEASE=y
CONFIG_LOCKDEP_COMPLETE=y

and make it all part of PROVE_LOCKING, like most of the other lock debugging bits.

Thanks,

Ingo

2017-08-10 11:46:21

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Thu, Aug 10, 2017 at 01:10:19PM +0200, Ingo Molnar wrote:
>
> * Byungchul Park <[email protected]> wrote:
>
> > Change from v7
> > - rebase on latest tip/sched/core (Jul 26 2017)
> > - apply peterz's suggestions
> > - simplify code of crossrelease_{hist/soft/hard}_{start/end}
> > - exclude a patch avoiding redundant links
> > - exclude a patch already applied onto the base
>
> Ok, it's looking pretty good here now, there's one thing I'd like you to change,
> please remove all the new Kconfig dependencies:
>
> CONFIG_LOCKDEP_CROSSRELEASE=y
> CONFIG_LOCKDEP_COMPLETE=y
>
> and make it all part of PROVE_LOCKING, like most of the other lock debugging bits.

OK. I will remove them. What about CONFIG_LOCKDEP_PAGELOCK? Should I also
remove it?

2017-08-10 11:59:19

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Mon, Aug 07, 2017 at 04:12:53PM +0900, Byungchul Park wrote:
> The ring buffer can be overwritten by hardirq/softirq/work contexts.
> That cases must be considered on rollback or commit. For example,
>
> |<------ hist_lock ring buffer size ----->|
> ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> wrapped > iiiiiiiiiiiiiiiiiiiiiii....................
>
> where 'p' represents an acquisition in process context,
> 'i' represents an acquisition in irq context.
>
> On irq exit, crossrelease tries to rollback idx to original position,
> but it should not because the entry already has been invalid by
> overwriting 'i'. Avoid rollback or commit for entries overwritten.
>
> Signed-off-by: Byungchul Park <[email protected]>
> ---
> include/linux/lockdep.h | 20 +++++++++++++++++++
> include/linux/sched.h | 3 +++
> kernel/locking/lockdep.c | 52 +++++++++++++++++++++++++++++++++++++++++++-----
> 3 files changed, 70 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 0c8a1b8..48c244c 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -284,6 +284,26 @@ struct held_lock {
> */
> struct hist_lock {
> /*
> + * Id for each entry in the ring buffer. This is used to
> + * decide whether the ring buffer was overwritten or not.
> + *
> + * For example,
> + *
> + * |<----------- hist_lock ring buffer size ------->|
> + * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> + * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
> + *
> + * where 'p' represents an acquisition in process
> + * context, 'i' represents an acquisition in irq
> + * context.
> + *
> + * In this example, the ring buffer was overwritten by
> + * acquisitions in irq context, that should be detected on
> + * rollback or commit.
> + */
> + unsigned int hist_id;
> +
> + /*
> * Seperate stack_trace data. This will be used at commit step.
> */
> struct stack_trace trace;
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5becef5..373466b 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -855,6 +855,9 @@ struct task_struct {
> unsigned int xhlock_idx;
> /* For restoring at history boundaries */
> unsigned int xhlock_idx_hist[CONTEXT_NR];
> + unsigned int hist_id;
> + /* For overwrite check at each context exit */
> + unsigned int hist_id_save[CONTEXT_NR];
> #endif
>
> #ifdef CONFIG_UBSAN
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index afd6e64..5168dac 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4742,6 +4742,17 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> static atomic_t cross_gen_id; /* Can be wrapped */
>
> /*
> + * Make an entry of the ring buffer invalid.
> + */
> +static inline void invalidate_xhlock(struct hist_lock *xhlock)
> +{
> + /*
> + * Normally, xhlock->hlock.instance must be !NULL.
> + */
> + xhlock->hlock.instance = NULL;
> +}
> +
> +/*
> * Lock history stacks; we have 3 nested lock history stacks:
> *
> * Hard IRQ
> @@ -4773,14 +4784,28 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> */
> void crossrelease_hist_start(enum context_t c)
> {
> - if (current->xhlocks)
> - current->xhlock_idx_hist[c] = current->xhlock_idx;
> + struct task_struct *cur = current;
> +
> + if (cur->xhlocks) {
> + cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> + cur->hist_id_save[c] = cur->hist_id;
> + }
> }
>
> void crossrelease_hist_end(enum context_t c)
> {
> - if (current->xhlocks)
> - current->xhlock_idx = current->xhlock_idx_hist[c];
> + struct task_struct *cur = current;
> +
> + if (cur->xhlocks) {
> + unsigned int idx = cur->xhlock_idx_hist[c];
> + struct hist_lock *h = &xhlock(idx);
> +
> + cur->xhlock_idx = idx;
> +
> + /* Check if the ring was overwritten. */
> + if (h->hist_id != cur->hist_id_save[c])

Could we use:

if (h->hist_id != idx)

here, and

> + invalidate_xhlock(h);
> + }
> }
>
> static int cross_lock(struct lockdep_map *lock)
> @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock *hlock)
> * Check if the xhlock is valid, which would be false if,
> *
> * 1. Has not used after initializaion yet.
> + * 2. Got invalidated.
> *
> * Remind hist_lock is implemented as a ring buffer.
> */
> @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
>
> /* Initialize hist_lock's members */
> xhlock->hlock = *hlock;
> + xhlock->hist_id = current->hist_id++;

use:

xhlock->hist_id = idx;

and,


>
> xhlock->trace.nr_entries = 0;
> xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> @@ -4995,6 +5022,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> static void commit_xhlocks(struct cross_lock *xlock)
> {
> unsigned int cur = current->xhlock_idx;
> + unsigned int prev_hist_id = xhlock(cur).hist_id;

use:
unsigned int prev_hist_id = cur;

here.

Then we can get away with the added fields in task_struct at least.

Thought?

Regards,
Boqun

> unsigned int i;
>
> if (!graph_lock())
> @@ -5013,6 +5041,17 @@ static void commit_xhlocks(struct cross_lock *xlock)
> break;
>
> /*
> + * Filter out the cases that the ring buffer was
> + * overwritten and the previous entry has a bigger
> + * hist_id than the following one, which is impossible
> + * otherwise.
> + */
> + if (unlikely(before(xhlock->hist_id, prev_hist_id)))
> + break;
> +
> + prev_hist_id = xhlock->hist_id;
> +
> + /*
> * commit_xhlock() returns 0 with graph_lock already
> * released if fail.
> */
> @@ -5085,9 +5124,12 @@ void lockdep_init_task(struct task_struct *task)
> int i;
>
> task->xhlock_idx = UINT_MAX;
> + task->hist_id = 0;
>
> - for (i = 0; i < CONTEXT_NR; i++)
> + for (i = 0; i < CONTEXT_NR; i++) {
> task->xhlock_idx_hist[i] = UINT_MAX;
> + task->hist_id_save[i] = 0;
> + }
>
> task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
> GFP_KERNEL);
> --
> 1.9.1
>


Attachments:
(No filename) (6.05 kB)
signature.asc (488.00 B)
Download all attachments
Subject: [tip:locking/core] locking/lockdep: Refactor lookup_chain_cache()

Commit-ID: 545c23f2e954eb3365629b20ceeef4eadb1ff97f
Gitweb: http://git.kernel.org/tip/545c23f2e954eb3365629b20ceeef4eadb1ff97f
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:48 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:05 +0200

locking/lockdep: Refactor lookup_chain_cache()

Currently, lookup_chain_cache() provides both 'lookup' and 'add'
functionalities in a function. However, each is useful. So this
patch makes lookup_chain_cache() only do 'lookup' functionality and
makes add_chain_cahce() only do 'add' functionality. And it's more
readable than before.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/locking/lockdep.c | 141 +++++++++++++++++++++++++++++++----------------
1 file changed, 93 insertions(+), 48 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index b2dd313..e029f2f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2151,14 +2151,15 @@ static int check_no_collision(struct task_struct *curr,
}

/*
- * Look up a dependency chain. If the key is not present yet then
- * add it and return 1 - in this case the new dependency chain is
- * validated. If the key is already hashed, return 0.
- * (On return with 1 graph_lock is held.)
+ * Adds a dependency chain into chain hashtable. And must be called with
+ * graph_lock held.
+ *
+ * Return 0 if fail, and graph_lock is released.
+ * Return 1 if succeed, with graph_lock held.
*/
-static inline int lookup_chain_cache(struct task_struct *curr,
- struct held_lock *hlock,
- u64 chain_key)
+static inline int add_chain_cache(struct task_struct *curr,
+ struct held_lock *hlock,
+ u64 chain_key)
{
struct lock_class *class = hlock_class(hlock);
struct hlist_head *hash_head = chainhashentry(chain_key);
@@ -2166,49 +2167,18 @@ static inline int lookup_chain_cache(struct task_struct *curr,
int i, j;

/*
+ * Allocate a new chain entry from the static array, and add
+ * it to the hash:
+ */
+
+ /*
* We might need to take the graph lock, ensure we've got IRQs
* disabled to make this an IRQ-safe lock.. for recursion reasons
* lockdep won't complain about its own locking errors.
*/
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return 0;
- /*
- * We can walk it lock-free, because entries only get added
- * to the hash:
- */
- hlist_for_each_entry_rcu(chain, hash_head, entry) {
- if (chain->chain_key == chain_key) {
-cache_hit:
- debug_atomic_inc(chain_lookup_hits);
- if (!check_no_collision(curr, hlock, chain))
- return 0;

- if (very_verbose(class))
- printk("\nhash chain already cached, key: "
- "%016Lx tail class: [%p] %s\n",
- (unsigned long long)chain_key,
- class->key, class->name);
- return 0;
- }
- }
- if (very_verbose(class))
- printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
- (unsigned long long)chain_key, class->key, class->name);
- /*
- * Allocate a new chain entry from the static array, and add
- * it to the hash:
- */
- if (!graph_lock())
- return 0;
- /*
- * We have to walk the chain again locked - to avoid duplicates:
- */
- hlist_for_each_entry(chain, hash_head, entry) {
- if (chain->chain_key == chain_key) {
- graph_unlock();
- goto cache_hit;
- }
- }
if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
if (!debug_locks_off_graph_unlock())
return 0;
@@ -2260,6 +2230,78 @@ cache_hit:
return 1;
}

+/*
+ * Look up a dependency chain.
+ */
+static inline struct lock_chain *lookup_chain_cache(u64 chain_key)
+{
+ struct hlist_head *hash_head = chainhashentry(chain_key);
+ struct lock_chain *chain;
+
+ /*
+ * We can walk it lock-free, because entries only get added
+ * to the hash:
+ */
+ hlist_for_each_entry_rcu(chain, hash_head, entry) {
+ if (chain->chain_key == chain_key) {
+ debug_atomic_inc(chain_lookup_hits);
+ return chain;
+ }
+ }
+ return NULL;
+}
+
+/*
+ * If the key is not present yet in dependency chain cache then
+ * add it and return 1 - in this case the new dependency chain is
+ * validated. If the key is already hashed, return 0.
+ * (On return with 1 graph_lock is held.)
+ */
+static inline int lookup_chain_cache_add(struct task_struct *curr,
+ struct held_lock *hlock,
+ u64 chain_key)
+{
+ struct lock_class *class = hlock_class(hlock);
+ struct lock_chain *chain = lookup_chain_cache(chain_key);
+
+ if (chain) {
+cache_hit:
+ if (!check_no_collision(curr, hlock, chain))
+ return 0;
+
+ if (very_verbose(class)) {
+ printk("\nhash chain already cached, key: "
+ "%016Lx tail class: [%p] %s\n",
+ (unsigned long long)chain_key,
+ class->key, class->name);
+ }
+
+ return 0;
+ }
+
+ if (very_verbose(class)) {
+ printk("\nnew hash chain, key: %016Lx tail class: [%p] %s\n",
+ (unsigned long long)chain_key, class->key, class->name);
+ }
+
+ if (!graph_lock())
+ return 0;
+
+ /*
+ * We have to walk the chain again locked - to avoid duplicates:
+ */
+ chain = lookup_chain_cache(chain_key);
+ if (chain) {
+ graph_unlock();
+ goto cache_hit;
+ }
+
+ if (!add_chain_cache(curr, hlock, chain_key))
+ return 0;
+
+ return 1;
+}
+
static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
struct held_lock *hlock, int chain_head, u64 chain_key)
{
@@ -2270,11 +2312,11 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
*
* We look up the chain_key and do the O(N^2) check and update of
* the dependencies only if this is a new dependency chain.
- * (If lookup_chain_cache() returns with 1 it acquires
+ * (If lookup_chain_cache_add() return with 1 it acquires
* graph_lock for us)
*/
if (!hlock->trylock && hlock->check &&
- lookup_chain_cache(curr, hlock, chain_key)) {
+ lookup_chain_cache_add(curr, hlock, chain_key)) {
/*
* Check whether last held lock:
*
@@ -2302,14 +2344,17 @@ static int validate_chain(struct task_struct *curr, struct lockdep_map *lock,
* Add dependency only if this lock is not the head
* of the chain, and if it's not a secondary read-lock:
*/
- if (!chain_head && ret != 2)
+ if (!chain_head && ret != 2) {
if (!check_prevs_add(curr, hlock))
return 0;
+ }
+
graph_unlock();
- } else
- /* after lookup_chain_cache(): */
+ } else {
+ /* after lookup_chain_cache_add(): */
if (unlikely(!debug_locks))
return 0;
+ }

return 1;
}

Subject: [tip:locking/core] locking/lockdep: Add a function building a chain between two classes

Commit-ID: 49347a986ab45eb1dafbf25170647c890f8ff192
Gitweb: http://git.kernel.org/tip/49347a986ab45eb1dafbf25170647c890f8ff192
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:49 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:05 +0200

locking/lockdep: Add a function building a chain between two classes

Crossrelease needs to build a chain between two classes regardless of
their contexts. However, add_chain_cache() cannot be used for that
purpose since it assumes that it's called in the acquisition context
of the hlock. So this patch introduces a new function doing it.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/locking/lockdep.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 70 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e029f2f..bdf6b31 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2151,6 +2151,76 @@ static int check_no_collision(struct task_struct *curr,
}

/*
+ * This is for building a chain between just two different classes,
+ * instead of adding a new hlock upon current, which is done by
+ * add_chain_cache().
+ *
+ * This can be called in any context with two classes, while
+ * add_chain_cache() must be done within the lock owener's context
+ * since it uses hlock which might be racy in another context.
+ */
+static inline int add_chain_cache_classes(unsigned int prev,
+ unsigned int next,
+ unsigned int irq_context,
+ u64 chain_key)
+{
+ struct hlist_head *hash_head = chainhashentry(chain_key);
+ struct lock_chain *chain;
+
+ /*
+ * Allocate a new chain entry from the static array, and add
+ * it to the hash:
+ */
+
+ /*
+ * We might need to take the graph lock, ensure we've got IRQs
+ * disabled to make this an IRQ-safe lock.. for recursion reasons
+ * lockdep won't complain about its own locking errors.
+ */
+ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
+ return 0;
+
+ if (unlikely(nr_lock_chains >= MAX_LOCKDEP_CHAINS)) {
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
+ dump_stack();
+ return 0;
+ }
+
+ chain = lock_chains + nr_lock_chains++;
+ chain->chain_key = chain_key;
+ chain->irq_context = irq_context;
+ chain->depth = 2;
+ if (likely(nr_chain_hlocks + chain->depth <= MAX_LOCKDEP_CHAIN_HLOCKS)) {
+ chain->base = nr_chain_hlocks;
+ nr_chain_hlocks += chain->depth;
+ chain_hlocks[chain->base] = prev - 1;
+ chain_hlocks[chain->base + 1] = next -1;
+ }
+#ifdef CONFIG_DEBUG_LOCKDEP
+ /*
+ * Important for check_no_collision().
+ */
+ else {
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
+ dump_stack();
+ return 0;
+ }
+#endif
+
+ hlist_add_head_rcu(&chain->entry, hash_head);
+ debug_atomic_inc(chain_lookup_misses);
+ inc_chains();
+
+ return 1;
+}
+
+/*
* Adds a dependency chain into chain hashtable. And must be called with
* graph_lock held.
*

Subject: [tip:locking/core] locking/lockdep: Change the meaning of check_prev_add()'s return value

Commit-ID: 70911fdc9576f4eeb3986689a1c9a778a4a4aacb
Gitweb: http://git.kernel.org/tip/70911fdc9576f4eeb3986689a1c9a778a4a4aacb
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:50 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:06 +0200

locking/lockdep: Change the meaning of check_prev_add()'s return value

Firstly, return 1 instead of 2 when 'prev -> next' dependency already
exists. Since the value 2 is not referenced anywhere, just return 1
indicating success in this case.

Secondly, return 2 instead of 1 when successfully added a lock_list
entry with saving stack_trace. With that, a caller can decide whether
to avoid redundant save_trace() on the caller site.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/locking/lockdep.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index bdf6b31..7cf02fa 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1881,7 +1881,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
if (entry->class == hlock_class(next)) {
if (distance == 1)
entry->distance = 1;
- return 2;
+ return 1;
}
}

@@ -1935,9 +1935,10 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
print_lock_name(hlock_class(next));
printk(KERN_CONT "\n");
dump_stack();
- return graph_lock();
+ if (!graph_lock())
+ return 0;
}
- return 1;
+ return 2;
}

/*

2017-08-10 12:11:44

by Byungchul Park

[permalink] [raw]
Subject: RE: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

> -----Original Message-----
> From: Boqun Feng [mailto:[email protected]]
> Sent: Thursday, August 10, 2017 8:59 PM
> To: Byungchul Park
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring
> buffer overwrite
>
> On Mon, Aug 07, 2017 at 04:12:53PM +0900, Byungchul Park wrote:
> > The ring buffer can be overwritten by hardirq/softirq/work contexts.
> > That cases must be considered on rollback or commit. For example,
> >
> > |<------ hist_lock ring buffer size ----->|
> > ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> > wrapped > iiiiiiiiiiiiiiiiiiiiiii....................
> >
> > where 'p' represents an acquisition in process context,
> > 'i' represents an acquisition in irq context.
> >
> > On irq exit, crossrelease tries to rollback idx to original position,
> > but it should not because the entry already has been invalid by
> > overwriting 'i'. Avoid rollback or commit for entries overwritten.
> >
> > Signed-off-by: Byungchul Park <[email protected]>
> > ---
> > include/linux/lockdep.h | 20 +++++++++++++++++++
> > include/linux/sched.h | 3 +++
> > kernel/locking/lockdep.c | 52
> +++++++++++++++++++++++++++++++++++++++++++-----
> > 3 files changed, 70 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index 0c8a1b8..48c244c 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -284,6 +284,26 @@ struct held_lock {
> > */
> > struct hist_lock {
> > /*
> > + * Id for each entry in the ring buffer. This is used to
> > + * decide whether the ring buffer was overwritten or not.
> > + *
> > + * For example,
> > + *
> > + * |<----------- hist_lock ring buffer size ------->|
> > + * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> > + * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
> > + *
> > + * where 'p' represents an acquisition in process
> > + * context, 'i' represents an acquisition in irq
> > + * context.
> > + *
> > + * In this example, the ring buffer was overwritten by
> > + * acquisitions in irq context, that should be detected on
> > + * rollback or commit.
> > + */
> > + unsigned int hist_id;
> > +
> > + /*
> > * Seperate stack_trace data. This will be used at commit step.
> > */
> > struct stack_trace trace;
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 5becef5..373466b 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -855,6 +855,9 @@ struct task_struct {
> > unsigned int xhlock_idx;
> > /* For restoring at history boundaries */
> > unsigned int xhlock_idx_hist[CONTEXT_NR];
> > + unsigned int hist_id;
> > + /* For overwrite check at each context exit */
> > + unsigned int hist_id_save[CONTEXT_NR];
> > #endif
> >
> > #ifdef CONFIG_UBSAN
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index afd6e64..5168dac 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -4742,6 +4742,17 @@ void lockdep_rcu_suspicious(const char *file,
> const int line, const char *s)
> > static atomic_t cross_gen_id; /* Can be wrapped */
> >
> > /*
> > + * Make an entry of the ring buffer invalid.
> > + */
> > +static inline void invalidate_xhlock(struct hist_lock *xhlock)
> > +{
> > + /*
> > + * Normally, xhlock->hlock.instance must be !NULL.
> > + */
> > + xhlock->hlock.instance = NULL;
> > +}
> > +
> > +/*
> > * Lock history stacks; we have 3 nested lock history stacks:
> > *
> > * Hard IRQ
> > @@ -4773,14 +4784,28 @@ void lockdep_rcu_suspicious(const char *file,
> const int line, const char *s)
> > */
> > void crossrelease_hist_start(enum context_t c)
> > {
> > - if (current->xhlocks)
> > - current->xhlock_idx_hist[c] = current->xhlock_idx;
> > + struct task_struct *cur = current;
> > +
> > + if (cur->xhlocks) {
> > + cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> > + cur->hist_id_save[c] = cur->hist_id;
> > + }
> > }
> >
> > void crossrelease_hist_end(enum context_t c)
> > {
> > - if (current->xhlocks)
> > - current->xhlock_idx = current->xhlock_idx_hist[c];
> > + struct task_struct *cur = current;
> > +
> > + if (cur->xhlocks) {
> > + unsigned int idx = cur->xhlock_idx_hist[c];
> > + struct hist_lock *h = &xhlock(idx);
> > +
> > + cur->xhlock_idx = idx;
> > +
> > + /* Check if the ring was overwritten. */
> > + if (h->hist_id != cur->hist_id_save[c])
>
> Could we use:
>
> if (h->hist_id != idx)

No, we cannot.

hist_id is a kind of timestamp and used to detect overwriting
data into places of same indexes of the ring buffer. And idx is
just an index. :) IOW, they mean different things.

>
> here, and
>
> > + invalidate_xhlock(h);
> > + }
> > }
> >
> > static int cross_lock(struct lockdep_map *lock)
> > @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock
> *hlock)
> > * Check if the xhlock is valid, which would be false if,
> > *
> > * 1. Has not used after initializaion yet.
> > + * 2. Got invalidated.
> > *
> > * Remind hist_lock is implemented as a ring buffer.
> > */
> > @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
> >
> > /* Initialize hist_lock's members */
> > xhlock->hlock = *hlock;
> > + xhlock->hist_id = current->hist_id++;
>
> use:
>
> xhlock->hist_id = idx;
>
> and,

Same.

>
>
> >
> > xhlock->trace.nr_entries = 0;
> > xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> > @@ -4995,6 +5022,7 @@ static int commit_xhlock(struct cross_lock *xlock,
> struct hist_lock *xhlock)
> > static void commit_xhlocks(struct cross_lock *xlock)
> > {
> > unsigned int cur = current->xhlock_idx;
> > + unsigned int prev_hist_id = xhlock(cur).hist_id;
>
> use:
> unsigned int prev_hist_id = cur;
>
> here.

Same.


Subject: [tip:locking/core] locking/lockdep: Make check_prev_add() able to handle external stack_trace

Commit-ID: ce07a9415f266e181a0a33033a5f7138760240a4
Gitweb: http://git.kernel.org/tip/ce07a9415f266e181a0a33033a5f7138760240a4
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:51 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:06 +0200

locking/lockdep: Make check_prev_add() able to handle external stack_trace

Currently, a space for stack_trace is pinned in check_prev_add(), that
makes us not able to use external stack_trace. The simplest way to
achieve it is to pass an external stack_trace as an argument.

A more suitable solution is to pass a callback additionally along with
a stack_trace so that callers can decide the way to save or whether to
save. Actually crossrelease needs to do other than saving a stack_trace.
So pass a stack_trace and callback to handle it, to check_prev_add().

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/locking/lockdep.c | 40 +++++++++++++++++++---------------------
1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 7cf02fa..841828b 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1824,20 +1824,13 @@ check_deadlock(struct task_struct *curr, struct held_lock *next,
*/
static int
check_prev_add(struct task_struct *curr, struct held_lock *prev,
- struct held_lock *next, int distance, int *stack_saved)
+ struct held_lock *next, int distance, struct stack_trace *trace,
+ int (*save)(struct stack_trace *trace))
{
struct lock_list *entry;
int ret;
struct lock_list this;
struct lock_list *uninitialized_var(target_entry);
- /*
- * Static variable, serialized by the graph_lock().
- *
- * We use this static variable to save the stack trace in case
- * we call into this function multiple times due to encountering
- * trylocks in the held lock stack.
- */
- static struct stack_trace trace;

/*
* Prove that the new <prev> -> <next> dependency would not
@@ -1899,11 +1892,8 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
return print_bfs_bug(ret);


- if (!*stack_saved) {
- if (!save_trace(&trace))
- return 0;
- *stack_saved = 1;
- }
+ if (save && !save(trace))
+ return 0;

/*
* Ok, all validations passed, add the new lock
@@ -1911,14 +1901,14 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
*/
ret = add_lock_to_list(hlock_class(next),
&hlock_class(prev)->locks_after,
- next->acquire_ip, distance, &trace);
+ next->acquire_ip, distance, trace);

if (!ret)
return 0;

ret = add_lock_to_list(hlock_class(prev),
&hlock_class(next)->locks_before,
- next->acquire_ip, distance, &trace);
+ next->acquire_ip, distance, trace);
if (!ret)
return 0;

@@ -1926,8 +1916,6 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
* Debugging printouts:
*/
if (verbose(hlock_class(prev)) || verbose(hlock_class(next))) {
- /* We drop graph lock, so another thread can overwrite trace. */
- *stack_saved = 0;
graph_unlock();
printk("\n new dependency: ");
print_lock_name(hlock_class(prev));
@@ -1951,8 +1939,9 @@ static int
check_prevs_add(struct task_struct *curr, struct held_lock *next)
{
int depth = curr->lockdep_depth;
- int stack_saved = 0;
struct held_lock *hlock;
+ struct stack_trace trace;
+ int (*save)(struct stack_trace *trace) = save_trace;

/*
* Debugging checks.
@@ -1977,9 +1966,18 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next)
* added:
*/
if (hlock->read != 2 && hlock->check) {
- if (!check_prev_add(curr, hlock, next,
- distance, &stack_saved))
+ int ret = check_prev_add(curr, hlock, next,
+ distance, &trace, save);
+ if (!ret)
return 0;
+
+ /*
+ * Stop saving stack_trace if save_trace() was
+ * called at least once:
+ */
+ if (save && ret == 2)
+ save = NULL;
+
/*
* Stop after the first non-trylock entry,
* as non-trylock entries have added their

Subject: [tip:locking/core] locking/lockdep: Implement the 'crossrelease' feature

Commit-ID: b09be676e0ff25bd6d2e7637e26d349f9109ad75
Gitweb: http://git.kernel.org/tip/b09be676e0ff25bd6d2e7637e26d349f9109ad75
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:52 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:07 +0200

locking/lockdep: Implement the 'crossrelease' feature

Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.

However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock.

So lockdep should track these locks to do a better job. The 'crossrelease'
implementation makes these primitives also be tracked.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/irqflags.h | 24 ++-
include/linux/lockdep.h | 110 +++++++++-
include/linux/sched.h | 8 +
kernel/exit.c | 1 +
kernel/fork.c | 4 +
kernel/locking/lockdep.c | 508 ++++++++++++++++++++++++++++++++++++++++++++---
kernel/workqueue.c | 2 +
lib/Kconfig.debug | 12 ++
8 files changed, 635 insertions(+), 34 deletions(-)

diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5dd1272..5fdd93b 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -23,10 +23,26 @@
# define trace_softirq_context(p) ((p)->softirq_context)
# define trace_hardirqs_enabled(p) ((p)->hardirqs_enabled)
# define trace_softirqs_enabled(p) ((p)->softirqs_enabled)
-# define trace_hardirq_enter() do { current->hardirq_context++; } while (0)
-# define trace_hardirq_exit() do { current->hardirq_context--; } while (0)
-# define lockdep_softirq_enter() do { current->softirq_context++; } while (0)
-# define lockdep_softirq_exit() do { current->softirq_context--; } while (0)
+# define trace_hardirq_enter() \
+do { \
+ current->hardirq_context++; \
+ crossrelease_hist_start(XHLOCK_HARD); \
+} while (0)
+# define trace_hardirq_exit() \
+do { \
+ current->hardirq_context--; \
+ crossrelease_hist_end(XHLOCK_HARD); \
+} while (0)
+# define lockdep_softirq_enter() \
+do { \
+ current->softirq_context++; \
+ crossrelease_hist_start(XHLOCK_SOFT); \
+} while (0)
+# define lockdep_softirq_exit() \
+do { \
+ current->softirq_context--; \
+ crossrelease_hist_end(XHLOCK_SOFT); \
+} while (0)
# define INIT_TRACE_IRQFLAGS .softirqs_enabled = 1,
#else
# define trace_hardirqs_on() do { } while (0)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 0a4c02c..e1e0fcd 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -155,6 +155,12 @@ struct lockdep_map {
int cpu;
unsigned long ip;
#endif
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+ /*
+ * Whether it's a crosslock.
+ */
+ int cross;
+#endif
};

static inline void lockdep_copy_map(struct lockdep_map *to,
@@ -258,8 +264,62 @@ struct held_lock {
unsigned int hardirqs_off:1;
unsigned int references:12; /* 32 bits */
unsigned int pin_count;
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+ /*
+ * Generation id.
+ *
+ * A value of cross_gen_id will be stored when holding this,
+ * which is globally increased whenever each crosslock is held.
+ */
+ unsigned int gen_id;
+#endif
+};
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_XHLOCK_TRACE_ENTRIES 5
+
+/*
+ * This is for keeping locks waiting for commit so that true dependencies
+ * can be added at commit step.
+ */
+struct hist_lock {
+ /*
+ * Seperate stack_trace data. This will be used at commit step.
+ */
+ struct stack_trace trace;
+ unsigned long trace_entries[MAX_XHLOCK_TRACE_ENTRIES];
+
+ /*
+ * Seperate hlock instance. This will be used at commit step.
+ *
+ * TODO: Use a smaller data structure containing only necessary
+ * data. However, we should make lockdep code able to handle the
+ * smaller one first.
+ */
+ struct held_lock hlock;
+};
+
+/*
+ * To initialize a lock as crosslock, lockdep_init_map_crosslock() should
+ * be called instead of lockdep_init_map().
+ */
+struct cross_lock {
+ /*
+ * Seperate hlock instance. This will be used at commit step.
+ *
+ * TODO: Use a smaller data structure containing only necessary
+ * data. However, we should make lockdep code able to handle the
+ * smaller one first.
+ */
+ struct held_lock hlock;
};

+struct lockdep_map_cross {
+ struct lockdep_map map;
+ struct cross_lock xlock;
+};
+#endif
+
/*
* Initialization, self-test and debugging-output methods:
*/
@@ -282,13 +342,6 @@ extern void lockdep_init_map(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass);

/*
- * To initialize a lockdep_map statically use this macro.
- * Note that _name must not be NULL.
- */
-#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
- { .name = (_name), .key = (void *)(_key), }
-
-/*
* Reinitialize a lock key - for cases where there is special locking or
* special initialization of locks so that the validator gets the scope
* of dependencies wrong: they are either too broad (they need a class-split)
@@ -460,6 +513,49 @@ struct pin_cookie { };

#endif /* !LOCKDEP */

+enum xhlock_context_t {
+ XHLOCK_HARD,
+ XHLOCK_SOFT,
+ XHLOCK_PROC,
+ XHLOCK_CTX_NR,
+};
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
+ const char *name,
+ struct lock_class_key *key,
+ int subclass);
+extern void lock_commit_crosslock(struct lockdep_map *lock);
+
+#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
+ { .map.name = (_name), .map.key = (void *)(_key), \
+ .map.cross = 1, }
+
+/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+ { .name = (_name), .key = (void *)(_key), .cross = 0, }
+
+extern void crossrelease_hist_start(enum xhlock_context_t c);
+extern void crossrelease_hist_end(enum xhlock_context_t c);
+extern void lockdep_init_task(struct task_struct *task);
+extern void lockdep_free_task(struct task_struct *task);
+#else
+/*
+ * To initialize a lockdep_map statically use this macro.
+ * Note that _name must not be NULL.
+ */
+#define STATIC_LOCKDEP_MAP_INIT(_name, _key) \
+ { .name = (_name), .key = (void *)(_key), }
+
+static inline void crossrelease_hist_start(enum xhlock_context_t c) {}
+static inline void crossrelease_hist_end(enum xhlock_context_t c) {}
+static inline void lockdep_init_task(struct task_struct *task) {}
+static inline void lockdep_free_task(struct task_struct *task) {}
+#endif
+
#ifdef CONFIG_LOCK_STAT

extern void lock_contended(struct lockdep_map *lock, unsigned long ip);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 57db70e..5235fba 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -848,6 +848,14 @@ struct task_struct {
struct held_lock held_locks[MAX_LOCK_DEPTH];
#endif

+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#define MAX_XHLOCKS_NR 64UL
+ struct hist_lock *xhlocks; /* Crossrelease history locks */
+ unsigned int xhlock_idx;
+ /* For restoring at history boundaries */
+ unsigned int xhlock_idx_hist[XHLOCK_CTX_NR];
+#endif
+
#ifdef CONFIG_UBSAN
unsigned int in_ubsan;
#endif
diff --git a/kernel/exit.c b/kernel/exit.c
index c5548fa..fa72d57 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -920,6 +920,7 @@ void __noreturn do_exit(long code)
exit_rcu();
TASKS_RCU(__srcu_read_unlock(&tasks_rcu_exit_srcu, tasks_rcu_i));

+ lockdep_free_task(tsk);
do_task_dead();
}
EXPORT_SYMBOL_GPL(do_exit);
diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0..cbf2221 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -484,6 +484,8 @@ void __init fork_init(void)
cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "fork:vm_stack_cache",
NULL, free_vm_stack_cache);
#endif
+
+ lockdep_init_task(&init_task);
}

int __weak arch_dup_task_struct(struct task_struct *dst,
@@ -1691,6 +1693,7 @@ static __latent_entropy struct task_struct *copy_process(
p->lockdep_depth = 0; /* no locks held yet */
p->curr_chain_key = 0;
p->lockdep_recursion = 0;
+ lockdep_init_task(p);
#endif

#ifdef CONFIG_DEBUG_MUTEXES
@@ -1949,6 +1952,7 @@ bad_fork_cleanup_audit:
bad_fork_cleanup_perf:
perf_event_free_task(p);
bad_fork_cleanup_policy:
+ lockdep_free_task(p);
#ifdef CONFIG_NUMA
mpol_put(p->mempolicy);
bad_fork_cleanup_threadgroup_lock:
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 841828b..56f69cc 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -58,6 +58,10 @@
#define CREATE_TRACE_POINTS
#include <trace/events/lock.h>

+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+#include <linux/slab.h>
+#endif
+
#ifdef CONFIG_PROVE_LOCKING
int prove_locking = 1;
module_param(prove_locking, int, 0644);
@@ -724,6 +728,18 @@ look_up_lock_class(struct lockdep_map *lock, unsigned int subclass)
return is_static || static_obj(lock->key) ? NULL : ERR_PTR(-EINVAL);
}

+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+static void cross_init(struct lockdep_map *lock, int cross);
+static int cross_lock(struct lockdep_map *lock);
+static int lock_acquire_crosslock(struct held_lock *hlock);
+static int lock_release_crosslock(struct lockdep_map *lock);
+#else
+static inline void cross_init(struct lockdep_map *lock, int cross) {}
+static inline int cross_lock(struct lockdep_map *lock) { return 0; }
+static inline int lock_acquire_crosslock(struct held_lock *hlock) { return 2; }
+static inline int lock_release_crosslock(struct lockdep_map *lock) { return 2; }
+#endif
+
/*
* Register a lock's class in the hash-table, if the class is not present
* yet. Otherwise we look it up. We cache the result in the lock object
@@ -1795,6 +1811,9 @@ check_deadlock(struct task_struct *curr, struct held_lock *next,
if (nest)
return 2;

+ if (cross_lock(prev->instance))
+ continue;
+
return print_deadlock_bug(curr, prev, next);
}
return 1;
@@ -1962,30 +1981,36 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next)
int distance = curr->lockdep_depth - depth + 1;
hlock = curr->held_locks + depth - 1;
/*
- * Only non-recursive-read entries get new dependencies
- * added:
+ * Only non-crosslock entries get new dependencies added.
+ * Crosslock entries will be added by commit later:
*/
- if (hlock->read != 2 && hlock->check) {
- int ret = check_prev_add(curr, hlock, next,
- distance, &trace, save);
- if (!ret)
- return 0;
-
+ if (!cross_lock(hlock->instance)) {
/*
- * Stop saving stack_trace if save_trace() was
- * called at least once:
+ * Only non-recursive-read entries get new dependencies
+ * added:
*/
- if (save && ret == 2)
- save = NULL;
+ if (hlock->read != 2 && hlock->check) {
+ int ret = check_prev_add(curr, hlock, next,
+ distance, &trace, save);
+ if (!ret)
+ return 0;

- /*
- * Stop after the first non-trylock entry,
- * as non-trylock entries have added their
- * own direct dependencies already, so this
- * lock is connected to them indirectly:
- */
- if (!hlock->trylock)
- break;
+ /*
+ * Stop saving stack_trace if save_trace() was
+ * called at least once:
+ */
+ if (save && ret == 2)
+ save = NULL;
+
+ /*
+ * Stop after the first non-trylock entry,
+ * as non-trylock entries have added their
+ * own direct dependencies already, so this
+ * lock is connected to them indirectly:
+ */
+ if (!hlock->trylock)
+ break;
+ }
}
depth--;
/*
@@ -3176,7 +3201,7 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
/*
* Initialize a lock instance's lock-class mapping info:
*/
-void lockdep_init_map(struct lockdep_map *lock, const char *name,
+static void __lockdep_init_map(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass)
{
int i;
@@ -3234,8 +3259,25 @@ void lockdep_init_map(struct lockdep_map *lock, const char *name,
raw_local_irq_restore(flags);
}
}
+
+void lockdep_init_map(struct lockdep_map *lock, const char *name,
+ struct lock_class_key *key, int subclass)
+{
+ cross_init(lock, 0);
+ __lockdep_init_map(lock, name, key, subclass);
+}
EXPORT_SYMBOL_GPL(lockdep_init_map);

+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+void lockdep_init_map_crosslock(struct lockdep_map *lock, const char *name,
+ struct lock_class_key *key, int subclass)
+{
+ cross_init(lock, 1);
+ __lockdep_init_map(lock, name, key, subclass);
+}
+EXPORT_SYMBOL_GPL(lockdep_init_map_crosslock);
+#endif
+
struct lock_class_key __lockdep_no_validate__;
EXPORT_SYMBOL_GPL(__lockdep_no_validate__);

@@ -3291,6 +3333,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
int chain_head = 0;
int class_idx;
u64 chain_key;
+ int ret;

if (unlikely(!debug_locks))
return 0;
@@ -3339,7 +3382,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,

class_idx = class - lock_classes + 1;

- if (depth) {
+ /* TODO: nest_lock is not implemented for crosslock yet. */
+ if (depth && !cross_lock(lock)) {
hlock = curr->held_locks + depth - 1;
if (hlock->class_idx == class_idx && nest_lock) {
if (hlock->references) {
@@ -3427,6 +3471,14 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
if (!validate_chain(curr, lock, hlock, chain_head, chain_key))
return 0;

+ ret = lock_acquire_crosslock(hlock);
+ /*
+ * 2 means normal acquire operations are needed. Otherwise, it's
+ * ok just to return with '0:fail, 1:success'.
+ */
+ if (ret != 2)
+ return ret;
+
curr->curr_chain_key = chain_key;
curr->lockdep_depth++;
check_chain_key(curr);
@@ -3664,11 +3716,19 @@ __lock_release(struct lockdep_map *lock, int nested, unsigned long ip)
struct task_struct *curr = current;
struct held_lock *hlock;
unsigned int depth;
- int i;
+ int ret, i;

if (unlikely(!debug_locks))
return 0;

+ ret = lock_release_crosslock(lock);
+ /*
+ * 2 means normal release operations are needed. Otherwise, it's
+ * ok just to return with '0:fail, 1:success'.
+ */
+ if (ret != 2)
+ return ret;
+
depth = curr->lockdep_depth;
/*
* So we're all set to release this lock.. wait what lock? We don't
@@ -4532,6 +4592,13 @@ asmlinkage __visible void lockdep_sys_exit(void)
curr->comm, curr->pid);
lockdep_print_held_locks(curr);
}
+
+ /*
+ * The lock history for each syscall should be independent. So wipe the
+ * slate clean on return to userspace.
+ */
+ crossrelease_hist_end(XHLOCK_PROC);
+ crossrelease_hist_start(XHLOCK_PROC);
}

void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
@@ -4580,3 +4647,398 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
dump_stack();
}
EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
+
+#ifdef CONFIG_LOCKDEP_CROSSRELEASE
+
+/*
+ * Crossrelease works by recording a lock history for each thread and
+ * connecting those historic locks that were taken after the
+ * wait_for_completion() in the complete() context.
+ *
+ * Task-A Task-B
+ *
+ * mutex_lock(&A);
+ * mutex_unlock(&A);
+ *
+ * wait_for_completion(&C);
+ * lock_acquire_crosslock();
+ * atomic_inc_return(&cross_gen_id);
+ * |
+ * | mutex_lock(&B);
+ * | mutex_unlock(&B);
+ * |
+ * | complete(&C);
+ * `-- lock_commit_crosslock();
+ *
+ * Which will then add a dependency between B and C.
+ */
+
+#define xhlock(i) (current->xhlocks[(i) % MAX_XHLOCKS_NR])
+
+/*
+ * Whenever a crosslock is held, cross_gen_id will be increased.
+ */
+static atomic_t cross_gen_id; /* Can be wrapped */
+
+/*
+ * Lock history stacks; we have 3 nested lock history stacks:
+ *
+ * Hard IRQ
+ * Soft IRQ
+ * History / Task
+ *
+ * The thing is that once we complete a (Hard/Soft) IRQ the future task locks
+ * should not depend on any of the locks observed while running the IRQ.
+ *
+ * So what we do is rewind the history buffer and erase all our knowledge of
+ * that temporal event.
+ */
+
+/*
+ * We need this to annotate lock history boundaries. Take for instance
+ * workqueues; each work is independent of the last. The completion of a future
+ * work does not depend on the completion of a past work (in general).
+ * Therefore we must not carry that (lock) dependency across works.
+ *
+ * This is true for many things; pretty much all kthreads fall into this
+ * pattern, where they have an 'idle' state and future completions do not
+ * depend on past completions. Its just that since they all have the 'same'
+ * form -- the kthread does the same over and over -- it doesn't typically
+ * matter.
+ *
+ * The same is true for system-calls, once a system call is completed (we've
+ * returned to userspace) the next system call does not depend on the lock
+ * history of the previous system call.
+ */
+void crossrelease_hist_start(enum xhlock_context_t c)
+{
+ if (current->xhlocks)
+ current->xhlock_idx_hist[c] = current->xhlock_idx;
+}
+
+void crossrelease_hist_end(enum xhlock_context_t c)
+{
+ if (current->xhlocks)
+ current->xhlock_idx = current->xhlock_idx_hist[c];
+}
+
+static int cross_lock(struct lockdep_map *lock)
+{
+ return lock ? lock->cross : 0;
+}
+
+/*
+ * This is needed to decide the relationship between wrapable variables.
+ */
+static inline int before(unsigned int a, unsigned int b)
+{
+ return (int)(a - b) < 0;
+}
+
+static inline struct lock_class *xhlock_class(struct hist_lock *xhlock)
+{
+ return hlock_class(&xhlock->hlock);
+}
+
+static inline struct lock_class *xlock_class(struct cross_lock *xlock)
+{
+ return hlock_class(&xlock->hlock);
+}
+
+/*
+ * Should we check a dependency with previous one?
+ */
+static inline int depend_before(struct held_lock *hlock)
+{
+ return hlock->read != 2 && hlock->check && !hlock->trylock;
+}
+
+/*
+ * Should we check a dependency with next one?
+ */
+static inline int depend_after(struct held_lock *hlock)
+{
+ return hlock->read != 2 && hlock->check;
+}
+
+/*
+ * Check if the xhlock is valid, which would be false if,
+ *
+ * 1. Has not used after initializaion yet.
+ *
+ * Remind hist_lock is implemented as a ring buffer.
+ */
+static inline int xhlock_valid(struct hist_lock *xhlock)
+{
+ /*
+ * xhlock->hlock.instance must be !NULL.
+ */
+ return !!xhlock->hlock.instance;
+}
+
+/*
+ * Record a hist_lock entry.
+ *
+ * Irq disable is only required.
+ */
+static void add_xhlock(struct held_lock *hlock)
+{
+ unsigned int idx = ++current->xhlock_idx;
+ struct hist_lock *xhlock = &xhlock(idx);
+
+#ifdef CONFIG_DEBUG_LOCKDEP
+ /*
+ * This can be done locklessly because they are all task-local
+ * state, we must however ensure IRQs are disabled.
+ */
+ WARN_ON_ONCE(!irqs_disabled());
+#endif
+
+ /* Initialize hist_lock's members */
+ xhlock->hlock = *hlock;
+
+ xhlock->trace.nr_entries = 0;
+ xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
+ xhlock->trace.entries = xhlock->trace_entries;
+ xhlock->trace.skip = 3;
+ save_stack_trace(&xhlock->trace);
+}
+
+static inline int same_context_xhlock(struct hist_lock *xhlock)
+{
+ return xhlock->hlock.irq_context == task_irq_context(current);
+}
+
+/*
+ * This should be lockless as far as possible because this would be
+ * called very frequently.
+ */
+static void check_add_xhlock(struct held_lock *hlock)
+{
+ /*
+ * Record a hist_lock, only in case that acquisitions ahead
+ * could depend on the held_lock. For example, if the held_lock
+ * is trylock then acquisitions ahead never depends on that.
+ * In that case, we don't need to record it. Just return.
+ */
+ if (!current->xhlocks || !depend_before(hlock))
+ return;
+
+ add_xhlock(hlock);
+}
+
+/*
+ * For crosslock.
+ */
+static int add_xlock(struct held_lock *hlock)
+{
+ struct cross_lock *xlock;
+ unsigned int gen_id;
+
+ if (!graph_lock())
+ return 0;
+
+ xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;
+
+ gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
+ xlock->hlock = *hlock;
+ xlock->hlock.gen_id = gen_id;
+ graph_unlock();
+
+ return 1;
+}
+
+/*
+ * Called for both normal and crosslock acquires. Normal locks will be
+ * pushed on the hist_lock queue. Cross locks will record state and
+ * stop regular lock_acquire() to avoid being placed on the held_lock
+ * stack.
+ *
+ * Return: 0 - failure;
+ * 1 - crosslock, done;
+ * 2 - normal lock, continue to held_lock[] ops.
+ */
+static int lock_acquire_crosslock(struct held_lock *hlock)
+{
+ /*
+ * CONTEXT 1 CONTEXT 2
+ * --------- ---------
+ * lock A (cross)
+ * X = atomic_inc_return(&cross_gen_id)
+ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ * Y = atomic_read_acquire(&cross_gen_id)
+ * lock B
+ *
+ * atomic_read_acquire() is for ordering between A and B,
+ * IOW, A happens before B, when CONTEXT 2 see Y >= X.
+ *
+ * Pairs with atomic_inc_return() in add_xlock().
+ */
+ hlock->gen_id = (unsigned int)atomic_read_acquire(&cross_gen_id);
+
+ if (cross_lock(hlock->instance))
+ return add_xlock(hlock);
+
+ check_add_xhlock(hlock);
+ return 2;
+}
+
+static int copy_trace(struct stack_trace *trace)
+{
+ unsigned long *buf = stack_trace + nr_stack_trace_entries;
+ unsigned int max_nr = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries;
+ unsigned int nr = min(max_nr, trace->nr_entries);
+
+ trace->nr_entries = nr;
+ memcpy(buf, trace->entries, nr * sizeof(trace->entries[0]));
+ trace->entries = buf;
+ nr_stack_trace_entries += nr;
+
+ if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) {
+ if (!debug_locks_off_graph_unlock())
+ return 0;
+
+ print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
+ dump_stack();
+
+ return 0;
+ }
+
+ return 1;
+}
+
+static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
+{
+ unsigned int xid, pid;
+ u64 chain_key;
+
+ xid = xlock_class(xlock) - lock_classes;
+ chain_key = iterate_chain_key((u64)0, xid);
+ pid = xhlock_class(xhlock) - lock_classes;
+ chain_key = iterate_chain_key(chain_key, pid);
+
+ if (lookup_chain_cache(chain_key))
+ return 1;
+
+ if (!add_chain_cache_classes(xid, pid, xhlock->hlock.irq_context,
+ chain_key))
+ return 0;
+
+ if (!check_prev_add(current, &xlock->hlock, &xhlock->hlock, 1,
+ &xhlock->trace, copy_trace))
+ return 0;
+
+ return 1;
+}
+
+static void commit_xhlocks(struct cross_lock *xlock)
+{
+ unsigned int cur = current->xhlock_idx;
+ unsigned int i;
+
+ if (!graph_lock())
+ return;
+
+ for (i = 0; i < MAX_XHLOCKS_NR; i++) {
+ struct hist_lock *xhlock = &xhlock(cur - i);
+
+ if (!xhlock_valid(xhlock))
+ break;
+
+ if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
+ break;
+
+ if (!same_context_xhlock(xhlock))
+ break;
+
+ /*
+ * commit_xhlock() returns 0 with graph_lock already
+ * released if fail.
+ */
+ if (!commit_xhlock(xlock, xhlock))
+ return;
+ }
+
+ graph_unlock();
+}
+
+void lock_commit_crosslock(struct lockdep_map *lock)
+{
+ struct cross_lock *xlock;
+ unsigned long flags;
+
+ if (unlikely(!debug_locks || current->lockdep_recursion))
+ return;
+
+ if (!current->xhlocks)
+ return;
+
+ /*
+ * Do commit hist_locks with the cross_lock, only in case that
+ * the cross_lock could depend on acquisitions after that.
+ *
+ * For example, if the cross_lock does not have the 'check' flag
+ * then we don't need to check dependencies and commit for that.
+ * Just skip it. In that case, of course, the cross_lock does
+ * not depend on acquisitions ahead, either.
+ *
+ * WARNING: Don't do that in add_xlock() in advance. When an
+ * acquisition context is different from the commit context,
+ * invalid(skipped) cross_lock might be accessed.
+ */
+ if (!depend_after(&((struct lockdep_map_cross *)lock)->xlock.hlock))
+ return;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+ current->lockdep_recursion = 1;
+ xlock = &((struct lockdep_map_cross *)lock)->xlock;
+ commit_xhlocks(xlock);
+ current->lockdep_recursion = 0;
+ raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_commit_crosslock);
+
+/*
+ * Return: 1 - crosslock, done;
+ * 2 - normal lock, continue to held_lock[] ops.
+ */
+static int lock_release_crosslock(struct lockdep_map *lock)
+{
+ return cross_lock(lock) ? 1 : 2;
+}
+
+static void cross_init(struct lockdep_map *lock, int cross)
+{
+ lock->cross = cross;
+
+ /*
+ * Crossrelease assumes that the ring buffer size of xhlocks
+ * is aligned with power of 2. So force it on build.
+ */
+ BUILD_BUG_ON(MAX_XHLOCKS_NR & (MAX_XHLOCKS_NR - 1));
+}
+
+void lockdep_init_task(struct task_struct *task)
+{
+ int i;
+
+ task->xhlock_idx = UINT_MAX;
+
+ for (i = 0; i < XHLOCK_CTX_NR; i++)
+ task->xhlock_idx_hist[i] = UINT_MAX;
+
+ task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
+ GFP_KERNEL);
+}
+
+void lockdep_free_task(struct task_struct *task)
+{
+ if (task->xhlocks) {
+ void *tmp = task->xhlocks;
+ /* Diable crossrelease for current */
+ task->xhlocks = NULL;
+ kfree(tmp);
+ }
+}
+#endif
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ca937b0..e86733a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2093,6 +2093,7 @@ __acquires(&pool->lock)

lock_map_acquire_read(&pwq->wq->lockdep_map);
lock_map_acquire(&lockdep_map);
+ crossrelease_hist_start(XHLOCK_PROC);
trace_workqueue_execute_start(work);
worker->current_func(work);
/*
@@ -2100,6 +2101,7 @@ __acquires(&pool->lock)
* point will only record its address.
*/
trace_workqueue_execute_end(work);
+ crossrelease_hist_end(XHLOCK_PROC);
lock_map_release(&lockdep_map);
lock_map_release(&pwq->wq->lockdep_map);

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 98fe715..c6038f2 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1150,6 +1150,18 @@ config LOCK_STAT
CONFIG_LOCK_STAT defines "contended" and "acquired" lock events.
(CONFIG_LOCKDEP defines "acquire" and "release" events.)

+config LOCKDEP_CROSSRELEASE
+ bool "Lock debugging: make lockdep work for crosslocks"
+ depends on PROVE_LOCKING
+ default n
+ help
+ This makes lockdep work for crosslock which is a lock allowed to
+ be released in a different context from the acquisition context.
+ Normally a lock must be released in the context acquiring the lock.
+ However, relexing this constraint helps synchronization primitives
+ such as page locks or completions can use the lock correctness
+ detector, lockdep.
+
config DEBUG_LOCKDEP
bool "Lock dependency engine debugging"
depends on DEBUG_KERNEL && LOCKDEP

Subject: [tip:locking/core] locking/lockdep: Detect and handle hist_lock ring buffer overwrite

Commit-ID: 23f873d8f9526ed7e49a1a02a45f8afb9ae5fb84
Gitweb: http://git.kernel.org/tip/23f873d8f9526ed7e49a1a02a45f8afb9ae5fb84
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:53 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:08 +0200

locking/lockdep: Detect and handle hist_lock ring buffer overwrite

The ring buffer can be overwritten by hardirq/softirq/work contexts.
That cases must be considered on rollback or commit. For example,

|<------ hist_lock ring buffer size ----->|
ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
wrapped > iiiiiiiiiiiiiiiiiiiiiii....................

where 'p' represents an acquisition in process context,
'i' represents an acquisition in irq context.

On irq exit, crossrelease tries to rollback idx to original position,
but it should not because the entry already has been invalid by
overwriting 'i'. Avoid rollback or commit for entries overwritten.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/lockdep.h | 20 +++++++++++++++++++
include/linux/sched.h | 3 +++
kernel/locking/lockdep.c | 52 +++++++++++++++++++++++++++++++++++++++++++-----
3 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index e1e0fcd..c75eedd 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -284,6 +284,26 @@ struct held_lock {
*/
struct hist_lock {
/*
+ * Id for each entry in the ring buffer. This is used to
+ * decide whether the ring buffer was overwritten or not.
+ *
+ * For example,
+ *
+ * |<----------- hist_lock ring buffer size ------->|
+ * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
+ * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
+ *
+ * where 'p' represents an acquisition in process
+ * context, 'i' represents an acquisition in irq
+ * context.
+ *
+ * In this example, the ring buffer was overwritten by
+ * acquisitions in irq context, that should be detected on
+ * rollback or commit.
+ */
+ unsigned int hist_id;
+
+ /*
* Seperate stack_trace data. This will be used at commit step.
*/
struct stack_trace trace;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5235fba..772c5f6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -854,6 +854,9 @@ struct task_struct {
unsigned int xhlock_idx;
/* For restoring at history boundaries */
unsigned int xhlock_idx_hist[XHLOCK_CTX_NR];
+ unsigned int hist_id;
+ /* For overwrite check at each context exit */
+ unsigned int hist_id_save[XHLOCK_CTX_NR];
#endif

#ifdef CONFIG_UBSAN
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 56f69cc..eda8114 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4681,6 +4681,17 @@ EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
static atomic_t cross_gen_id; /* Can be wrapped */

/*
+ * Make an entry of the ring buffer invalid.
+ */
+static inline void invalidate_xhlock(struct hist_lock *xhlock)
+{
+ /*
+ * Normally, xhlock->hlock.instance must be !NULL.
+ */
+ xhlock->hlock.instance = NULL;
+}
+
+/*
* Lock history stacks; we have 3 nested lock history stacks:
*
* Hard IRQ
@@ -4712,14 +4723,28 @@ static atomic_t cross_gen_id; /* Can be wrapped */
*/
void crossrelease_hist_start(enum xhlock_context_t c)
{
- if (current->xhlocks)
- current->xhlock_idx_hist[c] = current->xhlock_idx;
+ struct task_struct *cur = current;
+
+ if (cur->xhlocks) {
+ cur->xhlock_idx_hist[c] = cur->xhlock_idx;
+ cur->hist_id_save[c] = cur->hist_id;
+ }
}

void crossrelease_hist_end(enum xhlock_context_t c)
{
- if (current->xhlocks)
- current->xhlock_idx = current->xhlock_idx_hist[c];
+ struct task_struct *cur = current;
+
+ if (cur->xhlocks) {
+ unsigned int idx = cur->xhlock_idx_hist[c];
+ struct hist_lock *h = &xhlock(idx);
+
+ cur->xhlock_idx = idx;
+
+ /* Check if the ring was overwritten. */
+ if (h->hist_id != cur->hist_id_save[c])
+ invalidate_xhlock(h);
+ }
}

static int cross_lock(struct lockdep_map *lock)
@@ -4765,6 +4790,7 @@ static inline int depend_after(struct held_lock *hlock)
* Check if the xhlock is valid, which would be false if,
*
* 1. Has not used after initializaion yet.
+ * 2. Got invalidated.
*
* Remind hist_lock is implemented as a ring buffer.
*/
@@ -4796,6 +4822,7 @@ static void add_xhlock(struct held_lock *hlock)

/* Initialize hist_lock's members */
xhlock->hlock = *hlock;
+ xhlock->hist_id = current->hist_id++;

xhlock->trace.nr_entries = 0;
xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
@@ -4934,6 +4961,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
static void commit_xhlocks(struct cross_lock *xlock)
{
unsigned int cur = current->xhlock_idx;
+ unsigned int prev_hist_id = xhlock(cur).hist_id;
unsigned int i;

if (!graph_lock())
@@ -4952,6 +4980,17 @@ static void commit_xhlocks(struct cross_lock *xlock)
break;

/*
+ * Filter out the cases that the ring buffer was
+ * overwritten and the previous entry has a bigger
+ * hist_id than the following one, which is impossible
+ * otherwise.
+ */
+ if (unlikely(before(xhlock->hist_id, prev_hist_id)))
+ break;
+
+ prev_hist_id = xhlock->hist_id;
+
+ /*
* commit_xhlock() returns 0 with graph_lock already
* released if fail.
*/
@@ -5024,9 +5063,12 @@ void lockdep_init_task(struct task_struct *task)
int i;

task->xhlock_idx = UINT_MAX;
+ task->hist_id = 0;

- for (i = 0; i < XHLOCK_CTX_NR; i++)
+ for (i = 0; i < XHLOCK_CTX_NR; i++) {
task->xhlock_idx_hist[i] = UINT_MAX;
+ task->hist_id_save[i] = 0;
+ }

task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
GFP_KERNEL);

Subject: [tip:locking/core] locking/lockdep: Handle non(or multi)-acquisition of a crosslock

Commit-ID: 28a903f63ec0811ead70ad0f8665e838d207a25e
Gitweb: http://git.kernel.org/tip/28a903f63ec0811ead70ad0f8665e838d207a25e
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:54 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:08 +0200

locking/lockdep: Handle non(or multi)-acquisition of a crosslock

No acquisition might be in progress on commit of a crosslock. Completion
operations enabling crossrelease are the case like:

CONTEXT X CONTEXT Y
--------- ---------
trigger completion context
complete AX
commit AX
wait_for_complete AX
acquire AX
wait

where AX is a crosslock.

When no acquisition is in progress, we should not perform commit because
the lock does not exist, which might cause incorrect memory access. So
we have to track the number of acquisitions of a crosslock to handle it.

Moreover, in case that more than one acquisition of a crosslock are
overlapped like:

CONTEXT W CONTEXT X CONTEXT Y CONTEXT Z
--------- --------- --------- ---------
acquire AX (gen_id: 1)
acquire A
acquire AX (gen_id: 10)
acquire B
commit AX
acquire C
commit AX

where A, B and C are typical locks and AX is a crosslock.

Current crossrelease code performs commits in Y and Z with gen_id = 10.
However, we can use gen_id = 1 to do it, since not only 'acquire AX in X'
but 'acquire AX in W' also depends on each acquisition in Y and Z until
their commits. So make it use gen_id = 1 instead of 10 on their commits,
which adds an additional dependency 'AX -> A' in the example above.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/lockdep.h | 22 ++++++++++++-
kernel/locking/lockdep.c | 82 +++++++++++++++++++++++++++++++++---------------
2 files changed, 77 insertions(+), 27 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index c75eedd..651cc61 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -325,6 +325,19 @@ struct hist_lock {
*/
struct cross_lock {
/*
+ * When more than one acquisition of crosslocks are overlapped,
+ * we have to perform commit for them based on cross_gen_id of
+ * the first acquisition, which allows us to add more true
+ * dependencies.
+ *
+ * Moreover, when no acquisition of a crosslock is in progress,
+ * we should not perform commit because the lock might not exist
+ * any more, which might cause incorrect memory access. So we
+ * have to track the number of acquisitions of a crosslock.
+ */
+ int nr_acquire;
+
+ /*
* Seperate hlock instance. This will be used at commit step.
*
* TODO: Use a smaller data structure containing only necessary
@@ -547,9 +560,16 @@ extern void lockdep_init_map_crosslock(struct lockdep_map *lock,
int subclass);
extern void lock_commit_crosslock(struct lockdep_map *lock);

+/*
+ * What we essencially have to initialize is 'nr_acquire'. Other members
+ * will be initialized in add_xlock().
+ */
+#define STATIC_CROSS_LOCK_INIT() \
+ { .nr_acquire = 0,}
+
#define STATIC_CROSS_LOCKDEP_MAP_INIT(_name, _key) \
{ .map.name = (_name), .map.key = (void *)(_key), \
- .map.cross = 1, }
+ .map.cross = 1, .xlock = STATIC_CROSS_LOCK_INIT(), }

/*
* To initialize a lockdep_map statically use this macro.
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index eda8114..7f97871 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4867,11 +4867,28 @@ static int add_xlock(struct held_lock *hlock)

xlock = &((struct lockdep_map_cross *)hlock->instance)->xlock;

+ /*
+ * When acquisitions for a crosslock are overlapped, we use
+ * nr_acquire to perform commit for them, based on cross_gen_id
+ * of the first acquisition, which allows to add additional
+ * dependencies.
+ *
+ * Moreover, when no acquisition of a crosslock is in progress,
+ * we should not perform commit because the lock might not exist
+ * any more, which might cause incorrect memory access. So we
+ * have to track the number of acquisitions of a crosslock.
+ *
+ * depend_after() is necessary to initialize only the first
+ * valid xlock so that the xlock can be used on its commit.
+ */
+ if (xlock->nr_acquire++ && depend_after(&xlock->hlock))
+ goto unlock;
+
gen_id = (unsigned int)atomic_inc_return(&cross_gen_id);
xlock->hlock = *hlock;
xlock->hlock.gen_id = gen_id;
+unlock:
graph_unlock();
-
return 1;
}

@@ -4967,35 +4984,37 @@ static void commit_xhlocks(struct cross_lock *xlock)
if (!graph_lock())
return;

- for (i = 0; i < MAX_XHLOCKS_NR; i++) {
- struct hist_lock *xhlock = &xhlock(cur - i);
+ if (xlock->nr_acquire) {
+ for (i = 0; i < MAX_XHLOCKS_NR; i++) {
+ struct hist_lock *xhlock = &xhlock(cur - i);

- if (!xhlock_valid(xhlock))
- break;
+ if (!xhlock_valid(xhlock))
+ break;

- if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
- break;
+ if (before(xhlock->hlock.gen_id, xlock->hlock.gen_id))
+ break;

- if (!same_context_xhlock(xhlock))
- break;
+ if (!same_context_xhlock(xhlock))
+ break;

- /*
- * Filter out the cases that the ring buffer was
- * overwritten and the previous entry has a bigger
- * hist_id than the following one, which is impossible
- * otherwise.
- */
- if (unlikely(before(xhlock->hist_id, prev_hist_id)))
- break;
+ /*
+ * Filter out the cases that the ring buffer was
+ * overwritten and the previous entry has a bigger
+ * hist_id than the following one, which is impossible
+ * otherwise.
+ */
+ if (unlikely(before(xhlock->hist_id, prev_hist_id)))
+ break;

- prev_hist_id = xhlock->hist_id;
+ prev_hist_id = xhlock->hist_id;

- /*
- * commit_xhlock() returns 0 with graph_lock already
- * released if fail.
- */
- if (!commit_xhlock(xlock, xhlock))
- return;
+ /*
+ * commit_xhlock() returns 0 with graph_lock already
+ * released if fail.
+ */
+ if (!commit_xhlock(xlock, xhlock))
+ return;
+ }
}

graph_unlock();
@@ -5039,16 +5058,27 @@ void lock_commit_crosslock(struct lockdep_map *lock)
EXPORT_SYMBOL_GPL(lock_commit_crosslock);

/*
- * Return: 1 - crosslock, done;
+ * Return: 0 - failure;
+ * 1 - crosslock, done;
* 2 - normal lock, continue to held_lock[] ops.
*/
static int lock_release_crosslock(struct lockdep_map *lock)
{
- return cross_lock(lock) ? 1 : 2;
+ if (cross_lock(lock)) {
+ if (!graph_lock())
+ return 0;
+ ((struct lockdep_map_cross *)lock)->xlock.nr_acquire--;
+ graph_unlock();
+ return 1;
+ }
+ return 2;
}

static void cross_init(struct lockdep_map *lock, int cross)
{
+ if (cross)
+ ((struct lockdep_map_cross *)lock)->xlock.nr_acquire = 0;
+
lock->cross = cross;

/*

Subject: [tip:locking/core] locking/lockdep: Make print_circular_bug() aware of crossrelease

Commit-ID: 383a4bc88849b804385162e81bf704f8f9690a87
Gitweb: http://git.kernel.org/tip/383a4bc88849b804385162e81bf704f8f9690a87
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:55 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:09 +0200

locking/lockdep: Make print_circular_bug() aware of crossrelease

print_circular_bug() reporting circular bug assumes that target hlock is
owned by the current. However, in crossrelease, target hlock can be
owned by other than the current. So the report format needs to be
changed to reflect the change.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/locking/lockdep.c | 67 +++++++++++++++++++++++++++++++++---------------
1 file changed, 47 insertions(+), 20 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 7f97871..1114dc4 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1139,22 +1139,41 @@ print_circular_lock_scenario(struct held_lock *src,
printk(KERN_CONT "\n\n");
}

- printk(" Possible unsafe locking scenario:\n\n");
- printk(" CPU0 CPU1\n");
- printk(" ---- ----\n");
- printk(" lock(");
- __print_lock_name(target);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(parent);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(target);
- printk(KERN_CONT ");\n");
- printk(" lock(");
- __print_lock_name(source);
- printk(KERN_CONT ");\n");
- printk("\n *** DEADLOCK ***\n\n");
+ if (cross_lock(tgt->instance)) {
+ printk(" Possible unsafe locking scenario by crosslock:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(parent);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(source);
+ printk(KERN_CONT ");\n");
+ printk(" unlock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+ } else {
+ printk(" Possible unsafe locking scenario:\n\n");
+ printk(" CPU0 CPU1\n");
+ printk(" ---- ----\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(parent);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(target);
+ printk(KERN_CONT ");\n");
+ printk(" lock(");
+ __print_lock_name(source);
+ printk(KERN_CONT ");\n");
+ printk("\n *** DEADLOCK ***\n\n");
+ }
}

/*
@@ -1179,7 +1198,12 @@ print_circular_bug_header(struct lock_list *entry, unsigned int depth,
pr_warn("%s/%d is trying to acquire lock:\n",
curr->comm, task_pid_nr(curr));
print_lock(check_src);
- pr_warn("\nbut task is already holding lock:\n");
+
+ if (cross_lock(check_tgt->instance))
+ pr_warn("\nbut now in release context of a crosslock acquired at the following:\n");
+ else
+ pr_warn("\nbut task is already holding lock:\n");
+
print_lock(check_tgt);
pr_warn("\nwhich lock already depends on the new lock.\n\n");
pr_warn("\nthe existing dependency chain (in reverse order) is:\n");
@@ -1197,7 +1221,8 @@ static inline int class_equal(struct lock_list *entry, void *data)
static noinline int print_circular_bug(struct lock_list *this,
struct lock_list *target,
struct held_lock *check_src,
- struct held_lock *check_tgt)
+ struct held_lock *check_tgt,
+ struct stack_trace *trace)
{
struct task_struct *curr = current;
struct lock_list *parent;
@@ -1207,7 +1232,9 @@ static noinline int print_circular_bug(struct lock_list *this,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return 0;

- if (!save_trace(&this->trace))
+ if (cross_lock(check_tgt->instance))
+ this->trace = *trace;
+ else if (!save_trace(&this->trace))
return 0;

depth = get_lock_depth(target);
@@ -1864,7 +1891,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
this.parent = NULL;
ret = check_noncircular(&this, hlock_class(prev), &target_entry);
if (unlikely(!ret))
- return print_circular_bug(&this, target_entry, next, prev);
+ return print_circular_bug(&this, target_entry, next, prev, trace);
else if (unlikely(ret < 0))
return print_bfs_bug(ret);


Subject: [tip:locking/core] locking/lockdep: Apply crossrelease to completions

Commit-ID: cd8084f91c02c1afd256a39aa833bff737631304
Gitweb: http://git.kernel.org/tip/cd8084f91c02c1afd256a39aa833bff737631304
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:12:56 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:29:10 +0200

locking/lockdep: Apply crossrelease to completions

Although wait_for_completion() and its family can cause deadlock, the
lock correctness validator could not be applied to them until now,
because things like complete() are usually called in a different context
from the waiting context, which violates lockdep's assumption.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to those completion operations. Applied it.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/completion.h | 45 ++++++++++++++++++++++++++++++++++++++++++++-
kernel/sched/completion.c | 11 +++++++++++
lib/Kconfig.debug | 9 +++++++++
3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 5d5aaae..9bcebf5 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
*/

#include <linux/wait.h>
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include <linux/lockdep.h>
+#endif

/*
* struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,50 @@
struct completion {
unsigned int done;
wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+ struct lockdep_map_cross map;
+#endif
};

+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+ lock_acquire_exclusive((struct lockdep_map *)&x->map, 0, 0, NULL, _RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+ lock_release((struct lockdep_map *)&x->map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+ lock_commit_crosslock((struct lockdep_map *)&x->map);
+}
+
+#define init_completion(x) \
+do { \
+ static struct lock_class_key __key; \
+ lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \
+ "(complete)" #x, \
+ &__key, 0); \
+ __init_completion(x); \
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+ { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+ STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
+#else
#define COMPLETION_INITIALIZER(work) \
{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif

#define COMPLETION_INITIALIZER_ONSTACK(work) \
({ init_completion(&work); work; })
@@ -70,7 +113,7 @@ struct completion {
* This inline function will initialize a dynamically created completion
* structure.
*/
-static inline void init_completion(struct completion *x)
+static inline void __init_completion(struct completion *x)
{
x->done = 0;
init_waitqueue_head(&x->wait);
diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index 13fc5ae..566b6ec 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -32,6 +32,12 @@ void complete(struct completion *x)
unsigned long flags;

spin_lock_irqsave(&x->wait.lock, flags);
+
+ /*
+ * Perform commit of crossrelease here.
+ */
+ complete_release_commit(x);
+
if (x->done != UINT_MAX)
x->done++;
__wake_up_locked(&x->wait, TASK_NORMAL, 1);
@@ -92,9 +98,14 @@ __wait_for_common(struct completion *x,
{
might_sleep();

+ complete_acquire(x);
+
spin_lock_irq(&x->wait.lock);
timeout = do_wait_for_common(x, action, timeout, state);
spin_unlock_irq(&x->wait.lock);
+
+ complete_release(x);
+
return timeout;
}

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index c6038f2..ebd40d3 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1162,6 +1162,15 @@ config LOCKDEP_CROSSRELEASE
such as page locks or completions can use the lock correctness
detector, lockdep.

+config LOCKDEP_COMPLETE
+ bool "Lock debugging: allow completions to use deadlock detector"
+ depends on PROVE_LOCKING
+ select LOCKDEP_CROSSRELEASE
+ default n
+ help
+ A deadlock caused by wait_for_completion() and complete() can be
+ detected by lockdep using crossrelease feature.
+
config DEBUG_LOCKDEP
bool "Lock dependency engine debugging"
depends on DEBUG_KERNEL && LOCKDEP

Subject: [tip:locking/core] locking/lockdep: Add 'crossrelease' feature documentation

Commit-ID: ef0758dd0fd70b98b889af26e27f003656952db8
Gitweb: http://git.kernel.org/tip/ef0758dd0fd70b98b889af26e27f003656952db8
Author: Byungchul Park <[email protected]>
AuthorDate: Mon, 7 Aug 2017 16:13:01 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 10 Aug 2017 12:32:37 +0200

locking/lockdep: Add 'crossrelease' feature documentation

This document describes the concept of crossrelease feature.

Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
Documentation/locking/crossrelease.txt | 874 +++++++++++++++++++++++++++++++++
1 file changed, 874 insertions(+)

diff --git a/Documentation/locking/crossrelease.txt b/Documentation/locking/crossrelease.txt
new file mode 100644
index 0000000..bdf1423
--- /dev/null
+++ b/Documentation/locking/crossrelease.txt
@@ -0,0 +1,874 @@
+Crossrelease
+============
+
+Started by Byungchul Park <[email protected]>
+
+Contents:
+
+ (*) Background
+
+ - What causes deadlock
+ - How lockdep works
+
+ (*) Limitation
+
+ - Limit lockdep
+ - Pros from the limitation
+ - Cons from the limitation
+ - Relax the limitation
+
+ (*) Crossrelease
+
+ - Introduce crossrelease
+ - Introduce commit
+
+ (*) Implementation
+
+ - Data structures
+ - How crossrelease works
+
+ (*) Optimizations
+
+ - Avoid duplication
+ - Lockless for hot paths
+
+ (*) APPENDIX A: What lockdep does to work aggresively
+
+ (*) APPENDIX B: How to avoid adding false dependencies
+
+
+==========
+Background
+==========
+
+What causes deadlock
+--------------------
+
+A deadlock occurs when a context is waiting for an event to happen,
+which is impossible because another (or the) context who can trigger the
+event is also waiting for another (or the) event to happen, which is
+also impossible due to the same reason.
+
+For example:
+
+ A context going to trigger event C is waiting for event A to happen.
+ A context going to trigger event A is waiting for event B to happen.
+ A context going to trigger event B is waiting for event C to happen.
+
+A deadlock occurs when these three wait operations run at the same time,
+because event C cannot be triggered if event A does not happen, which in
+turn cannot be triggered if event B does not happen, which in turn
+cannot be triggered if event C does not happen. After all, no event can
+be triggered since any of them never meets its condition to wake up.
+
+A dependency might exist between two waiters and a deadlock might happen
+due to an incorrect releationship between dependencies. Thus, we must
+define what a dependency is first. A dependency exists between them if:
+
+ 1. There are two waiters waiting for each event at a given time.
+ 2. The only way to wake up each waiter is to trigger its event.
+ 3. Whether one can be woken up depends on whether the other can.
+
+Each wait in the example creates its dependency like:
+
+ Event C depends on event A.
+ Event A depends on event B.
+ Event B depends on event C.
+
+ NOTE: Precisely speaking, a dependency is one between whether a
+ waiter for an event can be woken up and whether another waiter for
+ another event can be woken up. However from now on, we will describe
+ a dependency as if it's one between an event and another event for
+ simplicity.
+
+And they form circular dependencies like:
+
+ -> C -> A -> B -
+ / \
+ \ /
+ ----------------
+
+ where 'A -> B' means that event A depends on event B.
+
+Such circular dependencies lead to a deadlock since no waiter can meet
+its condition to wake up as described.
+
+CONCLUSION
+
+Circular dependencies cause a deadlock.
+
+
+How lockdep works
+-----------------
+
+Lockdep tries to detect a deadlock by checking dependencies created by
+lock operations, acquire and release. Waiting for a lock corresponds to
+waiting for an event, and releasing a lock corresponds to triggering an
+event in the previous section.
+
+In short, lockdep does:
+
+ 1. Detect a new dependency.
+ 2. Add the dependency into a global graph.
+ 3. Check if that makes dependencies circular.
+ 4. Report a deadlock or its possibility if so.
+
+For example, consider a graph built by lockdep that looks like:
+
+ A -> B -
+ \
+ -> E
+ /
+ C -> D -
+
+ where A, B,..., E are different lock classes.
+
+Lockdep will add a dependency into the graph on detection of a new
+dependency. For example, it will add a dependency 'E -> C' when a new
+dependency between lock E and lock C is detected. Then the graph will be:
+
+ A -> B -
+ \
+ -> E -
+ / \
+ -> C -> D - \
+ / /
+ \ /
+ ------------------
+
+ where A, B,..., E are different lock classes.
+
+This graph contains a subgraph which demonstrates circular dependencies:
+
+ -> E -
+ / \
+ -> C -> D - \
+ / /
+ \ /
+ ------------------
+
+ where C, D and E are different lock classes.
+
+This is the condition under which a deadlock might occur. Lockdep
+reports it on detection after adding a new dependency. This is the way
+how lockdep works.
+
+CONCLUSION
+
+Lockdep detects a deadlock or its possibility by checking if circular
+dependencies were created after adding each new dependency.
+
+
+==========
+Limitation
+==========
+
+Limit lockdep
+-------------
+
+Limiting lockdep to work on only typical locks e.g. spin locks and
+mutexes, which are released within the acquire context, the
+implementation becomes simple but its capacity for detection becomes
+limited. Let's check pros and cons in next section.
+
+
+Pros from the limitation
+------------------------
+
+Given the limitation, when acquiring a lock, locks in a held_locks
+cannot be released if the context cannot acquire it so has to wait to
+acquire it, which means all waiters for the locks in the held_locks are
+stuck. It's an exact case to create dependencies between each lock in
+the held_locks and the lock to acquire.
+
+For example:
+
+ CONTEXT X
+ ---------
+ acquire A
+ acquire B /* Add a dependency 'A -> B' */
+ release B
+ release A
+
+ where A and B are different lock classes.
+
+When acquiring lock A, the held_locks of CONTEXT X is empty thus no
+dependency is added. But when acquiring lock B, lockdep detects and adds
+a new dependency 'A -> B' between lock A in the held_locks and lock B.
+They can be simply added whenever acquiring each lock.
+
+And data required by lockdep exists in a local structure, held_locks
+embedded in task_struct. Forcing to access the data within the context,
+lockdep can avoid racy problems without explicit locks while handling
+the local data.
+
+Lastly, lockdep only needs to keep locks currently being held, to build
+a dependency graph. However, relaxing the limitation, it needs to keep
+even locks already released, because a decision whether they created
+dependencies might be long-deferred.
+
+To sum up, we can expect several advantages from the limitation:
+
+ 1. Lockdep can easily identify a dependency when acquiring a lock.
+ 2. Races are avoidable while accessing local locks in a held_locks.
+ 3. Lockdep only needs to keep locks currently being held.
+
+CONCLUSION
+
+Given the limitation, the implementation becomes simple and efficient.
+
+
+Cons from the limitation
+------------------------
+
+Given the limitation, lockdep is applicable only to typical locks. For
+example, page locks for page access or completions for synchronization
+cannot work with lockdep.
+
+Can we detect deadlocks below, under the limitation?
+
+Example 1:
+
+ CONTEXT X CONTEXT Y CONTEXT Z
+ --------- --------- ----------
+ mutex_lock A
+ lock_page B
+ lock_page B
+ mutex_lock A /* DEADLOCK */
+ unlock_page B held by X
+ unlock_page B
+ mutex_unlock A
+ mutex_unlock A
+
+ where A and B are different lock classes.
+
+No, we cannot.
+
+Example 2:
+
+ CONTEXT X CONTEXT Y
+ --------- ---------
+ mutex_lock A
+ mutex_lock A
+ wait_for_complete B /* DEADLOCK */
+ complete B
+ mutex_unlock A
+ mutex_unlock A
+
+ where A is a lock class and B is a completion variable.
+
+No, we cannot.
+
+CONCLUSION
+
+Given the limitation, lockdep cannot detect a deadlock or its
+possibility caused by page locks or completions.
+
+
+Relax the limitation
+--------------------
+
+Under the limitation, things to create dependencies are limited to
+typical locks. However, synchronization primitives like page locks and
+completions, which are allowed to be released in any context, also
+create dependencies and can cause a deadlock. So lockdep should track
+these locks to do a better job. We have to relax the limitation for
+these locks to work with lockdep.
+
+Detecting dependencies is very important for lockdep to work because
+adding a dependency means adding an opportunity to check whether it
+causes a deadlock. The more lockdep adds dependencies, the more it
+thoroughly works. Thus Lockdep has to do its best to detect and add as
+many true dependencies into a graph as possible.
+
+For example, considering only typical locks, lockdep builds a graph like:
+
+ A -> B -
+ \
+ -> E
+ /
+ C -> D -
+
+ where A, B,..., E are different lock classes.
+
+On the other hand, under the relaxation, additional dependencies might
+be created and added. Assuming additional 'FX -> C' and 'E -> GX' are
+added thanks to the relaxation, the graph will be:
+
+ A -> B -
+ \
+ -> E -> GX
+ /
+ FX -> C -> D -
+
+ where A, B,..., E, FX and GX are different lock classes, and a suffix
+ 'X' is added on non-typical locks.
+
+The latter graph gives us more chances to check circular dependencies
+than the former. However, it might suffer performance degradation since
+relaxing the limitation, with which design and implementation of lockdep
+can be efficient, might introduce inefficiency inevitably. So lockdep
+should provide two options, strong detection and efficient detection.
+
+Choosing efficient detection:
+
+ Lockdep works with only locks restricted to be released within the
+ acquire context. However, lockdep works efficiently.
+
+Choosing strong detection:
+
+ Lockdep works with all synchronization primitives. However, lockdep
+ suffers performance degradation.
+
+CONCLUSION
+
+Relaxing the limitation, lockdep can add additional dependencies giving
+additional opportunities to check circular dependencies.
+
+
+============
+Crossrelease
+============
+
+Introduce crossrelease
+----------------------
+
+In order to allow lockdep to handle additional dependencies by what
+might be released in any context, namely 'crosslock', we have to be able
+to identify those created by crosslocks. The proposed 'crossrelease'
+feature provoides a way to do that.
+
+Crossrelease feature has to do:
+
+ 1. Identify dependencies created by crosslocks.
+ 2. Add the dependencies into a dependency graph.
+
+That's all. Once a meaningful dependency is added into graph, then
+lockdep would work with the graph as it did. The most important thing
+crossrelease feature has to do is to correctly identify and add true
+dependencies into the global graph.
+
+A dependency e.g. 'A -> B' can be identified only in the A's release
+context because a decision required to identify the dependency can be
+made only in the release context. That is to decide whether A can be
+released so that a waiter for A can be woken up. It cannot be made in
+other than the A's release context.
+
+It's no matter for typical locks because each acquire context is same as
+its release context, thus lockdep can decide whether a lock can be
+released in the acquire context. However for crosslocks, lockdep cannot
+make the decision in the acquire context but has to wait until the
+release context is identified.
+
+Therefore, deadlocks by crosslocks cannot be detected just when it
+happens, because those cannot be identified until the crosslocks are
+released. However, deadlock possibilities can be detected and it's very
+worth. See 'APPENDIX A' section to check why.
+
+CONCLUSION
+
+Using crossrelease feature, lockdep can work with what might be released
+in any context, namely crosslock.
+
+
+Introduce commit
+----------------
+
+Since crossrelease defers the work adding true dependencies of
+crosslocks until they are actually released, crossrelease has to queue
+all acquisitions which might create dependencies with the crosslocks.
+Then it identifies dependencies using the queued data in batches at a
+proper time. We call it 'commit'.
+
+There are four types of dependencies:
+
+1. TT type: 'typical lock A -> typical lock B'
+
+ Just when acquiring B, lockdep can see it's in the A's release
+ context. So the dependency between A and B can be identified
+ immediately. Commit is unnecessary.
+
+2. TC type: 'typical lock A -> crosslock BX'
+
+ Just when acquiring BX, lockdep can see it's in the A's release
+ context. So the dependency between A and BX can be identified
+ immediately. Commit is unnecessary, too.
+
+3. CT type: 'crosslock AX -> typical lock B'
+
+ When acquiring B, lockdep cannot identify the dependency because
+ there's no way to know if it's in the AX's release context. It has
+ to wait until the decision can be made. Commit is necessary.
+
+4. CC type: 'crosslock AX -> crosslock BX'
+
+ When acquiring BX, lockdep cannot identify the dependency because
+ there's no way to know if it's in the AX's release context. It has
+ to wait until the decision can be made. Commit is necessary.
+ But, handling CC type is not implemented yet. It's a future work.
+
+Lockdep can work without commit for typical locks, but commit step is
+necessary once crosslocks are involved. Introducing commit, lockdep
+performs three steps. What lockdep does in each step is:
+
+1. Acquisition: For typical locks, lockdep does what it originally did
+ and queues the lock so that CT type dependencies can be checked using
+ it at the commit step. For crosslocks, it saves data which will be
+ used at the commit step and increases a reference count for it.
+
+2. Commit: No action is reauired for typical locks. For crosslocks,
+ lockdep adds CT type dependencies using the data saved at the
+ acquisition step.
+
+3. Release: No changes are required for typical locks. When a crosslock
+ is released, it decreases a reference count for it.
+
+CONCLUSION
+
+Crossrelease introduces commit step to handle dependencies of crosslocks
+in batches at a proper time.
+
+
+==============
+Implementation
+==============
+
+Data structures
+---------------
+
+Crossrelease introduces two main data structures.
+
+1. hist_lock
+
+ This is an array embedded in task_struct, for keeping lock history so
+ that dependencies can be added using them at the commit step. Since
+ it's local data, it can be accessed locklessly in the owner context.
+ The array is filled at the acquisition step and consumed at the
+ commit step. And it's managed in circular manner.
+
+2. cross_lock
+
+ One per lockdep_map exists. This is for keeping data of crosslocks
+ and used at the commit step.
+
+
+How crossrelease works
+----------------------
+
+It's the key of how crossrelease works, to defer necessary works to an
+appropriate point in time and perform in at once at the commit step.
+Let's take a look with examples step by step, starting from how lockdep
+works without crossrelease for typical locks.
+
+ acquire A /* Push A onto held_locks */
+ acquire B /* Push B onto held_locks and add 'A -> B' */
+ acquire C /* Push C onto held_locks and add 'B -> C' */
+ release C /* Pop C from held_locks */
+ release B /* Pop B from held_locks */
+ release A /* Pop A from held_locks */
+
+ where A, B and C are different lock classes.
+
+ NOTE: This document assumes that readers already understand how
+ lockdep works without crossrelease thus omits details. But there's
+ one thing to note. Lockdep pretends to pop a lock from held_locks
+ when releasing it. But it's subtly different from the original pop
+ operation because lockdep allows other than the top to be poped.
+
+In this case, lockdep adds 'the top of held_locks -> the lock to acquire'
+dependency every time acquiring a lock.
+
+After adding 'A -> B', a dependency graph will be:
+
+ A -> B
+
+ where A and B are different lock classes.
+
+And after adding 'B -> C', the graph will be:
+
+ A -> B -> C
+
+ where A, B and C are different lock classes.
+
+Let's performs commit step even for typical locks to add dependencies.
+Of course, commit step is not necessary for them, however, it would work
+well because this is a more general way.
+
+ acquire A
+ /*
+ * Queue A into hist_locks
+ *
+ * In hist_locks: A
+ * In graph: Empty
+ */
+
+ acquire B
+ /*
+ * Queue B into hist_locks
+ *
+ * In hist_locks: A, B
+ * In graph: Empty
+ */
+
+ acquire C
+ /*
+ * Queue C into hist_locks
+ *
+ * In hist_locks: A, B, C
+ * In graph: Empty
+ */
+
+ commit C
+ /*
+ * Add 'C -> ?'
+ * Answer the following to decide '?'
+ * What has been queued since acquire C: Nothing
+ *
+ * In hist_locks: A, B, C
+ * In graph: Empty
+ */
+
+ release C
+
+ commit B
+ /*
+ * Add 'B -> ?'
+ * Answer the following to decide '?'
+ * What has been queued since acquire B: C
+ *
+ * In hist_locks: A, B, C
+ * In graph: 'B -> C'
+ */
+
+ release B
+
+ commit A
+ /*
+ * Add 'A -> ?'
+ * Answer the following to decide '?'
+ * What has been queued since acquire A: B, C
+ *
+ * In hist_locks: A, B, C
+ * In graph: 'B -> C', 'A -> B', 'A -> C'
+ */
+
+ release A
+
+ where A, B and C are different lock classes.
+
+In this case, dependencies are added at the commit step as described.
+
+After commits for A, B and C, the graph will be:
+
+ A -> B -> C
+
+ where A, B and C are different lock classes.
+
+ NOTE: A dependency 'A -> C' is optimized out.
+
+We can see the former graph built without commit step is same as the
+latter graph built using commit steps. Of course the former way leads to
+earlier finish for building the graph, which means we can detect a
+deadlock or its possibility sooner. So the former way would be prefered
+when possible. But we cannot avoid using the latter way for crosslocks.
+
+Let's look at how commit steps work for crosslocks. In this case, the
+commit step is performed only on crosslock AX as real. And it assumes
+that the AX release context is different from the AX acquire context.
+
+ BX RELEASE CONTEXT BX ACQUIRE CONTEXT
+ ------------------ ------------------
+ acquire A
+ /*
+ * Push A onto held_locks
+ * Queue A into hist_locks
+ *
+ * In held_locks: A
+ * In hist_locks: A
+ * In graph: Empty
+ */
+
+ acquire BX
+ /*
+ * Add 'the top of held_locks -> BX'
+ *
+ * In held_locks: A
+ * In hist_locks: A
+ * In graph: 'A -> BX'
+ */
+
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ It must be guaranteed that the following operations are seen after
+ acquiring BX globally. It can be done by things like barrier.
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+ acquire C
+ /*
+ * Push C onto held_locks
+ * Queue C into hist_locks
+ *
+ * In held_locks: C
+ * In hist_locks: C
+ * In graph: 'A -> BX'
+ */
+
+ release C
+ /*
+ * Pop C from held_locks
+ *
+ * In held_locks: Empty
+ * In hist_locks: C
+ * In graph: 'A -> BX'
+ */
+ acquire D
+ /*
+ * Push D onto held_locks
+ * Queue D into hist_locks
+ * Add 'the top of held_locks -> D'
+ *
+ * In held_locks: A, D
+ * In hist_locks: A, D
+ * In graph: 'A -> BX', 'A -> D'
+ */
+ acquire E
+ /*
+ * Push E onto held_locks
+ * Queue E into hist_locks
+ *
+ * In held_locks: E
+ * In hist_locks: C, E
+ * In graph: 'A -> BX', 'A -> D'
+ */
+
+ release E
+ /*
+ * Pop E from held_locks
+ *
+ * In held_locks: Empty
+ * In hist_locks: D, E
+ * In graph: 'A -> BX', 'A -> D'
+ */
+ release D
+ /*
+ * Pop D from held_locks
+ *
+ * In held_locks: A
+ * In hist_locks: A, D
+ * In graph: 'A -> BX', 'A -> D'
+ */
+ commit BX
+ /*
+ * Add 'BX -> ?'
+ * What has been queued since acquire BX: C, E
+ *
+ * In held_locks: Empty
+ * In hist_locks: D, E
+ * In graph: 'A -> BX', 'A -> D',
+ * 'BX -> C', 'BX -> E'
+ */
+
+ release BX
+ /*
+ * In held_locks: Empty
+ * In hist_locks: D, E
+ * In graph: 'A -> BX', 'A -> D',
+ * 'BX -> C', 'BX -> E'
+ */
+ release A
+ /*
+ * Pop A from held_locks
+ *
+ * In held_locks: Empty
+ * In hist_locks: A, D
+ * In graph: 'A -> BX', 'A -> D',
+ * 'BX -> C', 'BX -> E'
+ */
+
+ where A, BX, C,..., E are different lock classes, and a suffix 'X' is
+ added on crosslocks.
+
+Crossrelease considers all acquisitions after acqiuring BX are
+candidates which might create dependencies with BX. True dependencies
+will be determined when identifying the release context of BX. Meanwhile,
+all typical locks are queued so that they can be used at the commit step.
+And then two dependencies 'BX -> C' and 'BX -> E' are added at the
+commit step when identifying the release context.
+
+The final graph will be, with crossrelease:
+
+ -> C
+ /
+ -> BX -
+ / \
+ A - -> E
+ \
+ -> D
+
+ where A, BX, C,..., E are different lock classes, and a suffix 'X' is
+ added on crosslocks.
+
+However, the final graph will be, without crossrelease:
+
+ A -> D
+
+ where A and D are different lock classes.
+
+The former graph has three more dependencies, 'A -> BX', 'BX -> C' and
+'BX -> E' giving additional opportunities to check if they cause
+deadlocks. This way lockdep can detect a deadlock or its possibility
+caused by crosslocks.
+
+CONCLUSION
+
+We checked how crossrelease works with several examples.
+
+
+=============
+Optimizations
+=============
+
+Avoid duplication
+-----------------
+
+Crossrelease feature uses a cache like what lockdep already uses for
+dependency chains, but this time it's for caching CT type dependencies.
+Once that dependency is cached, the same will never be added again.
+
+
+Lockless for hot paths
+----------------------
+
+To keep all locks for later use at the commit step, crossrelease adopts
+a local array embedded in task_struct, which makes access to the data
+lockless by forcing it to happen only within the owner context. It's
+like how lockdep handles held_locks. Lockless implmentation is important
+since typical locks are very frequently acquired and released.
+
+
+=================================================
+APPENDIX A: What lockdep does to work aggresively
+=================================================
+
+A deadlock actually occurs when all wait operations creating circular
+dependencies run at the same time. Even though they don't, a potential
+deadlock exists if the problematic dependencies exist. Thus it's
+meaningful to detect not only an actual deadlock but also its potential
+possibility. The latter is rather valuable. When a deadlock occurs
+actually, we can identify what happens in the system by some means or
+other even without lockdep. However, there's no way to detect possiblity
+without lockdep unless the whole code is parsed in head. It's terrible.
+Lockdep does the both, and crossrelease only focuses on the latter.
+
+Whether or not a deadlock actually occurs depends on several factors.
+For example, what order contexts are switched in is a factor. Assuming
+circular dependencies exist, a deadlock would occur when contexts are
+switched so that all wait operations creating the dependencies run
+simultaneously. Thus to detect a deadlock possibility even in the case
+that it has not occured yet, lockdep should consider all possible
+combinations of dependencies, trying to:
+
+1. Use a global dependency graph.
+
+ Lockdep combines all dependencies into one global graph and uses them,
+ regardless of which context generates them or what order contexts are
+ switched in. Aggregated dependencies are only considered so they are
+ prone to be circular if a problem exists.
+
+2. Check dependencies between classes instead of instances.
+
+ What actually causes a deadlock are instances of lock. However,
+ lockdep checks dependencies between classes instead of instances.
+ This way lockdep can detect a deadlock which has not happened but
+ might happen in future by others but the same class.
+
+3. Assume all acquisitions lead to waiting.
+
+ Although locks might be acquired without waiting which is essential
+ to create dependencies, lockdep assumes all acquisitions lead to
+ waiting since it might be true some time or another.
+
+CONCLUSION
+
+Lockdep detects not only an actual deadlock but also its possibility,
+and the latter is more valuable.
+
+
+==================================================
+APPENDIX B: How to avoid adding false dependencies
+==================================================
+
+Remind what a dependency is. A dependency exists if:
+
+ 1. There are two waiters waiting for each event at a given time.
+ 2. The only way to wake up each waiter is to trigger its event.
+ 3. Whether one can be woken up depends on whether the other can.
+
+For example:
+
+ acquire A
+ acquire B /* A dependency 'A -> B' exists */
+ release B
+ release A
+
+ where A and B are different lock classes.
+
+A depedency 'A -> B' exists since:
+
+ 1. A waiter for A and a waiter for B might exist when acquiring B.
+ 2. Only way to wake up each is to release what it waits for.
+ 3. Whether the waiter for A can be woken up depends on whether the
+ other can. IOW, TASK X cannot release A if it fails to acquire B.
+
+For another example:
+
+ TASK X TASK Y
+ ------ ------
+ acquire AX
+ acquire B /* A dependency 'AX -> B' exists */
+ release B
+ release AX held by Y
+
+ where AX and B are different lock classes, and a suffix 'X' is added
+ on crosslocks.
+
+Even in this case involving crosslocks, the same rule can be applied. A
+depedency 'AX -> B' exists since:
+
+ 1. A waiter for AX and a waiter for B might exist when acquiring B.
+ 2. Only way to wake up each is to release what it waits for.
+ 3. Whether the waiter for AX can be woken up depends on whether the
+ other can. IOW, TASK X cannot release AX if it fails to acquire B.
+
+Let's take a look at more complicated example:
+
+ TASK X TASK Y
+ ------ ------
+ acquire B
+ release B
+ fork Y
+ acquire AX
+ acquire C /* A dependency 'AX -> C' exists */
+ release C
+ release AX held by Y
+
+ where AX, B and C are different lock classes, and a suffix 'X' is
+ added on crosslocks.
+
+Does a dependency 'AX -> B' exist? Nope.
+
+Two waiters are essential to create a dependency. However, waiters for
+AX and B to create 'AX -> B' cannot exist at the same time in this
+example. Thus the dependency 'AX -> B' cannot be created.
+
+It would be ideal if the full set of true ones can be considered. But
+we can ensure nothing but what actually happened. Relying on what
+actually happens at runtime, we can anyway add only true ones, though
+they might be a subset of true ones. It's similar to how lockdep works
+for typical locks. There might be more true dependencies than what
+lockdep has detected in runtime. Lockdep has no choice but to rely on
+what actually happens. Crossrelease also relies on it.
+
+CONCLUSION
+
+Relying on what actually happens, lockdep can avoid adding false
+dependencies.

2017-08-10 12:51:22

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Thu, Aug 10, 2017 at 09:11:32PM +0900, Byungchul Park wrote:
> > -----Original Message-----
> > From: Boqun Feng [mailto:[email protected]]
> > Sent: Thursday, August 10, 2017 8:59 PM
> > To: Byungchul Park
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring
> > buffer overwrite
> >
> > On Mon, Aug 07, 2017 at 04:12:53PM +0900, Byungchul Park wrote:
> > > The ring buffer can be overwritten by hardirq/softirq/work contexts.
> > > That cases must be considered on rollback or commit. For example,
> > >
> > > |<------ hist_lock ring buffer size ----->|
> > > ppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> > > wrapped > iiiiiiiiiiiiiiiiiiiiiii....................
> > >
> > > where 'p' represents an acquisition in process context,
> > > 'i' represents an acquisition in irq context.
> > >
> > > On irq exit, crossrelease tries to rollback idx to original position,
> > > but it should not because the entry already has been invalid by
> > > overwriting 'i'. Avoid rollback or commit for entries overwritten.
> > >
> > > Signed-off-by: Byungchul Park <[email protected]>
> > > ---
> > > include/linux/lockdep.h | 20 +++++++++++++++++++
> > > include/linux/sched.h | 3 +++
> > > kernel/locking/lockdep.c | 52
> > +++++++++++++++++++++++++++++++++++++++++++-----
> > > 3 files changed, 70 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > > index 0c8a1b8..48c244c 100644
> > > --- a/include/linux/lockdep.h
> > > +++ b/include/linux/lockdep.h
> > > @@ -284,6 +284,26 @@ struct held_lock {
> > > */
> > > struct hist_lock {
> > > /*
> > > + * Id for each entry in the ring buffer. This is used to
> > > + * decide whether the ring buffer was overwritten or not.
> > > + *
> > > + * For example,
> > > + *
> > > + * |<----------- hist_lock ring buffer size ------->|
> > > + * pppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiiiiiiii
> > > + * wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiii.......................
> > > + *
> > > + * where 'p' represents an acquisition in process
> > > + * context, 'i' represents an acquisition in irq
> > > + * context.
> > > + *
> > > + * In this example, the ring buffer was overwritten by
> > > + * acquisitions in irq context, that should be detected on
> > > + * rollback or commit.
> > > + */
> > > + unsigned int hist_id;
> > > +
> > > + /*
> > > * Seperate stack_trace data. This will be used at commit step.
> > > */
> > > struct stack_trace trace;
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index 5becef5..373466b 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -855,6 +855,9 @@ struct task_struct {
> > > unsigned int xhlock_idx;
> > > /* For restoring at history boundaries */
> > > unsigned int xhlock_idx_hist[CONTEXT_NR];
> > > + unsigned int hist_id;
> > > + /* For overwrite check at each context exit */
> > > + unsigned int hist_id_save[CONTEXT_NR];
> > > #endif
> > >
> > > #ifdef CONFIG_UBSAN
> > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > > index afd6e64..5168dac 100644
> > > --- a/kernel/locking/lockdep.c
> > > +++ b/kernel/locking/lockdep.c
> > > @@ -4742,6 +4742,17 @@ void lockdep_rcu_suspicious(const char *file,
> > const int line, const char *s)
> > > static atomic_t cross_gen_id; /* Can be wrapped */
> > >
> > > /*
> > > + * Make an entry of the ring buffer invalid.
> > > + */
> > > +static inline void invalidate_xhlock(struct hist_lock *xhlock)
> > > +{
> > > + /*
> > > + * Normally, xhlock->hlock.instance must be !NULL.
> > > + */
> > > + xhlock->hlock.instance = NULL;
> > > +}
> > > +
> > > +/*
> > > * Lock history stacks; we have 3 nested lock history stacks:
> > > *
> > > * Hard IRQ
> > > @@ -4773,14 +4784,28 @@ void lockdep_rcu_suspicious(const char *file,
> > const int line, const char *s)
> > > */
> > > void crossrelease_hist_start(enum context_t c)
> > > {
> > > - if (current->xhlocks)
> > > - current->xhlock_idx_hist[c] = current->xhlock_idx;
> > > + struct task_struct *cur = current;
> > > +
> > > + if (cur->xhlocks) {
> > > + cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> > > + cur->hist_id_save[c] = cur->hist_id;
> > > + }
> > > }
> > >
> > > void crossrelease_hist_end(enum context_t c)
> > > {
> > > - if (current->xhlocks)
> > > - current->xhlock_idx = current->xhlock_idx_hist[c];
> > > + struct task_struct *cur = current;
> > > +
> > > + if (cur->xhlocks) {
> > > + unsigned int idx = cur->xhlock_idx_hist[c];
> > > + struct hist_lock *h = &xhlock(idx);
> > > +
> > > + cur->xhlock_idx = idx;
> > > +
> > > + /* Check if the ring was overwritten. */
> > > + if (h->hist_id != cur->hist_id_save[c])
> >
> > Could we use:
> >
> > if (h->hist_id != idx)
>
> No, we cannot.
>

Hey, I'm not buying it. task_struct::hist_id and task_struct::xhlock_idx
are increased at the same place(in add_xhlock()), right?

And, yes, xhlock_idx will get decreased when we do ring-buffer
unwinding, but that's OK, because we need to throw away those recently
added items.

And xhlock_idx always points to the most recently added valid item,
right? Any other item's idx must "before()" the most recently added
one's, right? So ::xhlock_idx acts just like a timestamp, doesn't it?

Maybe I'm missing something subtle, but could you show me an example,
that could end up being a problem if we use xhlock_idx as the hist_id?

> hist_id is a kind of timestamp and used to detect overwriting
> data into places of same indexes of the ring buffer. And idx is
> just an index. :) IOW, they mean different things.
>
> >
> > here, and
> >
> > > + invalidate_xhlock(h);
> > > + }
> > > }
> > >
> > > static int cross_lock(struct lockdep_map *lock)
> > > @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock
> > *hlock)
> > > * Check if the xhlock is valid, which would be false if,
> > > *
> > > * 1. Has not used after initializaion yet.
> > > + * 2. Got invalidated.
> > > *
> > > * Remind hist_lock is implemented as a ring buffer.
> > > */
> > > @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
> > >
> > > /* Initialize hist_lock's members */
> > > xhlock->hlock = *hlock;
> > > + xhlock->hist_id = current->hist_id++;

Besides, is this code correct? Does this just make xhlock->hist_id
one-less-than the curr->hist_id, which cause the invalidation every time
you do ring buffer unwinding?

Regards,
Boqun

> >
> > use:
> >
> > xhlock->hist_id = idx;
> >
> > and,
>
> Same.
>
> >
> >
> > >
> > > xhlock->trace.nr_entries = 0;
> > > xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> > > @@ -4995,6 +5022,7 @@ static int commit_xhlock(struct cross_lock *xlock,
> > struct hist_lock *xhlock)
> > > static void commit_xhlocks(struct cross_lock *xlock)
> > > {
> > > unsigned int cur = current->xhlock_idx;
> > > + unsigned int prev_hist_id = xhlock(cur).hist_id;
> >
> > use:
> > unsigned int prev_hist_id = cur;
> >
> > here.
>
> Same.
>
>


Attachments:
(No filename) (7.23 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-10 13:17:26

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Thu, Aug 10, 2017 at 08:51:33PM +0800, Boqun Feng wrote:
[...]
> > > > + /* Check if the ring was overwritten. */
> > > > + if (h->hist_id != cur->hist_id_save[c])
> > >
> > > Could we use:
> > >
> > > if (h->hist_id != idx)
> >
> > No, we cannot.
> >
>
> Hey, I'm not buying it. task_struct::hist_id and task_struct::xhlock_idx
> are increased at the same place(in add_xhlock()), right?
>
> And, yes, xhlock_idx will get decreased when we do ring-buffer
> unwinding, but that's OK, because we need to throw away those recently
> added items.
>
> And xhlock_idx always points to the most recently added valid item,
> right? Any other item's idx must "before()" the most recently added
> one's, right? So ::xhlock_idx acts just like a timestamp, doesn't it?
>
> Maybe I'm missing something subtle, but could you show me an example,
> that could end up being a problem if we use xhlock_idx as the hist_id?
>
> > hist_id is a kind of timestamp and used to detect overwriting
> > data into places of same indexes of the ring buffer. And idx is
> > just an index. :) IOW, they mean different things.
> >
> > >
> > > here, and
> > >
> > > > + invalidate_xhlock(h);
> > > > + }
> > > > }
> > > >
> > > > static int cross_lock(struct lockdep_map *lock)
> > > > @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock
> > > *hlock)
> > > > * Check if the xhlock is valid, which would be false if,
> > > > *
> > > > * 1. Has not used after initializaion yet.
> > > > + * 2. Got invalidated.
> > > > *
> > > > * Remind hist_lock is implemented as a ring buffer.
> > > > */
> > > > @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
> > > >
> > > > /* Initialize hist_lock's members */
> > > > xhlock->hlock = *hlock;
> > > > + xhlock->hist_id = current->hist_id++;
>
> Besides, is this code correct? Does this just make xhlock->hist_id
> one-less-than the curr->hist_id, which cause the invalidation every time
> you do ring buffer unwinding?
>
> Regards,
> Boqun
>

So basically, I'm suggesting do this on top of your patch, there is also
a fix in commit_xhlocks(), which I think you should swap the parameters
in before(...), no matter using task_struct::hist_id or using
task_struct::xhlock_idx as the timestamp.

Hope this could make my point more clear, and if I do miss something,
please point it out, thanks ;-)

Regards,
Boqun
------------>8

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 074872f016f8..886ba79bfc38 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -854,9 +854,6 @@ struct task_struct {
unsigned int xhlock_idx;
/* For restoring at history boundaries */
unsigned int xhlock_idx_hist[XHLOCK_NR];
- unsigned int hist_id;
- /* For overwrite check at each context exit */
- unsigned int hist_id_save[XHLOCK_NR];
#endif

#ifdef CONFIG_UBSAN
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 699fbeab1920..04c6c8d68e18 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -4752,10 +4752,8 @@ void crossrelease_hist_start(enum xhlock_context_t c)
{
struct task_struct *cur = current;

- if (cur->xhlocks) {
+ if (cur->xhlocks)
cur->xhlock_idx_hist[c] = cur->xhlock_idx;
- cur->hist_id_save[c] = cur->hist_id;
- }
}

void crossrelease_hist_end(enum xhlock_context_t c)
@@ -4769,7 +4767,7 @@ void crossrelease_hist_end(enum xhlock_context_t c)
cur->xhlock_idx = idx;

/* Check if the ring was overwritten. */
- if (h->hist_id != cur->hist_id_save[c])
+ if (h->hist_id != idx)
invalidate_xhlock(h);
}
}
@@ -4849,7 +4847,7 @@ static void add_xhlock(struct held_lock *hlock)

/* Initialize hist_lock's members */
xhlock->hlock = *hlock;
- xhlock->hist_id = current->hist_id++;
+ xhlock->hist_id = idx;

xhlock->trace.nr_entries = 0;
xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
@@ -5005,7 +5003,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
static void commit_xhlocks(struct cross_lock *xlock)
{
unsigned int cur = current->xhlock_idx;
- unsigned int prev_hist_id = xhlock(cur).hist_id;
+ unsigned int prev_hist_id = cur + 1;
unsigned int i;

if (!graph_lock())
@@ -5030,7 +5028,7 @@ static void commit_xhlocks(struct cross_lock *xlock)
* hist_id than the following one, which is impossible
* otherwise.
*/
- if (unlikely(before(xhlock->hist_id, prev_hist_id)))
+ if (unlikely(before(prev_hist_id, xhlock->hist_id)))
break;

prev_hist_id = xhlock->hist_id;
@@ -5120,12 +5118,9 @@ void lockdep_init_task(struct task_struct *task)
int i;

task->xhlock_idx = UINT_MAX;
- task->hist_id = 0;

- for (i = 0; i < XHLOCK_NR; i++) {
+ for (i = 0; i < XHLOCK_NR; i++)
task->xhlock_idx_hist[i] = UINT_MAX;
- task->hist_id_save[i] = 0;
- }

task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
GFP_KERNEL);

2017-08-11 00:41:39

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Thu, Aug 10, 2017 at 08:51:33PM +0800, Boqun Feng wrote:
> > > > void crossrelease_hist_end(enum context_t c)
> > > > {
> > > > - if (current->xhlocks)
> > > > - current->xhlock_idx = current->xhlock_idx_hist[c];
> > > > + struct task_struct *cur = current;
> > > > +
> > > > + if (cur->xhlocks) {
> > > > + unsigned int idx = cur->xhlock_idx_hist[c];
> > > > + struct hist_lock *h = &xhlock(idx);
> > > > +
> > > > + cur->xhlock_idx = idx;
> > > > +
> > > > + /* Check if the ring was overwritten. */
> > > > + if (h->hist_id != cur->hist_id_save[c])
> > >
> > > Could we use:
> > >
> > > if (h->hist_id != idx)
> >
> > No, we cannot.
> >
>
> Hey, I'm not buying it. task_struct::hist_id and task_struct::xhlock_idx
> are increased at the same place(in add_xhlock()), right?

Right.

> And, yes, xhlock_idx will get decreased when we do ring-buffer

This is why we should keep both of them.

> unwinding, but that's OK, because we need to throw away those recently
> added items.
>
> And xhlock_idx always points to the most recently added valid item,

No, it's not true in case that the ring buffer was wrapped like:

ppppppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiii
wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii................
^
xhlock_idx points here after unwinding,
and it's not a valid one.

where p represents an acquisition in process context,
i represents an acquisition in irq context.

> right? Any other item's idx must "before()" the most recently added
> one's, right? So ::xhlock_idx acts just like a timestamp, doesn't it?

Both of two answers are _no_.

> Maybe I'm missing something subtle, but could you show me an example,
> that could end up being a problem if we use xhlock_idx as the hist_id?

See the example above. We cannot detect whether it was wrapped or not using
xhlock_idx.

>
> > hist_id is a kind of timestamp and used to detect overwriting
> > data into places of same indexes of the ring buffer. And idx is
> > just an index. :) IOW, they mean different things.
> >
> > >
> > > here, and
> > >
> > > > + invalidate_xhlock(h);
> > > > + }
> > > > }
> > > >
> > > > static int cross_lock(struct lockdep_map *lock)
> > > > @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock
> > > *hlock)
> > > > * Check if the xhlock is valid, which would be false if,
> > > > *
> > > > * 1. Has not used after initializaion yet.
> > > > + * 2. Got invalidated.
> > > > *
> > > > * Remind hist_lock is implemented as a ring buffer.
> > > > */
> > > > @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
> > > >
> > > > /* Initialize hist_lock's members */
> > > > xhlock->hlock = *hlock;
> > > > + xhlock->hist_id = current->hist_id++;
>
> Besides, is this code correct? Does this just make xhlock->hist_id
> one-less-than the curr->hist_id, which cause the invalidation every time
> you do ring buffer unwinding?

Right. "save = hist_id++" should be "save = ++hist_id". Could you fix it?

Thank you,
Byungchul

2017-08-11 00:46:09

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Thu, Aug 10, 2017 at 09:17:37PM +0800, Boqun Feng wrote:
> So basically, I'm suggesting do this on top of your patch, there is also
> a fix in commit_xhlocks(), which I think you should swap the parameters
> in before(...), no matter using task_struct::hist_id or using
> task_struct::xhlock_idx as the timestamp.
>
> Hope this could make my point more clear, and if I do miss something,
> please point it out, thanks ;-)

I think I fully explained why we cannot use xhlock_idx as the timestamp
in another reply. Please let me know if it's not enough. :)

Thank you,
Byungchul

> Regards,
> Boqun
> ------------>8
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 074872f016f8..886ba79bfc38 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -854,9 +854,6 @@ struct task_struct {
> unsigned int xhlock_idx;
> /* For restoring at history boundaries */
> unsigned int xhlock_idx_hist[XHLOCK_NR];
> - unsigned int hist_id;
> - /* For overwrite check at each context exit */
> - unsigned int hist_id_save[XHLOCK_NR];
> #endif
>
> #ifdef CONFIG_UBSAN
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 699fbeab1920..04c6c8d68e18 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4752,10 +4752,8 @@ void crossrelease_hist_start(enum xhlock_context_t c)
> {
> struct task_struct *cur = current;
>
> - if (cur->xhlocks) {
> + if (cur->xhlocks)
> cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> - cur->hist_id_save[c] = cur->hist_id;
> - }
> }
>
> void crossrelease_hist_end(enum xhlock_context_t c)
> @@ -4769,7 +4767,7 @@ void crossrelease_hist_end(enum xhlock_context_t c)
> cur->xhlock_idx = idx;
>
> /* Check if the ring was overwritten. */
> - if (h->hist_id != cur->hist_id_save[c])
> + if (h->hist_id != idx)
> invalidate_xhlock(h);
> }
> }
> @@ -4849,7 +4847,7 @@ static void add_xhlock(struct held_lock *hlock)
>
> /* Initialize hist_lock's members */
> xhlock->hlock = *hlock;
> - xhlock->hist_id = current->hist_id++;
> + xhlock->hist_id = idx;
>
> xhlock->trace.nr_entries = 0;
> xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> @@ -5005,7 +5003,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> static void commit_xhlocks(struct cross_lock *xlock)
> {
> unsigned int cur = current->xhlock_idx;
> - unsigned int prev_hist_id = xhlock(cur).hist_id;
> + unsigned int prev_hist_id = cur + 1;
> unsigned int i;
>
> if (!graph_lock())
> @@ -5030,7 +5028,7 @@ static void commit_xhlocks(struct cross_lock *xlock)
> * hist_id than the following one, which is impossible
> * otherwise.
> */
> - if (unlikely(before(xhlock->hist_id, prev_hist_id)))
> + if (unlikely(before(prev_hist_id, xhlock->hist_id)))
> break;
>
> prev_hist_id = xhlock->hist_id;
> @@ -5120,12 +5118,9 @@ void lockdep_init_task(struct task_struct *task)
> int i;
>
> task->xhlock_idx = UINT_MAX;
> - task->hist_id = 0;
>
> - for (i = 0; i < XHLOCK_NR; i++) {
> + for (i = 0; i < XHLOCK_NR; i++)
> task->xhlock_idx_hist[i] = UINT_MAX;
> - task->hist_id_save[i] = 0;
> - }
>
> task->xhlocks = kzalloc(sizeof(struct hist_lock) * MAX_XHLOCKS_NR,
> GFP_KERNEL);

2017-08-11 01:02:51

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Fri, Aug 11, 2017 at 09:40:21AM +0900, Byungchul Park wrote:
> On Thu, Aug 10, 2017 at 08:51:33PM +0800, Boqun Feng wrote:
> > > > > void crossrelease_hist_end(enum context_t c)
> > > > > {
> > > > > - if (current->xhlocks)
> > > > > - current->xhlock_idx = current->xhlock_idx_hist[c];
> > > > > + struct task_struct *cur = current;
> > > > > +
> > > > > + if (cur->xhlocks) {
> > > > > + unsigned int idx = cur->xhlock_idx_hist[c];
> > > > > + struct hist_lock *h = &xhlock(idx);
> > > > > +
> > > > > + cur->xhlock_idx = idx;
> > > > > +
> > > > > + /* Check if the ring was overwritten. */
> > > > > + if (h->hist_id != cur->hist_id_save[c])
> > > >
> > > > Could we use:
> > > >
> > > > if (h->hist_id != idx)
> > >
> > > No, we cannot.
> > >
> >
> > Hey, I'm not buying it. task_struct::hist_id and task_struct::xhlock_idx
> > are increased at the same place(in add_xhlock()), right?
>
> Right.
>
> > And, yes, xhlock_idx will get decreased when we do ring-buffer
>
> This is why we should keep both of them.
>
> > unwinding, but that's OK, because we need to throw away those recently
> > added items.
> >
> > And xhlock_idx always points to the most recently added valid item,
>
> No, it's not true in case that the ring buffer was wrapped like:
>
> ppppppppppppppppppppppppiiiiiiiiiiiiiiiiiiiiiiii
> wrapped > iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii................
> ^
> xhlock_idx points here after unwinding,
> and it's not a valid one.
>
> where p represents an acquisition in process context,
> i represents an acquisition in irq context.
>

Yeah, but we can detect this with comparison between the
hist_lock::hist_id and the task_struct::xhlock_idx in
commit_xhlocks()(see my patch), no?

Regards,
Boqun

> > right? Any other item's idx must "before()" the most recently added
> > one's, right? So ::xhlock_idx acts just like a timestamp, doesn't it?
>
> Both of two answers are _no_.
>
> > Maybe I'm missing something subtle, but could you show me an example,
> > that could end up being a problem if we use xhlock_idx as the hist_id?
>
> See the example above. We cannot detect whether it was wrapped or not using
> xhlock_idx.
>
> >
> > > hist_id is a kind of timestamp and used to detect overwriting
> > > data into places of same indexes of the ring buffer. And idx is
> > > just an index. :) IOW, they mean different things.
> > >
> > > >
> > > > here, and
> > > >
> > > > > + invalidate_xhlock(h);
> > > > > + }
> > > > > }
> > > > >
> > > > > static int cross_lock(struct lockdep_map *lock)
> > > > > @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock
> > > > *hlock)
> > > > > * Check if the xhlock is valid, which would be false if,
> > > > > *
> > > > > * 1. Has not used after initializaion yet.
> > > > > + * 2. Got invalidated.
> > > > > *
> > > > > * Remind hist_lock is implemented as a ring buffer.
> > > > > */
> > > > > @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
> > > > >
> > > > > /* Initialize hist_lock's members */
> > > > > xhlock->hlock = *hlock;
> > > > > + xhlock->hist_id = current->hist_id++;
> >
> > Besides, is this code correct? Does this just make xhlock->hist_id
> > one-less-than the curr->hist_id, which cause the invalidation every time
> > you do ring buffer unwinding?
>
> Right. "save = hist_id++" should be "save = ++hist_id". Could you fix it?
>
> Thank you,
> Byungchul
>


Attachments:
(No filename) (3.46 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-11 03:44:45

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Thu, Aug 10, 2017 at 09:17:37PM +0800, Boqun Feng wrote:
> > > > > @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock
> > > > *hlock)
> > > > > * Check if the xhlock is valid, which would be false if,
> > > > > *
> > > > > * 1. Has not used after initializaion yet.
> > > > > + * 2. Got invalidated.
> > > > > *
> > > > > * Remind hist_lock is implemented as a ring buffer.
> > > > > */
> > > > > @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
> > > > >
> > > > > /* Initialize hist_lock's members */
> > > > > xhlock->hlock = *hlock;
> > > > > + xhlock->hist_id = current->hist_id++;
> >
> > Besides, is this code correct? Does this just make xhlock->hist_id
> > one-less-than the curr->hist_id, which cause the invalidation every time
> > you do ring buffer unwinding?
> >
> > Regards,
> > Boqun
> >
>
> So basically, I'm suggesting do this on top of your patch, there is also
> a fix in commit_xhlocks(), which I think you should swap the parameters
> in before(...), no matter using task_struct::hist_id or using
> task_struct::xhlock_idx as the timestamp.
>
> Hope this could make my point more clear, and if I do miss something,
> please point it out, thanks ;-)

Sorry for mis-understanding. I like your patch. I think it works.

Additionally.. See below..

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 074872f016f8..886ba79bfc38 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -854,9 +854,6 @@ struct task_struct {
> unsigned int xhlock_idx;
> /* For restoring at history boundaries */
> unsigned int xhlock_idx_hist[XHLOCK_NR];
> - unsigned int hist_id;
> - /* For overwrite check at each context exit */
> - unsigned int hist_id_save[XHLOCK_NR];
> #endif
>
> #ifdef CONFIG_UBSAN
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 699fbeab1920..04c6c8d68e18 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -4752,10 +4752,8 @@ void crossrelease_hist_start(enum xhlock_context_t c)
> {
> struct task_struct *cur = current;
>
> - if (cur->xhlocks) {
> + if (cur->xhlocks)
> cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> - cur->hist_id_save[c] = cur->hist_id;
> - }
> }
>
> void crossrelease_hist_end(enum xhlock_context_t c)
> @@ -4769,7 +4767,7 @@ void crossrelease_hist_end(enum xhlock_context_t c)
> cur->xhlock_idx = idx;
>
> /* Check if the ring was overwritten. */
> - if (h->hist_id != cur->hist_id_save[c])
> + if (h->hist_id != idx)
> invalidate_xhlock(h);
> }
> }
> @@ -4849,7 +4847,7 @@ static void add_xhlock(struct held_lock *hlock)
>
> /* Initialize hist_lock's members */
> xhlock->hlock = *hlock;
> - xhlock->hist_id = current->hist_id++;
> + xhlock->hist_id = idx;
>
> xhlock->trace.nr_entries = 0;
> xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> @@ -5005,7 +5003,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> static void commit_xhlocks(struct cross_lock *xlock)
> {
> unsigned int cur = current->xhlock_idx;
> - unsigned int prev_hist_id = xhlock(cur).hist_id;
> + unsigned int prev_hist_id = cur + 1;

I should have named it another. Could you suggest a better one?

> unsigned int i;
>
> if (!graph_lock())
> @@ -5030,7 +5028,7 @@ static void commit_xhlocks(struct cross_lock *xlock)
> * hist_id than the following one, which is impossible
> * otherwise.

Or we need to modify the comment so that the word 'prev' does not make
readers confused. It was my mistake.

Thanks,
Byungchul

2017-08-11 08:03:24

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Fri, Aug 11, 2017 at 12:43:28PM +0900, Byungchul Park wrote:
> On Thu, Aug 10, 2017 at 09:17:37PM +0800, Boqun Feng wrote:
> > > > > > @@ -4826,6 +4851,7 @@ static inline int depend_after(struct held_lock
> > > > > *hlock)
> > > > > > * Check if the xhlock is valid, which would be false if,
> > > > > > *
> > > > > > * 1. Has not used after initializaion yet.
> > > > > > + * 2. Got invalidated.
> > > > > > *
> > > > > > * Remind hist_lock is implemented as a ring buffer.
> > > > > > */
> > > > > > @@ -4857,6 +4883,7 @@ static void add_xhlock(struct held_lock *hlock)
> > > > > >
> > > > > > /* Initialize hist_lock's members */
> > > > > > xhlock->hlock = *hlock;
> > > > > > + xhlock->hist_id = current->hist_id++;
> > >
> > > Besides, is this code correct? Does this just make xhlock->hist_id
> > > one-less-than the curr->hist_id, which cause the invalidation every time
> > > you do ring buffer unwinding?
> > >
> > > Regards,
> > > Boqun
> > >
> >
> > So basically, I'm suggesting do this on top of your patch, there is also
> > a fix in commit_xhlocks(), which I think you should swap the parameters
> > in before(...), no matter using task_struct::hist_id or using
> > task_struct::xhlock_idx as the timestamp.
> >
> > Hope this could make my point more clear, and if I do miss something,
> > please point it out, thanks ;-)
>
> Sorry for mis-understanding. I like your patch. I think it works.
>

Thanks for taking a look at it ;-)

> Additionally.. See below..
>
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 074872f016f8..886ba79bfc38 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -854,9 +854,6 @@ struct task_struct {
> > unsigned int xhlock_idx;
> > /* For restoring at history boundaries */
> > unsigned int xhlock_idx_hist[XHLOCK_NR];
> > - unsigned int hist_id;
> > - /* For overwrite check at each context exit */
> > - unsigned int hist_id_save[XHLOCK_NR];
> > #endif
> >
> > #ifdef CONFIG_UBSAN
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 699fbeab1920..04c6c8d68e18 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -4752,10 +4752,8 @@ void crossrelease_hist_start(enum xhlock_context_t c)
> > {
> > struct task_struct *cur = current;
> >
> > - if (cur->xhlocks) {
> > + if (cur->xhlocks)
> > cur->xhlock_idx_hist[c] = cur->xhlock_idx;
> > - cur->hist_id_save[c] = cur->hist_id;
> > - }
> > }
> >
> > void crossrelease_hist_end(enum xhlock_context_t c)
> > @@ -4769,7 +4767,7 @@ void crossrelease_hist_end(enum xhlock_context_t c)
> > cur->xhlock_idx = idx;
> >
> > /* Check if the ring was overwritten. */
> > - if (h->hist_id != cur->hist_id_save[c])
> > + if (h->hist_id != idx)
> > invalidate_xhlock(h);
> > }
> > }
> > @@ -4849,7 +4847,7 @@ static void add_xhlock(struct held_lock *hlock)
> >
> > /* Initialize hist_lock's members */
> > xhlock->hlock = *hlock;
> > - xhlock->hist_id = current->hist_id++;
> > + xhlock->hist_id = idx;
> >
> > xhlock->trace.nr_entries = 0;
> > xhlock->trace.max_entries = MAX_XHLOCK_TRACE_ENTRIES;
> > @@ -5005,7 +5003,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> > static void commit_xhlocks(struct cross_lock *xlock)
> > {
> > unsigned int cur = current->xhlock_idx;
> > - unsigned int prev_hist_id = xhlock(cur).hist_id;
> > + unsigned int prev_hist_id = cur + 1;
>
> I should have named it another. Could you suggest a better one?
>

I think "prev" is fine, because I thought the "previous" means the
xhlock item we visit _previously_.

> > unsigned int i;
> >
> > if (!graph_lock())
> > @@ -5030,7 +5028,7 @@ static void commit_xhlocks(struct cross_lock *xlock)
> > * hist_id than the following one, which is impossible
> > * otherwise.
>
> Or we need to modify the comment so that the word 'prev' does not make
> readers confused. It was my mistake.
>

I think the comment needs some help, but before you do it, could you
have another look at what Peter proposed previously? Note you have a
same_context_xhlock() check in the commit_xhlocks(), so the your
previous overwrite case actually could be detected, I think.

However, one thing may not be detected is this case:

ppppppppppppppppppppppppppppppppppwwwwwwww
wrapped > wwwwwww

where p: process and w: worker.

, because p and w are in the same task_irq_context(). I discussed this
with Peter yesterday, and he has a good idea: unconditionally do a reset
on the ring buffer whenever we do a crossrelease_hist_end(XHLOCK_PROC).
Basically it means we empty the lock history whenever we finished a
worker function in a worker thread or we are about to return to
userspace after we finish the syscall. This could further save some
memory and so I think this may be better than my approach.

How does this sound to you?

Regards,
Boqun

> Thanks,
> Byungchul
>


Attachments:
(No filename) (4.84 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-11 08:53:24

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Fri, Aug 11, 2017 at 04:03:29PM +0800, Boqun Feng wrote:
> Thanks for taking a look at it ;-)

I rather appriciate it.

> > > @@ -5005,7 +5003,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> > > static void commit_xhlocks(struct cross_lock *xlock)
> > > {
> > > unsigned int cur = current->xhlock_idx;
> > > - unsigned int prev_hist_id = xhlock(cur).hist_id;
> > > + unsigned int prev_hist_id = cur + 1;
> >
> > I should have named it another. Could you suggest a better one?
> >
>
> I think "prev" is fine, because I thought the "previous" means the
> xhlock item we visit _previously_.
>
> > > unsigned int i;
> > >
> > > if (!graph_lock())
> > > @@ -5030,7 +5028,7 @@ static void commit_xhlocks(struct cross_lock *xlock)
> > > * hist_id than the following one, which is impossible
> > > * otherwise.
> >
> > Or we need to modify the comment so that the word 'prev' does not make
> > readers confused. It was my mistake.
> >
>
> I think the comment needs some help, but before you do it, could you
> have another look at what Peter proposed previously? Note you have a
> same_context_xhlock() check in the commit_xhlocks(), so the your
> previous overwrite case actually could be detected, I think.

What is the previous overwrite case?

ppppppppppwwwwwwwwwwwwiiiiiiiii
iiiiiiiiiiiiiii................

Do you mean this one? I missed the check of same_context_xhlock(). Yes,
peterz's suggestion also seems to work.

> However, one thing may not be detected is this case:
>
> ppppppppppppppppppppppppppppppppppwwwwwwww
> wrapped > wwwwwww

To be honest, I think your suggestion is more natual, with which this
case would be also covered.

>
> where p: process and w: worker.
>
> , because p and w are in the same task_irq_context(). I discussed this
> with Peter yesterday, and he has a good idea: unconditionally do a reset
> on the ring buffer whenever we do a crossrelease_hist_end(XHLOCK_PROC).
> Basically it means we empty the lock history whenever we finished a
> worker function in a worker thread or we are about to return to
> userspace after we finish the syscall. This could further save some
> memory and so I think this may be better than my approach.

Do you mean reset _whenever_ hard irq exit, soft irq exit or work exit?
Why should we give up chances to check dependencies of remaining xhlocks
whenever each exit? Am I understanding correctly?

I am just curious. Does your approach have some problems?

Thanks,
Byungchul

2017-08-11 09:46:05

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Fri, Aug 11, 2017 at 05:52:02PM +0900, Byungchul Park wrote:
> On Fri, Aug 11, 2017 at 04:03:29PM +0800, Boqun Feng wrote:
> > Thanks for taking a look at it ;-)
>
> I rather appriciate it.
>
> > > > @@ -5005,7 +5003,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> > > > static void commit_xhlocks(struct cross_lock *xlock)
> > > > {
> > > > unsigned int cur = current->xhlock_idx;
> > > > - unsigned int prev_hist_id = xhlock(cur).hist_id;
> > > > + unsigned int prev_hist_id = cur + 1;
> > >
> > > I should have named it another. Could you suggest a better one?
> > >
> >
> > I think "prev" is fine, because I thought the "previous" means the
> > xhlock item we visit _previously_.
> >
> > > > unsigned int i;
> > > >
> > > > if (!graph_lock())
> > > > @@ -5030,7 +5028,7 @@ static void commit_xhlocks(struct cross_lock *xlock)
> > > > * hist_id than the following one, which is impossible
> > > > * otherwise.
> > >
> > > Or we need to modify the comment so that the word 'prev' does not make
> > > readers confused. It was my mistake.
> > >
> >
> > I think the comment needs some help, but before you do it, could you
> > have another look at what Peter proposed previously? Note you have a
> > same_context_xhlock() check in the commit_xhlocks(), so the your
> > previous overwrite case actually could be detected, I think.
>
> What is the previous overwrite case?
>
> ppppppppppwwwwwwwwwwwwiiiiiiiii
> iiiiiiiiiiiiiii................
>
> Do you mean this one? I missed the check of same_context_xhlock(). Yes,
> peterz's suggestion also seems to work.
>
> > However, one thing may not be detected is this case:
> >
> > ppppppppppppppppppppppppppppppppppwwwwwwww
> > wrapped > wwwwwww
>
> To be honest, I think your suggestion is more natual, with which this
> case would be also covered.
>
> >
> > where p: process and w: worker.
> >
> > , because p and w are in the same task_irq_context(). I discussed this
> > with Peter yesterday, and he has a good idea: unconditionally do a reset
> > on the ring buffer whenever we do a crossrelease_hist_end(XHLOCK_PROC).

Ah, ok. You meant 'whenever _process_ context exit'.

I need more time to be sure, but anyway for now it seems to work with
giving up some chances for remaining xhlocks.

But, I am not sure if it's still true even in future and the code can be
maintained easily. I think your approach is natural and neat enough for
that purpose. What problem exists with yours?

> > Basically it means we empty the lock history whenever we finished a
> > worker function in a worker thread or we are about to return to
> > userspace after we finish the syscall. This could further save some
> > memory and so I think this may be better than my approach.
>
> Do you mean reset _whenever_ hard irq exit, soft irq exit or work exit?
> Why should we give up chances to check dependencies of remaining xhlocks
> whenever each exit? Am I understanding correctly?
>
> I am just curious. Does your approach have some problems?
>
> Thanks,
> Byungchul

2017-08-11 13:06:42

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Fri, Aug 11, 2017 at 6:44 PM, Byungchul Park <[email protected]> wrote:
> On Fri, Aug 11, 2017 at 05:52:02PM +0900, Byungchul Park wrote:
>> On Fri, Aug 11, 2017 at 04:03:29PM +0800, Boqun Feng wrote:
>> > Thanks for taking a look at it ;-)
>>
>> I rather appriciate it.
>>
>> > > > @@ -5005,7 +5003,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
>> > > > static void commit_xhlocks(struct cross_lock *xlock)
>> > > > {
>> > > > unsigned int cur = current->xhlock_idx;
>> > > > - unsigned int prev_hist_id = xhlock(cur).hist_id;
>> > > > + unsigned int prev_hist_id = cur + 1;
>> > >
>> > > I should have named it another. Could you suggest a better one?
>> > >
>> >
>> > I think "prev" is fine, because I thought the "previous" means the
>> > xhlock item we visit _previously_.
>> >
>> > > > unsigned int i;
>> > > >
>> > > > if (!graph_lock())
>> > > > @@ -5030,7 +5028,7 @@ static void commit_xhlocks(struct cross_lock *xlock)
>> > > > * hist_id than the following one, which is impossible
>> > > > * otherwise.
>> > >
>> > > Or we need to modify the comment so that the word 'prev' does not make
>> > > readers confused. It was my mistake.
>> > >
>> >
>> > I think the comment needs some help, but before you do it, could you
>> > have another look at what Peter proposed previously? Note you have a
>> > same_context_xhlock() check in the commit_xhlocks(), so the your
>> > previous overwrite case actually could be detected, I think.
>>
>> What is the previous overwrite case?
>>
>> ppppppppppwwwwwwwwwwwwiiiiiiiii
>> iiiiiiiiiiiiiii................
>>
>> Do you mean this one? I missed the check of same_context_xhlock(). Yes,
>> peterz's suggestion also seems to work.
>>
>> > However, one thing may not be detected is this case:
>> >
>> > ppppppppppppppppppppppppppppppppppwwwwwwww
>> > wrapped > wwwwwww
>>
>> To be honest, I think your suggestion is more natual, with which this
>> case would be also covered.
>>
>> >
>> > where p: process and w: worker.
>> >
>> > , because p and w are in the same task_irq_context(). I discussed this
>> > with Peter yesterday, and he has a good idea: unconditionally do a reset
>> > on the ring buffer whenever we do a crossrelease_hist_end(XHLOCK_PROC).
>
> Ah, ok. You meant 'whenever _process_ context exit'.
>
> I need more time to be sure, but anyway for now it seems to work with
> giving up some chances for remaining xhlocks.
>
> But, I am not sure if it's still true even in future and the code can be
> maintained easily. I think your approach is natural and neat enough for
> that purpose. What problem exists with yours?

Let me list up the possible approaches:

0. Byungchul's approach
1. Boqun's approach
2. Peterz's approach
3. Reset on process exit

I like Boqun's approach most but, _whatever_. It's ok if it solves the problem.
The last one is not bad when it is used for syscall exit, but we have to give
up valid dependencies unnecessarily in other cases. And I think Peterz's
approach should be modified a bit to make it work neatly, like:

crossrelease_hist_end(...)
{
...
invalidate_xhlock(&xhlock(cur->xhlock_idx_max));

for (c = 0; c < XHLOCK_CXT_NR; c++)
if ((cur->xhlock_idx_max - cur->xhlock_idx_hist[c]) >=
MAX_XHLOCKS_NR)
invalidate_xhlock(&xhlock(cur->xhlock_idx_hist[c]));
...
}

And then Peterz's approach can also work, I think.

---
Thanks,
Byungchul

2017-08-14 07:05:22

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Fri, Aug 11, 2017 at 10:06:37PM +0900, Byungchul Park wrote:
> On Fri, Aug 11, 2017 at 6:44 PM, Byungchul Park <[email protected]> wrote:
> > On Fri, Aug 11, 2017 at 05:52:02PM +0900, Byungchul Park wrote:
> >> On Fri, Aug 11, 2017 at 04:03:29PM +0800, Boqun Feng wrote:
> >> > Thanks for taking a look at it ;-)
> >>
> >> I rather appriciate it.
> >>
> >> > > > @@ -5005,7 +5003,7 @@ static int commit_xhlock(struct cross_lock *xlock, struct hist_lock *xhlock)
> >> > > > static void commit_xhlocks(struct cross_lock *xlock)
> >> > > > {
> >> > > > unsigned int cur = current->xhlock_idx;
> >> > > > - unsigned int prev_hist_id = xhlock(cur).hist_id;
> >> > > > + unsigned int prev_hist_id = cur + 1;
> >> > >
> >> > > I should have named it another. Could you suggest a better one?
> >> > >
> >> >
> >> > I think "prev" is fine, because I thought the "previous" means the
> >> > xhlock item we visit _previously_.
> >> >
> >> > > > unsigned int i;
> >> > > >
> >> > > > if (!graph_lock())
> >> > > > @@ -5030,7 +5028,7 @@ static void commit_xhlocks(struct cross_lock *xlock)
> >> > > > * hist_id than the following one, which is impossible
> >> > > > * otherwise.
> >> > >
> >> > > Or we need to modify the comment so that the word 'prev' does not make
> >> > > readers confused. It was my mistake.
> >> > >
> >> >
> >> > I think the comment needs some help, but before you do it, could you
> >> > have another look at what Peter proposed previously? Note you have a
> >> > same_context_xhlock() check in the commit_xhlocks(), so the your
> >> > previous overwrite case actually could be detected, I think.
> >>
> >> What is the previous overwrite case?
> >>
> >> ppppppppppwwwwwwwwwwwwiiiiiiiii
> >> iiiiiiiiiiiiiii................
> >>
> >> Do you mean this one? I missed the check of same_context_xhlock(). Yes,
> >> peterz's suggestion also seems to work.
> >>
> >> > However, one thing may not be detected is this case:
> >> >
> >> > ppppppppppppppppppppppppppppppppppwwwwwwww
> >> > wrapped > wwwwwww
> >>
> >> To be honest, I think your suggestion is more natual, with which this
> >> case would be also covered.
> >>
> >> >
> >> > where p: process and w: worker.
> >> >
> >> > , because p and w are in the same task_irq_context(). I discussed this
> >> > with Peter yesterday, and he has a good idea: unconditionally do a reset
> >> > on the ring buffer whenever we do a crossrelease_hist_end(XHLOCK_PROC).
> >
> > Ah, ok. You meant 'whenever _process_ context exit'.
> >
> > I need more time to be sure, but anyway for now it seems to work with
> > giving up some chances for remaining xhlocks.
> >
> > But, I am not sure if it's still true even in future and the code can be
> > maintained easily. I think your approach is natural and neat enough for
> > that purpose. What problem exists with yours?
>

My approach works but it has bigger memmory footprint than Peter's, so I
asked about whether you could consider Peter's approach.

> Let me list up the possible approaches:
>
> 0. Byungchul's approach

Your approach requires(additionally):

MAX_XHLOCKS_NR * sizeof(unsigned int) // because of the hist_id field in hist_lock
+
(XHLOCK_CXT_NR + 1) * sizeof(unsigned int) // because of fields in task_struct

bytes per task.

> 1. Boqun's approach

My approach requires(additionally):

MAX_XHLOCKS_NR * sizeof(unsigned int) // because of the hist_id field in hist_lock

bytes per task.

> 2. Peterz's approach

And Peter's approach requires(additionally):

1 * sizeof(unsigned int)

bytes per task.

So basically we need some tradeoff between memory footprints and history
precision here.

> 3. Reset on process exit
>
> I like Boqun's approach most but, _whatever_. It's ok if it solves the problem.
> The last one is not bad when it is used for syscall exit, but we have to give
> up valid dependencies unnecessarily in other cases. And I think Peterz's
> approach should be modified a bit to make it work neatly, like:
>
> crossrelease_hist_end(...)
> {
> ...
> invalidate_xhlock(&xhlock(cur->xhlock_idx_max));
>
> for (c = 0; c < XHLOCK_CXT_NR; c++)
> if ((cur->xhlock_idx_max - cur->xhlock_idx_hist[c]) >=
> MAX_XHLOCKS_NR)
> invalidate_xhlock(&xhlock(cur->xhlock_idx_hist[c]));
> ...
> }
>

Haven't looked into this deeply, but my gut feeling is this is
unnecessary, will have a deep look.

Regards,
Boqun

> And then Peterz's approach can also work, I think.
>
> ---
> Thanks,
> Byungchul


Attachments:
(No filename) (4.48 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-14 07:23:33

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Mon, Aug 14, 2017 at 03:05:22PM +0800, Boqun Feng wrote:
> > I like Boqun's approach most but, _whatever_. It's ok if it solves the problem.
> > The last one is not bad when it is used for syscall exit, but we have to give
> > up valid dependencies unnecessarily in other cases. And I think Peterz's
> > approach should be modified a bit to make it work neatly, like:
> >
> > crossrelease_hist_end(...)
> > {
> > ...
> > invalidate_xhlock(&xhlock(cur->xhlock_idx_max));
> >
> > for (c = 0; c < XHLOCK_CXT_NR; c++)
> > if ((cur->xhlock_idx_max - cur->xhlock_idx_hist[c]) >=
> > MAX_XHLOCKS_NR)
> > invalidate_xhlock(&xhlock(cur->xhlock_idx_hist[c]));
> > ...
> > }
> >
>
> Haven't looked into this deeply, but my gut feeling is this is
> unnecessary, will have a deep look.

Of course, for now, it looks like we can rely on the check_same_context()
on the commit, without invalidating it. But I think the approach might be
dangerous in future. I think it would be better to do it explicitlly.

>
> Regards,
> Boqun
>
> > And then Peterz's approach can also work, I think.
> >
> > ---
> > Thanks,
> > Byungchul


2017-08-14 07:30:21

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 06/14] lockdep: Detect and handle hist_lock ring buffer overwrite

On Mon, Aug 14, 2017 at 03:05:22PM +0800, Boqun Feng wrote:
> > 1. Boqun's approach
>
> My approach requires(additionally):
>
> MAX_XHLOCKS_NR * sizeof(unsigned int) // because of the hist_id field in hist_lock
>
> bytes per task.
>
> > 2. Peterz's approach
>
> And Peter's approach requires(additionally):
>
> 1 * sizeof(unsigned int)
>
> bytes per task.
>
> So basically we need some tradeoff between memory footprints and history
> precision here.

I see what you intended. Then, Peterz's one looks better.

2017-08-14 08:50:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park <[email protected]> wrote:
> Although wait_for_completion() and its family can cause deadlock, the
> lock correctness validator could not be applied to them until now,
> because things like complete() are usually called in a different context
> from the waiting context, which violates lockdep's assumption.
>
> Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> detector to those completion operations. Applied it.
>
> Signed-off-by: Byungchul Park <[email protected]>

This patch introduced a significant growth in kernel stack usage for a small
set of functions. I see two new warnings for functions that get tipped over the
1024 or 2048 byte frame size limit in linux-next (with a few other patches
applied):

Before:

drivers/md/dm-integrity.c: In function 'write_journal':
drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
larger than xxx bytes [-Werror=frame-larger-than=]
drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
is larger than 104 bytes [-Werror=frame-larger-than=]

After:

drivers/md/dm-integrity.c: In function 'write_journal':
drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
is larger than 1024 bytes [-Werror=frame-larger-than=]
drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

I have not checked in detail why this happens, but I'm guessing that
there is an overall increase in stack usage with
CONFIG_LOCKDEP_COMPLETE in functions using completions,
and I think it would be good to try to come up with a version that doesn't
add as much.

Arnd

2017-08-14 10:57:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature


* Byungchul Park <[email protected]> wrote:

> On Thu, Aug 10, 2017 at 01:10:19PM +0200, Ingo Molnar wrote:
> >
> > * Byungchul Park <[email protected]> wrote:
> >
> > > Change from v7
> > > - rebase on latest tip/sched/core (Jul 26 2017)
> > > - apply peterz's suggestions
> > > - simplify code of crossrelease_{hist/soft/hard}_{start/end}
> > > - exclude a patch avoiding redundant links
> > > - exclude a patch already applied onto the base
> >
> > Ok, it's looking pretty good here now, there's one thing I'd like you to change,
> > please remove all the new Kconfig dependencies:
> >
> > CONFIG_LOCKDEP_CROSSRELEASE=y
> > CONFIG_LOCKDEP_COMPLETE=y
> >
> > and make it all part of PROVE_LOCKING, like most of the other lock debugging bits.
>
> OK. I will remove them. What about CONFIG_LOCKDEP_PAGELOCK? Should I also
> remove it?

So I'd only remove the forced _configurability_ - we can still keep those
variables just fine. They modularize the code and they might be useful later on if
for some reason there's some really bad performance aspect that would make one of
these lockdep components to be configured out by default.

Just make the user interface sane - i.e. only one switch needed to enable full
lockdep. Internal modularization is fine, as long as it's not ugly and the user is
not burdened with it.

Thanks,

Ingo

2017-08-14 11:12:06

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Mon, Aug 14, 2017 at 12:57:48PM +0200, Ingo Molnar wrote:
>
> * Byungchul Park <[email protected]> wrote:
>
> > On Thu, Aug 10, 2017 at 01:10:19PM +0200, Ingo Molnar wrote:
> > >
> > > * Byungchul Park <[email protected]> wrote:
> > >
> > > > Change from v7
> > > > - rebase on latest tip/sched/core (Jul 26 2017)
> > > > - apply peterz's suggestions
> > > > - simplify code of crossrelease_{hist/soft/hard}_{start/end}
> > > > - exclude a patch avoiding redundant links
> > > > - exclude a patch already applied onto the base
> > >
> > > Ok, it's looking pretty good here now, there's one thing I'd like you to change,
> > > please remove all the new Kconfig dependencies:
> > >
> > > CONFIG_LOCKDEP_CROSSRELEASE=y
> > > CONFIG_LOCKDEP_COMPLETE=y
> > >
> > > and make it all part of PROVE_LOCKING, like most of the other lock debugging bits.
> >
> > OK. I will remove them. What about CONFIG_LOCKDEP_PAGELOCK? Should I also
> > remove it?
>
> So I'd only remove the forced _configurability_ - we can still keep those
> variables just fine. They modularize the code and they might be useful later on if
> for some reason there's some really bad performance aspect that would make one of
> these lockdep components to be configured out by default.
>
> Just make the user interface sane - i.e. only one switch needed to enable full
> lockdep. Internal modularization is fine, as long as it's not ugly and the user is
> not burdened with it.

Agree.

Thank you,
Byungchul

2017-08-15 08:20:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature


So with the latest fixes there's a new lockdep warning on one of my testboxes:

[ 11.322487] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)

[ 11.495661] ======================================================
[ 11.502093] WARNING: possible circular locking dependency detected
[ 11.508507] 4.13.0-rc5-00497-g73135c58-dirty #1 Not tainted
[ 11.514313] ------------------------------------------------------
[ 11.520725] umount/533 is trying to acquire lock:
[ 11.525657] ((complete)&barr->done){+.+.}, at: [<ffffffff810fdbb3>] flush_work+0x213/0x2f0
[ 11.534411]
but task is already holding lock:
[ 11.540661] (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
[ 11.549613]
which lock already depends on the new lock.

The full splat is below. The kernel config is nothing fancy - distro derived,
pretty close to defconfig, with lockdep enabled.

Thanks,

Ingo

[ 11.322487] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)

[ 11.495661] ======================================================
[ 11.502093] WARNING: possible circular locking dependency detected
[ 11.508507] 4.13.0-rc5-00497-g73135c58-dirty #1 Not tainted
[ 11.514313] ------------------------------------------------------
[ 11.520725] umount/533 is trying to acquire lock:
[ 11.525657] ((complete)&barr->done){+.+.}, at: [<ffffffff810fdbb3>] flush_work+0x213/0x2f0
[ 11.534411]
but task is already holding lock:
[ 11.540661] (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
[ 11.549613]
which lock already depends on the new lock.

[ 11.558349]
the existing dependency chain (in reverse order) is:
[ 11.566229]
-> #3 (lock#3){+.+.}:
[ 11.571439] lock_acquire+0xe7/0x1d0
[ 11.575765] __mutex_lock+0x75/0x8e0
[ 11.580086] lru_add_drain_all_cpuslocked+0x3d/0x190
[ 11.585797] lru_add_drain_all+0xf/0x20
[ 11.590402] invalidate_bdev+0x3e/0x60
[ 11.594901] ext4_put_super+0x1f9/0x3d0
[ 11.599485] generic_shutdown_super+0x64/0x110
[ 11.604685] kill_block_super+0x21/0x50
[ 11.609270] deactivate_locked_super+0x39/0x70
[ 11.614462] cleanup_mnt+0x3b/0x70
[ 11.618612] task_work_run+0x72/0x90
[ 11.622955] exit_to_usermode_loop+0x93/0xa0
[ 11.627971] do_syscall_64+0x1a2/0x1c0
[ 11.632470] return_from_SYSCALL_64+0x0/0x7a
[ 11.637487]
-> #2 (cpu_hotplug_lock.rw_sem){++++}:
[ 11.644144] lock_acquire+0xe7/0x1d0
[ 11.648487] cpus_read_lock+0x2b/0x60
[ 11.652897] apply_workqueue_attrs+0x12/0x50
[ 11.657917] __alloc_workqueue_key+0x2f2/0x510
[ 11.663110] scsi_host_alloc+0x353/0x470
[ 11.667780] _scsih_probe+0x5bb/0x7b0
[ 11.672192] local_pci_probe+0x3f/0x90
[ 11.676714] work_for_cpu_fn+0x10/0x20
[ 11.681213] process_one_work+0x1fc/0x670
[ 11.685971] worker_thread+0x219/0x3e0
[ 11.690469] kthread+0x13a/0x170
[ 11.694465] ret_from_fork+0x27/0x40
[ 11.698790]
-> #1 ((&wfc.work)){+.+.}:
[ 11.704433] worker_thread+0x219/0x3e0
[ 11.708930] kthread+0x13a/0x170
[ 11.712908] ret_from_fork+0x27/0x40
[ 11.717234] 0xffffffffffffffff
[ 11.721142]
-> #0 ((complete)&barr->done){+.+.}:
[ 11.727633] __lock_acquire+0x1433/0x14a0
[ 11.732392] lock_acquire+0xe7/0x1d0
[ 11.736715] wait_for_completion+0x4e/0x170
[ 11.741664] flush_work+0x213/0x2f0
[ 11.745919] lru_add_drain_all_cpuslocked+0x149/0x190
[ 11.751718] lru_add_drain_all+0xf/0x20
[ 11.756303] invalidate_bdev+0x3e/0x60
[ 11.760819] ext4_put_super+0x1f9/0x3d0
[ 11.765403] generic_shutdown_super+0x64/0x110
[ 11.770596] kill_block_super+0x21/0x50
[ 11.775181] deactivate_locked_super+0x39/0x70
[ 11.780372] cleanup_mnt+0x3b/0x70
[ 11.784522] task_work_run+0x72/0x90
[ 11.788848] exit_to_usermode_loop+0x93/0xa0
[ 11.793875] do_syscall_64+0x1a2/0x1c0
[ 11.798399] return_from_SYSCALL_64+0x0/0x7a
[ 11.803416]
other info that might help us debug this:

[ 11.811997] Chain exists of:
(complete)&barr->done --> cpu_hotplug_lock.rw_sem --> lock#3

[ 11.823810] Possible unsafe locking scenario:

[ 11.830120] CPU0 CPU1
[ 11.834878] ---- ----
[ 11.839636] lock(lock#3);
[ 11.842653] lock(cpu_hotplug_lock.rw_sem);
[ 11.849697] lock(lock#3);
[ 11.855236] lock((complete)&barr->done);
[ 11.859560]
*** DEADLOCK ***

[ 11.866054] 3 locks held by umount/533:
[ 11.870117] #0: (&type->s_umount_key#24){+.+.}, at: [<ffffffff8129b7ad>] deactivate_super+0x4d/0x60
[ 11.879737] #1: (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff812268ea>] lru_add_drain_all+0xa/0x20
[ 11.889445] #2: (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
[ 11.898805]
stack backtrace:
[ 11.903573] CPU: 12 PID: 533 Comm: umount Not tainted 4.13.0-rc5-00497-g73135c58-dirty #1
[ 11.912169] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 2.0b 03/01/2012
[ 11.920759] Call Trace:
[ 11.923433] dump_stack+0x5e/0x8e
[ 11.926975] print_circular_bug+0x204/0x310
[ 11.931385] ? add_lock_to_list.isra.29+0xb0/0xb0
[ 11.936316] check_prev_add+0x444/0x860
[ 11.940382] ? generic_shutdown_super+0x64/0x110
[ 11.945237] ? add_lock_to_list.isra.29+0xb0/0xb0
[ 11.950168] ? __lock_acquire+0x1433/0x14a0
[ 11.954578] __lock_acquire+0x1433/0x14a0
[ 11.958818] lock_acquire+0xe7/0x1d0
[ 11.962621] ? flush_work+0x213/0x2f0
[ 11.966506] wait_for_completion+0x4e/0x170
[ 11.970915] ? flush_work+0x213/0x2f0
[ 11.974807] ? flush_work+0x1e6/0x2f0
[ 11.978699] flush_work+0x213/0x2f0
[ 11.982416] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[ 11.987610] ? mark_held_locks+0x66/0x90
[ 11.991778] ? queue_work_on+0x41/0x70
[ 11.995755] lru_add_drain_all_cpuslocked+0x149/0x190
[ 12.001034] lru_add_drain_all+0xf/0x20
[ 12.005124] invalidate_bdev+0x3e/0x60
[ 12.009094] ext4_put_super+0x1f9/0x3d0
[ 12.013159] generic_shutdown_super+0x64/0x110
[ 12.017856] kill_block_super+0x21/0x50
[ 12.021922] deactivate_locked_super+0x39/0x70
[ 12.026591] cleanup_mnt+0x3b/0x70
[ 12.030242] task_work_run+0x72/0x90
[ 12.034063] exit_to_usermode_loop+0x93/0xa0
[ 12.038561] do_syscall_64+0x1a2/0x1c0
[ 12.042541] entry_SYSCALL64_slow_path+0x25/0x25
[ 12.047384] RIP: 0033:0x7fc3f2854a37
[ 12.051189] RSP: 002b:00007fff660582b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 12.059162] RAX: 0000000000000000 RBX: 00000074471c14e0 RCX: 00007fc3f2854a37
[ 12.066530] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000074471c22e0
[ 12.073895] RBP: 00000074471c22e0 R08: 0000000000000000 R09: 0000000000000002
[ 12.081264] R10: 00007fff66058050 R11: 0000000000000246 R12: 00007fc3f35e6890
[ 12.088656] R13: 0000000000000000 R14: 00000074471c1660 R15: 0000000000000000
[ 12.110307] dracut: Checking ext4: /dev/sda2

2017-08-16 00:18:01

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Tue, Aug 15, 2017 at 10:20:20AM +0200, Ingo Molnar wrote:
>
> So with the latest fixes there's a new lockdep warning on one of my testboxes:
>
> [ 11.322487] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
>
> [ 11.495661] ======================================================
> [ 11.502093] WARNING: possible circular locking dependency detected
> [ 11.508507] 4.13.0-rc5-00497-g73135c58-dirty #1 Not tainted
> [ 11.514313] ------------------------------------------------------
> [ 11.520725] umount/533 is trying to acquire lock:
> [ 11.525657] ((complete)&barr->done){+.+.}, at: [<ffffffff810fdbb3>] flush_work+0x213/0x2f0
> [ 11.534411]
> but task is already holding lock:
> [ 11.540661] (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
> [ 11.549613]
> which lock already depends on the new lock.
>
> The full splat is below. The kernel config is nothing fancy - distro derived,
> pretty close to defconfig, with lockdep enabled.

I see...

Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
Task C : acquired of lock#3 -> wait for completion of barr->done
Worker D : wait for wfc.work to be released -> will complete barr->done

The report below is telling that a deadlock would happen if the four tasks
run simultaniously. Here, I wonder if wfc.work sould be acquired with a
write version. I am not familiar with workqueue. Could anyone explain it
for me?

Thank you,
Byungchul

> Thanks,
>
> Ingo
>
> [ 11.322487] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
>
> [ 11.495661] ======================================================
> [ 11.502093] WARNING: possible circular locking dependency detected
> [ 11.508507] 4.13.0-rc5-00497-g73135c58-dirty #1 Not tainted
> [ 11.514313] ------------------------------------------------------
> [ 11.520725] umount/533 is trying to acquire lock:
> [ 11.525657] ((complete)&barr->done){+.+.}, at: [<ffffffff810fdbb3>] flush_work+0x213/0x2f0
> [ 11.534411]
> but task is already holding lock:
> [ 11.540661] (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
> [ 11.549613]
> which lock already depends on the new lock.
>
> [ 11.558349]
> the existing dependency chain (in reverse order) is:
> [ 11.566229]
> -> #3 (lock#3){+.+.}:
> [ 11.571439] lock_acquire+0xe7/0x1d0
> [ 11.575765] __mutex_lock+0x75/0x8e0
> [ 11.580086] lru_add_drain_all_cpuslocked+0x3d/0x190
> [ 11.585797] lru_add_drain_all+0xf/0x20
> [ 11.590402] invalidate_bdev+0x3e/0x60
> [ 11.594901] ext4_put_super+0x1f9/0x3d0
> [ 11.599485] generic_shutdown_super+0x64/0x110
> [ 11.604685] kill_block_super+0x21/0x50
> [ 11.609270] deactivate_locked_super+0x39/0x70
> [ 11.614462] cleanup_mnt+0x3b/0x70
> [ 11.618612] task_work_run+0x72/0x90
> [ 11.622955] exit_to_usermode_loop+0x93/0xa0
> [ 11.627971] do_syscall_64+0x1a2/0x1c0
> [ 11.632470] return_from_SYSCALL_64+0x0/0x7a
> [ 11.637487]
> -> #2 (cpu_hotplug_lock.rw_sem){++++}:
> [ 11.644144] lock_acquire+0xe7/0x1d0
> [ 11.648487] cpus_read_lock+0x2b/0x60
> [ 11.652897] apply_workqueue_attrs+0x12/0x50
> [ 11.657917] __alloc_workqueue_key+0x2f2/0x510
> [ 11.663110] scsi_host_alloc+0x353/0x470
> [ 11.667780] _scsih_probe+0x5bb/0x7b0
> [ 11.672192] local_pci_probe+0x3f/0x90
> [ 11.676714] work_for_cpu_fn+0x10/0x20
> [ 11.681213] process_one_work+0x1fc/0x670
> [ 11.685971] worker_thread+0x219/0x3e0
> [ 11.690469] kthread+0x13a/0x170
> [ 11.694465] ret_from_fork+0x27/0x40
> [ 11.698790]
> -> #1 ((&wfc.work)){+.+.}:
> [ 11.704433] worker_thread+0x219/0x3e0
> [ 11.708930] kthread+0x13a/0x170
> [ 11.712908] ret_from_fork+0x27/0x40
> [ 11.717234] 0xffffffffffffffff
> [ 11.721142]
> -> #0 ((complete)&barr->done){+.+.}:
> [ 11.727633] __lock_acquire+0x1433/0x14a0
> [ 11.732392] lock_acquire+0xe7/0x1d0
> [ 11.736715] wait_for_completion+0x4e/0x170
> [ 11.741664] flush_work+0x213/0x2f0
> [ 11.745919] lru_add_drain_all_cpuslocked+0x149/0x190
> [ 11.751718] lru_add_drain_all+0xf/0x20
> [ 11.756303] invalidate_bdev+0x3e/0x60
> [ 11.760819] ext4_put_super+0x1f9/0x3d0
> [ 11.765403] generic_shutdown_super+0x64/0x110
> [ 11.770596] kill_block_super+0x21/0x50
> [ 11.775181] deactivate_locked_super+0x39/0x70
> [ 11.780372] cleanup_mnt+0x3b/0x70
> [ 11.784522] task_work_run+0x72/0x90
> [ 11.788848] exit_to_usermode_loop+0x93/0xa0
> [ 11.793875] do_syscall_64+0x1a2/0x1c0
> [ 11.798399] return_from_SYSCALL_64+0x0/0x7a
> [ 11.803416]
> other info that might help us debug this:
>
> [ 11.811997] Chain exists of:
> (complete)&barr->done --> cpu_hotplug_lock.rw_sem --> lock#3
>
> [ 11.823810] Possible unsafe locking scenario:
>
> [ 11.830120] CPU0 CPU1
> [ 11.834878] ---- ----
> [ 11.839636] lock(lock#3);
> [ 11.842653] lock(cpu_hotplug_lock.rw_sem);
> [ 11.849697] lock(lock#3);
> [ 11.855236] lock((complete)&barr->done);
> [ 11.859560]
> *** DEADLOCK ***
>
> [ 11.866054] 3 locks held by umount/533:
> [ 11.870117] #0: (&type->s_umount_key#24){+.+.}, at: [<ffffffff8129b7ad>] deactivate_super+0x4d/0x60
> [ 11.879737] #1: (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff812268ea>] lru_add_drain_all+0xa/0x20
> [ 11.889445] #2: (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
> [ 11.898805]
> stack backtrace:
> [ 11.903573] CPU: 12 PID: 533 Comm: umount Not tainted 4.13.0-rc5-00497-g73135c58-dirty #1
> [ 11.912169] Hardware name: Supermicro H8DG6/H8DGi/H8DG6/H8DGi, BIOS 2.0b 03/01/2012
> [ 11.920759] Call Trace:
> [ 11.923433] dump_stack+0x5e/0x8e
> [ 11.926975] print_circular_bug+0x204/0x310
> [ 11.931385] ? add_lock_to_list.isra.29+0xb0/0xb0
> [ 11.936316] check_prev_add+0x444/0x860
> [ 11.940382] ? generic_shutdown_super+0x64/0x110
> [ 11.945237] ? add_lock_to_list.isra.29+0xb0/0xb0
> [ 11.950168] ? __lock_acquire+0x1433/0x14a0
> [ 11.954578] __lock_acquire+0x1433/0x14a0
> [ 11.958818] lock_acquire+0xe7/0x1d0
> [ 11.962621] ? flush_work+0x213/0x2f0
> [ 11.966506] wait_for_completion+0x4e/0x170
> [ 11.970915] ? flush_work+0x213/0x2f0
> [ 11.974807] ? flush_work+0x1e6/0x2f0
> [ 11.978699] flush_work+0x213/0x2f0
> [ 11.982416] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
> [ 11.987610] ? mark_held_locks+0x66/0x90
> [ 11.991778] ? queue_work_on+0x41/0x70
> [ 11.995755] lru_add_drain_all_cpuslocked+0x149/0x190
> [ 12.001034] lru_add_drain_all+0xf/0x20
> [ 12.005124] invalidate_bdev+0x3e/0x60
> [ 12.009094] ext4_put_super+0x1f9/0x3d0
> [ 12.013159] generic_shutdown_super+0x64/0x110
> [ 12.017856] kill_block_super+0x21/0x50
> [ 12.021922] deactivate_locked_super+0x39/0x70
> [ 12.026591] cleanup_mnt+0x3b/0x70
> [ 12.030242] task_work_run+0x72/0x90
> [ 12.034063] exit_to_usermode_loop+0x93/0xa0
> [ 12.038561] do_syscall_64+0x1a2/0x1c0
> [ 12.042541] entry_SYSCALL64_slow_path+0x25/0x25
> [ 12.047384] RIP: 0033:0x7fc3f2854a37
> [ 12.051189] RSP: 002b:00007fff660582b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> [ 12.059162] RAX: 0000000000000000 RBX: 00000074471c14e0 RCX: 00007fc3f2854a37
> [ 12.066530] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000074471c22e0
> [ 12.073895] RBP: 00000074471c22e0 R08: 0000000000000000 R09: 0000000000000002
> [ 12.081264] R10: 00007fff66058050 R11: 0000000000000246 R12: 00007fc3f35e6890
> [ 12.088656] R13: 0000000000000000 R14: 00000074471c1660 R15: 0000000000000000
> [ 12.110307] dracut: Checking ext4: /dev/sda2

2017-08-16 04:05:18

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 09:16:37AM +0900, Byungchul Park wrote:
> On Tue, Aug 15, 2017 at 10:20:20AM +0200, Ingo Molnar wrote:
> >
> > So with the latest fixes there's a new lockdep warning on one of my testboxes:
> >
> > [ 11.322487] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
> >
> > [ 11.495661] ======================================================
> > [ 11.502093] WARNING: possible circular locking dependency detected
> > [ 11.508507] 4.13.0-rc5-00497-g73135c58-dirty #1 Not tainted
> > [ 11.514313] ------------------------------------------------------
> > [ 11.520725] umount/533 is trying to acquire lock:
> > [ 11.525657] ((complete)&barr->done){+.+.}, at: [<ffffffff810fdbb3>] flush_work+0x213/0x2f0
> > [ 11.534411]
> > but task is already holding lock:
> > [ 11.540661] (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
> > [ 11.549613]
> > which lock already depends on the new lock.
> >
> > The full splat is below. The kernel config is nothing fancy - distro derived,
> > pretty close to defconfig, with lockdep enabled.
>
> I see...
>
> Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> Task C : acquired of lock#3 -> wait for completion of barr->done

>From the stack trace below, this barr->done is for flush_work() in
lru_add_drain_all_cpuslocked(), i.e. for work "per_cpu(lru_add_drain_work)"

> Worker D : wait for wfc.work to be released -> will complete barr->done

and this barr->done is for work "wfc.work".

So those two barr->done could not be the same instance, IIUC. Therefore
the deadlock case is not possible.

The problem here is all barr->done instances are initialized at
insert_wq_barrier() and they belongs to the same lock class, to fix
this, we need to differ barr->done with different lock classes based on
the corresponding works.

How about the this(only compilation test):

----------------->8
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e86733a8b344..d14067942088 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2431,6 +2431,27 @@ struct wq_barrier {
struct task_struct *task; /* purely informational */
};

+#ifdef CONFIG_LOCKDEP_COMPLETE
+# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
+do { \
+ INIT_WORK_ONSTACK(&(barr)->work, func); \
+ __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
+ lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
+ "(complete)" #barr, \
+ (target)->lockdep_map.key, 1); \
+ __init_completion(&barr->done); \
+ barr->task = current; \
+} while (0)
+#else
+# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
+do { \
+ INIT_WORK_ONSTACK(&(barr)->work, func); \
+ __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
+ init_completion(&barr->done); \
+ barr->task = current; \
+} while (0)
+#endif
+
static void wq_barrier_func(struct work_struct *work)
{
struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
@@ -2474,10 +2495,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
* checks and call back into the fixup functions where we
* might deadlock.
*/
- INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
- __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
- init_completion(&barr->done);
- barr->task = current;
+ INIT_WQ_BARRIER_ONSTACK(barr, wq_barrier_func, target);

/*
* If @target is currently being executed, schedule the

2017-08-16 04:39:09

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 12:05:31PM +0800, Boqun Feng wrote:
> On Wed, Aug 16, 2017 at 09:16:37AM +0900, Byungchul Park wrote:
> > On Tue, Aug 15, 2017 at 10:20:20AM +0200, Ingo Molnar wrote:
> > >
> > > So with the latest fixes there's a new lockdep warning on one of my testboxes:
> > >
> > > [ 11.322487] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
> > >
> > > [ 11.495661] ======================================================
> > > [ 11.502093] WARNING: possible circular locking dependency detected
> > > [ 11.508507] 4.13.0-rc5-00497-g73135c58-dirty #1 Not tainted
> > > [ 11.514313] ------------------------------------------------------
> > > [ 11.520725] umount/533 is trying to acquire lock:
> > > [ 11.525657] ((complete)&barr->done){+.+.}, at: [<ffffffff810fdbb3>] flush_work+0x213/0x2f0
> > > [ 11.534411]
> > > but task is already holding lock:
> > > [ 11.540661] (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
> > > [ 11.549613]
> > > which lock already depends on the new lock.
> > >
> > > The full splat is below. The kernel config is nothing fancy - distro derived,
> > > pretty close to defconfig, with lockdep enabled.
> >
> > I see...
> >
> > Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> > Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> > Task C : acquired of lock#3 -> wait for completion of barr->done
>
> >From the stack trace below, this barr->done is for flush_work() in
> lru_add_drain_all_cpuslocked(), i.e. for work "per_cpu(lru_add_drain_work)"
>
> > Worker D : wait for wfc.work to be released -> will complete barr->done
>
> and this barr->done is for work "wfc.work".

I think it can be the same instance. wait_for_completion() in flush_work()
e.g. at task C in my example, waits for completion which we expect to be
done by a worker e.g. worker D in my example.

I think the problem is caused by a write-acquisition of wfc.work in
process_one_work(). The acquisition of wfc.work should be reenterable,
that is, read-acquisition, shouldn't it?

I might be wrong... Please fix me if so.

Thank you,
Byungchul

> So those two barr->done could not be the same instance, IIUC. Therefore
> the deadlock case is not possible.
>
> The problem here is all barr->done instances are initialized at
> insert_wq_barrier() and they belongs to the same lock class, to fix
> this, we need to differ barr->done with different lock classes based on
> the corresponding works.
>
> How about the this(only compilation test):
>
> ----------------->8
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index e86733a8b344..d14067942088 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2431,6 +2431,27 @@ struct wq_barrier {
> struct task_struct *task; /* purely informational */
> };
>
> +#ifdef CONFIG_LOCKDEP_COMPLETE
> +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> +do { \
> + INIT_WORK_ONSTACK(&(barr)->work, func); \
> + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> + lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
> + "(complete)" #barr, \
> + (target)->lockdep_map.key, 1); \
> + __init_completion(&barr->done); \
> + barr->task = current; \
> +} while (0)
> +#else
> +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> +do { \
> + INIT_WORK_ONSTACK(&(barr)->work, func); \
> + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> + init_completion(&barr->done); \
> + barr->task = current; \
> +} while (0)
> +#endif
> +
> static void wq_barrier_func(struct work_struct *work)
> {
> struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
> @@ -2474,10 +2495,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
> * checks and call back into the fixup functions where we
> * might deadlock.
> */
> - INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
> - __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
> - init_completion(&barr->done);
> - barr->task = current;
> + INIT_WQ_BARRIER_ONSTACK(barr, wq_barrier_func, target);
>
> /*
> * If @target is currently being executed, schedule the

2017-08-16 05:06:27

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 12:05:31PM +0800, Boqun Feng wrote:
> > I see...
> >
> > Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> > Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> > Task C : acquired of lock#3 -> wait for completion of barr->done
>
> >From the stack trace below, this barr->done is for flush_work() in
> lru_add_drain_all_cpuslocked(), i.e. for work "per_cpu(lru_add_drain_work)"
>
> > Worker D : wait for wfc.work to be released -> will complete barr->done
>
> and this barr->done is for work "wfc.work".
>
> So those two barr->done could not be the same instance, IIUC. Therefore
> the deadlock case is not possible.
>
> The problem here is all barr->done instances are initialized at
> insert_wq_barrier() and they belongs to the same lock class, to fix

I'm not sure this caused the lockdep warning but, if they belongs to the
same class even though they couldn't be the same instance as you said, I
also think that is another problem and should be fixed.

> this, we need to differ barr->done with different lock classes based on
> the corresponding works.
>
> How about the this(only compilation test):
>
> ----------------->8
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index e86733a8b344..d14067942088 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2431,6 +2431,27 @@ struct wq_barrier {
> struct task_struct *task; /* purely informational */
> };
>
> +#ifdef CONFIG_LOCKDEP_COMPLETE
> +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> +do { \
> + INIT_WORK_ONSTACK(&(barr)->work, func); \
> + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> + lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
> + "(complete)" #barr, \
> + (target)->lockdep_map.key, 1); \
> + __init_completion(&barr->done); \
> + barr->task = current; \
> +} while (0)
> +#else
> +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> +do { \
> + INIT_WORK_ONSTACK(&(barr)->work, func); \
> + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> + init_completion(&barr->done); \
> + barr->task = current; \
> +} while (0)
> +#endif
> +
> static void wq_barrier_func(struct work_struct *work)
> {
> struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
> @@ -2474,10 +2495,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
> * checks and call back into the fixup functions where we
> * might deadlock.
> */
> - INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
> - __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
> - init_completion(&barr->done);
> - barr->task = current;
> + INIT_WQ_BARRIER_ONSTACK(barr, wq_barrier_func, target);
>
> /*
> * If @target is currently being executed, schedule the

2017-08-16 05:40:35

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 01:37:46PM +0900, Byungchul Park wrote:
> On Wed, Aug 16, 2017 at 12:05:31PM +0800, Boqun Feng wrote:
> > On Wed, Aug 16, 2017 at 09:16:37AM +0900, Byungchul Park wrote:
> > > On Tue, Aug 15, 2017 at 10:20:20AM +0200, Ingo Molnar wrote:
> > > >
> > > > So with the latest fixes there's a new lockdep warning on one of my testboxes:
> > > >
> > > > [ 11.322487] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)
> > > >
> > > > [ 11.495661] ======================================================
> > > > [ 11.502093] WARNING: possible circular locking dependency detected
> > > > [ 11.508507] 4.13.0-rc5-00497-g73135c58-dirty #1 Not tainted
> > > > [ 11.514313] ------------------------------------------------------
> > > > [ 11.520725] umount/533 is trying to acquire lock:
> > > > [ 11.525657] ((complete)&barr->done){+.+.}, at: [<ffffffff810fdbb3>] flush_work+0x213/0x2f0
> > > > [ 11.534411]
> > > > but task is already holding lock:
> > > > [ 11.540661] (lock#3){+.+.}, at: [<ffffffff8122678d>] lru_add_drain_all_cpuslocked+0x3d/0x190
> > > > [ 11.549613]
> > > > which lock already depends on the new lock.
> > > >
> > > > The full splat is below. The kernel config is nothing fancy - distro derived,
> > > > pretty close to defconfig, with lockdep enabled.
> > >
> > > I see...
> > >
> > > Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> > > Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> > > Task C : acquired of lock#3 -> wait for completion of barr->done
> >
> > >From the stack trace below, this barr->done is for flush_work() in
> > lru_add_drain_all_cpuslocked(), i.e. for work "per_cpu(lru_add_drain_work)"
> >
> > > Worker D : wait for wfc.work to be released -> will complete barr->done
> >
> > and this barr->done is for work "wfc.work".
>
> I think it can be the same instance. wait_for_completion() in flush_work()
> e.g. at task C in my example, waits for completion which we expect to be
> done by a worker e.g. worker D in my example.
>
> I think the problem is caused by a write-acquisition of wfc.work in
> process_one_work(). The acquisition of wfc.work should be reenterable,
> that is, read-acquisition, shouldn't it?
>

The only thing is that wfc.work is not a real and please see code in
flush_work(). And if a task C do a flush_work() for "wfc.work" with
lock#3 held, it needs to "acquire" wfc.work before it
wait_for_completion(), which is already a deadlock case:

lock#3 -> wfc.work -> cpu_hotplug_lock -+
^ |
| |
+-------------------------------------+

, without crossrelease enabled. So the task C didn't flush work wfc.work
in the previous case, which implies barr->done in Task C and Worker D
are not the same instance.

Make sense?

Regards,
Boqun

> I might be wrong... Please fix me if so.
>
> Thank you,
> Byungchul
>
> > So those two barr->done could not be the same instance, IIUC. Therefore
> > the deadlock case is not possible.
> >
> > The problem here is all barr->done instances are initialized at
> > insert_wq_barrier() and they belongs to the same lock class, to fix
> > this, we need to differ barr->done with different lock classes based on
> > the corresponding works.
> >
> > How about the this(only compilation test):
> >
> > ----------------->8
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index e86733a8b344..d14067942088 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -2431,6 +2431,27 @@ struct wq_barrier {
> > struct task_struct *task; /* purely informational */
> > };
> >
> > +#ifdef CONFIG_LOCKDEP_COMPLETE
> > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > +do { \
> > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > + lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
> > + "(complete)" #barr, \
> > + (target)->lockdep_map.key, 1); \
> > + __init_completion(&barr->done); \
> > + barr->task = current; \
> > +} while (0)
> > +#else
> > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > +do { \
> > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > + init_completion(&barr->done); \
> > + barr->task = current; \
> > +} while (0)
> > +#endif
> > +
> > static void wq_barrier_func(struct work_struct *work)
> > {
> > struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
> > @@ -2474,10 +2495,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
> > * checks and call back into the fixup functions where we
> > * might deadlock.
> > */
> > - INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
> > - __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
> > - init_completion(&barr->done);
> > - barr->task = current;
> > + INIT_WQ_BARRIER_ONSTACK(barr, wq_barrier_func, target);
> >
> > /*
> > * If @target is currently being executed, schedule the


Attachments:
(No filename) (5.12 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-16 05:57:53

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 02:05:06PM +0900, Byungchul Park wrote:
> On Wed, Aug 16, 2017 at 12:05:31PM +0800, Boqun Feng wrote:
> > > I see...
> > >
> > > Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> > > Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> > > Task C : acquired of lock#3 -> wait for completion of barr->done
> >
> > >From the stack trace below, this barr->done is for flush_work() in
> > lru_add_drain_all_cpuslocked(), i.e. for work "per_cpu(lru_add_drain_work)"
> >
> > > Worker D : wait for wfc.work to be released -> will complete barr->done
> >
> > and this barr->done is for work "wfc.work".
> >
> > So those two barr->done could not be the same instance, IIUC. Therefore
> > the deadlock case is not possible.
> >
> > The problem here is all barr->done instances are initialized at
> > insert_wq_barrier() and they belongs to the same lock class, to fix
>
> I'm not sure this caused the lockdep warning but, if they belongs to the
> same class even though they couldn't be the same instance as you said, I
> also think that is another problem and should be fixed.
>

My point was more like this is a false positive case, which we should
avoid as hard as we can, because this very case doesn't look like a
deadlock to me.

Maybe the pattern above does exist in current kernel, but we need to
guide/adjust lockdep to find the real case showing it's happening.

Regards,
Boqun

> > this, we need to differ barr->done with different lock classes based on
> > the corresponding works.
> >
> > How about the this(only compilation test):
> >
> > ----------------->8
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index e86733a8b344..d14067942088 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -2431,6 +2431,27 @@ struct wq_barrier {
> > struct task_struct *task; /* purely informational */
> > };
> >
> > +#ifdef CONFIG_LOCKDEP_COMPLETE
> > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > +do { \
> > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > + lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
> > + "(complete)" #barr, \
> > + (target)->lockdep_map.key, 1); \
> > + __init_completion(&barr->done); \
> > + barr->task = current; \
> > +} while (0)
> > +#else
> > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > +do { \
> > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > + init_completion(&barr->done); \
> > + barr->task = current; \
> > +} while (0)
> > +#endif
> > +
> > static void wq_barrier_func(struct work_struct *work)
> > {
> > struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
> > @@ -2474,10 +2495,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
> > * checks and call back into the fixup functions where we
> > * might deadlock.
> > */
> > - INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
> > - __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
> > - init_completion(&barr->done);
> > - barr->task = current;
> > + INIT_WQ_BARRIER_ONSTACK(barr, wq_barrier_func, target);
> >
> > /*
> > * If @target is currently being executed, schedule the


Attachments:
(No filename) (3.35 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-16 06:38:58

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 01:40:51PM +0800, Boqun Feng wrote:
> > > > Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> > > > Task B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> > > > Task C : acquired of lock#3 -> wait for completion of barr->done
> > >
> > > >From the stack trace below, this barr->done is for flush_work() in
> > > lru_add_drain_all_cpuslocked(), i.e. for work "per_cpu(lru_add_drain_work)"
> > >
> > > > Worker D : wait for wfc.work to be released -> will complete barr->done
> > >
> > > and this barr->done is for work "wfc.work".
> >
> > I think it can be the same instance. wait_for_completion() in flush_work()
> > e.g. at task C in my example, waits for completion which we expect to be
> > done by a worker e.g. worker D in my example.
> >
> > I think the problem is caused by a write-acquisition of wfc.work in
> > process_one_work(). The acquisition of wfc.work should be reenterable,
> > that is, read-acquisition, shouldn't it?
> >
>
> The only thing is that wfc.work is not a real and please see code in
> flush_work(). And if a task C do a flush_work() for "wfc.work" with
> lock#3 held, it needs to "acquire" wfc.work before it
> wait_for_completion(), which is already a deadlock case:
>
> lock#3 -> wfc.work -> cpu_hotplug_lock -+
> ^ |
> | |
> +-------------------------------------+
>
> , without crossrelease enabled. So the task C didn't flush work wfc.work
> in the previous case, which implies barr->done in Task C and Worker D
> are not the same instance.
>
> Make sense?

Thank you very much for your explanation. I misunderstood how flush_work()
works. Yes, it seems to be led by incorrect class of completion.

Thanks,
Byungchul

>
> Regards,
> Boqun
>
> > I might be wrong... Please fix me if so.
> >
> > Thank you,
> > Byungchul
> >
> > > So those two barr->done could not be the same instance, IIUC. Therefore
> > > the deadlock case is not possible.
> > >
> > > The problem here is all barr->done instances are initialized at
> > > insert_wq_barrier() and they belongs to the same lock class, to fix
> > > this, we need to differ barr->done with different lock classes based on
> > > the corresponding works.
> > >
> > > How about the this(only compilation test):
> > >
> > > ----------------->8
> > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > > index e86733a8b344..d14067942088 100644
> > > --- a/kernel/workqueue.c
> > > +++ b/kernel/workqueue.c
> > > @@ -2431,6 +2431,27 @@ struct wq_barrier {
> > > struct task_struct *task; /* purely informational */
> > > };
> > >
> > > +#ifdef CONFIG_LOCKDEP_COMPLETE
> > > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > > +do { \
> > > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > > + lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
> > > + "(complete)" #barr, \
> > > + (target)->lockdep_map.key, 1); \
> > > + __init_completion(&barr->done); \
> > > + barr->task = current; \
> > > +} while (0)
> > > +#else
> > > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > > +do { \
> > > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > > + init_completion(&barr->done); \
> > > + barr->task = current; \
> > > +} while (0)
> > > +#endif
> > > +
> > > static void wq_barrier_func(struct work_struct *work)
> > > {
> > > struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
> > > @@ -2474,10 +2495,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
> > > * checks and call back into the fixup functions where we
> > > * might deadlock.
> > > */
> > > - INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
> > > - __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
> > > - init_completion(&barr->done);
> > > - barr->task = current;
> > > + INIT_WQ_BARRIER_ONSTACK(barr, wq_barrier_func, target);
> > >
> > > /*
> > > * If @target is currently being executed, schedule the


2017-08-16 07:15:46

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 01:58:08PM +0800, Boqun Feng wrote:
> > I'm not sure this caused the lockdep warning but, if they belongs to the
> > same class even though they couldn't be the same instance as you said, I
> > also think that is another problem and should be fixed.
> >
>
> My point was more like this is a false positive case, which we should
> avoid as hard as we can, because this very case doesn't look like a
> deadlock to me.
>
> Maybe the pattern above does exist in current kernel, but we need to
> guide/adjust lockdep to find the real case showing it's happening.

As long as they are initialized as a same class, there's no way to
distinguish between them within lockdep.

And I also think we should avoid false positive cases. Do you think
there are many places where completions are initialized in a same place
even though they could never be the same instance?

If no, it would be better to fix it whenever we face it, as you did.

If yes, we have to change it for completion, for example:

1. Do not apply crossrelease into completions initialized on stack.

or

2. Use the full call path instead of a call site as a lockdep_map key.

or

3. So on.

Could you let me know your opinion about it?

Thanks,
Byungchul

> Regards,
> Boqun
>
> > > this, we need to differ barr->done with different lock classes based on
> > > the corresponding works.
> > >
> > > How about the this(only compilation test):
> > >
> > > ----------------->8
> > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > > index e86733a8b344..d14067942088 100644
> > > --- a/kernel/workqueue.c
> > > +++ b/kernel/workqueue.c
> > > @@ -2431,6 +2431,27 @@ struct wq_barrier {
> > > struct task_struct *task; /* purely informational */
> > > };
> > >
> > > +#ifdef CONFIG_LOCKDEP_COMPLETE
> > > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > > +do { \
> > > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > > + lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
> > > + "(complete)" #barr, \
> > > + (target)->lockdep_map.key, 1); \
> > > + __init_completion(&barr->done); \
> > > + barr->task = current; \
> > > +} while (0)
> > > +#else
> > > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > > +do { \
> > > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > > + init_completion(&barr->done); \
> > > + barr->task = current; \
> > > +} while (0)
> > > +#endif
> > > +
> > > static void wq_barrier_func(struct work_struct *work)
> > > {
> > > struct wq_barrier *barr = container_of(work, struct wq_barrier, work);
> > > @@ -2474,10 +2495,7 @@ static void insert_wq_barrier(struct pool_workqueue *pwq,
> > > * checks and call back into the fixup functions where we
> > > * might deadlock.
> > > */
> > > - INIT_WORK_ONSTACK(&barr->work, wq_barrier_func);
> > > - __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&barr->work));
> > > - init_completion(&barr->done);
> > > - barr->task = current;
> > > + INIT_WQ_BARRIER_ONSTACK(barr, wq_barrier_func, target);
> > >
> > > /*
> > > * If @target is currently being executed, schedule the


2017-08-16 08:07:49

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 04:14:21PM +0900, Byungchul Park wrote:
> On Wed, Aug 16, 2017 at 01:58:08PM +0800, Boqun Feng wrote:
> > > I'm not sure this caused the lockdep warning but, if they belongs to the
> > > same class even though they couldn't be the same instance as you said, I
> > > also think that is another problem and should be fixed.
> > >
> >
> > My point was more like this is a false positive case, which we should
> > avoid as hard as we can, because this very case doesn't look like a
> > deadlock to me.
> >
> > Maybe the pattern above does exist in current kernel, but we need to
> > guide/adjust lockdep to find the real case showing it's happening.
>
> As long as they are initialized as a same class, there's no way to
> distinguish between them within lockdep.
>
> And I also think we should avoid false positive cases. Do you think
> there are many places where completions are initialized in a same place
> even though they could never be the same instance?
>
> If no, it would be better to fix it whenever we face it, as you did.

BTW, of course, the same problem would have occured when applying
lockdep for the first time. How did you solve it?

I mean that lockdep basically identifies classes even for typical locks
with the call site. So two locks could be the same class even though
they should not be the same. Of course, for now, we avoid the problemaic
cases with sub-class. Anyway, the problems certainly would have arised
for the first time. I want to follow that solution you did.

Thanks,
Byungchul

2017-08-16 09:39:59

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Wed, Aug 16, 2017 at 05:06:23PM +0900, Byungchul Park wrote:
> On Wed, Aug 16, 2017 at 04:14:21PM +0900, Byungchul Park wrote:
> > On Wed, Aug 16, 2017 at 01:58:08PM +0800, Boqun Feng wrote:
> > > > I'm not sure this caused the lockdep warning but, if they belongs to the
> > > > same class even though they couldn't be the same instance as you said, I
> > > > also think that is another problem and should be fixed.
> > > >
> > >
> > > My point was more like this is a false positive case, which we should
> > > avoid as hard as we can, because this very case doesn't look like a
> > > deadlock to me.
> > >
> > > Maybe the pattern above does exist in current kernel, but we need to
> > > guide/adjust lockdep to find the real case showing it's happening.
> >
> > As long as they are initialized as a same class, there's no way to
> > distinguish between them within lockdep.
> >
> > And I also think we should avoid false positive cases. Do you think
> > there are many places where completions are initialized in a same place
> > even though they could never be the same instance?
> >
> > If no, it would be better to fix it whenever we face it, as you did.
>
> BTW, of course, the same problem would have occured when applying
> lockdep for the first time. How did you solve it?
>
> I mean that lockdep basically identifies classes even for typical locks
> with the call site. So two locks could be the same class even though
> they should not be the same. Of course, for now, we avoid the problemaic
> cases with sub-class. Anyway, the problems certainly would have arised
^
or setting a class or re-design code like what Boqun
suggested. And so on...

> for the first time. I want to follow that solution you did.
>
> Thanks,
> Byungchul

2017-08-17 07:48:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature


* Boqun Feng <[email protected]> wrote:

> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2431,6 +2431,27 @@ struct wq_barrier {
> struct task_struct *task; /* purely informational */
> };
>
> +#ifdef CONFIG_LOCKDEP_COMPLETE
> +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> +do { \
> + INIT_WORK_ONSTACK(&(barr)->work, func); \
> + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> + lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
> + "(complete)" #barr, \
> + (target)->lockdep_map.key, 1); \
> + __init_completion(&barr->done); \
> + barr->task = current; \
> +} while (0)
> +#else
> +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> +do { \
> + INIT_WORK_ONSTACK(&(barr)->work, func); \
> + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> + init_completion(&barr->done); \
> + barr->task = current; \
> +} while (0)
> +#endif

Is there any progress with this bug? This false positive warning regression is
blocking the locking tree.

BTW., I don't think the #ifdef is necessary: lockdep_init_map_crosslock should map
to nothing when lockdep is disabled, right?

Thanks,

Ingo

2017-08-17 08:03:54

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Thu, Aug 17, 2017 at 09:48:11AM +0200, Ingo Molnar wrote:
>
> * Boqun Feng <[email protected]> wrote:
>
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -2431,6 +2431,27 @@ struct wq_barrier {
> > struct task_struct *task; /* purely informational */
> > };
> >
> > +#ifdef CONFIG_LOCKDEP_COMPLETE
> > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > +do { \
> > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > + lockdep_init_map_crosslock((struct lockdep_map *)&(barr)->done.map, \
> > + "(complete)" #barr, \
> > + (target)->lockdep_map.key, 1); \
> > + __init_completion(&barr->done); \
> > + barr->task = current; \
> > +} while (0)
> > +#else
> > +# define INIT_WQ_BARRIER_ONSTACK(barr, func, target) \
> > +do { \
> > + INIT_WORK_ONSTACK(&(barr)->work, func); \
> > + __set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(&(barr)->work)); \
> > + init_completion(&barr->done); \
> > + barr->task = current; \
> > +} while (0)
> > +#endif
>
> Is there any progress with this bug? This false positive warning regression is
> blocking the locking tree.
>

I have been trying to reproduce the false positive on my machine, but
haven't succeeded. ;-( Have you tried this?

But I have been using this patch for a day and haven't shoot my foot
yet.

> BTW., I don't think the #ifdef is necessary: lockdep_init_map_crosslock should map
> to nothing when lockdep is disabled, right?

IIUC, lockdep_init_map_crosslock is only defined when
CONFIG_LOCKDEP_CROSSRELEASE=y, moreover, completion::map, which used as
the parameter of lockdep_init_map_crosslock(), is only defined when
CONFIG_LOCKDEP_COMPLETE=y. So the #ifdef is necessary, but maybe we can
clean this thing up in the future.

I will send a proper patch, so the thing could move forwards. Just a
minute ;-)

Regards,
Boqun

>
> Thanks,
>
> Ingo


Attachments:
(No filename) (1.94 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-17 08:12:31

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature


* Boqun Feng <[email protected]> wrote:

> > BTW., I don't think the #ifdef is necessary: lockdep_init_map_crosslock should map
> > to nothing when lockdep is disabled, right?
>
> IIUC, lockdep_init_map_crosslock is only defined when
> CONFIG_LOCKDEP_CROSSRELEASE=y,

Then lockdep_init_map_crosslock() should be defined in the !LOCKDEP case as well.

> [...] moreover, completion::map, which used as
> the parameter of lockdep_init_map_crosslock(), is only defined when
> CONFIG_LOCKDEP_COMPLETE=y.

If the !LOCKDEP wrapper is a CPP macro then it can ignore that parameter just
fine, and it won't be built.

Thanks,

Ingo

2017-08-17 08:32:56

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 00/14] lockdep: Implement crossrelease feature

On Thu, Aug 17, 2017 at 10:12:24AM +0200, Ingo Molnar wrote:
>
> * Boqun Feng <[email protected]> wrote:
>
> > > BTW., I don't think the #ifdef is necessary: lockdep_init_map_crosslock should map
> > > to nothing when lockdep is disabled, right?
> >
> > IIUC, lockdep_init_map_crosslock is only defined when
> > CONFIG_LOCKDEP_CROSSRELEASE=y,
>
> Then lockdep_init_map_crosslock() should be defined in the !LOCKDEP case as well.
>
> > [...] moreover, completion::map, which used as
> > the parameter of lockdep_init_map_crosslock(), is only defined when
> > CONFIG_LOCKDEP_COMPLETE=y.
>
> If the !LOCKDEP wrapper is a CPP macro then it can ignore that parameter just
> fine, and it won't be built.
>

Oops, I miss this part.. so I will cook a patch define
lockdep_init_map_crosslock() when !LOCKDEP and I think based on that,
there is no need to introducde INIT_WQ_BARRIER_ONSTACK(), we can simply:

lockdep_init_map_crosslock(...);
__init_completion();

in insert_wq_barrier(). Simpler.

Thanks for your suggestion.

Regards,
Boqun

> Thanks,
>
> Ingo


Attachments:
(No filename) (1.04 kB)
signature.asc (484.00 B)
Download all attachments

2017-08-18 23:43:31

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

Hi Arnd,

On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote:
> On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park <[email protected]> wrote:
> > Although wait_for_completion() and its family can cause deadlock, the
> > lock correctness validator could not be applied to them until now,
> > because things like complete() are usually called in a different context
> > from the waiting context, which violates lockdep's assumption.
> >
> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> > detector to those completion operations. Applied it.
> >
> > Signed-off-by: Byungchul Park <[email protected]>
>
> This patch introduced a significant growth in kernel stack usage for a small
> set of functions. I see two new warnings for functions that get tipped over the
> 1024 or 2048 byte frame size limit in linux-next (with a few other patches
> applied):
>
> Before:
>
> drivers/md/dm-integrity.c: In function 'write_journal':
> drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
> larger than xxx bytes [-Werror=frame-larger-than=]
> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
> is larger than 104 bytes [-Werror=frame-larger-than=]
>
> After:
>
> drivers/md/dm-integrity.c: In function 'write_journal':
> drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
> is larger than 1024 bytes [-Werror=frame-larger-than=]
> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
> bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>
> I have not checked in detail why this happens, but I'm guessing that
> there is an overall increase in stack usage with
> CONFIG_LOCKDEP_COMPLETE in functions using completions,
> and I think it would be good to try to come up with a version that doesn't
> add as much.
>

So I have been staring at this for a while, and below is what I found:

(BTW, Arnd, may I know your compiler version? Mine is 7.1.1)

In write_journal(), I can see the code generated like this on x86:

io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2462: e8 00 00 00 00 callq 2467 <write_journal+0x47>
2467: 48 8d 85 80 fd ff ff lea -0x280(%rbp),%rax
246e: 48 c7 c6 00 00 00 00 mov $0x0,%rsi
2475: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
x->done = 0;
247c: c7 85 90 fd ff ff 00 movl $0x0,-0x270(%rbp)
2483: 00 00 00
init_waitqueue_head(&x->wait);
2486: 48 8d 78 18 lea 0x18(%rax),%rdi
248a: e8 00 00 00 00 callq 248f <write_journal+0x6f>
if (commit_start + commit_sections <= ic->journal_sections) {
248f: 41 8b 87 a8 00 00 00 mov 0xa8(%r15),%eax
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2496: 48 8d bd e8 f9 ff ff lea -0x618(%rbp),%rdi
249d: 48 8d b5 90 fd ff ff lea -0x270(%rbp),%rsi
24a4: b9 17 00 00 00 mov $0x17,%ecx
24a9: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
if (commit_start + commit_sections <= ic->journal_sections) {
24ac: 41 39 c6 cmp %eax,%r14d
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
24af: 48 8d bd 90 fd ff ff lea -0x270(%rbp),%rdi
24b6: 48 8d b5 e8 f9 ff ff lea -0x618(%rbp),%rsi
24bd: b9 17 00 00 00 mov $0x17,%ecx
24c2: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)

Those two "rep movsq"s are very suspicious, because
COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
rather than move it to some temporary variable and copy it back.

I tried to reduce the size of completion struct, and the "rep movsq" did
go away, however it seemed the compiler still allocated the memory for
the temporary variables on the stack, because whenever I
increased/decreased the size of completion, the stack size of
write_journal() got increased/decreased *7* times, but there are only
3 journal_completion structures in write_journal(). So the *4* callsites
of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous.

So I come up with the following patch, trying to teach the compiler not
to do the unnecessary allocation, could you give it a try?

Besides, I could also observe the stack size reduction of
write_journal() even for !LOCKDEP kernel.

-------------------------->8
Subject: [PATCH] completion: Avoid unnecessary stack allocation for
COMPLETION_INITIALIZER_ONSTACK()

In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the
stack allocation of the caller. However, on some compilers, a temporary
structure was allocated for the return value of
COMPLETION_INITIALIZER_ONSTACK(), for example in write_journal() with
LOCKDEP_COMPLETIONS=y(gcc is 7.1.1):

io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2462: e8 00 00 00 00 callq 2467 <write_journal+0x47>
2467: 48 8d 85 80 fd ff ff lea -0x280(%rbp),%rax
246e: 48 c7 c6 00 00 00 00 mov $0x0,%rsi
2475: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
x->done = 0;
247c: c7 85 90 fd ff ff 00 movl $0x0,-0x270(%rbp)
2483: 00 00 00
init_waitqueue_head(&x->wait);
2486: 48 8d 78 18 lea 0x18(%rax),%rdi
248a: e8 00 00 00 00 callq 248f <write_journal+0x6f>
if (commit_start + commit_sections <= ic->journal_sections) {
248f: 41 8b 87 a8 00 00 00 mov 0xa8(%r15),%eax
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2496: 48 8d bd e8 f9 ff ff lea -0x618(%rbp),%rdi
249d: 48 8d b5 90 fd ff ff lea -0x270(%rbp),%rsi
24a4: b9 17 00 00 00 mov $0x17,%ecx
24a9: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
if (commit_start + commit_sections <= ic->journal_sections) {
24ac: 41 39 c6 cmp %eax,%r14d
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
24af: 48 8d bd 90 fd ff ff lea -0x270(%rbp),%rdi
24b6: 48 8d b5 e8 f9 ff ff lea -0x618(%rbp),%rsi
24bd: b9 17 00 00 00 mov $0x17,%ecx
24c2: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)

We can obviously see the temporary structure allocated, and the compiler
also does two meaningless memcpy with "rep movsq".

To fix this, make the brace block in COMPLETION_INITIALIZER_ONSTACK()
return a pointer and dereference it outside the block rather than return
the whole structure, in this way, we are able to teach the compiler not
to do the unnecessary stack allocation.

This could also reduce the stack size even if !LOCKDEP, for example in
write_journal(), compiled with gcc 7.1.1, the result of command:

objdump -d drivers/md/dm-integrity.o | ./scripts/checkstack.pl x86

before:

0x0000246a write_journal [dm-integrity.o]: 696

after:

0x00002b7a write_journal [dm-integrity.o]: 296

Reported-by: Arnd Bergmann <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/completion.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/completion.h b/include/linux/completion.h
index 791f053f28b7..cae5400022a3 100644
--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -74,7 +74,7 @@ static inline void complete_release_commit(struct completion *x) {}
#endif

#define COMPLETION_INITIALIZER_ONSTACK(work) \
- ({ init_completion(&work); work; })
+ (*({ init_completion(&work); &work; }))

/**
* DECLARE_COMPLETION - declare and initialize a completion structure
--
2.14.1


2017-08-19 12:51:20

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

On Sat, Aug 19, 2017 at 1:43 AM, Boqun Feng <[email protected]> wrote:
> Hi Arnd,
>
> On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote:
>> On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park <[email protected]> wrote:
>> > Although wait_for_completion() and its family can cause deadlock, the
>> > lock correctness validator could not be applied to them until now,
>> > because things like complete() are usually called in a different context
>> > from the waiting context, which violates lockdep's assumption.
>> >
>> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
>> > detector to those completion operations. Applied it.
>> >
>> > Signed-off-by: Byungchul Park <[email protected]>
>>
>> This patch introduced a significant growth in kernel stack usage for a small
>> set of functions. I see two new warnings for functions that get tipped over the
>> 1024 or 2048 byte frame size limit in linux-next (with a few other patches
>> applied):
>>
>> Before:
>>
>> drivers/md/dm-integrity.c: In function 'write_journal':
>> drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
>> larger than xxx bytes [-Werror=frame-larger-than=]
>> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
>> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
>> is larger than 104 bytes [-Werror=frame-larger-than=]
>>
>> After:
>>
>> drivers/md/dm-integrity.c: In function 'write_journal':
>> drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
>> is larger than 1024 bytes [-Werror=frame-larger-than=]
>> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
>> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
>> bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>>
>> I have not checked in detail why this happens, but I'm guessing that
>> there is an overall increase in stack usage with
>> CONFIG_LOCKDEP_COMPLETE in functions using completions,
>> and I think it would be good to try to come up with a version that doesn't
>> add as much.
>>
>
> So I have been staring at this for a while, and below is what I found:
>
> (BTW, Arnd, may I know your compiler version? Mine is 7.1.1)

That is what I used as well, on x86, arm32 and arm64.

> In write_journal(), I can see the code generated like this on x86:
>
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 2462: e8 00 00 00 00 callq 2467 <write_journal+0x47>
> 2467: 48 8d 85 80 fd ff ff lea -0x280(%rbp),%rax
> 246e: 48 c7 c6 00 00 00 00 mov $0x0,%rsi
> 2475: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
> x->done = 0;
> 247c: c7 85 90 fd ff ff 00 movl $0x0,-0x270(%rbp)
> 2483: 00 00 00
> init_waitqueue_head(&x->wait);
> 2486: 48 8d 78 18 lea 0x18(%rax),%rdi
> 248a: e8 00 00 00 00 callq 248f <write_journal+0x6f>
> if (commit_start + commit_sections <= ic->journal_sections) {
> 248f: 41 8b 87 a8 00 00 00 mov 0xa8(%r15),%eax
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 2496: 48 8d bd e8 f9 ff ff lea -0x618(%rbp),%rdi
> 249d: 48 8d b5 90 fd ff ff lea -0x270(%rbp),%rsi
> 24a4: b9 17 00 00 00 mov $0x17,%ecx
> 24a9: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
> if (commit_start + commit_sections <= ic->journal_sections) {
> 24ac: 41 39 c6 cmp %eax,%r14d
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 24af: 48 8d bd 90 fd ff ff lea -0x270(%rbp),%rdi
> 24b6: 48 8d b5 e8 f9 ff ff lea -0x618(%rbp),%rsi
> 24bd: b9 17 00 00 00 mov $0x17,%ecx
> 24c2: f3 48 a5 rep movsq %ds:(%rsi),%es:(%rdi)
>
> Those two "rep movsq"s are very suspicious, because
> COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
> rather than move it to some temporary variable and copy it back.

Right. I've seen this behavior before when using c99 compound
literals, but I was surprised to see it here.

I also submitted a patch for the one driver that turned up a new
warning because of this behavior:

https://www.spinics.net/lists/raid/msg58766.html

In case of the mmc driver, the behavior was as expected, it was
just a little too large and I sent the obvious workaround for it

https://patchwork.kernel.org/patch/9902063/

> I tried to reduce the size of completion struct, and the "rep movsq" did
> go away, however it seemed the compiler still allocated the memory for
> the temporary variables on the stack, because whenever I
> increased/decreased the size of completion, the stack size of
> write_journal() got increased/decreased *7* times, but there are only
> 3 journal_completion structures in write_journal(). So the *4* callsites
> of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous.
>
> So I come up with the following patch, trying to teach the compiler not
> to do the unnecessary allocation, could you give it a try?
>
> Besides, I could also observe the stack size reduction of
> write_journal() even for !LOCKDEP kernel.

Ok.

> -------------------
> Reported-by: Arnd Bergmann <[email protected]>
> Signed-off-by: Boqun Feng <[email protected]>
> ---
> include/linux/completion.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/completion.h b/include/linux/completion.h
> index 791f053f28b7..cae5400022a3 100644
> --- a/include/linux/completion.h
> +++ b/include/linux/completion.h
> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct completion *x) {}
> #endif
>
> #define COMPLETION_INITIALIZER_ONSTACK(work) \
> - ({ init_completion(&work); work; })
> + (*({ init_completion(&work); &work; }))
>
> /**
> * DECLARE_COMPLETION - declare and initialize a completion structure

Nice hack. Any idea why that's different to the compiler?

I've applied that one to my test tree now, and reverted my own patch,
will let you know if anything else shows up. I think we probably want
to merge both patches to mainline.

Arnd

2017-08-19 13:34:03

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmann <[email protected]> wrote:

>> --- a/include/linux/completion.h
>> +++ b/include/linux/completion.h
>> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct completion *x) {}
>> #endif
>>
>> #define COMPLETION_INITIALIZER_ONSTACK(work) \
>> - ({ init_completion(&work); work; })
>> + (*({ init_completion(&work); &work; }))
>>
>> /**
>> * DECLARE_COMPLETION - declare and initialize a completion structure
>
> Nice hack. Any idea why that's different to the compiler?
>
> I've applied that one to my test tree now, and reverted my own patch,
> will let you know if anything else shows up. I think we probably want
> to merge both patches to mainline.

There is apparently one user of COMPLETION_INITIALIZER_ONSTACK
that causes a regression with the patch above:

drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe':
include/linux/completion.h:77:3: error: value computed is not used
[-Werror=unused-value]
(*({ init_completion(&work); &work; }))

It would be trivial to convert to init_completion(), which seems to be
what was intended there.

Arnd

2017-08-20 03:17:49

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

On Sat, Aug 19, 2017 at 02:51:17PM +0200, Arnd Bergmann wrote:
[...]
> > Those two "rep movsq"s are very suspicious, because
> > COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
> > rather than move it to some temporary variable and copy it back.
>
> Right. I've seen this behavior before when using c99 compound
> literals, but I was surprised to see it here.
>
> I also submitted a patch for the one driver that turned up a new
> warning because of this behavior:
>
> https://www.spinics.net/lists/raid/msg58766.html
>

This solution also came up into my mind but then I found there are
several callsites of COMPLETION_INITIALIZER_ONSTACK(), so I then tried
to find a way to fix the macro itself. But your patch looks good to me
;-)

> In case of the mmc driver, the behavior was as expected, it was
> just a little too large and I sent the obvious workaround for it
>
> https://patchwork.kernel.org/patch/9902063/
>

Yep.

> > I tried to reduce the size of completion struct, and the "rep movsq" did
> > go away, however it seemed the compiler still allocated the memory for
> > the temporary variables on the stack, because whenever I
> > increased/decreased the size of completion, the stack size of
> > write_journal() got increased/decreased *7* times, but there are only
> > 3 journal_completion structures in write_journal(). So the *4* callsites
> > of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous.
> >
> > So I come up with the following patch, trying to teach the compiler not
> > to do the unnecessary allocation, could you give it a try?
> >
> > Besides, I could also observe the stack size reduction of
> > write_journal() even for !LOCKDEP kernel.
>
> Ok.
>
> > -------------------
> > Reported-by: Arnd Bergmann <[email protected]>
> > Signed-off-by: Boqun Feng <[email protected]>
> > ---
> > include/linux/completion.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/completion.h b/include/linux/completion.h
> > index 791f053f28b7..cae5400022a3 100644
> > --- a/include/linux/completion.h
> > +++ b/include/linux/completion.h
> > @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct completion *x) {}
> > #endif
> >
> > #define COMPLETION_INITIALIZER_ONSTACK(work) \
> > - ({ init_completion(&work); work; })
> > + (*({ init_completion(&work); &work; }))
> >
> > /**
> > * DECLARE_COMPLETION - declare and initialize a completion structure
>
> Nice hack. Any idea why that's different to the compiler?
>

So *I think* the block {init_completion(&work); &work;} now will return
a pointer rather than a whole structure, and a pointer could fit in a
register, so the compiler won't bother to allocate the memory for it.

> I've applied that one to my test tree now, and reverted my own patch,
> will let you know if anything else shows up. I think we probably want

Thanks ;-)

> to merge both patches to mainline.
>

Agreed! Unless we want to remove COMPLETION_INITIALIZER_ONSTACK() for
some reason, then my patch is not needed.

Regards,
Boqun

> Arnd


Attachments:
(No filename) (3.02 kB)
signature.asc (488.00 B)
Download all attachments

2017-08-23 14:43:14

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

On Sat, Aug 19, 2017 at 03:34:01PM +0200, Arnd Bergmann wrote:
> On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmann <[email protected]> wrote:
>
> >> --- a/include/linux/completion.h
> >> +++ b/include/linux/completion.h
> >> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct completion *x) {}
> >> #endif
> >>
> >> #define COMPLETION_INITIALIZER_ONSTACK(work) \
> >> - ({ init_completion(&work); work; })
> >> + (*({ init_completion(&work); &work; }))
> >>
> >> /**
> >> * DECLARE_COMPLETION - declare and initialize a completion structure
> >
> > Nice hack. Any idea why that's different to the compiler?
> >

So I find this link:

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

it says:

"In G++, the result value of a statement expression undergoes array and
function pointer decay, and is returned by value to the enclosing
expression. "

I think this is why the temporary variable is constructed(or at least
allocated). Lemme put this in my commit log.

> > I've applied that one to my test tree now, and reverted my own patch,
> > will let you know if anything else shows up. I think we probably want
> > to merge both patches to mainline.
>
> There is apparently one user of COMPLETION_INITIALIZER_ONSTACK
> that causes a regression with the patch above:
>
> drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe':
> include/linux/completion.h:77:3: error: value computed is not used
> [-Werror=unused-value]
> (*({ init_completion(&work); &work; }))
>
> It would be trivial to convert to init_completion(), which seems to be
> what was intended there.
>

Thanks. Will send the conversion as a separate patch along with my
patch.

Regards,
Boqun

> Arnd


Attachments:
(No filename) (1.68 kB)
signature.asc (488.00 B)
Download all attachments

2017-09-05 01:04:10

by Byungchul Park

[permalink] [raw]
Subject: Re: [PATCH v8 11/14] lockdep: Apply crossrelease to PG_locked locks

On Mon, Aug 07, 2017 at 04:12:58PM +0900, Byungchul Park wrote:
> Although lock_page() and its family can cause deadlock, the lock
> correctness validator could not be applied to them until now, becasue
> things like unlock_page() might be called in a different context from
> the acquisition context, which violates lockdep's assumption.
>
> Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> detector to page locks. Applied it.

I expect applying this into lock_page() is more useful than
wait_for_completion(). Could you consider this as the next?

> Signed-off-by: Byungchul Park <[email protected]>
> ---
> include/linux/mm_types.h | 8 ++++
> include/linux/pagemap.h | 101 ++++++++++++++++++++++++++++++++++++++++++++---
> lib/Kconfig.debug | 8 ++++
> mm/filemap.c | 4 +-
> mm/page_alloc.c | 3 ++
> 5 files changed, 116 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index ff15181..f1e3dba 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -16,6 +16,10 @@
>
> #include <asm/mmu.h>
>
> +#ifdef CONFIG_LOCKDEP_PAGELOCK
> +#include <linux/lockdep.h>
> +#endif
> +
> #ifndef AT_VECTOR_SIZE_ARCH
> #define AT_VECTOR_SIZE_ARCH 0
> #endif
> @@ -216,6 +220,10 @@ struct page {
> #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
> int _last_cpupid;
> #endif
> +
> +#ifdef CONFIG_LOCKDEP_PAGELOCK
> + struct lockdep_map_cross map;
> +#endif
> }
> /*
> * The struct page can be forced to be double word aligned so that atomic ops
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 9717ca8..9f448c6 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -14,6 +14,9 @@
> #include <linux/bitops.h>
> #include <linux/hardirq.h> /* for in_interrupt() */
> #include <linux/hugetlb_inline.h>
> +#ifdef CONFIG_LOCKDEP_PAGELOCK
> +#include <linux/lockdep.h>
> +#endif
>
> /*
> * Bits in mapping->flags.
> @@ -450,26 +453,91 @@ static inline pgoff_t linear_page_index(struct vm_area_struct *vma,
> return pgoff;
> }
>
> +#ifdef CONFIG_LOCKDEP_PAGELOCK
> +#define lock_page_init(p) \
> +do { \
> + static struct lock_class_key __key; \
> + lockdep_init_map_crosslock((struct lockdep_map *)&(p)->map, \
> + "(PG_locked)" #p, &__key, 0); \
> +} while (0)
> +
> +static inline void lock_page_acquire(struct page *page, int try)
> +{
> + page = compound_head(page);
> + lock_acquire_exclusive((struct lockdep_map *)&page->map, 0,
> + try, NULL, _RET_IP_);
> +}
> +
> +static inline void lock_page_release(struct page *page)
> +{
> + page = compound_head(page);
> + /*
> + * lock_commit_crosslock() is necessary for crosslocks.
> + */
> + lock_commit_crosslock((struct lockdep_map *)&page->map);
> + lock_release((struct lockdep_map *)&page->map, 0, _RET_IP_);
> +}
> +#else
> +static inline void lock_page_init(struct page *page) {}
> +static inline void lock_page_free(struct page *page) {}
> +static inline void lock_page_acquire(struct page *page, int try) {}
> +static inline void lock_page_release(struct page *page) {}
> +#endif
> +
> extern void __lock_page(struct page *page);
> extern int __lock_page_killable(struct page *page);
> extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm,
> unsigned int flags);
> -extern void unlock_page(struct page *page);
> +extern void do_raw_unlock_page(struct page *page);
>
> -static inline int trylock_page(struct page *page)
> +static inline void unlock_page(struct page *page)
> +{
> + lock_page_release(page);
> + do_raw_unlock_page(page);
> +}
> +
> +static inline int do_raw_trylock_page(struct page *page)
> {
> page = compound_head(page);
> return (likely(!test_and_set_bit_lock(PG_locked, &page->flags)));
> }
>
> +static inline int trylock_page(struct page *page)
> +{
> + if (do_raw_trylock_page(page)) {
> + lock_page_acquire(page, 1);
> + return 1;
> + }
> + return 0;
> +}
> +
> /*
> * lock_page may only be called if we have the page's inode pinned.
> */
> static inline void lock_page(struct page *page)
> {
> might_sleep();
> - if (!trylock_page(page))
> +
> + if (!do_raw_trylock_page(page))
> __lock_page(page);
> + /*
> + * acquire() must be after actual lock operation for crosslocks.
> + * This way a crosslock and current lock can be ordered like:
> + *
> + * CONTEXT 1 CONTEXT 2
> + * --------- ---------
> + * lock A (cross)
> + * acquire A
> + * X = atomic_inc_return(&cross_gen_id)
> + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> + * acquire B
> + * Y = atomic_read_acquire(&cross_gen_id)
> + * lock B
> + *
> + * so that 'lock A and then lock B' can be seen globally,
> + * if X <= Y.
> + */
> + lock_page_acquire(page, 0);
> }
>
> /*
> @@ -479,9 +547,20 @@ static inline void lock_page(struct page *page)
> */
> static inline int lock_page_killable(struct page *page)
> {
> + int ret;
> +
> might_sleep();
> - if (!trylock_page(page))
> - return __lock_page_killable(page);
> +
> + if (!do_raw_trylock_page(page)) {
> + ret = __lock_page_killable(page);
> + if (ret)
> + return ret;
> + }
> + /*
> + * acquire() must be after actual lock operation for crosslocks.
> + * This way a crosslock and other locks can be ordered.
> + */
> + lock_page_acquire(page, 0);
> return 0;
> }
>
> @@ -496,7 +575,17 @@ static inline int lock_page_or_retry(struct page *page, struct mm_struct *mm,
> unsigned int flags)
> {
> might_sleep();
> - return trylock_page(page) || __lock_page_or_retry(page, mm, flags);
> +
> + if (do_raw_trylock_page(page) || __lock_page_or_retry(page, mm, flags)) {
> + /*
> + * acquire() must be after actual lock operation for crosslocks.
> + * This way a crosslock and other locks can be ordered.
> + */
> + lock_page_acquire(page, 0);
> + return 1;
> + }
> +
> + return 0;
> }
>
> /*
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 4ba8adc..99b5f76 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1093,6 +1093,14 @@ config LOCKDEP_COMPLETE
> A deadlock caused by wait_for_completion() and complete() can be
> detected by lockdep using crossrelease feature.
>
> +config LOCKDEP_PAGELOCK
> + bool "Lock debugging: allow PG_locked lock to use deadlock detector"
> + select LOCKDEP_CROSSRELEASE
> + default n
> + help
> + PG_locked lock is a kind of crosslock. Using crossrelease feature,
> + PG_locked lock can work with runtime deadlock detector.
> +
> config PROVE_LOCKING
> bool "Lock debugging: prove locking correctness"
> depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
> diff --git a/mm/filemap.c b/mm/filemap.c
> index a497024..0d83bf0 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1083,7 +1083,7 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem
> * portably (architectures that do LL/SC can test any bit, while x86 can
> * test the sign bit).
> */
> -void unlock_page(struct page *page)
> +void do_raw_unlock_page(struct page *page)
> {
> BUILD_BUG_ON(PG_waiters != 7);
> page = compound_head(page);
> @@ -1091,7 +1091,7 @@ void unlock_page(struct page *page)
> if (clear_bit_unlock_is_negative_byte(PG_locked, &page->flags))
> wake_up_page_bit(page, PG_locked);
> }
> -EXPORT_SYMBOL(unlock_page);
> +EXPORT_SYMBOL(do_raw_unlock_page);
>
> /**
> * end_page_writeback - end writeback against a page
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6d30e91..2cbf412 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5406,6 +5406,9 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> } else {
> __init_single_pfn(pfn, zone, nid);
> }
> +#ifdef CONFIG_LOCKDEP_PAGELOCK
> + lock_page_init(pfn_to_page(pfn));
> +#endif
> }
> }
>
> --
> 1.9.1