Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935589AbcKWLZx (ORCPT ); Wed, 23 Nov 2016 06:25:53 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:33965 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933941AbcKWLZv (ORCPT ); Wed, 23 Nov 2016 06:25:51 -0500 From: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Nicolai=20H=C3=A4hnle?= , Peter Zijlstra , Ingo Molnar , Chris Wilson , Maarten Lankhorst , dri-devel@lists.freedesktop.org, stable@vger.kernel.org, =?UTF-8?q?Nicolai=20H=C3=A4hnle?= Subject: [PATCH 1/4] locking/ww_mutex: Fix a deadlock affecting ww_mutexes Date: Wed, 23 Nov 2016 12:25:22 +0100 Message-Id: <1479900325-28358-1-git-send-email-nhaehnle@gmail.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5057 Lines: 137 From: Nicolai Hähnle Fix a race condition involving 4 threads and 2 ww_mutexes as indicated in the following example. Acquire context stamps are ordered like the thread numbers, i.e. thread #1 should back off when it encounters a mutex locked by thread #0 etc. Thread #0 Thread #1 Thread #2 Thread #3 --------- --------- --------- --------- lock(ww) success lock(ww') success lock(ww) lock(ww) . . . unlock(ww) part 1 lock(ww) . . . success . . . . . unlock(ww) part 2 . back off lock(ww') . . . (stuck) (stuck) Here, unlock(ww) part 1 is the part that sets lock->base.count to 1 (without being protected by lock->base.wait_lock), meaning that thread #0 can acquire ww in the fast path or, much more likely, the medium path in mutex_optimistic_spin. Since lock->base.count == 0, thread #0 then won't wake up any of the waiters in ww_mutex_set_context_fastpath. Then, unlock(ww) part 2 wakes up _only_the_first_ waiter of ww. This is thread #2, since waiters are added at the tail. Thread #2 wakes up and backs off since it sees ww owned by a context with a lower stamp. Meanwhile, thread #1 is never woken up, and so it won't back off its lock on ww'. So thread #0 gets stuck waiting for ww' to be released. This patch fixes the deadlock by waking up all waiters in the slow path of ww_mutex_unlock. We have an internal test case for amdgpu which continuously submits command streams from tens of threads, where all command streams reference hundreds of GPU buffer objects with a lot of overlap in the buffer lists between command streams. This test reliably caused a deadlock, and while I haven't completely confirmed that it is exactly the scenario outlined above, this patch does fix the test case. v2: - use wake_q_add - add additional explanations Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Chris Wilson Cc: Maarten Lankhorst Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org Reviewed-by: Christian König (v1) Signed-off-by: Nicolai Hähnle --- kernel/locking/mutex.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index a70b90d..7fbf9b4 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -409,6 +409,9 @@ static bool mutex_optimistic_spin(struct mutex *lock, __visible __used noinline void __sched __mutex_unlock_slowpath(atomic_t *lock_count); +static __used noinline +void __sched __mutex_unlock_slowpath_wakeall(atomic_t *lock_count); + /** * mutex_unlock - release the mutex * @lock: the mutex to be released @@ -473,7 +476,14 @@ void __sched ww_mutex_unlock(struct ww_mutex *lock) */ mutex_clear_owner(&lock->base); #endif - __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath); + /* + * A previously _not_ waiting task may acquire the lock via the fast + * path during our unlock. In that case, already waiting tasks may have + * to back off to avoid a deadlock. Wake up all waiters so that they + * can check their acquire context stamp against the new owner. + */ + __mutex_fastpath_unlock(&lock->base.count, + __mutex_unlock_slowpath_wakeall); } EXPORT_SYMBOL(ww_mutex_unlock); @@ -716,7 +726,7 @@ EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible); * Release the lock, slowpath: */ static inline void -__mutex_unlock_common_slowpath(struct mutex *lock, int nested) +__mutex_unlock_common_slowpath(struct mutex *lock, int nested, int wake_all) { unsigned long flags; WAKE_Q(wake_q); @@ -740,7 +750,14 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested) mutex_release(&lock->dep_map, nested, _RET_IP_); debug_mutex_unlock(lock); - if (!list_empty(&lock->wait_list)) { + if (wake_all) { + struct mutex_waiter *waiter; + + list_for_each_entry(waiter, &lock->wait_list, list) { + debug_mutex_wake_waiter(lock, waiter); + wake_q_add(&wake_q, waiter->task); + } + } else if (!list_empty(&lock->wait_list)) { /* get the first entry from the wait-list: */ struct mutex_waiter *waiter = list_entry(lock->wait_list.next, @@ -762,7 +779,15 @@ __mutex_unlock_slowpath(atomic_t *lock_count) { struct mutex *lock = container_of(lock_count, struct mutex, count); - __mutex_unlock_common_slowpath(lock, 1); + __mutex_unlock_common_slowpath(lock, 1, 0); +} + +static void +__mutex_unlock_slowpath_wakeall(atomic_t *lock_count) +{ + struct mutex *lock = container_of(lock_count, struct mutex, count); + + __mutex_unlock_common_slowpath(lock, 1, 1); } #ifndef CONFIG_DEBUG_LOCK_ALLOC -- 2.7.4