From: Waiman Long <Waiman.Long@hpe.com>
To: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>, linux-kernel@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Ding Tianhong <dingtianhong@huawei.com>,
        Jason Low <jason.low2@hpe.com>, Davidlohr Bueso <dave@stgolabs.net>,
        "Paul E. McKenney" <paulmck@us.ibm.com>,
        Thomas Gleixner <tglx@linutronix.de>,
        Will Deacon <Will.Deacon@arm.com>,
        Tim Chen <tim.c.chen@linux.intel.com>,
        Waiman Long <Waiman.Long@hpe.com>, Waiman Long <Waiman.Long@hp.com>
Subject: [PATCH v3 3/3] locking/mutex: Avoid missed wakeup of mutex waiter
Date: Tue, 22 Mar 2016 13:46:44 -0400
Message-Id: <1458668804-10138-4-git-send-email-Waiman.Long@hpe.com>
In-Reply-To: <1458668804-10138-1-git-send-email-Waiman.Long@hpe.com>
References: <1458668804-10138-1-git-send-email-Waiman.Long@hpe.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2970
Lines: 78

The current mutex code sets count to -1 and then sets the task
state. This is the same sequence that the mutex unlock path is checking
count and task state. That could lead to a missed wakeup even though
the problem will be cleared when a new waiter enters the waiting queue.

This patch reverses the order in the locking slowpath so that the task
state is set first before setting the count. This should eliminate
the potential missed wakeup and improve latency.

Signed-off-by: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/mutex.c |   44 +++++++++++++++++++++++++++++---------------
 1 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 5c0acee..c9af28c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -571,20 +571,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 
 	while (!acquired) {
 		/*
-		 * Lets try to take the lock again - this is needed even if
-		 * we get here for the first time (shortly after failing to
-		 * acquire the lock), to make sure that we get a wakeup once
-		 * it's unlocked. Later on, if we sleep, this is the
-		 * operation that gives us the lock. We xchg it to -1, so
-		 * that when we release the lock, we properly wake up the
-		 * other waiters. We only attempt the xchg if the count is
-		 * non-negative in order to avoid unnecessary xchg operations:
-		 */
-		if (atomic_read(&lock->count) >= 0 &&
-		    (atomic_xchg_acquire(&lock->count, -1) == 1))
-			break;
-
-		/*
 		 * got a signal? (This code gets eliminated in the
 		 * TASK_UNINTERRUPTIBLE case.)
 		 */
@@ -599,7 +585,35 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 				goto err;
 		}
 
-		__set_task_state(task, state);
+
+		/*
+		 * We need to set the state first before changing the count
+		 * to -1 to avoid missed wakeup even though the problem can
+		 * be cleared by a new waiter entering the queue.
+		 *
+		 *	Sleep			Wakeup
+		 *	-----			------
+		 *   [S] p->state = state [RmW] count = 1
+		 *	 MB			MB
+		 * [RmW] count = -1	    [L] if ((prev_count < 0) &&
+		 *	 if (prev_count < 1)	    (p->state & state))
+		 *	     sleep		    wakeup
+		 */
+		set_task_state(task, state);
+
+		/*
+		 * Lets try to take the lock again - this is needed even if
+		 * we get here for the first time (shortly after failing to
+		 * acquire the lock), to make sure that we get a wakeup once
+		 * it's unlocked. Later on, if we sleep, this is the
+		 * operation that gives us the lock. We xchg it to -1, so
+		 * that when we release the lock, we properly wake up the
+		 * other waiters. We only attempt the xchg if the count is
+		 * non-negative in order to avoid unnecessary xchg operations:
+		 */
+		if (atomic_read(&lock->count) >= 0 &&
+		   (atomic_xchg_acquire(&lock->count, -1) == 1))
+			break;
 
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
-- 
1.7.1