From: Davidlohr Bueso <dave@stgolabs.net>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>
Cc: Darren Hart <dvhart@infradead.org>, Steven Rostedt <rostedt@goodmis.org>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
        Davidlohr Bueso <dave@stgolabs.net>, linux-kernel@vger.kernel.org,
        Davidlohr Bueso <dbueso@suse.de>
Subject: [PATCH -tip v2 1/2] locking/rtmutex: Support spin on owner
Date: Wed,  1 Jul 2015 13:29:47 -0700
Message-Id: <1435782588-4177-1-git-send-email-dave@stgolabs.net>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6455
Lines: 208

Similar to what we have in other locks, particularly regular mutexes, the
idea is that as long as the owner is running, there is a fair chance it'll
release the lock soon, and thus a task trying to acquire the rtmutex will
better off spinning instead of blocking immediately after the fastpath.
Conditions to stop spinning and enter the slowpath are simple:

(1) Upon need_resched()
(2) Current lock owner blocks

Because rtmutexes track the lock owner atomically, we can extend the fastpath
to continue polling on the lock owner via cmpxchg(lock->owner, NULL, current).

However, this is a conservative approach, such that if there are any waiters
in-line, we stop spinning and immediately take the traditional slowpath. This
allows priority boosting to take precedence over spinning, as otherwise we
could starve a higher priority queued-up task (ie: top waiter) if spinners
constantly steal the lock. This is done by locklesly checking the rbtree
for any waiters (which is safe as if we race all we do is iterate again).
Obviously this means that spinners ignore the "has waiters" flag/bit and thus
tasks can extend their spinning time until the first task queues itself into
the tree. In addition, tasks that failed their spinning and are in the slowpath
still sync with tasks unlocking, respecting try_to_take_rt_mutex() -- spinning
only allows possible lock stealing. Because this of its rt nature, there is
only limited benefits of this approach, as most systems that really need
real-time locks will commonly end up boosting prio, which means we have queued
waiters.

Passes futextests and was seen to improve total inversions performed on a 60-core
box running pi_stress (with a 10 minute window) in ~5% -- which given the program
characteristics, is a non-trivial speed up). More importantly, it shows that we
continue to avoid any deadlock scenarios the benchmark purposely introduces and
avoids by using prio boosting.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
Changes from v1 (from tglx):
 o Removed the osq usage, too bad we cannot have it as is
   due to queue order.
 o Changed waiter check to use rbtree instead of has waiters bit.
 o Comment cleanups.
 o Rebased & more testing.

 kernel/Kconfig.locks     |   4 ++
 kernel/locking/rtmutex.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks
index ebdb004..38ea896 100644
--- a/kernel/Kconfig.locks
+++ b/kernel/Kconfig.locks
@@ -227,6 +227,10 @@ config MUTEX_SPIN_ON_OWNER
 	def_bool y
 	depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW
 
+config RT_MUTEX_SPIN_ON_OWNER
+	def_bool y
+	depends on SMP && RT_MUTEXES && !DEBUG_RT_MUTEXES
+
 config RWSEM_SPIN_ON_OWNER
        def_bool y
        depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 5674b07..66b8aa3 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1150,6 +1150,116 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock,
 	}
 }
 
+#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER
+static noinline
+bool rt_mutex_spin_on_owner(struct rt_mutex *lock, struct task_struct *owner)
+{
+	bool ret = true;
+
+	rcu_read_lock();
+	while (rt_mutex_owner(lock) == owner) {
+		/*
+		 * Ensure we emit the owner->on_cpu, dereference _after_
+		 * checking lock->owner still matches owner. If that fails,
+		 * owner might point to freed memory. If it still matches,
+		 * the rcu_read_lock() ensures the memory stays valid.
+		 */
+		barrier();
+
+		if (!owner->on_cpu || need_resched()) {
+			ret = false;
+			break;
+		}
+
+		cpu_relax_lowlatency();
+	}
+	rcu_read_unlock();
+
+	return ret;
+}
+
+/*
+ * Initial check for entering the mutex spinning loop.
+ * If no owner, the lock is free.
+ */
+static inline bool rt_mutex_can_spin_on_owner(struct rt_mutex *lock)
+{
+	struct task_struct *owner;
+	int ret = true;
+
+	if (need_resched())
+		return 0;
+
+	rcu_read_lock();
+	owner = rt_mutex_owner(lock);
+	if (owner)
+		ret = owner->on_cpu;
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static bool rt_mutex_optimistic_spin(struct rt_mutex *lock)
+{
+	bool taken = false;
+
+	preempt_disable();
+
+	if (!rt_mutex_can_spin_on_owner(lock))
+		goto done;
+
+	while (true) {
+		struct task_struct *owner;
+
+		/*
+		 * When another task is competing for the lock in the
+		 * slowpath avoid the cmpxchg and immediately block.
+		 *
+		 * This is a lockless check against the waiters rb
+		 * root, meaning that only _real_ waiters are
+		 * acknowledged. The sometimes-phony "has waiters"
+		 * bit is only set unconditionally, in the slowpath
+		 * acquire path, to sync up with any owner releasing
+		 * the lock.
+		 */
+		if (rt_mutex_has_waiters(lock))
+			goto done;
+
+		/*
+		 * If there's an owner, wait for it to either
+		 * release the lock or go to sleep.
+		 */
+		owner = rt_mutex_owner(lock);
+		if (owner && !rt_mutex_spin_on_owner(lock, owner))
+			break;
+
+		/* Try to acquire the lock, if it is unlocked. */
+		if (rt_mutex_cmpxchg(lock, NULL, current)) {
+			taken = true;
+			goto done;
+		}
+
+		/*
+		 * The cpu_relax() call is a compiler barrier which forces
+		 * everything in this loop to be re-loaded. We don't need
+		 * memory barriers as we'll eventually observe the right
+		 * values at the cost of a few extra spins.
+		 */
+		cpu_relax_lowlatency();
+	}
+
+done:
+	preempt_enable();
+	return taken;
+}
+
+#else
+static bool rt_mutex_optimistic_spin(struct rt_mutex *lock)
+{
+	return false;
+}
+#endif
+
 /*
  * Slow path lock function:
  */
@@ -1161,6 +1271,9 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 	struct rt_mutex_waiter waiter;
 	int ret = 0;
 
+	if (rt_mutex_optimistic_spin(lock))
+		return ret;
+
 	debug_rt_mutex_init_waiter(&waiter);
 	RB_CLEAR_NODE(&waiter.pi_tree_entry);
 	RB_CLEAR_NODE(&waiter.tree_entry);
@@ -1524,7 +1637,6 @@ void __rt_mutex_init(struct rt_mutex *lock, const char *name)
 	raw_spin_lock_init(&lock->wait_lock);
 	lock->waiters = RB_ROOT;
 	lock->waiters_leftmost = NULL;
-
 	debug_rt_mutex_init(lock, name);
 }
 EXPORT_SYMBOL_GPL(__rt_mutex_init);
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/