Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753302AbbGAUak (ORCPT ); Wed, 1 Jul 2015 16:30:40 -0400 Received: from smtp2.provo.novell.com ([137.65.250.81]:38513 "EHLO smtp2.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751106AbbGAUab (ORCPT ); Wed, 1 Jul 2015 16:30:31 -0400 From: Davidlohr Bueso To: Ingo Molnar , Peter Zijlstra , Thomas Gleixner Cc: Darren Hart , Steven Rostedt , Mike Galbraith , "Paul E. McKenney" , Sebastian Andrzej Siewior , Davidlohr Bueso , linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: [PATCH -tip v2 1/2] locking/rtmutex: Support spin on owner Date: Wed, 1 Jul 2015 13:29:47 -0700 Message-Id: <1435782588-4177-1-git-send-email-dave@stgolabs.net> X-Mailer: git-send-email 2.1.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6455 Lines: 208 Similar to what we have in other locks, particularly regular mutexes, the idea is that as long as the owner is running, there is a fair chance it'll release the lock soon, and thus a task trying to acquire the rtmutex will better off spinning instead of blocking immediately after the fastpath. Conditions to stop spinning and enter the slowpath are simple: (1) Upon need_resched() (2) Current lock owner blocks Because rtmutexes track the lock owner atomically, we can extend the fastpath to continue polling on the lock owner via cmpxchg(lock->owner, NULL, current). However, this is a conservative approach, such that if there are any waiters in-line, we stop spinning and immediately take the traditional slowpath. This allows priority boosting to take precedence over spinning, as otherwise we could starve a higher priority queued-up task (ie: top waiter) if spinners constantly steal the lock. This is done by locklesly checking the rbtree for any waiters (which is safe as if we race all we do is iterate again). Obviously this means that spinners ignore the "has waiters" flag/bit and thus tasks can extend their spinning time until the first task queues itself into the tree. In addition, tasks that failed their spinning and are in the slowpath still sync with tasks unlocking, respecting try_to_take_rt_mutex() -- spinning only allows possible lock stealing. Because this of its rt nature, there is only limited benefits of this approach, as most systems that really need real-time locks will commonly end up boosting prio, which means we have queued waiters. Passes futextests and was seen to improve total inversions performed on a 60-core box running pi_stress (with a 10 minute window) in ~5% -- which given the program characteristics, is a non-trivial speed up). More importantly, it shows that we continue to avoid any deadlock scenarios the benchmark purposely introduces and avoids by using prio boosting. Signed-off-by: Davidlohr Bueso --- Changes from v1 (from tglx): o Removed the osq usage, too bad we cannot have it as is due to queue order. o Changed waiter check to use rbtree instead of has waiters bit. o Comment cleanups. o Rebased & more testing. kernel/Kconfig.locks | 4 ++ kernel/locking/rtmutex.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 117 insertions(+), 1 deletion(-) diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index ebdb004..38ea896 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -227,6 +227,10 @@ config MUTEX_SPIN_ON_OWNER def_bool y depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW +config RT_MUTEX_SPIN_ON_OWNER + def_bool y + depends on SMP && RT_MUTEXES && !DEBUG_RT_MUTEXES + config RWSEM_SPIN_ON_OWNER def_bool y depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 5674b07..66b8aa3 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1150,6 +1150,116 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock, } } +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER +static noinline +bool rt_mutex_spin_on_owner(struct rt_mutex *lock, struct task_struct *owner) +{ + bool ret = true; + + rcu_read_lock(); + while (rt_mutex_owner(lock) == owner) { + /* + * Ensure we emit the owner->on_cpu, dereference _after_ + * checking lock->owner still matches owner. If that fails, + * owner might point to freed memory. If it still matches, + * the rcu_read_lock() ensures the memory stays valid. + */ + barrier(); + + if (!owner->on_cpu || need_resched()) { + ret = false; + break; + } + + cpu_relax_lowlatency(); + } + rcu_read_unlock(); + + return ret; +} + +/* + * Initial check for entering the mutex spinning loop. + * If no owner, the lock is free. + */ +static inline bool rt_mutex_can_spin_on_owner(struct rt_mutex *lock) +{ + struct task_struct *owner; + int ret = true; + + if (need_resched()) + return 0; + + rcu_read_lock(); + owner = rt_mutex_owner(lock); + if (owner) + ret = owner->on_cpu; + rcu_read_unlock(); + + return ret; +} + +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) +{ + bool taken = false; + + preempt_disable(); + + if (!rt_mutex_can_spin_on_owner(lock)) + goto done; + + while (true) { + struct task_struct *owner; + + /* + * When another task is competing for the lock in the + * slowpath avoid the cmpxchg and immediately block. + * + * This is a lockless check against the waiters rb + * root, meaning that only _real_ waiters are + * acknowledged. The sometimes-phony "has waiters" + * bit is only set unconditionally, in the slowpath + * acquire path, to sync up with any owner releasing + * the lock. + */ + if (rt_mutex_has_waiters(lock)) + goto done; + + /* + * If there's an owner, wait for it to either + * release the lock or go to sleep. + */ + owner = rt_mutex_owner(lock); + if (owner && !rt_mutex_spin_on_owner(lock, owner)) + break; + + /* Try to acquire the lock, if it is unlocked. */ + if (rt_mutex_cmpxchg(lock, NULL, current)) { + taken = true; + goto done; + } + + /* + * The cpu_relax() call is a compiler barrier which forces + * everything in this loop to be re-loaded. We don't need + * memory barriers as we'll eventually observe the right + * values at the cost of a few extra spins. + */ + cpu_relax_lowlatency(); + } + +done: + preempt_enable(); + return taken; +} + +#else +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) +{ + return false; +} +#endif + /* * Slow path lock function: */ @@ -1161,6 +1271,9 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state, struct rt_mutex_waiter waiter; int ret = 0; + if (rt_mutex_optimistic_spin(lock)) + return ret; + debug_rt_mutex_init_waiter(&waiter); RB_CLEAR_NODE(&waiter.pi_tree_entry); RB_CLEAR_NODE(&waiter.tree_entry); @@ -1524,7 +1637,6 @@ void __rt_mutex_init(struct rt_mutex *lock, const char *name) raw_spin_lock_init(&lock->wait_lock); lock->waiters = RB_ROOT; lock->waiters_leftmost = NULL; - debug_rt_mutex_init(lock, name); } EXPORT_SYMBOL_GPL(__rt_mutex_init); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/