Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758794AbZAHJ7m (ORCPT ); Thu, 8 Jan 2009 04:59:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754394AbZAHJ73 (ORCPT ); Thu, 8 Jan 2009 04:59:29 -0500 Received: from casper.infradead.org ([85.118.1.10]:53678 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751178AbZAHJ7Z (ORCPT ); Thu, 8 Jan 2009 04:59:25 -0500 Subject: [PATCH -v6][RFC]: mutex: implement adaptive spinning From: Peter Zijlstra To: Linus Torvalds Cc: Steven Rostedt , paulmck@linux.vnet.ibm.com, Gregory Haskins , Ingo Molnar , Matthew Wilcox , Andi Kleen , Chris Mason , Andrew Morton , Linux Kernel Mailing List , linux-fsdevel , linux-btrfs , Thomas Gleixner , Nick Piggin , Peter Morreale , Sven Dietrich In-Reply-To: References: <87r63ljzox.fsf@basil.nowhere.org> <20090103191706.GA2002@parisc-linux.org> <4963584A.4090805@novell.com> <20090106131643.GA15228@elte.hu> <1231248041.11687.107.camel@twins> <49636799.1010109@novell.com> <20090106214229.GD6741@linux.vnet.ibm.com> <1231278275.11687.111.camel@twins> <1231279660.11687.121.camel@twins> <1231281801.11687.125.camel@twins> <1231283778.11687.136.camel@twins> <1231329783.11687.287.camel@twins> <1231347442.11687.344.camel@twins> <1231365115.11687.361.camel@twins> <1231366716.11687.377.camel@twins> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 08 Jan 2009 10:58:38 +0100 Message-Id: <1231408718.11687.400.camel@twins> Mime-Version: 1.0 X-Mailer: Evolution 2.24.2 X-Bad-Reply: References and In-Reply-To but no 'Re:' in Subject. Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13998 Lines: 479 On Wed, 2009-01-07 at 15:10 -0800, Linus Torvalds wrote: > Please take all my patches to be pseudo-code. They've neither been > compiled nor tested, and I'm just posting them in the hope that somebody > else will then do things in the direction I think is the proper one ;) Linux opteron 2.6.28-tip #585 SMP PREEMPT Thu Jan 8 10:38:09 CET 2009 x86_64 x86_64 x86_64 GNU/Linux [root@opteron bench]# echo NO_OWNER_SPIN > /debug/sched_features; ./timec -e -5,-4,-3,-2 ./test-mutex V 16 10 2 CPUs, running 16 parallel test-tasks. checking VFS performance. | loops/sec: 69415 avg ops/sec: 74996 average cost per op: 0.00 usecs average cost per lock: 0.00 usecs average cost per unlock: 0.00 usecs max cost per op: 0.00 usecs max cost per lock: 0.00 usecs max cost per unlock: 0.00 usecs average deviance per op: 0.00 usecs Performance counter stats for './test-mutex': 12098.324578 task clock ticks (msecs) 1081 CPU migrations (events) 7102 context switches (events) 2763 pagefaults (events) 12098.324578 task clock ticks (msecs) Wall-clock time elapsed: 12026.804839 msecs [root@opteron bench]# echo OWNER_SPIN > /debug/sched_features; ./timec -e -5,-4,-3,-2 ./test-mutex V 16 10 2 CPUs, running 16 parallel test-tasks. checking VFS performance. | loops/sec: 208147 avg ops/sec: 228126 average cost per op: 0.00 usecs average cost per lock: 0.00 usecs average cost per unlock: 0.00 usecs max cost per op: 0.00 usecs max cost per lock: 0.00 usecs max cost per unlock: 0.00 usecs average deviance per op: 0.00 usecs Performance counter stats for './test-mutex': 22280.283224 task clock ticks (msecs) 117 CPU migrations (events) 5711 context switches (events) 2781 pagefaults (events) 22280.283224 task clock ticks (msecs) Wall-clock time elapsed: 12307.053737 msecs * WOW * --- Subject: mutex: implement adaptive spin From: Peter Zijlstra Date: Thu Jan 08 09:41:22 CET 2009 Signed-off-by: Peter Zijlstra --- include/linux/mutex.h | 4 +- include/linux/sched.h | 1 kernel/mutex-debug.c | 7 ---- kernel/mutex-debug.h | 18 +++++----- kernel/mutex.c | 81 ++++++++++++++++++++++++++++++++++++++++++------ kernel/mutex.h | 22 +++++++++++-- kernel/sched.c | 63 +++++++++++++++++++++++++++++++++++++ kernel/sched_features.h | 1 8 files changed, 170 insertions(+), 27 deletions(-) Index: linux-2.6/include/linux/mutex.h =================================================================== --- linux-2.6.orig/include/linux/mutex.h +++ linux-2.6/include/linux/mutex.h @@ -50,8 +50,10 @@ struct mutex { atomic_t count; spinlock_t wait_lock; struct list_head wait_list; -#ifdef CONFIG_DEBUG_MUTEXES +#if defined(CONFIG_DEBUG_MUTEXES) || defined(CONFIG_SMP) struct thread_info *owner; +#endif +#ifdef CONFIG_DEBUG_MUTEXES const char *name; void *magic; #endif Index: linux-2.6/kernel/mutex-debug.c =================================================================== --- linux-2.6.orig/kernel/mutex-debug.c +++ linux-2.6/kernel/mutex-debug.c @@ -26,11 +26,6 @@ /* * Must be called with lock->wait_lock held. */ -void debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner) -{ - lock->owner = new_owner; -} - void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter) { memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter)); @@ -82,7 +77,6 @@ void debug_mutex_unlock(struct mutex *lo DEBUG_LOCKS_WARN_ON(lock->magic != lock); DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info()); DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next); - DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info()); } void debug_mutex_init(struct mutex *lock, const char *name, @@ -95,7 +89,6 @@ void debug_mutex_init(struct mutex *lock debug_check_no_locks_freed((void *)lock, sizeof(*lock)); lockdep_init_map(&lock->dep_map, name, key, 0); #endif - lock->owner = NULL; lock->magic = lock; } Index: linux-2.6/kernel/mutex-debug.h =================================================================== --- linux-2.6.orig/kernel/mutex-debug.h +++ linux-2.6/kernel/mutex-debug.h @@ -13,14 +13,6 @@ /* * This must be called with lock->wait_lock held. */ -extern void -debug_mutex_set_owner(struct mutex *lock, struct thread_info *new_owner); - -static inline void debug_mutex_clear_owner(struct mutex *lock) -{ - lock->owner = NULL; -} - extern void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter); extern void debug_mutex_wake_waiter(struct mutex *lock, @@ -35,6 +27,16 @@ extern void debug_mutex_unlock(struct mu extern void debug_mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key); +static inline void mutex_set_owner(struct mutex *lock) +{ + lock->owner = current_thread_info(); +} + +static incline void mutex_clear_owner(struct mutex *lock) +{ + lock->owner = NULL; +} + #define spin_lock_mutex(lock, flags) \ do { \ struct mutex *l = container_of(lock, struct mutex, wait_lock); \ Index: linux-2.6/kernel/mutex.c =================================================================== --- linux-2.6.orig/kernel/mutex.c +++ linux-2.6/kernel/mutex.c @@ -10,6 +10,11 @@ * Many thanks to Arjan van de Ven, Thomas Gleixner, Steven Rostedt and * David Howells for suggestions and improvements. * + * - Adaptive spinning for mutexes by Peter Zijlstra. (Ported to mainline + * from the -rt tree, where it was originally implemented for rtmutexes + * by Steven Rostedt, based on work by Gregory Haskins, Peter Morreale + * and Sven Dietrich. + * * Also see Documentation/mutex-design.txt. */ #include @@ -46,6 +51,7 @@ __mutex_init(struct mutex *lock, const c atomic_set(&lock->count, 1); spin_lock_init(&lock->wait_lock); INIT_LIST_HEAD(&lock->wait_list); + mutex_clear_owner(lock); debug_mutex_init(lock, name, key); } @@ -91,6 +97,7 @@ void inline __sched mutex_lock(struct mu * 'unlocked' into 'locked' state. */ __mutex_fastpath_lock(&lock->count, __mutex_lock_slowpath); + mutex_set_owner(lock); } EXPORT_SYMBOL(mutex_lock); @@ -115,11 +122,21 @@ void __sched mutex_unlock(struct mutex * * The unlocking fastpath is the 0->1 transition from 'locked' * into 'unlocked' state: */ +#ifndef CONFIG_DEBUG_MUTEXES + /* + * When debugging is enabled we must not clear the owner before time, + * the slow path will always be taken, and that clears the owner field + * after verifying that it was indeed current. + */ + mutex_clear_owner(lock); +#endif __mutex_fastpath_unlock(&lock->count, __mutex_unlock_slowpath); } EXPORT_SYMBOL(mutex_unlock); +#define MUTEX_SLEEPERS (-1000) + /* * Lock a mutex (possibly interruptible), slowpath: */ @@ -132,6 +149,34 @@ __mutex_lock_common(struct mutex *lock, unsigned int old_val; unsigned long flags; +#ifdef CONFIG_SMP + /* Optimistic spinning.. */ + for (;;) { + struct thread_info *owner; + int oldval = atomic_read(&lock->count); + + if (oldval <= MUTEX_SLEEPERS) + break; + if (oldval == 1) { + oldval = atomic_cmpxchg(&lock->count, oldval, 0); + if (oldval == 1) { + mutex_set_owner(lock); + return 0; + } + } else { + /* See who owns it, and spin on him if anybody */ + owner = ACCESS_ONCE(lock->owner); + if (owner && !spin_on_owner(lock, owner)) + break; + } + + if (need_resched()) + break; + + cpu_relax(); + } +#endif + spin_lock_mutex(&lock->wait_lock, flags); debug_mutex_lock_common(lock, &waiter); @@ -142,7 +187,7 @@ __mutex_lock_common(struct mutex *lock, list_add_tail(&waiter.list, &lock->wait_list); waiter.task = task; - old_val = atomic_xchg(&lock->count, -1); + old_val = atomic_xchg(&lock->count, MUTEX_SLEEPERS); if (old_val == 1) goto done; @@ -158,7 +203,7 @@ __mutex_lock_common(struct mutex *lock, * that when we release the lock, we properly wake up the * other waiters: */ - old_val = atomic_xchg(&lock->count, -1); + old_val = atomic_xchg(&lock->count, MUTEX_SLEEPERS); if (old_val == 1) break; @@ -187,7 +232,7 @@ done: lock_acquired(&lock->dep_map, ip); /* got the lock - rejoice! */ mutex_remove_waiter(lock, &waiter, task_thread_info(task)); - debug_mutex_set_owner(lock, task_thread_info(task)); + mutex_set_owner(lock); /* set it to 0 if there are no waiters left: */ if (likely(list_empty(&lock->wait_list))) @@ -260,7 +305,7 @@ __mutex_unlock_common_slowpath(atomic_t wake_up_process(waiter->task); } - debug_mutex_clear_owner(lock); + mutex_clear_owner(lock); spin_unlock_mutex(&lock->wait_lock, flags); } @@ -298,18 +343,30 @@ __mutex_lock_interruptible_slowpath(atom */ int __sched mutex_lock_interruptible(struct mutex *lock) { + int ret; + might_sleep(); - return __mutex_fastpath_lock_retval + ret = __mutex_fastpath_lock_retval (&lock->count, __mutex_lock_interruptible_slowpath); + if (!ret) + mutex_set_owner(lock); + + return ret; } EXPORT_SYMBOL(mutex_lock_interruptible); int __sched mutex_lock_killable(struct mutex *lock) { + int ret; + might_sleep(); - return __mutex_fastpath_lock_retval + ret = __mutex_fastpath_lock_retval (&lock->count, __mutex_lock_killable_slowpath); + if (!ret) + mutex_set_owner(lock); + + return ret; } EXPORT_SYMBOL(mutex_lock_killable); @@ -352,9 +409,10 @@ static inline int __mutex_trylock_slowpa prev = atomic_xchg(&lock->count, -1); if (likely(prev == 1)) { - debug_mutex_set_owner(lock, current_thread_info()); + mutex_set_owner(lock); mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_); } + /* Set it back to 0 if there are no waiters: */ if (likely(list_empty(&lock->wait_list))) atomic_set(&lock->count, 0); @@ -380,8 +438,13 @@ static inline int __mutex_trylock_slowpa */ int __sched mutex_trylock(struct mutex *lock) { - return __mutex_fastpath_trylock(&lock->count, - __mutex_trylock_slowpath); + int ret; + + ret = __mutex_fastpath_trylock(&lock->count, __mutex_trylock_slowpath); + if (ret) + mutex_set_owner(lock); + + return ret; } EXPORT_SYMBOL(mutex_trylock); Index: linux-2.6/kernel/mutex.h =================================================================== --- linux-2.6.orig/kernel/mutex.h +++ linux-2.6/kernel/mutex.h @@ -16,8 +16,26 @@ #define mutex_remove_waiter(lock, waiter, ti) \ __list_del((waiter)->list.prev, (waiter)->list.next) -#define debug_mutex_set_owner(lock, new_owner) do { } while (0) -#define debug_mutex_clear_owner(lock) do { } while (0) +#ifdef CONFIG_SMP +static inline void mutex_set_owner(struct mutex *lock) +{ + lock->owner = current_thread_info(); +} + +static inline void mutex_clear_owner(struct mutex *lock) +{ + lock->owner = NULL; +} +#else +static inline void mutex_set_owner(struct mutex *lock) +{ +} + +static inline void mutex_clear_owner(struct mutex *lock) +{ +} +#endif + #define debug_mutex_wake_waiter(lock, waiter) do { } while (0) #define debug_mutex_free_waiter(waiter) do { } while (0) #define debug_mutex_add_waiter(lock, waiter, ti) do { } while (0) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -4672,6 +4672,69 @@ need_resched_nonpreemptible: } EXPORT_SYMBOL(schedule); +#ifdef CONFIG_SMP +/* + * Look out! "owner" is an entirely speculative pointer + * access and not reliable. + */ +int spin_on_owner(struct mutex *lock, struct thread_info *owner) +{ + unsigned int cpu; + struct rq *rq; + + if (unlikely(!sched_feat(OWNER_SPIN))) + return 0; + + preempt_disable(); +#ifdef CONFIG_DEBUG_PAGEALLOC + /* + * Need to access the cpu field knowing that + * DEBUG_PAGEALLOC could have unmapped it if + * the mutex owner just released it and exited. + */ + if (probe_kernel_address(&owner->cpu, cpu)) + goto out; +#else + cpu = owner->cpu; +#endif + + /* + * Even if the access succeeded (likely case), + * the cpu field may no longer be valid. + */ + if (cpu >= nr_cpumask_bits) + goto out; + + /* + * We need to validate that we can do a + * get_cpu() and that we have the percpu area. + */ + if (!cpu_online(cpu)) + goto out; + + rq = cpu_rq(cpu); + + for (;;) { + if (lock->owner != owner) + break; + + /* + * Is that owner really running on that cpu? + */ + if (task_thread_info(rq->curr) != owner) + break; + + if (need_resched()) + break; + + cpu_relax(); + } +out: + preempt_enable_no_resched(); + return 1; +} +#endif + #ifdef CONFIG_PREEMPT /* * this is the entry point to schedule() from in-kernel preemption Index: linux-2.6/include/linux/sched.h =================================================================== --- linux-2.6.orig/include/linux/sched.h +++ linux-2.6/include/linux/sched.h @@ -330,6 +330,7 @@ extern signed long schedule_timeout_inte extern signed long schedule_timeout_killable(signed long timeout); extern signed long schedule_timeout_uninterruptible(signed long timeout); asmlinkage void schedule(void); +extern int spin_on_owner(struct mutex *lock, struct thread_info *owner); struct nsproxy; struct user_namespace; Index: linux-2.6/kernel/sched_features.h =================================================================== --- linux-2.6.orig/kernel/sched_features.h +++ linux-2.6/kernel/sched_features.h @@ -13,3 +13,4 @@ SCHED_FEAT(LB_WAKEUP_UPDATE, 1) SCHED_FEAT(ASYM_EFF_LOAD, 1) SCHED_FEAT(WAKEUP_OVERLAP, 0) SCHED_FEAT(LAST_BUDDY, 1) +SCHED_FEAT(OWNER_SPIN, 1) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/