Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754155AbZAFRhc (ORCPT ); Tue, 6 Jan 2009 12:37:32 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752114AbZAFRhP (ORCPT ); Tue, 6 Jan 2009 12:37:15 -0500 Received: from hrndva-omtalb.mail.rr.com ([71.74.56.122]:65504 "EHLO hrndva-omtalb.mail.rr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752030AbZAFRhN (ORCPT ); Tue, 6 Jan 2009 12:37:13 -0500 Date: Tue, 6 Jan 2009 12:37:10 -0500 (EST) From: Steven Rostedt X-X-Sender: rostedt@gandalf.stny.rr.com To: Andrew Morton cc: Peter Zijlstra , Matthew Wilcox , Andi Kleen , Chris Mason , linux-kernel@vger.kernel.org, linux-fsdevel , linux-btrfs , Ingo Molnar , Thomas Gleixner , Gregory Haskins , Nick Piggin , Linus Torvalds Subject: Re: [PATCH][RFC]: mutex: adaptive spin In-Reply-To: <20090106090833.78482f5d.akpm@linux-foundation.org> Message-ID: References: <1230722935.4680.5.camel@think.oraclecorp.com> <20081231104533.abfb1cf9.akpm@linux-foundation.org> <1230765549.7538.8.camel@think.oraclecorp.com> <87r63ljzox.fsf@basil.nowhere.org> <20090103191706.GA2002@parisc-linux.org> <1231093310.27690.5.camel@twins> <20090104184103.GE2002@parisc-linux.org> <1231242031.11687.97.camel@twins> <20090106090833.78482f5d.akpm@linux-foundation.org> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5927 Lines: 189 On Tue, 6 Jan 2009, Andrew Morton wrote: > On Tue, 06 Jan 2009 12:40:31 +0100 Peter Zijlstra wrote: > > > Subject: mutex: adaptive spin > > From: Peter Zijlstra > > Date: Tue Jan 06 12:32:12 CET 2009 > > > > Based on the code in -rt, provide adaptive spins on generic mutexes. > > > > How dumb is it to send a lump of uncommented, changelogged code as an > rfc, only to have Ingo reply, providing the changelog for you? Sounds smart to me. He was able to get someone else to do the work. ;-) > > Sigh. > > > --- linux-2.6.orig/kernel/mutex.c > > +++ linux-2.6/kernel/mutex.c > > @@ -46,6 +46,7 @@ __mutex_init(struct mutex *lock, const c > > atomic_set(&lock->count, 1); > > spin_lock_init(&lock->wait_lock); > > INIT_LIST_HEAD(&lock->wait_list); > > + lock->owner = NULL; > > > > debug_mutex_init(lock, name, key); > > } > > @@ -120,6 +121,28 @@ void __sched mutex_unlock(struct mutex * > > > > EXPORT_SYMBOL(mutex_unlock); > > > > +#ifdef CONFIG_SMP > > +static int adaptive_wait(struct mutex_waiter *waiter, > > + struct task_struct *owner, long state) > > +{ > > + for (;;) { > > + if (signal_pending_state(state, waiter->task)) > > + return 0; > > + if (waiter->lock->owner != owner) > > + return 0; > > + if (!task_is_current(owner)) > > + return 1; > > + cpu_relax(); > > + } > > +} > > Each of the tests in this function should be carefully commented. It's > really the core piece of the design. Yep, I agree here too. > > > +#else > > +static int adaptive_wait(struct mutex_waiter *waiter, > > + struct task_struct *owner, long state) > > +{ > > + return 1; > > +} > > +#endif > > + > > /* > > * Lock a mutex (possibly interruptible), slowpath: > > */ > > @@ -127,7 +150,7 @@ static inline int __sched > > __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > > unsigned long ip) > > { > > - struct task_struct *task = current; > > + struct task_struct *owner, *task = current; > > struct mutex_waiter waiter; > > unsigned int old_val; > > unsigned long flags; > > @@ -141,6 +164,7 @@ __mutex_lock_common(struct mutex *lock, > > /* add waiting tasks to the end of the waitqueue (FIFO): */ > > list_add_tail(&waiter.list, &lock->wait_list); > > waiter.task = task; > > + waiter.lock = lock; > > > > old_val = atomic_xchg(&lock->count, -1); > > if (old_val == 1) > > @@ -175,11 +199,19 @@ __mutex_lock_common(struct mutex *lock, > > debug_mutex_free_waiter(&waiter); > > return -EINTR; > > } > > - __set_task_state(task, state); > > > > - /* didnt get the lock, go to sleep: */ > > + owner = lock->owner; > > What prevents *owner from exitting right here? Yeah, this should be commented. Why this is not a race is because the owner has the mutex here, and we have the lock->wait_lock. When the owner releases the mutex it must go into the slow unlock path and grab the lock->wait_lock spinlock. Thus it will block and not go away. Of course if there's a bug elsewhere in the kernel that lets the task exit without releasing the mutex, this will fail. But in that case, there's bigger problems that need to be fixed. > > > + get_task_struct(owner); > > spin_unlock_mutex(&lock->wait_lock, flags); Here, we get the owner before releasing the wait_lock. > > - schedule(); > > + > > + if (adaptive_wait(&waiter, owner, state)) { > > + put_task_struct(owner); > > + __set_task_state(task, state); > > + /* didnt get the lock, go to sleep: */ > > + schedule(); > > + } else > > + put_task_struct(owner); > > + > > spin_lock_mutex(&lock->wait_lock, flags); > > } > > > > @@ -187,7 +219,7 @@ done: > > lock_acquired(&lock->dep_map, ip); > > /* got the lock - rejoice! */ > > mutex_remove_waiter(lock, &waiter, task_thread_info(task)); > > - debug_mutex_set_owner(lock, task_thread_info(task)); > > + lock->owner = task; > > > > /* set it to 0 if there are no waiters left: */ > > if (likely(list_empty(&lock->wait_list))) > > @@ -260,7 +292,7 @@ __mutex_unlock_common_slowpath(atomic_t > > wake_up_process(waiter->task); > > } > > > > - debug_mutex_clear_owner(lock); > > + lock->owner = NULL; > > > > spin_unlock_mutex(&lock->wait_lock, flags); > > } > > @@ -352,7 +384,7 @@ static inline int __mutex_trylock_slowpa > > > > prev = atomic_xchg(&lock->count, -1); > > if (likely(prev == 1)) { > > - debug_mutex_set_owner(lock, current_thread_info()); > > + lock->owner = current; > > mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_); > > } > > /* Set it back to 0 if there are no waiters: */ > > Index: linux-2.6/kernel/sched.c > > =================================================================== > > --- linux-2.6.orig/kernel/sched.c > > +++ linux-2.6/kernel/sched.c > > @@ -697,6 +697,11 @@ int runqueue_is_locked(void) > > return ret; > > } > > > > +int task_is_current(struct task_struct *p) > > +{ > > + return task_rq(p)->curr == p; > > +} > > Please don't add kernel-wide infrastructure and leave it completely > undocumented. Particularly functions which are as vague and dangerous > as this one. > > What locking must the caller provide? What are the semantics of the > return value? > > What must the caller do to avoid oopses if *p is concurrently exiting? > > etc. > > > The overall design intent seems very smart to me, as long as the races > can be plugged, if they're indeed present. Thanks for the complement on the design. It came about lots of work by many people in the -rt tree. What we need here is to comment the code for those that did not see the evolution of the changes that were made. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/