Date: Tue, 6 Jan 2009 09:08:33 -0800
From: Andrew Morton <akpm@linux-foundation.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Matthew Wilcox <matthew@wil.cx>, Andi Kleen <andi@firstfloor.org>,
       Chris Mason <chris.mason@oracle.com>, linux-kernel@vger.kernel.org,
       linux-fsdevel <linux-fsdevel@vger.kernel.org>,
       linux-btrfs <linux-btrfs@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
       Thomas Gleixner <tglx@linutronix.de>,
       Steven Rostedt <rostedt@goodmis.org>,
       Gregory Haskins <ghaskins@novell.com>, Nick Piggin <npiggin@suse.de>,
       Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH][RFC]: mutex: adaptive spin
Message-Id: <20090106090833.78482f5d.akpm@linux-foundation.org>
In-Reply-To: <1231242031.11687.97.camel@twins>
References: <1230722935.4680.5.camel@think.oraclecorp.com>
	<20081231104533.abfb1cf9.akpm@linux-foundation.org>
	<1230765549.7538.8.camel@think.oraclecorp.com>
	<87r63ljzox.fsf@basil.nowhere.org>
	<20090103191706.GA2002@parisc-linux.org>
	<1231093310.27690.5.camel@twins>
	<20090104184103.GE2002@parisc-linux.org>
	<1231242031.11687.97.camel@twins>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4728
Lines: 160

On Tue, 06 Jan 2009 12:40:31 +0100 Peter Zijlstra <peterz@infradead.org> wrote:

> Subject: mutex: adaptive spin
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Tue Jan 06 12:32:12 CET 2009
> 
> Based on the code in -rt, provide adaptive spins on generic mutexes.
> 

How dumb is it to send a lump of uncommented, changelogged code as an
rfc, only to have Ingo reply, providing the changelog for you?

Sigh.

> --- linux-2.6.orig/kernel/mutex.c
> +++ linux-2.6/kernel/mutex.c
> @@ -46,6 +46,7 @@ __mutex_init(struct mutex *lock, const c
>  	atomic_set(&lock->count, 1);
>  	spin_lock_init(&lock->wait_lock);
>  	INIT_LIST_HEAD(&lock->wait_list);
> +	lock->owner = NULL;
>  
>  	debug_mutex_init(lock, name, key);
>  }
> @@ -120,6 +121,28 @@ void __sched mutex_unlock(struct mutex *
>  
>  EXPORT_SYMBOL(mutex_unlock);
>  
> +#ifdef CONFIG_SMP
> +static int adaptive_wait(struct mutex_waiter *waiter,
> +			 struct task_struct *owner, long state)
> +{
> +	for (;;) {
> +		if (signal_pending_state(state, waiter->task))
> +			return 0;
> +		if (waiter->lock->owner != owner)
> +			return 0;
> +		if (!task_is_current(owner))
> +			return 1;
> +		cpu_relax();
> +	}
> +}

Each of the tests in this function should be carefully commented.  It's
really the core piece of the design.

> +#else
> +static int adaptive_wait(struct mutex_waiter *waiter,
> +			 struct task_struct *owner, long state)
> +{
> +	return 1;
> +}
> +#endif
> +
>  /*
>   * Lock a mutex (possibly interruptible), slowpath:
>   */
> @@ -127,7 +150,7 @@ static inline int __sched
>  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	       	unsigned long ip)
>  {
> -	struct task_struct *task = current;
> +	struct task_struct *owner, *task = current;
>  	struct mutex_waiter waiter;
>  	unsigned int old_val;
>  	unsigned long flags;
> @@ -141,6 +164,7 @@ __mutex_lock_common(struct mutex *lock, 
>  	/* add waiting tasks to the end of the waitqueue (FIFO): */
>  	list_add_tail(&waiter.list, &lock->wait_list);
>  	waiter.task = task;
> +	waiter.lock = lock;
>  
>  	old_val = atomic_xchg(&lock->count, -1);
>  	if (old_val == 1)
> @@ -175,11 +199,19 @@ __mutex_lock_common(struct mutex *lock, 
>  			debug_mutex_free_waiter(&waiter);
>  			return -EINTR;
>  		}
> -		__set_task_state(task, state);
>  
> -		/* didnt get the lock, go to sleep: */
> +		owner = lock->owner;

What prevents *owner from exitting right here?

> +		get_task_struct(owner);
>  		spin_unlock_mutex(&lock->wait_lock, flags);
> -		schedule();
> +
> +		if (adaptive_wait(&waiter, owner, state)) {
> +			put_task_struct(owner);
> +			__set_task_state(task, state);
> +			/* didnt get the lock, go to sleep: */
> +			schedule();
> +		} else
> +			put_task_struct(owner);
> +
>  		spin_lock_mutex(&lock->wait_lock, flags);
>  	}
>  
> @@ -187,7 +219,7 @@ done:
>  	lock_acquired(&lock->dep_map, ip);
>  	/* got the lock - rejoice! */
>  	mutex_remove_waiter(lock, &waiter, task_thread_info(task));
> -	debug_mutex_set_owner(lock, task_thread_info(task));
> +	lock->owner = task;
>  
>  	/* set it to 0 if there are no waiters left: */
>  	if (likely(list_empty(&lock->wait_list)))
> @@ -260,7 +292,7 @@ __mutex_unlock_common_slowpath(atomic_t 
>  		wake_up_process(waiter->task);
>  	}
>  
> -	debug_mutex_clear_owner(lock);
> +	lock->owner = NULL;
>  
>  	spin_unlock_mutex(&lock->wait_lock, flags);
>  }
> @@ -352,7 +384,7 @@ static inline int __mutex_trylock_slowpa
>  
>  	prev = atomic_xchg(&lock->count, -1);
>  	if (likely(prev == 1)) {
> -		debug_mutex_set_owner(lock, current_thread_info());
> +		lock->owner = current;
>  		mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_);
>  	}
>  	/* Set it back to 0 if there are no waiters: */
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -697,6 +697,11 @@ int runqueue_is_locked(void)
>  	return ret;
>  }
>  
> +int task_is_current(struct task_struct *p)
> +{
> +	return task_rq(p)->curr == p;
> +}

Please don't add kernel-wide infrastructure and leave it completely
undocumented.  Particularly functions which are as vague and dangerous
as this one.

What locking must the caller provide?  What are the semantics of the
return value?

What must the caller do to avoid oopses if *p is concurrently exiting?

etc.


The overall design intent seems very smart to me, as long as the races
can be plugged, if they're indeed present.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/