Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755165AbbBFQkD (ORCPT ); Fri, 6 Feb 2015 11:40:03 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:53309 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751485AbbBFQkA (ORCPT ); Fri, 6 Feb 2015 11:40:00 -0500 Date: Fri, 6 Feb 2015 17:39:47 +0100 From: Peter Zijlstra To: Linus Torvalds Cc: Bruno =?iso-8859-1?Q?Pr=E9mont?= , "Rafael J. Wysocki" , Ingo Molnar , Peter Hurley , Davidlohr Bueso , Linux Kernel Mailing List , Thomas Gleixner , Ilya Dryomov , Mike Galbraith , Oleg Nesterov Subject: Re: Linux 3.19-rc5 Message-ID: <20150206163947.GR21418@twins.programming.kicks-ass.net> References: <1421878320.4903.17.camel@stgolabs.net> <54C02E08.4080405@hurleysoftware.com> <1861286.x5DC37NGWz@vostro.rjw.lan> <20150205221436.01bc24f2@neptune.home> <20150206115051.GL23123@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4389 Lines: 132 On Fri, Feb 06, 2015 at 08:02:41AM -0800, Linus Torvalds wrote: > On Fri, Feb 6, 2015 at 3:50 AM, Peter Zijlstra wrote: > > > > Also, set_current_state(TASK_RUNNING) is almost always pointless, nobody > > cares about that barrier, so make it go away. > > I'd rather not mix this with the patch, and wonder if we should just > do that globally with some preprocessor magic. We do have a fair > number of "set_current_state(TASK_RUNNING)" 138 > and at least for the > *documented* reason for the memory barrier, all of them could/should > be barrier-less. > > So something like > > if (__is_constant_p(state) && state == TASK_RUNNING) > tsk->state = state; > else > set_mb(tsk->state, state); > > might be more general solution than randomly doing one at a time when > changing code around it.. Yeah, or we could do the coccinelle thing and do a mass conversion. I like the macro one though; I worry a wee bit about non-documented cases through. If someone is doing something way subtle we'll break it :/ --- Subject: sched: Avoid the full memory-barrier for set_current_state(TASK_RUNNING) One should never need the full memory barrier implied by set_current_state() to set TASK_RUNNING for the documented reason of avoiding races against wakeup. Suggested-by: Linus Torvalds Signed-off-by: Peter Zijlstra (Intel) --- diff --git a/include/linux/sched.h b/include/linux/sched.h index 8db31ef98d2f..aea44c4eeed8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -243,6 +243,27 @@ extern char ___assert_task_state[1 - 2*!!( ((task->state & TASK_UNINTERRUPTIBLE) != 0 && \ (task->flags & PF_FROZEN) == 0) +/* + * set_current_state() includes a barrier so that the write of current->state + * is correctly serialised wrt the caller's subsequent test of whether to + * actually sleep: + * + * set_current_state(TASK_UNINTERRUPTIBLE); + * if (do_i_need_to_sleep()) + * schedule(); + * + * If the caller does not need such serialisation then use + * __set_current_state(). This is always true for TASK_RUNNING since + * there is no race against wakeup -- both write the same value. + */ +#define ___set_current_state(state) \ +do { \ + if (__is_constant_p(state) && (state) == TASK_RUNNING) \ + current->state = (state); \ + else \ + set_mb(current->state, (state)); \ +} while (0) + #ifdef CONFIG_DEBUG_ATOMIC_SLEEP #define __set_task_state(tsk, state_value) \ @@ -256,17 +277,6 @@ extern char ___assert_task_state[1 - 2*!!( set_mb((tsk)->state, (state_value)); \ } while (0) -/* - * set_current_state() includes a barrier so that the write of current->state - * is correctly serialised wrt the caller's subsequent test of whether to - * actually sleep: - * - * set_current_state(TASK_UNINTERRUPTIBLE); - * if (do_i_need_to_sleep()) - * schedule(); - * - * If the caller does not need such serialisation then use __set_current_state() - */ #define __set_current_state(state_value) \ do { \ current->task_state_change = _THIS_IP_; \ @@ -275,7 +285,7 @@ extern char ___assert_task_state[1 - 2*!!( #define set_current_state(state_value) \ do { \ current->task_state_change = _THIS_IP_; \ - set_mb(current->state, (state_value)); \ + ___set_current_state(state_value); \ } while (0) #else @@ -285,21 +295,10 @@ extern char ___assert_task_state[1 - 2*!!( #define set_task_state(tsk, state_value) \ set_mb((tsk)->state, (state_value)) -/* - * set_current_state() includes a barrier so that the write of current->state - * is correctly serialised wrt the caller's subsequent test of whether to - * actually sleep: - * - * set_current_state(TASK_UNINTERRUPTIBLE); - * if (do_i_need_to_sleep()) - * schedule(); - * - * If the caller does not need such serialisation then use __set_current_state() - */ #define __set_current_state(state_value) \ do { current->state = (state_value); } while (0) #define set_current_state(state_value) \ - set_mb(current->state, (state_value)) + ___set_current_state(state_value); #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/