Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755103AbbDTNz6 (ORCPT ); Mon, 20 Apr 2015 09:55:58 -0400 Received: from cantor2.suse.de ([195.135.220.15]:35487 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752132AbbDTNzy (ORCPT ); Mon, 20 Apr 2015 09:55:54 -0400 Message-ID: <1429538139.2042.10.camel@stgolabs.net> Subject: Re: [PATCH 2/2] futex: lockless wakeups From: Davidlohr Bueso To: Ingo Molnar Cc: Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Sebastian Andrzej Siewior , Linus Torvalds , Chris Mason , Steven Rostedt , fredrik.markstrom@windriver.com, linux-kernel@vger.kernel.org, Andrew Morton Date: Mon, 20 Apr 2015 06:55:39 -0700 In-Reply-To: <20150420061836.GA11191@gmail.com> References: <1429471060-21271-1-git-send-email-dave@stgolabs.net> <1429471060-21271-3-git-send-email-dave@stgolabs.net> <20150420061836.GA11191@gmail.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.11 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3811 Lines: 110 On Mon, 2015-04-20 at 08:18 +0200, Ingo Molnar wrote: > Please write a small description we can cite to driver authors once > the (inevitable) breakages appear, outlining this new behavior and its > implications, so that we can fix any remaining bugs ASAP. I wouldn't call this new behavior, simply because changing a critical region should not be labeled as such imho. Other than asking driver authors to put their schedule() in a loop to confirm that the expected condition has in fact occurred, I'm not sure what else we can ask them to do -- as you know, this is not just about futexes. > I'll also let this pending a bit longer than other changes, to make > sure we shake out any bugs/regressions triggered by this change. > > Third, it might make sense to add a new 'spurious wakeup injection > debug mechanism' that, if enabled in the .config, automatically and > continuously inserts spurious wakeups at a given, slightly randomized > rate - which would ensure that all kernel facilities can robustly > handle spurious wakeups. I have been using this from Peter to test against: diff --git a/include/linux/sched.h b/include/linux/sched.h index 6d77432..fdf1f68 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -214,9 +214,10 @@ print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq); #define TASK_WAKEKILL 128 #define TASK_WAKING 256 #define TASK_PARKED 512 -#define TASK_STATE_MAX 1024 +#define TASK_YIELD 1024 +#define TASK_STATE_MAX 2048 -#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWP" +#define TASK_STATE_TO_CHAR_STR "RSDTtXZxKWPY" extern char ___assert_task_state[1 - 2*!!( sizeof(TASK_STATE_TO_CHAR_STR)-1 != ilog2(TASK_STATE_MAX)+1)]; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f0f831e..2c938ae 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1005,7 +1005,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) * ttwu() will sort out the placement. */ WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING && - !p->on_rq); + !p->on_rq && !(p->state & TASK_YIELD)); #ifdef CONFIG_LOCKDEP /* @@ -2743,6 +2743,23 @@ static void __sched __schedule(void) if (unlikely(signal_pending_state(prev->state, prev))) { prev->state = TASK_RUNNING; } else { + + /* + * Provide an auto-yield feature on schedule(). + * + * The thought is to avoid a sleep+wakeup cycle + * if simply yielding the cpu will suffice to + * satisfy the required condition. + * + * Assumes the calling schedule() site can deal + * with spurious wakeups. + */ + if (prev->state & TASK_YIELD) { + prev->state &= ~TASK_YIELD; + if (rq->nr_running > 1) + goto no_deactivate; + } + deactivate_task(rq, prev, DEQUEUE_SLEEP); prev->on_rq = 0; @@ -2759,6 +2776,7 @@ static void __sched __schedule(void) try_to_wake_up_local(to_wakeup); } } + no_deactivate: switch_count = &prev->nvcsw; } > My guess would be that most remaining fragilities against spurious > wakeups ought to be in the boot/init phase, so I'd keep an eye out for > suspend/resume regressions. Correct, which is why I'm not that concerned anymore about spurious wakups, in fact that code above now boots and handles correctly on rather large systems. > > > [...] However there is core code that cannot handle them afaict, and > > furthermore tglx does have the point that other events can already > > trigger them anyway. > > s/there is core code/there is no core code heh yes. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/