Date: Mon, 3 Dec 2007 12:37:16 +0100
From: Ingo Molnar <mingo@elte.hu>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>,
       Arjan van de Ven <arjan@infradead.org>,
       Andrew Morton <akpm@linux-foundation.org>,
       LKML <linux-kernel@vger.kernel.org>
Subject: Re: sched_yield: delete sysctl_sched_compat_yield
Message-ID: <20071203113716.GA8432@elte.hu>
References: <1196155985.25646.31.camel@ymzhang> <200712032115.43275.nickpiggin@yahoo.com.au> <20071203103326.GD30050@elte.hu> <200712032202.37975.nickpiggin@yahoo.com.au>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200712032202.37975.nickpiggin@yahoo.com.au>
User-Agent: Mutt/1.5.17 (2007-11-01)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3416
Lines: 73


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > given how poorly sched_yield() is/was defined the only "compatible" 
> > solution would be to go back to the old yield code.
> 
> While it is technically allowed to do anything with SCHED_OTHER class, 
> putting the thread to the back of the runnable tasks, or at least 
> having it give up _some_ priority (like the old scheduler) is less 
> surprising than having it do _nothing_.

wrong: it's not "nothing" that the new code does - run two yield-ing 
loops and they'll happily switch to each other, at a rate of a few 
million context switches per second.

( Note that the old scheduler's yield code did not actually change a 
  task's priority - so if an interactive task (such as firefox) yielded, 
  it got different behavior than CPU hogs. )

> Wheras JVMs (eg. that have garbage collectors call yield), presumably 
> get quite a lot of tuning, and that was probably done with the less 
> surprising (and more common) sched_yield behaviour.

i disagree. To some of them, having a _more_ agressive yield than 2.6.22 
might increase latencies and jitter - which can be seen as a regression 
as well. All tests i've seen so far show dramatically lower jitter in 
v2.6.23 and upwards kernels.

anyway, right now what we have is a closed-source benchmark (which is a 
quite silly one as well) against a popular open-source desktop app and 
in that case the choice is obvious. Actual Java app server benchmarks 
did not show any regression so maybe Java's use of yield for locking is 
not that significant after all and it's only Volanomark that is doing 
extra (unnecessary) yields. (and java benchmarks are part of the 
upstream kernel test grid anyway so we'd have noticed any serious 
regression)

if you insist on flipping the default then that just shows a blatant 
disregard to desktop performance - i personally care quite a bit about 
desktop performance. (and deterministic scheduling in particular)

> > i think the sanest long-term solution is to strongly discourage the 
> > use of SCHED_OTHER::yield, because there's just no sane definition 
> > for yield that apps could rely upon. (well Linus suggested a pretty 
> > sane definition but that would necessiate the burdening of the 
> > scheduler fastpath - we dont want to do that.) New ideas are welcome 
> > of course.
> 
> sched_yield is defined to put the calling task at the end of the queue 
> for the given priority level as you know (ie. at the end of all other 
> priority 0 tasks, for SCHED_OTHER).

almost: substitute "priority" with "nice level". One problem is, that's 
not what the old scheduler did.

> > [ also, actual technical feedback on the SCHED_BATCH patch i sent
> >   (which was the only "forward looking" moment in this thread so far
> >    ;-) would be nice too. ]
> 
> I dislike a wholesale change in behaviour like that. Especially when 
> it is changing behaviour of yield among SCHED_BATCH tasks versus yield 
> among SCHED_OTHER tasks.

There's no wholesale change in behavior, SCHED_BATCH tasks have clear 
expectations of being throughput versus latency, hence the patch makes 
quite a bit of sense to me. YMMV.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/