Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755069Ab0LTJdV (ORCPT ); Mon, 20 Dec 2010 04:33:21 -0500 Received: from mailout-de.gmx.net ([213.165.64.23]:52400 "HELO mail.gmx.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with SMTP id S1754947Ab0LTJdT (ORCPT ); Mon, 20 Dec 2010 04:33:19 -0500 X-Authenticated: #14349625 X-Provags-ID: V01U2FsdGVkX1+9BZVja129RQrfRzcdvghhJkMIswS3NGt4XCacn8 OlQBb4TOr4Drc8 Subject: Re: [RFC -v2 PATCH 2/3] sched: add yield_to function From: Mike Galbraith To: Avi Kivity Cc: Rik van Riel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Srivatsa Vaddagiri , Peter Zijlstra , Chris Wright In-Reply-To: <4D0F1BD8.20601@redhat.com> References: <20101213224434.7495edb2@annuminas.surriel.com> <20101213224657.7e141746@annuminas.surriel.com> <1292306896.7448.157.camel@marge.simson.net> <4D0A6D34.6070806@redhat.com> <1292569018.7772.75.camel@marge.simson.net> <4D0B7D24.5060207@redhat.com> <1292615509.7381.81.camel@marge.simson.net> <4D0CE937.8090601@redhat.com> <1292699204.1181.51.camel@marge.simson.net> <4D0DA45A.9070600@redhat.com> <1292753156.16367.104.camel@marge.simson.net> <4D0DCE10.7000200@redhat.com> <1292834372.8948.27.camel@marge.simson.net> <4D0F1794.3010803@redhat.com> <1292835302.8948.35.camel@marge.simson.net> <4D0F1BD8.20601@redhat.com> Content-Type: text/plain; charset="UTF-8" Date: Mon, 20 Dec 2010 10:30:40 +0100 Message-ID: <1292837440.8948.60.camel@marge.simson.net> Mime-Version: 1.0 X-Mailer: Evolution 2.30.1.2 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4520 Lines: 96 On Mon, 2010-12-20 at 11:03 +0200, Avi Kivity wrote: > On 12/20/2010 10:55 AM, Mike Galbraith wrote: > > > > > > > > > > I don't want it to run now. I want it to run before some other task. I > > > > > don't care if N other tasks run before both. So no godlike powers > > > > > needed, simply a courteous "after you". > > > > > > > > Ponders that... > > > > > > > > What if: we test that both tasks are in the same thread group, if so, > > > > > > In my use case, both tasks are in the same thread group, so that works. > > > I don't see why the requirement is needed though. > > > > Because preempting a perfect stranger is not courteous, all tasks have > > to play nice. > > I don't want to preempt anybody, simply make the task run before me. I thought you wanted to get the target to the cpu asap? You just can't have he runs before me cross cpu. > Further, this is a kernel internal API, so no need for these types of > restrictions. If we expose it to userspace, sure. Doesn't matter whether it's kernel or not afaikt. If virtualization has to coexist peacefully with other loads, it can't just say "my hints are the only ones that count", and thus shred other loads throughput. > > > > use cfs_rq->next to pass the scheduler a HINT of what you would LIKE to > > > > happen. > > > > > > Hint is fine, so long as the scheduler seriously considers it. > > > > It will take the hint if the target the target hasn't had too much cpu. > > Since I'm running and the target isn't, it's clear the scheduler thinks > the target had more cpu than I did [73]. That's why I want to donate > cpu time. That's not necessarily true, in fact, it's very often false. Last/next buddy will allow a task to run ahead of leftmost so we don't always blindly select leftmost and shred cache. > [73] at least it'd be clear if the scheduler were globally fair. As it > is, I might be the only task running on my cpu, therefore in a cpu glut, > while the other task shares the cpu with some other task and is > currently waiting for its turn. > > > > > If the current task on that rq is also in your group, resched > > > > it, then IFF the task you would like to run isn't too far right, it'll > > > > be selected. If the current task is not one of yours, tough, you can > > > > set cfs_rq->next and hope it doesn't get overwritten, but you may not > > > > preempt a stranger. If you happen to be sharing an rq, cool, you > > > > accomplished your yield_to(). If not, there's no practical way (I can > > > > think of) to ensure that the target runs before you run again if you try > > > > to yield, but you did your best to try to get him to the cpu sooner, and > > > > in a manner that preserves fairness without dangerous vruntime diddling. > > > > > > > > Would that be good enough to stop (or seriously improve) cpu wastage? > > > > > > The cross-cpu limitation is bothersome. Since there are many cpus in > > > modern machines, particularly ones used for virt, the probability of the > > > two tasks being on the same cpu is quite low. > > > > What would you suggest? There is no global execution timeline, so if > > you want to definitely run after this task, you're stuck with moving to > > his timezone or moving him to yours. Well, you could sleep a while, but > > we know how productive sleeping is. > > I don't know. The whole idea of donating runtime was predicated on CFS > being completely fair. Now I find that (a) it isn't (b) donating > runtimes between tasks on different cpus is problematic. True and true. However, would you _want_ the scheduler to hold runnable tasks hostage, and thus let CPU go to waste in the name of perfect fairness? Perfect is the enemy of good applies to that idea imho. > Moving tasks between cpus is expensive and sometimes prohibited by > pinning. I'd like to avoid it if possible, but it's better than nothing. Expensive in many ways, so let's try to not do that. So why do you need this other task to run before you do, even cross cpu? If he's a lock holder, getting him to the cpu will give him a chance to drop, no? Isn't that what you want to get done? Drop that lock so you or someone else can get something other than spinning done? -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/