Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751758Ab2K1HI0 (ORCPT ); Wed, 28 Nov 2012 02:08:26 -0500 Received: from e28smtp08.in.ibm.com ([122.248.162.8]:41744 "EHLO e28smtp08.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751117Ab2K1HIZ (ORCPT ); Wed, 28 Nov 2012 02:08:25 -0500 Message-ID: <50B5B738.50700@linux.vnet.ibm.com> Date: Wed, 28 Nov 2012 12:33:20 +0530 From: Raghavendra K T Organization: IBM User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121029 Thunderbird/16.0.2 MIME-Version: 1.0 To: habanero@linux.vnet.ibm.com CC: Andrew Jones , Marcelo Tosatti , Gleb Natapov , Chegu Vinod , Peter Zijlstra , "H. Peter Anvin" , Ingo Molnar , Avi Kivity , Rik van Riel , Srikar , "Nikunj A. Dadhania" , KVM , Jiannan Ouyang , LKML , Srivatsa Vaddagiri Subject: Re: [PATCH V3 RFC 1/2] sched: Bail out of yield_to when source and target runqueue has one task References: <20121126120740.2595.33651.sendpatchset@codeblue> <20121126120754.2595.37316.sendpatchset@codeblue> <20121126133501.GA9830@turtle.usersys.redhat.com> <50B49658.7080507@linux.vnet.ibm.com> <1354025096.31820.886.camel@oc6622382223.ibm.com> In-Reply-To: <1354025096.31820.886.camel@oc6622382223.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit x-cbid: 12112807-2000-0000-0000-00000A0BB2C5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4690 Lines: 141 On 11/27/2012 07:34 PM, Andrew Theurer wrote: > On Tue, 2012-11-27 at 16:00 +0530, Raghavendra K T wrote: >> On 11/26/2012 07:05 PM, Andrew Jones wrote: >>> On Mon, Nov 26, 2012 at 05:37:54PM +0530, Raghavendra K T wrote: >>>> From: Peter Zijlstra >>>> >>>> In case of undercomitted scenarios, especially in large guests >>>> yield_to overhead is significantly high. when run queue length of >>>> source and target is one, take an opportunity to bail out and return >>>> -ESRCH. This return condition can be further exploited to quickly come >>>> out of PLE handler. >>>> >>>> (History: Raghavendra initially worked on break out of kvm ple handler upon >>>> seeing source runqueue length = 1, but it had to export rq length). >>>> Peter came up with the elegant idea of return -ESRCH in scheduler core. >>>> >>>> Signed-off-by: Peter Zijlstra >>>> Raghavendra, Checking the rq length of target vcpu condition added.(thanks Avi) >>>> Reviewed-by: Srikar Dronamraju >>>> Signed-off-by: Raghavendra K T >>>> --- >>>> >>>> kernel/sched/core.c | 25 +++++++++++++++++++------ >>>> 1 file changed, 19 insertions(+), 6 deletions(-) >>>> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>>> index 2d8927f..fc219a5 100644 >>>> --- a/kernel/sched/core.c >>>> +++ b/kernel/sched/core.c >>>> @@ -4289,7 +4289,10 @@ EXPORT_SYMBOL(yield); >>>> * It's the caller's job to ensure that the target task struct >>>> * can't go away on us before we can do any checks. >>>> * >>>> - * Returns true if we indeed boosted the target task. >>>> + * Returns: >>>> + * true (>0) if we indeed boosted the target task. >>>> + * false (0) if we failed to boost the target. >>>> + * -ESRCH if there's no task to yield to. >>>> */ >>>> bool __sched yield_to(struct task_struct *p, bool preempt) >>>> { >>>> @@ -4303,6 +4306,15 @@ bool __sched yield_to(struct task_struct *p, bool preempt) >>>> >>>> again: >>>> p_rq = task_rq(p); >>>> + /* >>>> + * If we're the only runnable task on the rq and target rq also >>>> + * has only one task, there's absolutely no point in yielding. >>>> + */ >>>> + if (rq->nr_running == 1 && p_rq->nr_running == 1) { >>>> + yielded = -ESRCH; >>>> + goto out_irq; >>>> + } >>>> + >>>> double_rq_lock(rq, p_rq); >>>> while (task_rq(p) != p_rq) { >>>> double_rq_unlock(rq, p_rq); >>>> @@ -4310,13 +4322,13 @@ again: >>>> } >>>> >>>> if (!curr->sched_class->yield_to_task) >>>> - goto out; >>>> + goto out_unlock; >>>> >>>> if (curr->sched_class != p->sched_class) >>>> - goto out; >>>> + goto out_unlock; >>>> >>>> if (task_running(p_rq, p) || p->state) >>>> - goto out; >>>> + goto out_unlock; >>>> >>>> yielded = curr->sched_class->yield_to_task(rq, p, preempt); >>>> if (yielded) { >>>> @@ -4329,11 +4341,12 @@ again: >>>> resched_task(p_rq->curr); >>>> } >>>> >>>> -out: >>>> +out_unlock: >>>> double_rq_unlock(rq, p_rq); >>>> +out_irq: >>>> local_irq_restore(flags); >>>> >>>> - if (yielded) >>>> + if (yielded > 0) >>>> schedule(); >>>> >>>> return yielded; >>>> >>> >>> Acked-by: Andrew Jones >>> >> >> Thank you Drew. >> >> Marcelo Gleb.. Please let me know if you have comments / concerns on the >> patches.. >> >> Andrew, Vinod, IMO, the patch set looks good for undercommit scenarios >> especially for large guests where we do have overhead of vcpu iteration >> of ple handler.. > > I agree, looks fine for undercommit scenarios. I do wonder what happens > with 1.5x overcommit, where we might see 1/2 the host cpus with runqueue > of 2 and 1/2 of the host cpus with a runqueue of 1. Even with this > change that scenario still might be fine, but it would be nice to see a > comparison. > Hi Andrew, yes thanks for pointing out 1.5x case which should have theoretical worst case.. I tried with 2 24 vcpu guests and the same 32 core machine.. Here is the result.. Ebizzy (rec/sec higher is better) x base + patched N Avg Stddev x 10 2688.6 347.55917 + 10 2707.6 260.93728 improvement 0.706% dbench (Throughput MB/sec higher is better) x base + patched N Avg Stddev x 10 3164.712 140.24468 + 10 3244.021 185.92434 Improvement 2.5% So there is no significant improvement / degradation seen in 1.5x. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/