Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752000AbYLOM0S (ORCPT ); Mon, 15 Dec 2008 07:26:18 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750802AbYLOM0J (ORCPT ); Mon, 15 Dec 2008 07:26:09 -0500 Received: from E23SMTP05.au.ibm.com ([202.81.18.174]:34534 "EHLO e23smtp05.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750769AbYLOM0H (ORCPT ); Mon, 15 Dec 2008 07:26:07 -0500 Date: Mon, 15 Dec 2008 17:55:29 +0530 From: Vaidyanathan Srinivasan To: Peter Zijlstra , Linux Kernel , Suresh B Siddha , Venkatesh Pallipadi , Ingo Molnar , Dipankar Sarma , Vatsa , Gautham R Shenoy , Andi Kleen , David Collier-Brown , Tim Connors , Max Krasnyansky , Gregory Haskins Subject: Re: [RFC PATCH v5 4/7] sched: bias task wakeups to preferred semi-idle packages Message-ID: <20081215122529.GR5457@dirshya.in.ibm.com> Reply-To: svaidy@linux.vnet.ibm.com Mail-Followup-To: Peter Zijlstra , Linux Kernel , Suresh B Siddha , Venkatesh Pallipadi , Ingo Molnar , Dipankar Sarma , Vatsa , Gautham R Shenoy , Andi Kleen , David Collier-Brown , Tim Connors , Max Krasnyansky , Gregory Haskins References: <20081211173831.2020.57550.stgit@drishya.in.ibm.com> <20081211174304.2020.14746.stgit@drishya.in.ibm.com> <20081215070139.GE18403@balbir.in.ibm.com> <1229329521.14605.17.camel@twins> <1229329984.14605.18.camel@twins> <20081215084642.GJ18403@balbir.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20081215084642.GJ18403@balbir.in.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3286 Lines: 79 * Balbir Singh [2008-12-15 14:16:42]: > * Peter Zijlstra [2008-12-15 09:33:04]: > > > On Mon, 2008-12-15 at 09:25 +0100, Peter Zijlstra wrote: > > > On Mon, 2008-12-15 at 12:31 +0530, Balbir Singh wrote: > > > > > > > > kernel/sched_fair.c | 17 +++++++++++++++++ > > > > > 1 files changed, 17 insertions(+), 0 deletions(-) > > > > > > > > > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > > > > > index 98345e4..939f2a1 100644 > > > > > --- a/kernel/sched_fair.c > > > > > +++ b/kernel/sched_fair.c > > > > > @@ -1027,6 +1027,23 @@ static int wake_idle(int cpu, struct task_struct *p) > > > > > cpumask_t tmp; > > > > > struct sched_domain *sd; > > > > > int i; > > > > > + unsigned int chosen_wakeup_cpu; > > > > > + int this_cpu; > > > > > + > > > > > + /* > > > > > + * At POWERSAVINGS_BALANCE_WAKEUP level, if both this_cpu and prev_cpu > > > > > + * are idle and this is not a kernel thread and this task's affinity > > > > > + * allows it to be moved to preferred cpu, then just move! > > > > > + */ > > > > > + > > > > > + this_cpu = smp_processor_id(); > > > > > + chosen_wakeup_cpu = > > > > > + cpu_rq(this_cpu)->rd->sched_mc_preferred_wakeup_cpu; > > > > > + > > > > > + if (sched_mc_power_savings >= POWERSAVINGS_BALANCE_WAKEUP && > > > > > + idle_cpu(cpu) && idle_cpu(this_cpu) && p->mm && > > > > > > > > The p->mm check is racy, it needs to be done under task_lock(). The > > > > best way to check for a kernel thread is get_task_mm(), followed by > > > > put_task_mm() is the mm is not NULL. We also need to check to see if > > > > the task is _hot_ on cpu. We should negate this optimization in case > > > > chosen_wakeup_cpu is idle, so check for that as well. > > > > > > Sure its racy, but so what? > > > > > > The worst I can see it that we exclude a dying task from this logic, > > > which isn't a problem at all, since its dying anyway. > > > > At which point I seriously doubt it'd still be on the rq anyway. > > > > I forgot to mention that, the check should be (p->mm && !(p->flags & PF_KTHREAD)) I can check for PF_KTHREAD for now. However, I should reduce the number of checks since this may slow down wake_idle for sched_mc=2. We can tolerate p->mm check on a dying process as Peter has suggested, hence we don't need to protect it. We are not going to access any contents of the mm struct. If PF_KTHREAD is only being used by AIO, then I feel we can drop the check since the threads will not have affinity and they can be moved to other cpus anyway. The main reason for skipping kthread is that they may be using per-cpu variables and sleep/preempted. I did not want the wake_idle() logic to move them around forcefully. This is not the general case and this situation should not happen. Second reason is to optimise on the affinity check since most of the kthreads have affinity and cannot be moved. This condition check needs optimisation after getting the framework functionally correct and useful. --Vaidy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/