Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757202Ab2BICTO (ORCPT ); Wed, 8 Feb 2012 21:19:14 -0500 Received: from mail-qw0-f53.google.com ([209.85.216.53]:47688 "EHLO mail-qw0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752066Ab2BICTN (ORCPT ); Wed, 8 Feb 2012 21:19:13 -0500 Date: Thu, 9 Feb 2012 10:18:55 +0800 From: Yong Zhang To: Venki Pallipadi Cc: Peter Zijlstra , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Suresh Siddha , Aaron Durbin , Paul Turner , linux-kernel@vger.kernel.org Subject: Re: [RFC] Extend mwait idle to optimize away IPIs when possible Message-ID: <20120209021855.GA26152@zhy> Reply-To: Yong Zhang References: <1328560933-3037-1-git-send-email-venki@google.com> <20120208065115.GA19691@zhy> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4889 Lines: 136 On Wed, Feb 08, 2012 at 03:28:45PM -0800, Venki Pallipadi wrote: > On Tue, Feb 7, 2012 at 10:51 PM, Yong Zhang wrote: > > On Mon, Feb 06, 2012 at 12:42:13PM -0800, Venkatesh Pallipadi wrote: > >> smp_call_function_single and ttwu_queue_remote sends unconditional IPI > >> to target CPU. However, if the target CPU is in mwait based idle, we can > >> do IPI-less wakeups using the magical powers of monitor-mwait. > >> Doing this has certain advantages: > > > > Actually I'm trying to do the similar thing on MIPS. > > > > The difference is that I want task_is_polling() to do something. The basic > > idea is: > > > >> + ? ? ? ? ? ? ? ? ? ? if (ipi_pending()) { > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? clear_ipi_pending(); > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? local_bh_disable(); > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? local_irq_disable(); > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? generic_smp_call_function_single_interrupt(); > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? scheduler_wakeup_self_check(); > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? local_irq_enable(); > >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? local_bh_enable(); > > > > I let cpu_idle() check if there is anything to do as your above code. > > > > And task_is_polling() handle the others with below patch: > > --- > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 5255c9d..09f633d 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -527,15 +527,16 @@ void resched_task(struct task_struct *p) > > ? ? ? ? ? ? ? ?smp_send_reschedule(cpu); > > ?} > > > > -void resched_cpu(int cpu) > > +int resched_cpu(int cpu) > > ?{ > > ? ? ? ?struct rq *rq = cpu_rq(cpu); > > ? ? ? ?unsigned long flags; > > > > ? ? ? ?if (!raw_spin_trylock_irqsave(&rq->lock, flags)) > > - ? ? ? ? ? ? ? return; > > + ? ? ? ? ? ? ? return 0; > > ? ? ? ?resched_task(cpu_curr(cpu)); > > ? ? ? ?raw_spin_unlock_irqrestore(&rq->lock, flags); > > + ? ? ? return 1; > > ?} > I assume we are talking about 'return from idle' but seems I don't make it clear. > Two points - > rq->lock: I tried something similar first. One hurdle with checking > task_is_polling() is that you need rq->lock to check it. And adding > lock+unlock (without wait) in wakeup path ended up being no net gain > compared to IPI. And when we actually end up spinning on that lock, > thats going to add overhead in the common path. That is the reason I > switched to atomic compare exchange and moving any wait onto the > target CPU coming out of idle. I see. But actually we will not spinning on that lock because we use 'trylock' in resched_cpu(). And you are right there is indeed a little overhead (resched_task()) if we hold the lock but it can be tolerated IMHO. BTW, mind showing you test case thus we can collect some common data? > > resched_task: ttwu_queue_remote() does not imply that the remote CPU > will do a resched. Today there is a IPI and IPI handler calls onto > check_preempt_wakeup() and if the current task has higher precedence > than the waking up task, then there will be just an activation of new > task and no resched. Using resched_task above breaks > check_preempt_wakeup() and always calls a resched on remote CPU after > the IPI, which would be change in behavior. Yeah, if the remote cpu is not idle, mine will change the behavior; but if the remote cpu is idle, it will always rescheduled, right? So maybe we could introduce resched_idle_cpu() to make things more clear: int resched_idle_cpu(int cpu) { struct rq *rq = cpu_rq(cpu); unsigned long flags; int ret = 0; if (!raw_spin_trylock_irqsave(&rq->lock, flags)) goto out; if (!idle_cpu(cpu)) goto out_unlock; resched_task(cpu_curr(cpu)); ret = 1; out_unlock: raw_spin_unlock_irqrestore(&rq->lock, flags); out: return ret; } Thanks, Yong > > Thanks, > Venki > > > > > ?#ifdef CONFIG_NO_HZ > > @@ -1484,7 +1485,8 @@ void scheduler_ipi(void) > > > > ?static void ttwu_queue_remote(struct task_struct *p, int cpu) > > ?{ > > - ? ? ? if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list)) > > + ? ? ? if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list) && > > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? !resched_cpu(cpu)) > > ? ? ? ? ? ? ? ?smp_send_reschedule(cpu); > > ?} > > > > Thought? > > > > Thanks, > > Yong > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Only stand for myself -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/