Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1EA6C433F5 for ; Thu, 25 Nov 2021 15:32:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356008AbhKYPfZ (ORCPT ); Thu, 25 Nov 2021 10:35:25 -0500 Received: from foss.arm.com ([217.140.110.172]:52408 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237654AbhKYPdY (ORCPT ); Thu, 25 Nov 2021 10:33:24 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 16B051FB; Thu, 25 Nov 2021 07:30:13 -0800 (PST) Received: from e113632-lin (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 13DAE3F66F; Thu, 25 Nov 2021 07:30:11 -0800 (PST) From: Valentin Schneider To: Vincent Guittot Cc: Vincent Donnefort , peterz@infradead.org, mingo@redhat.com, linux-kernel@vger.kernel.org, mgorman@techsingularity.net, dietmar.eggemann@arm.com Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task In-Reply-To: References: <20211124154239.3191366-1-vincent.donnefort@arm.com> <8735nkcwov.mognet@arm.com> Date: Thu, 25 Nov 2021 15:30:05 +0000 Message-ID: <87zgpsb6de.mognet@arm.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25/11/21 14:23, Vincent Guittot wrote: > On Thu, 25 Nov 2021 at 12:16, Valentin Schneider > wrote: >> I think you can still hit this on a symmetric system; let me try to >> reformulate my other email. >> >> If this (non-patched) condition evaluates to true, it means the previous >> condition >> >> (available_idle_cpu(target) || sched_idle_cpu(target)) && >> asym_fits_capacity(task_util, target) >> >> evaluated to false, so for a symmetric system target sure isn't idle. >> >> prev == smp_processor_id() implies prev == target, IOW prev isn't >> idle. Now, consider: >> >> p0.prev = CPU1 >> p1.prev = CPU1 >> >> CPU0 CPU1 >> current = don't care current = swapper/1 >> >> ttwu(p1) >> ttwu_queue(p1, CPU1) >> // or >> ttwu_queue_wakelist(p1, CPU1) >> >> hrtimer_wakeup() >> wake_up_process() >> ttwu() >> idle_cpu(CPU1)? no >> >> is_per_cpu_kthread(current)? yes >> prev == smp_processor_id()? yes >> this_rq()->nr_running <= 1? yes >> => self enqueue >> >> ... >> schedule_idle() >> >> This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just >> a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3 >> around, we'd still be stacking p0 and p1 onto CPU1. >> >> IOW this opens a window between a remote ttwu() and the idle task invoking >> schedule_idle() where the idle task can stack more tasks onto its CPU. > > Your use case above is out of the scope of this patch and has always > been there, even for other per cpu kthreads. In such case, the wake up > is not triggered by current (idle or another per cpu kthread) but by > an interrupt (hrtimer in your case). Technically the idle task didn't pass is_per_cpu_kthread(p) when that condition was added, this is somewhat of a "new development" - but you're right on the hardirq side of things. > If we want to filter wakeup > generated by interrupt context while a per cpu kthread is running, it > would be better to fix all cases and test the running context like > this > I think that could make sense - though can the idle task issue wakeups in process context? If so that won't be sufficient. A quick audit tells me: o rcu_nocb_flush_deferred_wakeup() happens before calling into cpuidle o I didn't see any wakeup issued from the cpu_pm_notifier call chain o I'm not entirely sure about flush_smp_call_function_from_idle(). I found this thing in RCU: smp_call_function_single(cpu, rcu_exp_handler) rcu_exp_handler() rcu_report_exp_rdp() rcu_report_exp_cpu_mult() __rcu_report_exp_rnp() swake_up_one() IIUC if set_nr_if_polling() then the smp_call won't send an IPI and should be handled in that flush_foo_from_idle() call. I'd be tempted to stick your VincentD's conditions together, just to be safe... > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -6397,7 +6397,8 @@ static int select_idle_sibling(struct > task_struct *p, int prev, int target) > * essentially a sync wakeup. An obvious example of this > * pattern is IO completions. > */ > - if (is_per_cpu_kthread(current) && > + if (!in_interrupt() && > + is_per_cpu_kthread(current) && > prev == smp_processor_id() && > this_rq()->nr_running <= 1) { > return prev; > >> >> > >> >> -- >> >> 2.25.1 >> >>