Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp489876ybk; Fri, 15 May 2020 06:06:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyUaUKyaj5TIJnBzq2r5Z4QxBx/DWVQuf7bYlmu2/koIf1SiDiuqc0iZ5Zfp8FPI2Gwp8B0 X-Received: by 2002:a17:906:1857:: with SMTP id w23mr2775166eje.273.1589547997452; Fri, 15 May 2020 06:06:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589547997; cv=none; d=google.com; s=arc-20160816; b=sP7jsHofImZDsosxOBLTiU37Sr1zWMDlXi+EH9HLCAplLlmgcHsdWYdoxgp7BA1CG0 6IaWJHCBzBErLL45+Idey3MIAup0495LheUvfq9J1Z9kYBIramD7RIU+AOwDqtBg3VhF gVkMG18mDlbOJS7pXbfePNCb83mPNvmBw8XUNJBOovIfw0sxM1ANd4WUQHnPi0GIJy53 rRRZLjIou0NsznP86htx3rdOBav/cE+LXaolVDcyZIVmpyE2fvqWbT2wL4IvtueEQrGT oBMEjwHnY5LjxuSUFfMdggSjBgl2hoMs0z67VsuOBk/r94tEwB+CrRdGTAwyd25MNxCV 46Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=EUWuptPlRizX85gdAezPK4sC9QVGGNyNB9mYjjMDnHs=; b=B1luEsPe80Mq5cV+JdRrGd0Qh8JaPA2J5PWsKy4IzwDcbXX+KbFTW5z6KYXQ029LYn MVGRhI5eI+gAliN7eaZVlg4p5+ZE67UGdvC0Is0QWJZ0aKMBA8iqWa9X/ZcAiQU6g/I4 pDTKwLxFAx9VvfDvyTIWki0xvxJW2tnQ10WMSDNfSWwlLLhLv+XOtlQv7qccyilZ77e2 EFIgBlJ8wS3Zbh4zrzwgUeP6FdLWOvjYL3hbOx0Qv59dZWceprYhr3S/UfgVs079uPTg CD2+miRbZ32MGv5WyLp4+rgFUv9t5FbEP4bCso6qz4uEr9B15Vq9BUA/R+I0gu9HvAoD Kwqg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c20si1123545edn.476.2020.05.15.06.06.04; Fri, 15 May 2020 06:06:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726525AbgEONDu (ORCPT + 99 others); Fri, 15 May 2020 09:03:50 -0400 Received: from outbound-smtp63.blacknight.com ([46.22.136.252]:59357 "EHLO outbound-smtp63.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726179AbgEONDu (ORCPT ); Fri, 15 May 2020 09:03:50 -0400 Received: from mail.blacknight.com (pemlinmail06.blacknight.ie [81.17.255.152]) by outbound-smtp63.blacknight.com (Postfix) with ESMTPS id 5EE9BFAD72 for ; Fri, 15 May 2020 14:03:48 +0100 (IST) Received: (qmail 24034 invoked from network); 15 May 2020 13:03:48 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.18.57]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 15 May 2020 13:03:48 -0000 Date: Fri, 15 May 2020 14:03:46 +0100 From: Mel Gorman To: Peter Zijlstra Cc: Jirka Hladky , Phil Auld , Ingo Molnar , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Valentin Schneider , Hillf Danton , LKML , Douglas Shakshober , Waiman Long , Joe Mario , Bill Gray Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6 Message-ID: <20200515130346.GM3758@techsingularity.net> References: <20200507155422.GD3758@techsingularity.net> <20200508092212.GE3758@techsingularity.net> <20200513153023.GF3758@techsingularity.net> <20200514153122.GE2978@hirez.programming.kicks-ass.net> <20200515084740.GJ3758@techsingularity.net> <20200515111732.GS2957@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20200515111732.GS2957@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 15, 2020 at 01:17:32PM +0200, Peter Zijlstra wrote: > On Fri, May 15, 2020 at 09:47:40AM +0100, Mel Gorman wrote: > > > However, the wakeups are so rapid that the wakeup > > happens while the server is descheduling. That forces the waker to spin > > on smp_cond_load_acquire for longer. In this case, it can be cheaper to > > add the task to the rq->wake_list even if that potentially requires an IPI. > > Right, I think Rik ran into that as well at some point. He wanted to > make ->on_cpu do a hand-off, but simply queueing the wakeup on the prev > cpu (which is currently in the middle of schedule()) should be an easier > proposition. > > Maybe something like this untested thing... could explode most mighty, > didn't thing too hard. > > --- > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index fa6c19d38e82..c07b92a0ee5d 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2312,7 +2312,7 @@ static void wake_csd_func(void *info) > sched_ttwu_pending(); > } > > -static void ttwu_queue_remote(struct task_struct *p, int cpu, int wake_flags) > +static void __ttwu_queue_remote(struct task_struct *p, int cpu, int wake_flags) > { > struct rq *rq = cpu_rq(cpu); > > @@ -2354,6 +2354,17 @@ bool cpus_share_cache(int this_cpu, int that_cpu) > { > return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu); > } > + > +static bool ttwu_queue_remote(struct task_struct *p, int cpu, int wake_flags) > +{ > + if (sched_feat(TTWU_QUEUE) && !cpus_share_cache(smp_processor_id(), cpu)) { > + sched_clock_cpu(cpu); /* Sync clocks across CPUs */ > + __ttwu_queue_remote(p, cpu, wake_flags); > + return true; > + } > + > + return false; > +} > #endif /* CONFIG_SMP */ > > static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags) > @@ -2362,11 +2373,8 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags) > struct rq_flags rf; > > #if defined(CONFIG_SMP) > - if (sched_feat(TTWU_QUEUE) && !cpus_share_cache(smp_processor_id(), cpu)) { > - sched_clock_cpu(cpu); /* Sync clocks across CPUs */ > - ttwu_queue_remote(p, cpu, wake_flags); > + if (ttwu_queue_remote(p, cpu, wake_flags)) > return; > - } > #endif > > rq_lock(rq, &rf); > @@ -2550,7 +2558,15 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > if (p->on_rq && ttwu_remote(p, wake_flags)) > goto unlock; > > + if (p->in_iowait) { > + delayacct_blkio_end(p); > + atomic_dec(&task_rq(p)->nr_iowait); > + } > + > #ifdef CONFIG_SMP > + p->sched_contributes_to_load = !!task_contributes_to_load(p); > + p->state = TASK_WAKING; > + > /* > * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be > * possible to, falsely, observe p->on_cpu == 0. > @@ -2581,15 +2597,10 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > * This ensures that tasks getting woken will be fully ordered against > * their previous state and preserve Program Order. > */ > - smp_cond_load_acquire(&p->on_cpu, !VAL); > - > - p->sched_contributes_to_load = !!task_contributes_to_load(p); > - p->state = TASK_WAKING; > + if (READ_ONCE(p->on_cpu) && __ttwu_queue_remote(p, cpu, wake_flags)) > + goto unlock; > > - if (p->in_iowait) { > - delayacct_blkio_end(p); > - atomic_dec(&task_rq(p)->nr_iowait); > - } > + smp_cond_load_acquire(&p->on_cpu, !VAL); > > cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags); > if (task_cpu(p) != cpu) { I don't see a problem with moving the updating of p->state to the other side of the barrier but I'm relying on the comment that the barrier is only related to on_rq and on_cpu. However, I'm less sure about what exactly you intended to do. __ttwu_queue_remote is void so maybe you meant to use ttwu_queue_remote. In that case, we potentially avoid spinning on on_rq for wakeups between tasks that do not share CPU but it's not clear why it would be specific to remote tasks. If you meant to call __ttwu_queue_remote unconditionally, it's not clear why that's now safe when smp_cond_load_acquire() cared about on_rq being 0 before queueing a task for wakup or directly waking it up. Also because __ttwu_queue_remote() now happens before select_task_rq(), is there not a risk that in some cases we end up stacking tasks unnecessarily? > @@ -2597,14 +2608,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > psi_ttwu_dequeue(p); > set_task_cpu(p, cpu); > } > - > -#else /* CONFIG_SMP */ > - > - if (p->in_iowait) { > - delayacct_blkio_end(p); > - atomic_dec(&task_rq(p)->nr_iowait); > - } > - > #endif /* CONFIG_SMP */ > > ttwu_queue(p, cpu, wake_flags); -- Mel Gorman SUSE Labs