Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp1666400pxb; Wed, 9 Feb 2022 01:38:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJywmM1APr9cZgDd2Sb6WNE9WkUGFHP/+UZV3lZtfGqpN9zbdpDd6ZhYbt2JjEfBYWqlzv0i X-Received: by 2002:a05:6a00:1ac9:: with SMTP id f9mr1310701pfv.65.1644399523466; Wed, 09 Feb 2022 01:38:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644399523; cv=none; d=google.com; s=arc-20160816; b=CQ+wf5Rn09ERea0rWWqALMeHuZEFM6bDxFh6+8wsE88msTRyWlPnvR0izwNivP2/Wk DK+wBIHcFXFqRDDYqgzwJX83szm3q95xMcoxNzWRvzlEDDwI4uvkh7K+eF5HA+ePwCh4 8Oga4qcYp3cs2xZIWfvHt5bFN3DNbonj60+RWVSZkRrH288vqSBU2FhUHQGSLLnRaF6q bOwAiVZCuW5ub1SMOJufnIyxVtOXjxI9SssNjcTz8gTVBk4dsZxYAslGOTwLOLEBDHWD legfxqHvDivJdb5vMh+x2al1ntttpX8ooernoZKN96lmWkTime2bB3Qi6THkyWWpV++i 7cXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=N9cDzMfCkgSFAnPOSUVKjB6IXfDoMrjEJZYqMeDDkVw=; b=gyjB8byKDcK/NT1OK7c2JW8900tMDvNlRppH47tx/I86FBfnXR07ZC1AiFj3o46xlQ DyLm/iMq037rOosEjW+1z291yPZKbaDMSIsVziYfnSc+mtOJ8L9D41MXo5yT+WpUoJBF bnsFfwlTtHuXmsyrJAg0x2EHneWEYJo4Da1yhkTNeQuOunaLVjU5ca8HW5uu1PGXNvN0 4NG9fIG8wMCwMvtsr9Z81IdUqhslrulKFxJsbTBIs07FYSTFm113cp94mHrD4h+RrOMQ sLHFpSWX5NgLd2sstTBpvmqhaKuyMCgNhzEMLYiJxQy/tCl8dxu7g568+BZ0osxpJSrv pzJA== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id y3si14289715plr.30.2022.02.09.01.38.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Feb 2022 01:38:43 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AD3F7C002145; Wed, 9 Feb 2022 01:03:06 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357319AbiBHLaq (ORCPT + 99 others); Tue, 8 Feb 2022 06:30:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238928AbiBHKvV (ORCPT ); Tue, 8 Feb 2022 05:51:21 -0500 Received: from outbound-smtp25.blacknight.com (outbound-smtp25.blacknight.com [81.17.249.193]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D235AC03FEC0 for ; Tue, 8 Feb 2022 02:51:19 -0800 (PST) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp25.blacknight.com (Postfix) with ESMTPS id 638ABCB22E for ; Tue, 8 Feb 2022 10:51:18 +0000 (GMT) Received: (qmail 12525 invoked from network); 8 Feb 2022 10:51:18 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.17.223]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 8 Feb 2022 10:51:18 -0000 Date: Tue, 8 Feb 2022 10:51:16 +0000 From: Mel Gorman To: K Prateek Nayak Cc: peterz@infradead.org, aubrey.li@linux.intel.com, efault@gmx.de, gautham.shenoy@amd.com, linux-kernel@vger.kernel.org, mingo@kernel.org, song.bao.hua@hisilicon.com, srikar@linux.vnet.ibm.com, valentin.schneider@arm.com, vincent.guittot@linaro.org Subject: Re: [PATCH] sched/fair: Consider cpu affinity when allowing NUMA imbalance in find_idlest_group Message-ID: <20220208105116.GO3366@techsingularity.net> References: <20220207155921.21321-1-kprateek.nayak@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20220207155921.21321-1-kprateek.nayak@amd.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 07, 2022 at 09:29:21PM +0530, K Prateek Nayak wrote: > Neither the sched/tip nor Mel's v5 patchset [1] provides an optimal > new-task wakeup strategy when the tasks are affined to a subset of cpus > which can result in piling of tasks on the same set of CPU in a NUMA > group despite there being other cpus in a different NUMA group where the > task could have run in. A good placement makes a difference especially > in case of short lived task where the delay in load balancer kicking in > can cause degradation in perfromance. > Thanks. V6 was posted based on previous feedback. While this patch is building on top of it, please add Acked-by or Tested-by if the imbalance series helps the general problem of handling imbalances when there are multiple last level caches. > > > Aggressive NUMA balancing is only done when needed. We select the > minimum of number of allowed cpus in sched group and the calculated > sd.imb_numa_nr as our imbalance threshold and the default behavior > of mel-v5 is only modified when the former is smaller than > the latter. > In this context, it should be safe to reuse select_idle_mask like this build tested patch diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 538756bd8e7f..1e759c21371b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9128,6 +9128,8 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu) case group_has_spare: if (sd->flags & SD_NUMA) { + struct cpumask *cpus; + int imb; #ifdef CONFIG_NUMA_BALANCING int idlest_cpu; /* @@ -9145,10 +9147,15 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu) * Otherwise, keep the task close to the wakeup source * and improve locality if the number of running tasks * would remain below threshold where an imbalance is - * allowed. If there is a real need of migration, - * periodic load balance will take care of it. + * allowed while accounting for the possibility the + * task is pinned to a subset of CPUs. If there is a + * real need of migration, periodic load balance will + * take care of it. */ - if (allow_numa_imbalance(local_sgs.sum_nr_running + 1, sd->imb_numa_nr)) + cpus = this_cpu_cpumask_var_ptr(select_idle_mask); + cpumask_and(cpus, sched_group_span(local), p->cpus_ptr); + imb = min(cpumask_weight(cpus), sd->imb_numa_nr); + if (allow_numa_imbalance(local_sgs.sum_nr_running + 1, imb)) return NULL; }