Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp602025pxu; Sun, 22 Nov 2020 21:40:50 -0800 (PST) X-Google-Smtp-Source: ABdhPJzRIp3M1jX6PuLNvW8HANmU5ORpD5QUfO9aPUv02ciWfjh154sJq/DTi1KtXaSGLQqr1yZd X-Received: by 2002:a17:906:1253:: with SMTP id u19mr41671818eja.288.1606110050134; Sun, 22 Nov 2020 21:40:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606110050; cv=none; d=google.com; s=arc-20160816; b=huyl19+xfg7pzu8mXu+6Izy6OaWGFSZ1cInMRGdk6oNgUlncP4KRVYmatcsXyAM+Lk q4f+2Cjxl/UPo2xeS96y1/UeJtxVAxi8qcpojd+lZfuGd6kmPCv38VTjQ+f7nZpGvYgX Fbs8/5CROZqAhU3hRXjQn1ifgKRid3WYgc/wT8nBWpeZSNMYYsIEn0Zry8ocnwq2sKh0 WgF5h2VBN8wEvBC3hHdDecnHL1vk3wXYTrawVewdLPhpfmrLn0t8plOe1xs+8mOy1tbc bPVv6KK6i9j3WT75+WT4Jb5FPOekt36+5eMagW/Bg8PwV6PpKG04Zvl5iFuV3jmo70hq 05pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=NEDnfUVbzTbI/sD8HQrH43b7rzKl597mA6O1YPBXsqI=; b=jehSZ7CIkQM7uWGN7xNtOaAOGLMYAvrcF0FQwG5sjrjcLJfxS3Lk3T+UWNLCvNmO2S ktMefnGp7E7eCU+ToVISjTBqEZIshlef3FA8RuH6i5ni4s8lVFyfmpCYzQY+vQU61iPK 67Lc7RnKtN35grGyEdA01RrvQ64YfXHvKaApAX+p2xAkGHNRwl+Cccy5gfOt9USu36PY egV+kebMNwOD57d3TeFifEdrGxlj2akzTc15luJgWn5irge6lU0id/Gp3gsxhZJlqKnb fF0fMCeGttnlWAs5lqen++zB+6d2Xfer04OxEwMHu/Ao0S4GoFXfS5zlTiv7WCtj7qa/ dpXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y2si5428697ejf.663.2020.11.22.21.40.27; Sun, 22 Nov 2020 21:40:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727154AbgKWEgW (ORCPT + 99 others); Sun, 22 Nov 2020 23:36:22 -0500 Received: from mga12.intel.com ([192.55.52.136]:45124 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726802AbgKWEgV (ORCPT ); Sun, 22 Nov 2020 23:36:21 -0500 IronPort-SDR: cO40+yhRMEI0SZVRH0K9+OL7btN80mCfAdhROeB/TfIEA7qG5SegFAdqzrpDfd4HCNYAXCjpRc JqVIKDZ56v3w== X-IronPort-AV: E=McAfee;i="6000,8403,9813"; a="150962112" X-IronPort-AV: E=Sophos;i="5.78,361,1599548400"; d="scan'208";a="150962112" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Nov 2020 20:36:20 -0800 IronPort-SDR: MKKN3PF0ejoQKgM2BIHoTrmrVKYIf6YWjuYazQBx/IFK+Ke773jxIYjQohR2csLCpfVh/6Tce3 CgJClM+YSSTg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.78,361,1599548400"; d="scan'208";a="369902952" Received: from cli6-desk1.ccr.corp.intel.com (HELO [10.239.161.125]) ([10.239.161.125]) by FMSMGA003.fm.intel.com with ESMTP; 22 Nov 2020 20:36:11 -0800 Subject: Re: [PATCH -tip 14/32] sched: migration changes for core scheduling To: Balbir Singh , "Joel Fernandes (Google)" Cc: Nishanth Aravamudan , Julien Desfossez , Peter Zijlstra , Tim Chen , Vineeth Pillai , Aaron Lu , Aubrey Li , tglx@linutronix.de, linux-kernel@vger.kernel.org, mingo@kernel.org, torvalds@linux-foundation.org, fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com, Phil Auld , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini , vineeth@bitbyteword.org, Chen Yu , Christian Brauner , Agata Gruza , Antonio Gomez Iglesias , graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com, pjt@google.com, rostedt@goodmis.org, derkling@google.com, benbjiang@tencent.com, Alexandre Chartre , James.Bottomley@hansenpartnership.com, OWeisse@umich.edu, Dhaval Giani , Junaid Shahid , jsbarnes@google.com, chris.hyser@oracle.com, Ben Segall , Josh Don , Hao Luo , Tom Lendacky , Aubrey Li , "Paul E. McKenney" , Tim Chen References: <20201117232003.3580179-1-joel@joelfernandes.org> <20201117232003.3580179-15-joel@joelfernandes.org> <20201122235456.GF110669@balbir-desktop> From: "Li, Aubrey" Message-ID: <0b2514ef-6cc3-c1a3-280b-5d9062c80a31@linux.intel.com> Date: Mon, 23 Nov 2020 12:36:10 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <20201122235456.GF110669@balbir-desktop> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/11/23 7:54, Balbir Singh wrote: > On Tue, Nov 17, 2020 at 06:19:44PM -0500, Joel Fernandes (Google) wrote: >> From: Aubrey Li >> >> - Don't migrate if there is a cookie mismatch >> Load balance tries to move task from busiest CPU to the >> destination CPU. When core scheduling is enabled, if the >> task's cookie does not match with the destination CPU's >> core cookie, this task will be skipped by this CPU. This >> mitigates the forced idle time on the destination CPU. >> >> - Select cookie matched idle CPU >> In the fast path of task wakeup, select the first cookie matched >> idle CPU instead of the first idle CPU. >> >> - Find cookie matched idlest CPU >> In the slow path of task wakeup, find the idlest CPU whose core >> cookie matches with task's cookie >> >> - Don't migrate task if cookie not match >> For the NUMA load balance, don't migrate task to the CPU whose >> core cookie does not match with task's cookie >> >> Tested-by: Julien Desfossez >> Signed-off-by: Aubrey Li >> Signed-off-by: Tim Chen >> Signed-off-by: Vineeth Remanan Pillai >> Signed-off-by: Joel Fernandes (Google) >> --- >> kernel/sched/fair.c | 64 ++++++++++++++++++++++++++++++++++++++++---- >> kernel/sched/sched.h | 29 ++++++++++++++++++++ >> 2 files changed, 88 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index de82f88ba98c..ceb3906c9a8a 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -1921,6 +1921,15 @@ static void task_numa_find_cpu(struct task_numa_env *env, >> if (!cpumask_test_cpu(cpu, env->p->cpus_ptr)) >> continue; >> >> +#ifdef CONFIG_SCHED_CORE >> + /* >> + * Skip this cpu if source task's cookie does not match >> + * with CPU's core cookie. >> + */ >> + if (!sched_core_cookie_match(cpu_rq(cpu), env->p)) >> + continue; >> +#endif >> + > > Any reason this is under an #ifdef? In sched_core_cookie_match() won't > the check for sched_core_enabled() do the right thing even when > CONFIG_SCHED_CORE is not enabed?> Yes, sched_core_enabled works properly when CONFIG_SCHED_CORE is not enabled. But when CONFIG_SCHED_CORE is not enabled, it does not make sense to leave a core scheduler specific function here even at compile time. Also, for the cases in hot path, this saves CPU cycles to avoid a judgment. >> env->dst_cpu = cpu; >> if (task_numa_compare(env, taskimp, groupimp, maymove)) >> break; >> @@ -5867,11 +5876,17 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this >> >> /* Traverse only the allowed CPUs */ >> for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) { >> + struct rq *rq = cpu_rq(i); >> + >> +#ifdef CONFIG_SCHED_CORE >> + if (!sched_core_cookie_match(rq, p)) >> + continue; >> +#endif >> + >> if (sched_idle_cpu(i)) >> return i; >> >> if (available_idle_cpu(i)) { >> - struct rq *rq = cpu_rq(i); >> struct cpuidle_state *idle = idle_get_state(rq); >> if (idle && idle->exit_latency < min_exit_latency) { >> /* >> @@ -6129,8 +6144,18 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t >> for_each_cpu_wrap(cpu, cpus, target) { >> if (!--nr) >> return -1; >> - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) >> - break; >> + >> + if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) { >> +#ifdef CONFIG_SCHED_CORE >> + /* >> + * If Core Scheduling is enabled, select this cpu >> + * only if the process cookie matches core cookie. >> + */ >> + if (sched_core_enabled(cpu_rq(cpu)) && >> + p->core_cookie == cpu_rq(cpu)->core->core_cookie) >> +#endif >> + break; >> + } >> } >> >> time = cpu_clock(this) - time; >> @@ -7530,8 +7555,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) >> * We do not migrate tasks that are: >> * 1) throttled_lb_pair, or >> * 2) cannot be migrated to this CPU due to cpus_ptr, or >> - * 3) running (obviously), or >> - * 4) are cache-hot on their current CPU. >> + * 3) task's cookie does not match with this CPU's core cookie >> + * 4) running (obviously), or >> + * 5) are cache-hot on their current CPU. >> */ >> if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu)) >> return 0; >> @@ -7566,6 +7592,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) >> return 0; >> } >> >> +#ifdef CONFIG_SCHED_CORE >> + /* >> + * Don't migrate task if the task's cookie does not match >> + * with the destination CPU's core cookie. >> + */ >> + if (!sched_core_cookie_match(cpu_rq(env->dst_cpu), p)) >> + return 0; >> +#endif >> + >> /* Record that we found atleast one task that could run on dst_cpu */ >> env->flags &= ~LBF_ALL_PINNED; >> >> @@ -8792,6 +8827,25 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, int this_cpu) >> p->cpus_ptr)) >> continue; >> >> +#ifdef CONFIG_SCHED_CORE >> + if (sched_core_enabled(cpu_rq(this_cpu))) { >> + int i = 0; >> + bool cookie_match = false; >> + >> + for_each_cpu(i, sched_group_span(group)) { >> + struct rq *rq = cpu_rq(i); >> + >> + if (sched_core_cookie_match(rq, p)) { >> + cookie_match = true; >> + break; >> + } >> + } >> + /* Skip over this group if no cookie matched */ >> + if (!cookie_match) >> + continue; >> + } >> +#endif >> + >> local_group = cpumask_test_cpu(this_cpu, >> sched_group_span(group)); >> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h >> index e72942a9ee11..de553d39aa40 100644 >> --- a/kernel/sched/sched.h >> +++ b/kernel/sched/sched.h >> @@ -1135,6 +1135,35 @@ static inline raw_spinlock_t *rq_lockp(struct rq *rq) >> >> bool cfs_prio_less(struct task_struct *a, struct task_struct *b); >> >> +/* >> + * Helper to check if the CPU's core cookie matches with the task's cookie >> + * when core scheduling is enabled. >> + * A special case is that the task's cookie always matches with CPU's core >> + * cookie if the CPU is in an idle core. >> + */ >> +static inline bool sched_core_cookie_match(struct rq *rq, struct task_struct *p) >> +{ >> + bool idle_core = true; >> + int cpu; >> + >> + /* Ignore cookie match if core scheduler is not enabled on the CPU. */ >> + if (!sched_core_enabled(rq)) >> + return true; >> + >> + for_each_cpu(cpu, cpu_smt_mask(cpu_of(rq))) { >> + if (!available_idle_cpu(cpu)) { > > I was looking at this snippet and comparing this to is_core_idle(), the > major difference is the check for vcpu_is_preempted(). Do we want to > call the core as non idle if any vcpu was preempted on this CPU? Yes, if there is a VCPU was preempted on this CPU, better not place task on this core as the VCPU may be holding a spinlock and wants to be executed again ASAP. Thanks, -Aubrey