Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp191925imm; Thu, 26 Jul 2018 01:38:22 -0700 (PDT) X-Google-Smtp-Source: AAOMgpd+4K5/su0dA66MUsrX/Io2i86UM/rZs1Ga+3JoSD7YJcXNN5q0ZvEr6C2x6gWWpuQ4w3kY X-Received: by 2002:a63:1d5e:: with SMTP id d30-v6mr1105369pgm.12.1532594302632; Thu, 26 Jul 2018 01:38:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532594302; cv=none; d=google.com; s=arc-20160816; b=0Ls80IHhURATzWN97BpKDOhdo4wE6zLUSOi861NA8zLm1XFPd7khR3+5OSXUHMmlPV G/uRlgIFm546PQob1WSs66eIwCoZOfB/ewuv1qKdAbwYSgh+nMBgrnLYsoUZFjUM1jHm H+gcXjFFfn+obFaFiUKWGdU4VVIXbYyayhWQnweF0LdNzWNOXF3ZlwUeD6QrIVs+nbeW fU8WpmvLkVwI1+tsIG02ba+ISVqDzKp/LkV7FxXgiQrtMYq5xz5mupQaTjjDKkWVGOVt SimvSiwohJMMTTz25sxVdV33Hmt/JwkEbumDfS1Bt7idO23ALHaX0AFsAmdjbYSdP6Ns w3Sg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=Vn2NRuPxrjyrx9eyEC+EDtX+fRMk88W/Pea/GiF+HWg=; b=neQ8trkWgF+RaA9aCrDWgLpjAUGolTGMuHKnI2ssK1iJ1kkg4abgTPG5r2iNzM7cWX Xfy0gl/06RY0uppKXxDttJg+LoqE6citTtxrzNbCVHAjHWvS7/XsBxCaU1p6oEXAPgLG W1Umu9j4g0DERXUZsgjlHqhDqXz4gukeIeuuYvf4ukY64k0NPD/5bBYBlJCfnpPQeGHx OU12uuWkp6X4Jqt8DOK1MrTwFoBkW6ITLiDlMaGY+AdAXQjE0h1X5Y4EzAGgixNSa0nq XtOWCKN6vBsBWqb53oJZdRI7swLhr1SWtRyFOuB4Z2/UbRoT3IMAQbr2X9IG3eDd1hIC aJ9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a12-v6si704011pgv.296.2018.07.26.01.38.07; Thu, 26 Jul 2018 01:38:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729120AbeGZJwD (ORCPT + 99 others); Thu, 26 Jul 2018 05:52:03 -0400 Received: from out1.zte.com.cn ([202.103.147.172]:33304 "EHLO mxct.zte.com.cn" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727951AbeGZJwD (ORCPT ); Thu, 26 Jul 2018 05:52:03 -0400 X-Greylist: delayed 930 seconds by postgrey-1.27 at vger.kernel.org; Thu, 26 Jul 2018 05:52:01 EDT Received: from mse01.zte.com.cn (unknown [10.30.3.20]) by Forcepoint Email with ESMTPS id 37BE326C0F2AD2F28CCE; Thu, 26 Jul 2018 16:20:45 +0800 (CST) Received: from notes_smtp.zte.com.cn ([10.30.1.239]) by mse01.zte.com.cn with ESMTP id w6Q8KREF069041; Thu, 26 Jul 2018 16:20:27 +0800 (GMT-8) (envelope-from cheng.lin130@zte.com.cn) Received: from localhost.localdomain ([10.75.10.200]) by szsmtp06.zte.com.cn (Lotus Domino Release 8.5.3FP6) with ESMTP id 2018072616203476-1946370 ; Thu, 26 Jul 2018 16:20:34 +0800 From: Cheng Lin To: mingo@redhat.com, peterz@infradead.org Cc: linux-kernel@vger.kernel.org, jiang.biao2@zte.com.cn, zhong.weidong@zte.com.cn, tan.hu@zte.com.cn.cn, Cheng Lin Subject: [PATCH v2] sched/numa: do not balance tasks onto isolated cpus Date: Thu, 26 Jul 2018 16:19:08 +0800 Message-Id: <1532593148-106266-1-git-send-email-cheng.lin130@zte.com.cn> X-Mailer: git-send-email 1.8.3.1 X-MIMETrack: Itemize by SMTP Server on SZSMTP06/server/zte_ltd(Release 8.5.3FP6|November 21, 2013) at 2018-07-26 16:20:34, Serialize by Router on notes_smtp/zte_ltd(Release 9.0.1FP7|August 17, 2016) at 2018-07-26 16:20:15, Serialize complete at 2018-07-26 16:20:15 X-MAIL: mse01.zte.com.cn w6Q8KREF069041 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org By default, there is one sched domain covering all CPUs, including those isolated ones using "isolcpus=" boot parameter. However, the isolated CPUs will not participate in load balancing, and will not have tasks running on them unless explicitly assigning by CPU affinity. But, NUMA balancing has not taken *isolcpus(isolated cpus)* into consideration. It may migrate tasks onto isolated cpus and the migrated tasks will never escape from the isolated cpus, which will break the isolation provided by *isolcpus* boot parameter and intrduce various problems. The typical scenario is, When we wanna use the isolated CPUs in a cgroup, cpuset must include them(e.g. in container).In that case, task's CPU-affinity in the cgroup includes the isolated CPU by default; If we pin a task onto an isolated CPU or a CPU which on the same NUMA node with the isolated CPU, and if there is another task sharing memory with the pinned task, it will be migrated to the same NUMA node by NUMA-balancing for better performance. In this case, the isolated CPU maybe chosen as the target CPU. Although Load-balancing never migrate a task onto isolated CPU, NUMA-balancing does not consider isolated CPU currently. This patch ensure NUMA balancing not to balance tasks onto isolated Signed-off-by: Cheng Lin Reviewed-by: Tan Hu Reviewed-by: Jiang Biao --- v2: * rework and retest on latest kernel * detail the scenario in the commit log * fix the SoB chain kernel/sched/core.c | 9 ++++++--- kernel/sched/fair.c | 3 ++- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index fe365c9..170a673 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1302,10 +1302,12 @@ int migrate_swap(struct task_struct *cur, struct task_struct *p) if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu)) goto out; - if (!cpumask_test_cpu(arg.dst_cpu, &arg.src_task->cpus_allowed)) + if ((!cpumask_test_cpu(arg.dst_cpu, &arg.src_task->cpus_allowed)) + || !housekeeping_test_cpu(arg.dst_cpu, HK_FLAG_DOMAIN)) goto out; - if (!cpumask_test_cpu(arg.src_cpu, &arg.dst_task->cpus_allowed)) + if ((!cpumask_test_cpu(arg.src_cpu, &arg.dst_task->cpus_allowed)) + || !housekeeping_test_cpu(arg.src_cpu, HK_FLAG_DOMAIN)) goto out; trace_sched_swap_numa(cur, arg.src_cpu, p, arg.dst_cpu); @@ -5508,7 +5510,8 @@ int migrate_task_to(struct task_struct *p, int target_cpu) if (curr_cpu == target_cpu) return 0; - if (!cpumask_test_cpu(target_cpu, &p->cpus_allowed)) + if ((!cpumask_test_cpu(target_cpu, &p->cpus_allowed)) + || !housekeeping_test_cpu(target_cpu, HK_FLAG_DOMAIN)) return -EINVAL; /* TODO: This is not properly updating schedstats */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f0a0be..1ea2953 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1724,7 +1724,8 @@ static void task_numa_find_cpu(struct task_numa_env *env, for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) { /* Skip this CPU if the source task cannot migrate */ - if (!cpumask_test_cpu(cpu, &env->p->cpus_allowed)) + if ((!cpumask_test_cpu(cpu, &env->p->cpus_allowed)) + || !housekeeping_test_cpu(cpu, HK_FLAG_DOMAIN)) continue; env->dst_cpu = cpu; -- 1.8.3.1