Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp92356pxk; Wed, 23 Sep 2020 23:51:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzKyufkyy4xqg5Pxbx4A1lpvSnmCxsnR0P9xXDvCS7vo3nmhKaUyW/9g3FD9zqcNmKdUzY5 X-Received: by 2002:a17:906:a002:: with SMTP id p2mr2913810ejy.399.1600930271578; Wed, 23 Sep 2020 23:51:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600930271; cv=none; d=google.com; s=arc-20160816; b=kyDAB5aXQQcxQysv0tIpczEqXoMifwU3s5dd7pF5qNu5fZ96Lv0pr4mJab4HKGy8RG Io77BfRpfv68KzYzFy01IVtyiJ4dpM3TOgzY9BMoZNbC3wLtdFsXkVnrlaDdUOrV825H df5hZAN9ONKnoA5JeeAKaKaYUJoubKfTAT3166o6/tDAIVokYOK/gN3/16MMMTDEMPs7 KW/Dz+NgYncDfg+gfa1BJPGRxCtWsc8+n7jPVyYnF4OyFuSJSLNPkXoza4A6CH01Ae4o kuMOSvE9Q+drP20XdjYWv/s9xnuFL1ML+x9RAccnygPuixidYRnKO1/GXppjHWTNreEb 6GuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=pjp+A1SgaGDpLOCEE/Kjg6XYxncXgzxP2/WBxrs8jTM=; b=oO5LwL34VBLzwHP7JciKwxK5JmD5/OuKXTCZDR7koUjFEFQ15/I7iXsvLUF7+5x8W2 KDEvUUxalmtCinrTkhpViIMLCiX4XfCDwHX0BilSp4YJXo3+BGL4AU5xAPdkfhNBtjVT 2zUjbsNwmqZFDF9RklsTjO7/jCaSh/VFP0faAoRb1hrnov4L4UclIkxq9TNkxc3R8nG1 1l7UIKRQEVwhsgSM3Ejz72vbJ5MnqEG4VOd/AyYqcbe1gUnbEFPKEA+qANa12v8CyiVf Z3gOgc2Lf+4RSp5eebESWyNCKfaeY1cvLh6Vr9/cTNQueYOot4EsfrAYBQTFaaxr8oPt WFbg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n25si1356844ejc.572.2020.09.23.23.50.48; Wed, 23 Sep 2020 23:51:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727024AbgIXGs5 (ORCPT + 99 others); Thu, 24 Sep 2020 02:48:57 -0400 Received: from out30-42.freemail.mail.aliyun.com ([115.124.30.42]:53312 "EHLO out30-42.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726119AbgIXGs4 (ORCPT ); Thu, 24 Sep 2020 02:48:56 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0U9wa7Ds_1600930127; Received: from localhost(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0U9wa7Ds_1600930127) by smtp.aliyun-inc.com(127.0.0.1); Thu, 24 Sep 2020 14:48:53 +0800 From: Xunlei Pang To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Jiang Biao Cc: Wetp Zhang , linux-kernel@vger.kernel.org Subject: [PATCH RESEND] sched/fair: Fix wrong cpu selecting from isolated domain Date: Thu, 24 Sep 2020 14:48:47 +0800 Message-Id: <1600930127-76857-1-git-send-email-xlpang@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We've met problems that occasionally tasks with full cpumask (e.g. by putting it into a cpuset or setting to full affinity) were migrated to our isolated cpus in production environment. After some analysis, we found that it is due to the current select_idle_smt() not considering the sched_domain mask. Steps to reproduce on my 31-CPU hyperthreads machine: 1. with boot parameter: "isolcpus=domain,2-31" (thread lists: 0,16 and 1,17) 2. cgcreate -g cpu:test; cgexec -g cpu:test "test_threads" 3. some threads will be migrated to the isolated cpu16~17. Fix it by checking the valid domain mask in select_idle_smt(). Fixes: 10e2f1acd010 ("sched/core: Rewrite and improve select_idle_siblings()) Reported-by: Wetp Zhang Reviewed-by: Jiang Biao Signed-off-by: Xunlei Pang --- kernel/sched/fair.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1a68a05..fa942c4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6075,7 +6075,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int /* * Scan the local SMT mask for idle CPUs. */ -static int select_idle_smt(struct task_struct *p, int target) +static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target) { int cpu; @@ -6083,7 +6083,8 @@ static int select_idle_smt(struct task_struct *p, int target) return -1; for_each_cpu(cpu, cpu_smt_mask(target)) { - if (!cpumask_test_cpu(cpu, p->cpus_ptr)) + if (!cpumask_test_cpu(cpu, p->cpus_ptr) || + !cpumask_test_cpu(cpu, sched_domain_span(sd))) continue; if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) return cpu; @@ -6099,7 +6100,7 @@ static inline int select_idle_core(struct task_struct *p, struct sched_domain *s return -1; } -static inline int select_idle_smt(struct task_struct *p, int target) +static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int target) { return -1; } @@ -6274,7 +6275,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) if ((unsigned)i < nr_cpumask_bits) return i; - i = select_idle_smt(p, target); + i = select_idle_smt(p, sd, target); if ((unsigned)i < nr_cpumask_bits) return i; -- 1.8.3.1