Received: by 2002:ac0:e34a:0:0:0:0:0 with SMTP id g10csp163575imn; Wed, 27 Jul 2022 18:48:53 -0700 (PDT) X-Google-Smtp-Source: AGRyM1v1gLYOIU9U+0qYeLU27hsuDyMUZRao005+sgYvdcaSdIOri3fco7rHbrcKHqADFKxjhOr4 X-Received: by 2002:a17:90b:4c8a:b0:1f2:c360:5e66 with SMTP id my10-20020a17090b4c8a00b001f2c3605e66mr7610002pjb.36.1658972933089; Wed, 27 Jul 2022 18:48:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658972933; cv=none; d=google.com; s=arc-20160816; b=BDqb2KhqjBYOf22g58uyOYgWzE03QSqFbo63lfrLZESfbkVwiqwtVZY3qNeTfyd2y+ o27JQbvGzy31VXdrWtruftkhU4b+h3+FWQxQC3dPEjllF5eiFPByXjRNttlvjNRPnbpp 1jfQ5jtEb4yhKrzcZWsZme9azzEAWZ0kmLPefUiAYn5jROlsNZttItyTEV6YBOH6JQFZ Lh3ab3MRFrT4uroNvV1bZ5po6ZsOWmd8g/MQOkab5mLb/9SYqTUHjltQt1h/wFniyX+V 3BqW4KmVsjoasZ2sHGhEmOu+am9qIOP6Kk7xJvSTPZklEw8SRIlF+OFYRjlb0lp3tzGg yE9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=BmJWjBY43RDpj3JT97t8mjKUtc70HW3e4aiC0GvyquU=; b=wugjCnUS0UJa9pdL3sHIW8AodfnRk8TUIYCLP1JGXKQHfEKp6XfmRdLTvH3Qe3agrw H7Dg98IFsUz6V+IrCQTx+MRt+9u4nr+CZODbGt91v74uwyJAzsQbkUJFxwtXDhSX50lx aLBzddsNsdZXZ90WeS+cxN+pkbaIZHdrqznwe84gzHJxLrMNC56+2nbBWDN8k1nlEERz of4tHAyTuf3jCUjKCRacWY03JBTHOwFuoP87WRr80fGDwGv03sgkKU/RgEy53icbFRyg 9DqzVlQ4QWvhHoAvKULWBrdetSwZT2pPndltEbMQfmGouo8Dx3IlJEXKd4w/yiGQOa9O st+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EMRMDT7R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f125-20020a623883000000b005295011e4f8si21220355pfa.228.2022.07.27.18.48.38; Wed, 27 Jul 2022 18:48:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EMRMDT7R; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236612AbiG1A6o (ORCPT + 99 others); Wed, 27 Jul 2022 20:58:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230007AbiG1A6m (ORCPT ); Wed, 27 Jul 2022 20:58:42 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2C1C85A3E7 for ; Wed, 27 Jul 2022 17:58:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1658969920; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=BmJWjBY43RDpj3JT97t8mjKUtc70HW3e4aiC0GvyquU=; b=EMRMDT7R4fb3vK1bpunngbRPajkM2JvS6vcXOw19emqrEezLbXjoiBabZOspJG39brNJsY 0hY9J2BIQShNylHKRpyHe+o9xR2zVRzbnzqd8Yph2XMT2x5n/t8nmsNY2G/dpJCfkCv6ze ZPi/MY/RcivJf2me86FdYDBgcfYXVEI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-9-s2gKf7mUM02v7tes66gJhA-1; Wed, 27 Jul 2022 20:58:36 -0400 X-MC-Unique: s2gKf7mUM02v7tes66gJhA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7DD601C01B3B; Thu, 28 Jul 2022 00:58:35 +0000 (UTC) Received: from llong.com (unknown [10.22.8.34]) by smtp.corp.redhat.com (Postfix) with ESMTP id 51A2EC28100; Thu, 28 Jul 2022 00:58:34 +0000 (UTC) From: Waiman Long To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Tejun Heo , Zefan Li , Johannes Weiner Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Waiman Long Subject: [PATCH 1/2] cgroup/cpuset: Keep current cpus list if cpus affinity was explicitly set Date: Wed, 27 Jul 2022 20:58:14 -0400 Message-Id: <20220728005815.1715522-1-longman@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.8 X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org It was found that any change to the current cpuset hierarchy may reset the cpus_allowed list of the tasks in the affected cpusets to the default cpuset value even if those tasks have cpus affinity explicitly set by the users before. That is especially easy to trigger under a cgroup v2 environment where writing "+cpuset" to the root cgroup's cgroup.subtree_control file will reset the cpus affinity of all the processes in the system. That is especially problematic in a nohz_full environment where the tasks running in the nohz_full CPUs usually have their cpus affinity explicitly set and will behave incorrectly if cpus affinity changes. Fix this problem by adding a flag in the task structure to indicate that a task has their cpus affinity explicitly set before and make cpuset code not to change their cpus_allowed list unless the user chosen cpu list is no longer a subset of the cpus_allowed list of the cpuset itself. With that change in place, it was verified that tasks that have its cpus affinity explicitly set will not be affected by changes made to the v2 cgroup.subtree_control files. Signed-off-by: Waiman Long --- include/linux/sched.h | 1 + kernel/cgroup/cpuset.c | 18 ++++++++++++++++-- kernel/sched/core.c | 1 + 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index c46f3a63b758..60ae022fa842 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -815,6 +815,7 @@ struct task_struct { unsigned int policy; int nr_cpus_allowed; + int cpus_affinity_set; const cpumask_t *cpus_ptr; cpumask_t *user_cpus_ptr; cpumask_t cpus_mask; diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 71a418858a5e..c47757c61f39 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -704,6 +704,20 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) return ret; } +/* + * Don't change the cpus_allowed list if cpus affinity has been explicitly + * set before unless the current cpu list is not a subset of the new cpu list. + */ +static int cpuset_set_cpus_allowed_ptr(struct task_struct *p, + const struct cpumask *new_mask) +{ + if (p->cpus_affinity_set && cpumask_subset(p->cpus_ptr, new_mask)) + return 0; + + p->cpus_affinity_set = 0; + return set_cpus_allowed_ptr(p, new_mask); +} + #ifdef CONFIG_SMP /* * Helper routine for generate_sched_domains(). @@ -1130,7 +1144,7 @@ static void update_tasks_cpumask(struct cpuset *cs) css_task_iter_start(&cs->css, 0, &it); while ((task = css_task_iter_next(&it))) - set_cpus_allowed_ptr(task, cs->effective_cpus); + cpuset_set_cpus_allowed_ptr(task, cs->effective_cpus); css_task_iter_end(&it); } @@ -2303,7 +2317,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) * can_attach beforehand should guarantee that this doesn't * fail. TODO: have a better way to handle failure here */ - WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach)); + WARN_ON_ONCE(cpuset_set_cpus_allowed_ptr(task, cpus_attach)); cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to); cpuset_update_task_spread_flag(cs, task); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index da0bf6fe9ecd..ab8ea6fa92db 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8034,6 +8034,7 @@ __sched_setaffinity(struct task_struct *p, const struct cpumask *mask) if (retval) goto out_free_new_mask; + p->cpus_affinity_set = 1; cpuset_cpus_allowed(p, cpus_allowed); if (!cpumask_subset(new_mask, cpus_allowed)) { /* -- 2.31.1