Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp2008869rdb; Tue, 3 Oct 2023 07:45:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGxErZyfapTDvKMxJ2zosC74lLLLT2zhAlfyTrKUE1HMJED2y8RHjbLUKX/fa0YK7XPiAym X-Received: by 2002:a17:90b:3b8a:b0:277:183e:185a with SMTP id pc10-20020a17090b3b8a00b00277183e185amr11471878pjb.3.1696344343390; Tue, 03 Oct 2023 07:45:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696344343; cv=none; d=google.com; s=arc-20160816; b=I/J7J5SkzBEfvAbqONC7SlowQlALXUbTDKiEKVmJISy+gFOjzW7qHJ9Zle7k9YLl4s MmGthIRIagnjOGJT05Lv3RR4NmuwUW0DO/DmjUtfN8q/eFAE69w12DO3yasxQTV88kSW R1zXOGRFpIhI2a6Vd6AiD2UwCR7DP6/duU/ggHJqbDFym8QP9yNUC9j+Mppgq+C9OxYI Cj62QtBqjN+iJMG72/c95r1fU1APHEZH4mmSz9jyDSdtgxQzEkwLX03jbd1oDwMu7OZH iF921KByOyu5g5bn2aM9f/kZ5jp8ypc77nZ2o6nNZyMZASuNotXyX7cYV9qDbq/7l7f5 OysA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=VPxUpX5ttwO6GbR8TKDJ9qUBldBgRlK4eeZ7J20+rxc=; fh=nYtnLP0snl4OZx2tgt9kIIMLhCYkBZnTBlqtakJ6wf0=; b=XXpYiBEElsqOvy+7lwAKSpp77FT0PUqHr2vaV28Yte915lcINodTSsAOeFcTnmQkk7 LfxEhYs2j2KeOvefO9YoQkKcN+KEFHPY2oPqu0JBzI3hgbZ8nFH+j+6++/rjN4sJsdmy m+LKSfe5zfoNN4znCO2GknTglOLUoR/fa/JhKr2gsvGtMs6Hwq/bWEi7Bls+GT6BsBgx oT1H5qjMWqQozFcNUGnu9+hVD7lpB7Th4pc2aPW4SY7DxoInjvxF718JDW5XpvxFW3Di C6hUaR+ssdZLpeXVqQKC/3Vl/bIaOstJv7NKN6h08f9JIf24Db5syLa2jY4pbCFiHUbu tRwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=GGZ39ZC6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id i16-20020a17090adc1000b0027468369b4fsi9821064pjv.177.2023.10.03.07.45.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 07:45:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=GGZ39ZC6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 7E456807EBB0; Tue, 3 Oct 2023 07:45:37 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240125AbjJCOpa (ORCPT + 99 others); Tue, 3 Oct 2023 10:45:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240126AbjJCOp2 (ORCPT ); Tue, 3 Oct 2023 10:45:28 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 51BF7B4 for ; Tue, 3 Oct 2023 07:44:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1696344278; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=VPxUpX5ttwO6GbR8TKDJ9qUBldBgRlK4eeZ7J20+rxc=; b=GGZ39ZC6SSvtNydX7oWMpeSLDqHoBJ6jU5sLtkccTDoLf2Zx+DjopK2O1TKMaaJhru/M6f RXw7un7bB7pWwdcNpqt19MmbtSSYzHg5bI5/+8XQy9e1O2YlQ3n9oa+ZuIXkZtOAWiX/3+ RnL/lIVTbVMo7MjDR9Xd2dkbs81ba8M= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-423-IVCw3C5KP2KXQwEfOeq6gA-1; Tue, 03 Oct 2023 10:44:33 -0400 X-MC-Unique: IVCw3C5KP2KXQwEfOeq6gA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5837729ABA3C; Tue, 3 Oct 2023 14:44:32 +0000 (UTC) Received: from llong.com (unknown [10.22.10.176]) by smtp.corp.redhat.com (Postfix) with ESMTP id BEE3740C6EA8; Tue, 3 Oct 2023 14:44:31 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Shuah Khan Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Pierre Gondois , Waiman Long Subject: [PATCH-cgroup v2] cgroup/cpuset: Enable invalid to valid local partition transition Date: Tue, 3 Oct 2023 10:44:20 -0400 Message-Id: <20231003144420.2895515-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 03 Oct 2023 07:45:37 -0700 (PDT) When a local partition becomes invalid, it won't transition back to valid partition automatically if a proper "cpuset.cpus.exclusive" or "cpuset.cpus" change is made. Instead, system administrators have to explicitly echo "root" or "isolated" into the "cpuset.cpus.partition" file at the partition root. This patch now enables the automatic transition of an invalid local partition back to valid when there is a proper "cpuset.cpus.exclusive" or "cpuset.cpus" change. Automatic transition of an invalid remote partition to a valid one, however, is not covered by this patch. They still need an explicit write to "cpuset.cpus.partition" to become valid again. The test_cpuset_prs.sh test script is updated to add new test cases to test this automatic state transition. Reported-by: Pierre Gondois Link: https://lore.kernel.org/lkml/9777f0d2-2fdf-41cb-bd01-19c52939ef42@arm.com Signed-off-by: Waiman Long --- kernel/cgroup/cpuset.c | 79 +++++++++++-------- .../selftests/cgroup/test_cpuset_prs.sh | 21 +++-- 2 files changed, 62 insertions(+), 38 deletions(-) v2: Add documentation about the new X designator in test_cpuset_prs.sh. diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 15f399153a2e..93facdab513c 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1806,17 +1806,28 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd, * * Compute add/delete mask to/from effective_cpus * - * addmask = effective_xcpus & ~newmask & parent->effective_xcpus - * delmask = newmask & ~cs->effective_xcpus - * & parent->effective_xcpus + * For valid partition: + * addmask = exclusive_cpus & ~newmask + * & parent->effective_xcpus + * delmask = newmask & ~exclusive_cpus + * & parent->effective_xcpus + * + * For invalid partition: + * delmask = newmask & parent->effective_xcpus */ - cpumask_andnot(tmp->addmask, xcpus, newmask); - adding = cpumask_and(tmp->addmask, tmp->addmask, - parent->effective_xcpus); + if (is_prs_invalid(old_prs)) { + adding = false; + deleting = cpumask_and(tmp->delmask, + newmask, parent->effective_xcpus); + } else { + cpumask_andnot(tmp->addmask, xcpus, newmask); + adding = cpumask_and(tmp->addmask, tmp->addmask, + parent->effective_xcpus); - cpumask_andnot(tmp->delmask, newmask, xcpus); - deleting = cpumask_and(tmp->delmask, tmp->delmask, - parent->effective_xcpus); + cpumask_andnot(tmp->delmask, newmask, xcpus); + deleting = cpumask_and(tmp->delmask, tmp->delmask, + parent->effective_xcpus); + } /* * Make partition invalid if parent's effective_cpus could * become empty and there are tasks in the parent. @@ -1910,9 +1921,11 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd, /* * Transitioning between invalid to valid or vice versa may require - * changing CS_CPU_EXCLUSIVE. + * changing CS_CPU_EXCLUSIVE. In the case of partcmd_update, + * validate_change() has already been successfully called and + * CPU lists in cs haven't been updated yet. So defer it to later. */ - if (old_prs != new_prs) { + if ((old_prs != new_prs) && (cmd != partcmd_update)) { int err = update_partition_exclusive(cs, new_prs); if (err) @@ -1960,6 +1973,9 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd, spin_unlock_irq(&callback_lock); + if ((old_prs != new_prs) && (cmd == partcmd_update)) + update_partition_exclusive(cs, new_prs); + if (adding || deleting) { update_tasks_cpumask(parent, tmp->addmask); update_sibling_cpumasks(parent, cs, tmp); @@ -2356,8 +2372,9 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (alloc_cpumasks(NULL, &tmp)) return -ENOMEM; - if (is_partition_valid(cs)) { - if (cpumask_empty(trialcs->effective_xcpus)) { + if (old_prs) { + if (is_partition_valid(cs) && + cpumask_empty(trialcs->effective_xcpus)) { invalidate = true; cs->prs_err = PERR_INVCPUS; } else if (prstate_housekeeping_conflict(old_prs, trialcs->effective_xcpus)) { @@ -2391,13 +2408,16 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, */ invalidate = true; rcu_read_lock(); - cpuset_for_each_child(cp, css, parent) + cpuset_for_each_child(cp, css, parent) { + struct cpumask *xcpus = fetch_xcpus(trialcs); + if (is_partition_valid(cp) && - cpumask_intersects(trialcs->effective_xcpus, cp->effective_xcpus)) { + cpumask_intersects(xcpus, cp->effective_xcpus)) { rcu_read_unlock(); update_parent_effective_cpumask(cp, partcmd_invalidate, NULL, &tmp); rcu_read_lock(); } + } rcu_read_unlock(); retval = 0; } @@ -2405,18 +2425,24 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (retval < 0) goto out_free; - if (is_partition_valid(cs)) { + if (is_partition_valid(cs) || + (is_partition_invalid(cs) && !invalidate)) { + struct cpumask *xcpus = trialcs->effective_xcpus; + + if (cpumask_empty(xcpus) && is_partition_invalid(cs)) + xcpus = trialcs->cpus_allowed; + /* * Call remote_cpus_update() to handle valid remote partition */ if (is_remote_partition(cs)) - remote_cpus_update(cs, trialcs->effective_xcpus, &tmp); + remote_cpus_update(cs, xcpus, &tmp); else if (invalidate) update_parent_effective_cpumask(cs, partcmd_invalidate, NULL, &tmp); else update_parent_effective_cpumask(cs, partcmd_update, - trialcs->effective_xcpus, &tmp); + xcpus, &tmp); } else if (!cpumask_empty(cs->exclusive_cpus)) { /* * Use trialcs->effective_cpus as a temp cpumask @@ -2493,7 +2519,7 @@ static int update_exclusive_cpumask(struct cpuset *cs, struct cpuset *trialcs, if (retval) return retval; - if (is_partition_valid(cs)) { + if (old_prs) { if (cpumask_empty(trialcs->effective_xcpus)) { invalidate = true; cs->prs_err = PERR_INVCPUS; @@ -2927,19 +2953,10 @@ static int update_prstate(struct cpuset *cs, int new_prs) return 0; /* - * For a previously invalid partition root with valid partition root - * parent, treat it as if it is a "member". Otherwise, reject it as - * remote partition cannot currently self-recover from an invalid - * state. + * Treat a previously invalid partition root as if it is a "member". */ - if (new_prs && is_prs_invalid(old_prs)) { - if (is_partition_valid(parent)) { - old_prs = PRS_MEMBER; - } else { - cs->partition_root_state = -new_prs; - return 0; - } - } + if (new_prs && is_prs_invalid(old_prs)) + old_prs = PRS_MEMBER; if (alloc_cpumasks(NULL, &tmpmask)) return -ENOMEM; diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh index 0f4f4a57ae12..a6e9848189d6 100755 --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh @@ -220,7 +220,8 @@ test_isolated() # +- B1 # # P = set cpus.partition (0:member, 1:root, 2:isolated) -# C = add cpu-list +# C = add cpu-list to cpuset.cpus +# X = add cpu-list to cpuset.cpus.exclusive # S

= use prefix in subtree_control # T = put a task into cgroup # O= = Write to CPU online file of @@ -318,16 +319,19 @@ TEST_MATRIX=( " C0-3:S+ C1-3:S+ C2 . X2-3 X2-3 T:P2:O2=0 O2=1 0 A1:0-3,A2:1-3,A3:2 A1:P0,A3:P-2" # cpus.exclusive.effective clearing test - " C0-3:S+ C1-3:S+ C2 . X2-3:X . . . 0 A1:0-3,A2:1-3,A3:2,XA1:" + " C0-3:S+ C1-3:S+ C2 . X2-3:X . . . 0 A1:0-3,A2:1-3,A3:2,XA1:" - # Invalid to valid remote partition indirect transition test via member - " C0-3:S+ C1-3 . . . X3:P2 . . 0 A1:0-3,A2:1-3,XA2: A2:P-2" + # Invalid to valid remote partition transition test + " C0-3:S+ C1-3 . . . X3:P2 . . 0 A1:0-3,A2:1-3,XA2: A2:P-2" " C0-3:S+ C1-3:X3:P2 - . . X2-3 P0:P2 . . 0 A1:0-2,A2:3,XA2:3 A2:P2 3" + . . X2-3 P2 . . 0 A1:0-2,A2:3,XA2:3 A2:P2 3" # Invalid to valid local partition direct transition tests - " C1-3:S+:P2 C2-3:X1:P2 . . . . . . 0 A1:1-3,XA1:1-3,A2:2-3:XA2: A1:P2,A2:P-2 1-3" - " C1-3:S+:P2 C2-3:X1:P2 . . . X3:P2 . . 0 A1:1-2,XA1:1-3,A2:3:XA2:3 A1:P2,A2:P2 1-3" + " C1-3:S+:P2 C2-3:X1:P2 . . . . . . 0 A1:1-3,XA1:1-3,A2:2-3:XA2: A1:P2,A2:P-2 1-3" + " C1-3:S+:P2 C2-3:X1:P2 . . . X3:P2 . . 0 A1:1-2,XA1:1-3,A2:3:XA2:3 A1:P2,A2:P2 1-3" + " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4,B1:4-6 A1:P-2,B1:P0" + " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3,B1:4-6 A1:P2,B1:P0 0-3" + " C0-3:P2 . . C3-5:C4-5 . . . . 0 A1:0-3,B1:4-5 A1:P2,B1:P0 0-3" # Local partition invalidation tests " C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \ @@ -336,6 +340,9 @@ TEST_MATRIX=( . . X4 . . 0 A1:1-3,A2:1-3,A3:2-3,XA2:,XA3: A1:P2,A2:P-2,A3:P-2 1-3" " C0-3:X1-3:S+:P2 C1-3:X2-3:S+:P2 C2-3:X3:P2 \ . . C4 . . 0 A1:1-3,A2:1-3,A3:2-3,XA2:,XA3: A1:P2,A2:P-2,A3:P-2 1-3" + # Local partition CPU change tests + " C0-5:S+:P2 C4-5:S+:P1 . . . C3-5 . . 0 A1:0-2,A2:3-5 A1:P2,A2:P1 0-2" + " C0-5:S+:P2 C4-5:S+:P1 . . C1-5 . . . 0 A1:1-3,A2:4-5 A1:P2,A2:P1 1-3" # cpus_allowed/exclusive_cpus update tests " C0-3:X2-3:S+ C1-3:X2-3:S+ C2-3:X2-3 \ -- 2.39.3