Received: by 2002:a05:7412:f584:b0:e2:908c:2ebd with SMTP id eh4csp2020620rdb; Tue, 5 Sep 2023 11:48:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFqGDb0EVr+IkxCtj359IxcdK1b5z6AFnGgt2NrgDVtFbRhJrPMDQiERfJfU8wLV/0bILXr X-Received: by 2002:a17:906:1011:b0:9a1:aea8:cb5a with SMTP id 17-20020a170906101100b009a1aea8cb5amr10128586ejm.1.1693939683374; Tue, 05 Sep 2023 11:48:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693939683; cv=none; d=google.com; s=arc-20160816; b=ucRxqAg6846m/87vmRbHGRyQ2nDsig7a/WI86f5ij3oB2g7C0GHun+PaQhtoSPThki TOYNNnKRuTmHd1ia5F4DMXpdL6JYy3Qj75y9CZMqLOFv7JvBJF//jtmAaWYzq37mKBrG ytj/8I1/2kulpoWbK+7YURTx45T72T0HV+lL+Knbo7UE6tPkN5ub3Tq5Xecf4Zd84A2+ ST11zU+c/bQYdNW2oweFZtvDAOObd+hhsWOSo8q4z+k9wbDfbm6XyUC/fv2oM/iL+Vzj LHuVrmTm3JR4Y7OWpa9QxIUpbtqCjVOOvaz4pPMzbUutjU1GS4NdgzW6syd8SHR+KKIP EAaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=W9wBl40MkYoGzFPm5en81DOVoGwAVK0bYVsRz4XcIR4=; fh=IQxjsirOcGs5aml5QGReyPM9WBq5qpKPq3zJTVbTvXQ=; b=u0JNV7NKjGrT0VEL0W+H/p4NVX+Fr7vVso56bDSmZQ28UQnDULGcq82uTagG0b9wYf 8Y6CjCQuYhbMNPCO1zp9ed0J86j0OprY1CoXapSCCuTg1wGoPkx2OntdkfWGa5t36+BX v5WuluXvxC27u8J2jOuSteGxFri0PEnamYJowGXxhSzuQJE/z8MMLtLy3eAdV8Z389z8 gnINMexZPOtWRSzyVxnhVaiIvbTqObL0Glv4kcKJyZcLYLlfxTmE36aDg9AImFXhoNS+ UDG+bQePkPRvNKu0M+FIvvnRLYNwGDI6pudhHXmI5jpbB+NEqjsgG+jxRD4c5D/zBkU0 Aw3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q6-20020a170906940600b009a1c2fcdceasi7783563ejx.109.2023.09.05.11.47.24; Tue, 05 Sep 2023 11:48:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234576AbjIEQgI (ORCPT + 99 others); Tue, 5 Sep 2023 12:36:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354681AbjIENdj (ORCPT ); Tue, 5 Sep 2023 09:33:39 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0006A1A7 for ; Tue, 5 Sep 2023 06:32:55 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-307-sN58GK3pOoCccjSxOJpMLQ-1; Tue, 05 Sep 2023 09:32:51 -0400 X-MC-Unique: sN58GK3pOoCccjSxOJpMLQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C60B23811F3A; Tue, 5 Sep 2023 13:32:50 +0000 (UTC) Received: from llong.com (unknown [10.22.9.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0A16D20BAE72; Tue, 5 Sep 2023 13:32:50 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Christian Brauner , Jonathan Corbet , Shuah Khan Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Giuseppe Scrivano , Waiman Long Subject: [PATCH v8 0/7] cgroup/cpuset: Support remote partitions Date: Tue, 5 Sep 2023 09:32:36 -0400 Message-Id: <20230905133243.91107-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org v8: - Add a new patch 1 to fix a load balance state problem. - Add new test cases to the test script and fixes some bugs in error handling. v7: - https://lore.kernel.org/lkml/d0380dfa-ee2e-e492-38e3-31bf6644e511@redhat.com/ - Fix a compilation problem in patch 1 & a memory allocation bug in patch 2. - Change exclusive_cpus type to cpumask_var_t to match other cpumasks and make code more consistent. v6: - https://lore.kernel.org/lkml/20230713172601.3285847-1-longman@redhat.com/ - Add another read-only cpuset.cpus.exclusive.effective control file to expose the effective set of exclusive CPUs. - Update the documentation and test accordingly. This patch series introduces new cpuset control files "cpuset.cpus.exclusive" (read-write) and "cpuset.cpus.exclusive.effective" (read only) for better control of which exclusive CPUs are being distributed down the cgroup hierarchy for creating cpuset partition. Any one of the exclusive CPUs can only be distributed to at most one child cpuset. Invalid input to "cpuset.cpus.exclusive" that violates the sibling exclusivity rule will be rejected. This new control files has no effect on the behavior of the cpuset until it turns into a partition root. At that point, its effective CPUs will be set to its exclusive CPUs unless some of them are offline. This patch series also introduces a new category of cpuset partition called remote partitions. The existing partition category where the partition roots have to be clustered around the root cgroup in a hierarchical way is now referred to as local partitions. A remote partition can be formed far from the root cgroup with no partition root parent. While local partitions can be created without touching "cpuset.cpus.exclusive" as it can be set automatically if a cpuset becomes a local partition root. Properly setting "cpuset.cpus.exclusive" values down the hierarchy are required to create a remote partition. Both scheduling and isolated partitions can be formed as a remote partition. A local partition can be created under a remote partition. A remote partition, however, cannot be formed under a local partition for now. Modern container orchestration tools like Kubernetes use the cgroup hierarchy to manage different containers. And it is relying on other middleware like systemd to help managing it. If a container needs to use isolated CPUs, it is hard to get those with the local partitions as it will require the administrative parent cgroup to be a partition root too which tool like systemd may not be ready to manage. With this patch series, we allow the creation of remote partition far from the root. The container management tool can manage the "cpuset.cpus.exclusive" file without impacting the other cpuset files that are managed by other middlewares. Of course, invalid "cpuset.cpus.exclusive" values will be rejected. Waiman Long (7): cgroup/cpuset: Fix load balance state in update_partition_sd_lb() cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2 cgroup/cpuset: Add cpuset.cpus.exclusive for v2 cgroup/cpuset: Introduce remote partition cgroup/cpuset: Check partition conflict with housekeeping setup cgroup/cpuset: Documentation update for partition cgroup/cpuset: Extend test_cpuset_prs.sh to test remote partition Documentation/admin-guide/cgroup-v2.rst | 123 +- kernel/cgroup/cpuset.c | 1279 ++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 458 ++++-- 3 files changed, 1366 insertions(+), 494 deletions(-) -- 2.31.1