Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp17815533rwd; Tue, 27 Jun 2023 07:56:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4gEdiMi+/W9sjTUVIhh/2zqsbg2vzjOVkl70OEQ6/ahfmHO8XkNUTgayDFvq7LoHyTAkKk X-Received: by 2002:a50:e613:0:b0:51a:532d:2fd7 with SMTP id y19-20020a50e613000000b0051a532d2fd7mr17210073edm.42.1687877809959; Tue, 27 Jun 2023 07:56:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687877809; cv=none; d=google.com; s=arc-20160816; b=ciR6vBBMlnPQmnsPFFFiK8ZlrVJORGfHgMukCrlpgU2SSv82BnTG2P/hghFdQ//S6M 97Lb+gCY7E5YMtIdVU+/I+qMn8bOjYc4MfH50VUjGP0kulpzktzK/y8ESHPL/4u+XuLB 9/B9HrAZlxQcyszj36d1HgRjWf3MJ+srhg1Fam/yY+jW7hH9HtuMpdlups6oPk7Aito6 yI5hJ7D0WyYDkKZZjdqmuVkNB4RO+afEXDVHH2ZMAlD8rlnveF/8P0dVcKducBoid284 Z572WL7wolCWdR7rRUi8H0icCXslTy86/V6sr0XqIrIMSqPNiV56drnLR+2FIJ5uIdqw ZaNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Vt7eR5Warck2Ap0bmlsSY+DSBbefbEPsBcMz8g8y+lE=; fh=e85PI7WvowV/CVl68S8nEijXq1ovh2xW2v/QK0gu/Hc=; b=RUiBbGFIaBLcLmVjqPXYr2ExvxIKjzyBBbAUKPmyHR4PEOsCbkhKj8g/wTTcagN+TN d+c0O/nVdafaRHrmMaFQ1g+cqvvyaKfZ6XGAQBu1XduqTEoT03bFGI4UIbG99DPZu5Qn D2JjUfrh8+CwYwRaxQOVeZ8wQUC4pxQsZh6byBhhjikFkkwaB+hvl6Cm/jvSYZxCwFuS BirW9be4d3pETQatYBuLIgQnMDDJ5q0nHgGearfjQ8AOzsxW1BFGuKrTtMG8+/daoXHT JX474MtMug6Pvrnny2UIFlJsCgYAh/k7TMmZCcioCPJ6Z2mPG2GCn72MEN0GaE2FLqQj NATA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TePE6k0G; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g16-20020a50ee10000000b0051d554a84b0si3758691eds.67.2023.06.27.07.56.25; Tue, 27 Jun 2023 07:56:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TePE6k0G; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231533AbjF0OhX (ORCPT + 99 others); Tue, 27 Jun 2023 10:37:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38174 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231566AbjF0OhK (ORCPT ); Tue, 27 Jun 2023 10:37:10 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6364E35A6 for ; Tue, 27 Jun 2023 07:36:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687876571; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Vt7eR5Warck2Ap0bmlsSY+DSBbefbEPsBcMz8g8y+lE=; b=TePE6k0GtFW9aPQSNkaDciZn+rHKCEffrc1BdNri1yCULOQYwTHL8PCTAXdUsW0xa+r2wG BfElZUXEHzt2AZ7W8b+2NQlrxm6nYy5+3K5GcITAq2ldWm5gLBFfjFDpPVT5lNclyZNuNj zd9/iU+Zu5/SlDIvMAA/cDhn5SvrmkY= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-649-80VyoFvJNiu59IAEKwFGwA-1; Tue, 27 Jun 2023 10:36:08 -0400 X-MC-Unique: 80VyoFvJNiu59IAEKwFGwA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id CF69E280AA50; Tue, 27 Jun 2023 14:35:31 +0000 (UTC) Received: from llong.com (unknown [10.22.10.32]) by smtp.corp.redhat.com (Postfix) with ESMTP id CBED140C2063; Tue, 27 Jun 2023 14:35:30 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Mrunal Patel , Ryan Phillips , Brent Rowsell , Peter Hunt , Phil Auld , Waiman Long Subject: [PATCH v4 0/9] cgroup/cpuset: Support remote partitions Date: Tue, 27 Jun 2023 10:34:59 -0400 Message-Id: <20230627143508.1576882-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org v4: - [v3] https://lore.kernel.org/lkml/20230627005529.1564984-1-longman@redhat.com/ - Fix compilation problem reported by kernel test robot. v3: - [v2] https://lore.kernel.org/lkml/20230531163405.2200292-1-longman@redhat.com/ - Change the new control file from root-only "cpuset.cpus.reserve" to non-root "cpuset.cpus.exclusive" which lists the set of exclusive CPUs distributed down the hierarchy. - Add a patch to restrict boot-time isolated CPUs to isolated partitions only. - Update the test_cpuset_prs.sh test script and documentation accordingly. This patch series introduces a new cpuset control file "cpuset.cpus.exclusive" which must be a subset of "cpuset.cpus" and the parent's "cpuset.cpus.exclusive". This control file lists the exclusive CPUs to be distributed down the hierarchy. Any one of the exclusive CPUs can only be distributed to at most one child cpuset. Unlike "cpuset.cpus", invalid input to "cpuset.cpus.exclusive" will be rejected with an error. This new control file has no effect on the behavior of the cpuset until it turns into a partition root. At that point, its effective CPUs will be set to its exclusive CPUs unless some of them are offline. This patch series also introduces a new category of cpuset partition called remote partitions. The existing partition category where the partition roots have to be clustered around the root cgroup in a hierarchical way is now referred to as local partitions. A remote partition can be formed far from the root cgroup with no partition root parent. While local partitions can be created without touching "cpuset.cpus.exclusive" as it can be set automatically if a cpuset becomes a local partition root. Properly set "cpuset.cpus.exclusive" values down the hierarchy are required to create a remote partition. Both scheduling and isolated partitions can be formed in a remote partition. A local partition can be created under a remote partition. A remote partition, however, cannot be formed under a local partition for now. Modern container orchestration tools like Kubernetes use the cgroup hierarchy to manage different containers. And it is relying on other middleware like systemd to help managing it. If a container needs to use isolated CPUs, it is hard to get those with the local partitions as it will require the administrative parent cgroup to be a partition root too which tool like systemd may not be ready to manage. With this patch series, we allow the creation of remote partition far from the root. The container management tool can manage the "cpuset.cpus.exclusive" file without impacting the other cpuset files that are managed by other middlewares. Of course, invalid "cpuset.cpus.exclusive" values will be rejected and changes to "cpuset.cpus" can affect the value of "cpuset.cpus.exclusive" due to the requirement that it has to be a subset of the former control file. Waiman Long (9): cgroup/cpuset: Inherit parent's load balance state in v2 cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE handling cgroup/cpuset: Improve temporary cpumasks handling cgroup/cpuset: Allow suppression of sched domain rebuild in update_cpumasks_hier() cgroup/cpuset: Add cpuset.cpus.exclusive for v2 cgroup/cpuset: Introduce remote partition cgroup/cpuset: Check partition conflict with housekeeping setup cgroup/cpuset: Documentation update for partition cgroup/cpuset: Extend test_cpuset_prs.sh to test remote partition Documentation/admin-guide/cgroup-v2.rst | 100 +- kernel/cgroup/cpuset.c | 1347 ++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 398 +++-- 3 files changed, 1291 insertions(+), 554 deletions(-) -- 2.31.1