Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1161701rwd; Wed, 31 May 2023 10:11:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6TSa2+Ut/TtbcYfBuBGog5uTwfKi9m3H76iKV7V/k4xF1PO8SJEtNWJSgHpAGqRBQea1bm X-Received: by 2002:a05:6a00:1144:b0:64c:b8bd:4192 with SMTP id b4-20020a056a00114400b0064cb8bd4192mr5703445pfm.6.1685553063636; Wed, 31 May 2023 10:11:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685553063; cv=none; d=google.com; s=arc-20160816; b=NCQJZn99UMNsjsLbrYaKLJT5qBpx4YUzzdgBv3CM4hzLBlij7jsbXGO5x6dA6XdLJC HYx4yM5mKqh1TtYBG9OEXg2XT7X7/HzQx3090Nbr988xpx9h9NmCMUwBa3loNzeaHeJ+ v17aJGfyjkGjd9J+4/cZLlFVQiBD1atHffVlnVJrtuCngitbD555YzZubKuZ3oMQVtG7 o1I2ixcLPSYiKl4s3OhXO8gS03IyYcWOcKFZdDVCwWmC74PEwO4E0I2akzUpfmXMSwpV vG7drSYWnJjD6G+dpucQ1jdlM4k43xgfBM+alDvMQndb7wiF1Zw8rsB8WQmfVGoFJJyW WI0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=W5Ntvs4Cwyl2QgvwcOBrVxRXkGHt94aU7j3HrNtgJ4Y=; b=sjQBZNXMfsdS0ejogyVhfBtyju+p/t1IHhblw3Lrl3T0hSuYlN+JIY7mHU1r9ShajN crnVf+74yFXkkHY6J9Bf5NnuvwqOlYwHqRFz30pum3gEeRHY2NGD73PjhUf5MbiiKjR8 id5clOdH8nl9SNdbgIX/oPGU+DA692YmArXhI5uXQPaEeAAaa8qzLTIvbz/X/h8OarVq TCJkm2aNY7x2WCXvNeCx2p5r6ZPfImBrFroO88wFb0c5Fik4UPv6OOsH9JqbJi7sSMdO 4kfvf1mlzC+buYf4GgP/g5ZgZUgzInNm+K/JLRK+9o1YN6q02uZU8CQBFwQQ9h6sbnOQ 1cZg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UehugPJU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b13-20020aa78ecd000000b0064d5a66dd06si3838989pfr.374.2023.05.31.10.10.50; Wed, 31 May 2023 10:11:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=UehugPJU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229681AbjEaQfg (ORCPT + 99 others); Wed, 31 May 2023 12:35:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229520AbjEaQff (ORCPT ); Wed, 31 May 2023 12:35:35 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59CE7E48 for ; Wed, 31 May 2023 09:34:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685550855; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=W5Ntvs4Cwyl2QgvwcOBrVxRXkGHt94aU7j3HrNtgJ4Y=; b=UehugPJUjC1l0PrJwrRm76U4Q3smSOgdhu6701Ldfzy+Rq3Oetuv0Kg+oYmo8mH6mnf8Re nCE+of8a0rx+wjZVWDQVlHWjVjFwxpxewGT+e1nNQCi6zwtaSG+fwJIEALhnfNUdGAg8QL MrUC6lB6m/Gxv14hriNHYV0NVVvTwi0= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-418-pO9mQNz7PaS5ecqsEMv-Zw-1; Wed, 31 May 2023 12:34:12 -0400 X-MC-Unique: pO9mQNz7PaS5ecqsEMv-Zw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 95BAA1C05AA2; Wed, 31 May 2023 16:34:11 +0000 (UTC) Received: from llong.com (dhcp-17-153.bos.redhat.com [10.18.17.153]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8FCED2166B25; Wed, 31 May 2023 16:34:10 +0000 (UTC) From: Waiman Long To: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Mrunal Patel , Ryan Phillips , Brent Rowsell , Peter Hunt , Phil Auld , Waiman Long Subject: [PATCH v2 0/6] cgroup/cpuset: Support remote isolated partitions Date: Wed, 31 May 2023 12:33:59 -0400 Message-Id: <20230531163405.2200292-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org v2: - [v1] https://lore.kernel.org/lkml/20230412153758.3088111-1-longman@redhat.com/ - Dropped the special "isolcpus" partition in v1 - Add the root only "cpuset.cpus.reserve" control file for reserving CPUs used for remote isolated partitions. - Update the test_cpuset_prs.sh test script and documentation accordingly. This patch series introduces a new category of cpuset partition called remote partitions. The existing partition category where the partition roots have to be clustered around the root cgroup in a hierarchical way is now referred to as adjacent partitions. A remote partition can be formed far from the root cgroup with no partition root parent. The only commonality is that the CPUs that are used in the partition as specified in "cpuset.cpus" have to be present in the "cpuset.cpus" of all its ancestors. It is relatively rare to have applications that require creation of a separate scheduling domain (root). However, it is more common to have applications that require the use of isolated CPUs (isolated), e.g. DPDK. One can use the "isolcpus" or "nohz_full" boot command options to get that statically. Of course, the "isolated" partition is another way to achieve that dynamically. Modern container orchestration tools like Kubernetes use the cgroup hierarchy to manage different containers. And it is relying on other middleware like systemd to help managing it. If a container needs to use isolated CPUs, it is hard to get those with the adjacent partitions as it will require the administrative parent cgroup to be a partition root too which tool like systemd may not be ready to manage. With this patch series, a new root cgroup only "cpuset.cpus.reserve" file is added to specify the set of CPUs that can be used in partitions (whether remote or adjacent). To create a remote partition, the set of CPUs to be used in that partition (the "cpuset.cpus" file of the partition root) has to be reserved by manually adding them to that control file first. Then that partition can be activated by writing "isolated" into its "cpuset.cpus.partition". CPU reservation of adjacent partitions is done automatically without touching "cpuset.cpus.reserve" at all. Currently only remote isolated partitions are supported, we could support a scheduling partition ("root") in the future if the need arises. Additional isolation attributes like those with the "isolcpus" or "nohz" boot command line options may be supported in the isolated partitions in the future. Waiman Long (6): cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE handling cgroup/cpuset: Improve temporary cpumasks handling cgroup/cpuset: Add cpuset.cpus.reserve for top cpuset cgroup/cpuset: Introduce remote isolated partition cgroup/cpuset: Documentation update for partition cgroup/cpuset: Extend test_cpuset_prs.sh to test remote partition Documentation/admin-guide/cgroup-v2.rst | 92 ++- kernel/cgroup/cpuset.c | 749 +++++++++++++++--- .../selftests/cgroup/test_cpuset_prs.sh | 403 ++++++---- 3 files changed, 988 insertions(+), 256 deletions(-) -- 2.31.1