Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp1059825rdg; Fri, 13 Oct 2023 09:04:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHnCbtdH9GwXobXmRUuppIbLr6/jmkNWc5r4KF/gNghMdUsU9Q73loNdfH5qoxl9yJzDChs X-Received: by 2002:a05:6a00:15ca:b0:68e:3838:a2dc with SMTP id o10-20020a056a0015ca00b0068e3838a2dcmr29929871pfu.24.1697213058647; Fri, 13 Oct 2023 09:04:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697213058; cv=none; d=google.com; s=arc-20160816; b=j+28nOaY8+qj/pW2YMzB+D25sBYjbgNSZ+1rytQMiU3q4ZPTud3g6MpvY8uvc43k+R uNvcqhL5a3CoWeCm85aWxyurWB/LI2RZ+zx0fpRffRRgKUx8xUKJESb7fnn6rBG9H5Jk oKxrrTkHjEkc0My+i7KTMlPD/HEs9G53IgDyATVsSungqph5eNb5TVzk+p2fgCe6RB3R up9UCMcpReVyZtYO9eAG+6e08G89gYjge7YwjO59S8GKxMKq8BqmR9bNQ+Jpa1rZIjE5 viypUCxN7cjGp1+pZDQ1p6G0Gy7UvKXq/0/1YSrUPkVrz/A7z62SxR+d1Xzo1qvbX2zw JUlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=nqf6Wos6j/dOB8iX9uZ+sliE7IdSLRs7IhOwsFPasV8=; fh=VDtFl7bH14iIHnLisRTeuSMk2JB5YzAW7YYcBIcJVRA=; b=oN/t4gkbssIus02D86DwjJ/hDqeMkf+CPX24XvrHPjorbJDw/5o/FwXPu/x7009pyF QknP9faM4WfIPGfbl5zkwgV4M25MjSnzfAH0hA4RcWCzRJU9JgsJzL6TfaR3PLiJZ5wU m3Go+dxGcRfyomGyp3PGD8m71bu2srksV2lp5z3sb9AGe122a9MRyK5AuV2g7sgly6F9 5T5m8DSfhFliveXaobWgXALmN6BqYeRynz//yCiHzaz5SNXnjuqIyXq1uqYhA8NrC7il Y9VHmHghaoViyZ4E51539BJalPCXaSBHYHFXQ4N1RKQcKlg0pBxTbSjDP423FiRnPzfk XFRg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VODI7twv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id cw1-20020a056a00450100b00690a7aefa50si1051974pfb.325.2023.10.13.09.04.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Oct 2023 09:04:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=VODI7twv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id A526D831DBFB; Fri, 13 Oct 2023 09:04:17 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232530AbjJMQEP (ORCPT + 99 others); Fri, 13 Oct 2023 12:04:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232371AbjJMQEN (ORCPT ); Fri, 13 Oct 2023 12:04:13 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63FC7BF for ; Fri, 13 Oct 2023 09:03:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697213003; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nqf6Wos6j/dOB8iX9uZ+sliE7IdSLRs7IhOwsFPasV8=; b=VODI7twvozOQjvrYoVl3L/C37UefnbihSBANHao4JKIwX7V0sFsOrXMHNMvpy7cNDpoGOE BzqayvxV7Nkk714OaXlwXcD4CoCvcN2xt3HzKUJla/HeLOeT8vpohf2qBRVnmMxtnt8vYy S7bp0GjdWzdXZ7lkDn6BLijzoJ4259M= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-694-KMMaU5QkNK6QYcars7KklA-1; Fri, 13 Oct 2023 12:03:20 -0400 X-MC-Unique: KMMaU5QkNK6QYcars7KklA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3D804887E6B; Fri, 13 Oct 2023 16:03:19 +0000 (UTC) Received: from [10.22.17.138] (unknown [10.22.17.138]) by smtp.corp.redhat.com (Postfix) with ESMTP id 78D7D1C06535; Fri, 13 Oct 2023 16:03:18 +0000 (UTC) Message-ID: Date: Fri, 13 Oct 2023 12:03:18 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH v8 0/7] cgroup/cpuset: Support remote partitions Content-Language: en-US To: =?UTF-8?Q?Michal_Koutn=c3=bd?= Cc: Tejun Heo , Zefan Li , Johannes Weiner , Christian Brauner , Jonathan Corbet , Shuah Khan , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Giuseppe Scrivano References: <20230905133243.91107-1-longman@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7 X-Spam-Status: No, score=-5.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 13 Oct 2023 09:04:17 -0700 (PDT) On 10/13/23 11:50, Michal Koutný wrote: > Hello. > > (I know this is heading for 6.7. Still I wanted to have a look at this > after it stabilized somehow to understand the new concept better but I > still have some questions below.) > > On Tue, Sep 05, 2023 at 09:32:36AM -0400, Waiman Long wrote: >> Both scheduling and isolated partitions can be formed as a remote >> partition. A local partition can be created under a remote partition. >> A remote partition, however, cannot be formed under a local partition >> for now. >> >> >> With this patch series, we allow the creation of remote partition >> far from the root. The container management tool can manage the >> "cpuset.cpus.exclusive" file without impacting the other cpuset >> files that are managed by other middlewares. Of course, invalid >> "cpuset.cpus.exclusive" values will be rejected. > I take the example with a nested cgroup `cont` to which I want to > dedicate two CPUs (0 and 1). > IIUC, I can do this both with a chain of local root partitions or as a > single remote partion. > > > [chain] > root > | \ > mid1a mid1b > cpuset.cpus=0-1 cpuset.cpus=2-15 > cpuset.cpus.partition=root > | > mid2 > cpuset.cpus=0-1 > cpuset.cpus.partition=root > | > cont > cpuset.cpus=0-1 > cpuset.cpus.partition=root In this case, the effective CPUs of both mid1a and mid2 will be empty. IOW, you can't have any task in these 2 cpusets. > > [remote] > root > | \ > mid1a mid1b > cpuset.cpus.exclusive=0-1 cpuset.cpus=2-15 > | > mid2 > cpuset.cpus.exclusive=0-1 > | > cont > cpuset.cpus.exclusive=0-1 > cpuset.cpus.partition=root > > In the former case I must configure cpuset.cpus and > cpuset.cpus.partition along the whole path and in the second case > cpuset.cpus.exclusive still along the whole path and root at the bottom > only. > > What is the difference between the two configs above? > (Or can you please give an example where the remote partitions are > better illustrated?) For the remote case, you can have intermediate tasks in both mid1a and mid2 as long as cpuset.cpus contains more CPUs than cpuset.cpus.exclusive. > >> Modern container orchestration tools like Kubernetes use the cgroup >> hierarchy to manage different containers. And it is relying on other >> middleware like systemd to help managing it. If a container needs to >> use isolated CPUs, it is hard to get those with the local partitions >> as it will require the administrative parent cgroup to be a partition >> root too which tool like systemd may not be ready to manage. > Such tools ready aren't ready to manage cpuset.cpus.exclusive, are they? > IOW tools need to distinguish exclusive and "shared" CPUs which is equal > to distinguishing root and member partitions. They will be ready eventually. This requirement of remote partition actually came from our OpenShift team as the use of just local partition did not meet their need. They don't need access to exclusive CPUs in the parent cgroup layer for their management daemons. They do need to activate isolated partition in selected child cgroups to support our Telco customers to run workloads like DPDK. So they will add the support to upstream Kubernetes. Cheers, Longman