Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp6524675rwd; Mon, 5 Jun 2023 20:40:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6/MyLeO53lBPM809jKi1SjGenb8O7KRox4RdM+A3XCQ5il1sCdRoP3ggwunA1p87npHvJK X-Received: by 2002:a05:6870:a715:b0:19f:6fae:d606 with SMTP id g21-20020a056870a71500b0019f6faed606mr1234823oam.28.1686022817438; Mon, 05 Jun 2023 20:40:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686022817; cv=none; d=google.com; s=arc-20160816; b=ToUOQ48MtgmBvkmvMqN/F+VZTBdFhdB1QTPlOGlgl2Omc4Lcmcelcmdcum8zQkXvct 8KxiKcyjKw2SvoR4yUuLhxXR1kzoVaVHjKYIzR/8lML/XewUPvhZ85ngxzsGKv7pp20r pl40/OhpSY/DAuri7LcqXl8D4o4bO2y4gD+FpLpeDcdZWYXqjfrRecoouc0URhmkkYnA y0xrdJHWJsCFWhilziRo7txLGtwbGDqIphAPVyHaoZ1cVvW0bE3aL/UokNiEKbeNDVvc Es/br07uaIgvRxL/0316VLOc8B3CCNad+FzMwZWJ/PJ7Khl+naeNA93UdrwSn53kYk94 +VhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=TyrQkndqpMXQTJvEYsFWn/XnRkmO/S0R4Ew8kAm6EeM=; b=ijivfRIw/rb9jV7Q0sqtdLw7iomtzWZJF86Te+ND7Tecyr5bwU4fRJxNKND/VjBgdH QAqVsewbA6PnkM/FEFhPdIcB5iQFUT5enZTPpivwCUWUxwW/vbVxjeGE2zLzikkPbGFn 7wH62oOGe4jv3tzbPWkHKdBB1J2p2kp9YtzkAaiEwMuQdIL6DDO8Bqtqd3rUOimXxSAw r+sZAsV18KwRUptp1CKo0c8vtpE4uOYsPVfHXPYaA4+4tMYTmA1JnB/oY/1G1iKWdBHv x2+YnmkEpaFjhKRos+gnYgDUh2xo5SaUUDaqNLZs5jRMdiB1g7j9xpCyP3UJgJZl73+B plxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bLFd5Gd6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k13-20020a636f0d000000b0054033bf35easi6592029pgc.269.2023.06.05.20.40.04; Mon, 05 Jun 2023 20:40:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=bLFd5Gd6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234114AbjFFCr7 (ORCPT + 99 others); Mon, 5 Jun 2023 22:47:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232590AbjFFCr5 (ORCPT ); Mon, 5 Jun 2023 22:47:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B7988F for ; Mon, 5 Jun 2023 19:47:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686019635; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TyrQkndqpMXQTJvEYsFWn/XnRkmO/S0R4Ew8kAm6EeM=; b=bLFd5Gd6SCusaWae7Bu31IEuPAlKXYGoZDpfOQ879yDFjnl/1Vz9+FcmQ+ezTOLvt9NdTu QwVJ4l1qFyJvFhIx+iVrjkw+RrmvtuKY5FBaMXyJVC13dLLVvoY3asP2XBTqE0Ox3y+t9o q7YOgeYpnA8oujyRiqSPOoC+Dgq78zE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-564-bJSmIJ0JP_iCmZBWbtvQrA-1; Mon, 05 Jun 2023 22:47:10 -0400 X-MC-Unique: bJSmIJ0JP_iCmZBWbtvQrA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 012D63C0CEE0; Tue, 6 Jun 2023 02:47:10 +0000 (UTC) Received: from [10.22.8.27] (unknown [10.22.8.27]) by smtp.corp.redhat.com (Postfix) with ESMTP id C9E762026D49; Tue, 6 Jun 2023 02:47:08 +0000 (UTC) Message-ID: Date: Mon, 5 Jun 2023 22:47:08 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [RFC PATCH 0/5] cgroup/cpuset: A new "isolcpus" paritition Content-Language: en-US To: Tejun Heo Cc: =?UTF-8?Q?Michal_Koutn=c3=bd?= , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Mrunal Patel , Ryan Phillips , Brent Rowsell , Peter Hunt , Phil Auld References: <759603dd-7538-54ad-e63d-bb827b618ae3@redhat.com> <405b2805-538c-790b-5bf8-e90d3660f116@redhat.com> <18793f4a-fd39-2e71-0b77-856afb01547b@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/5/23 16:27, Tejun Heo wrote: > Hello, > > On Mon, Jun 05, 2023 at 04:00:39PM -0400, Waiman Long wrote: > ... >>> file seems hacky to me. e.g. How would it interact with namespacing? Are >>> there reasons why this can't be properly hierarchical other than the amount >>> of work needed? For example: >>> >>> cpuset.cpus.exclusive is a per-cgroup file and represents the mask of CPUs >>> that the cgroup holds exclusively. The mask is always a subset of >>> cpuset.cpus. The parent loses access to a CPU when the CPU is given to a >>> child by setting the CPU in the child's cpus.exclusive and the CPU can't >>> be given to more than one child. IOW, exclusive CPUs are available only to >>> the leaf cgroups that have them set in their .exclusive file. >>> >>> When a cgroup is turned into a partition, its cpuset.cpus and >>> cpuset.cpus.exclusive should be the same. For backward compatibility, if >>> the cgroup's parent is already a partition, cpuset will automatically >>> attempt to add all cpus in cpuset.cpus into cpuset.cpus.exclusive. >>> >>> I could well be missing something important but I'd really like to see >>> something like the above where the reservation feature blends in with the >>> rest of cpuset. >> It can certainly be made hierarchical as you suggest. It does increase >> complexity from both user and kernel point of view. >> >> From the user point of view, there is one more knob to manage hierarchically >> which is not used that often. > From user pov, this only affects them when they want to create partitions > down the tree, right? > >> From the kernel point of view, we may need to have one more cpumask per >> cpuset as the current subparts_cpus is used to track automatic reservation. >> We need another cpumask to contain extra exclusive CPUs not allocated >> through automatic reservation. The fact that you mention this new control >> file as a list of exclusively owned CPUs for this cgroup. Creating a >> partition is in fact allocating exclusive CPUs to a cgroup. So it kind of >> overlaps with the cpuset.cpus.partititon file. Can we fail a write to > Yes, it substitutes and expands on cpuset.cpus.partition behavior. > >> cpuset.cpus.exclusive if those exclusive CPUs cannot be granted or will this >> exclusive list is only valid if a valid partition can be formed. So we need >> to properly manage the dependency between these 2 control files. > So, I think cpus.exclusive can become the sole mechanism to arbitrate > exclusive owenership of CPUs and .partition can depend on .exclusive. > >> Alternatively, I have no problem exposing cpuset.cpus.exclusive as a >> read-only file. It is a bit problematic if we need to make it writable. > I don't follow. How would remote partitions work then? I had a different idea on the semantics of the cpuset.cpus.exclusive at the beginning. My original thinking is that it was the actual exclusive CPUs that are allocated to the cgroup. Now if we treat this as a hint of what exclusive CPUs should be used and it becomes valid only if the cgroup can become a valid partition. I can see it as a value that can be hierarchically set throughout the whole cpuset hierarchy. So a transition to a valid partition is possible iff 1) cpuset.cpus.exclusive is a subset of cpuset.cpus and is a subset of cpuset.cpus.exclusive of all its ancestors. 2) If its parent is not a partition root, none of the CPUs in cpuset.cpus.exclusive are currently allocated to other partitions. This the same remote partition concept in my v2 patch. If its parent is a partition root, part of its exclusive CPUs will be distributed to this child partition like the current behavior of cpuset partition. I can rework my patch to adopt this model if it is what you have in mind. Thanks, Longman