Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp1102422rdb; Wed, 1 Nov 2023 11:17:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFXooaMl68jiG3wkrc3oa41iVn4HTbmwNBZxy1dNQP27Ta84VeRgKnCEPwyU8iZEmMVlJpD X-Received: by 2002:a05:6358:e490:b0:168:ec55:d164 with SMTP id by16-20020a056358e49000b00168ec55d164mr18542837rwb.25.1698862619773; Wed, 01 Nov 2023 11:16:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698862619; cv=none; d=google.com; s=arc-20160816; b=FTat/VO97WqYPt4glKrMEAO6lXgar1UGbCtpK9F8MrXFB0vEmbS5N8P/ORoEHMpTFo Otu8o48U5raRo2mK88g+a1g3Y7oWP2zL5zO0+Jc1CeHC25SB3xmWctTvjP86dcFDcq3F 1Hr7jCQnhlXQGa8Mfh8rW9nG3Kics+S7NDaQZgpPPMDiOt4D+aUQEs7YFbLxK/1YPzAq kzvK9ntThiuT9noLgqfMi6s/Azdcc0o2I0TKEOQiB/1ZFegqyiim/Vg7jdHE/peFvWHC J5SkOoq25LathHNa+II18KCnoWD/ut3xKAa0bLIVeGphhQT8n7z7+jFiyVEaVphaJJYy dNNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=n284L869YZxY8n/xQrB63AFMrPs7jEpIh/yllTjR+Bc=; fh=VDtFl7bH14iIHnLisRTeuSMk2JB5YzAW7YYcBIcJVRA=; b=pg6ZfoUn73RmYEFjPmM7tSTbBHFr0i4Sd/YnTYiKU92AsH/MafN3Rs43ZN+Pk4kLOM 8XJmWQK1IIOdRUcLiAFz8fF8+DIRouqw/fr2BQcJmpTKH0HQa4VaQgHZN9SklEAsd6i1 cS8/tGMdZEanVdfrVOMqfrPn4rSkAhcqyuRWBmhVErkGpJ7NfyKWutCVqIcRxiQdmWvN XxnHozPEnaSTRwHvqmRI4ivHHJ+AJ6UZjqinxqOBFdtUpDG9a3ksHjseR9Wxw3s4BGVd i+BOnCDShgTbS6WyOv3M7nQ/fo+jM8jQuKGM4EmTRol3nGufMCmjwuZQjS/upBVe15Np QjXA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=QQHLuqp8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id l133-20020a633e8b000000b005b982b93780si391347pga.251.2023.11.01.11.16.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 11:16:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=QQHLuqp8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 50B0880289E2; Wed, 1 Nov 2023 11:16:26 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343964AbjKASQG (ORCPT + 99 others); Wed, 1 Nov 2023 14:16:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232071AbjKASQE (ORCPT ); Wed, 1 Nov 2023 14:16:04 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32F6F10C1 for ; Wed, 1 Nov 2023 11:14:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698862485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=n284L869YZxY8n/xQrB63AFMrPs7jEpIh/yllTjR+Bc=; b=QQHLuqp8VmnFJ87hSNC4hSdFqqk3RRkZ1xZoctO3RHQcceiwiR3ZF3n0jJ8DH7euvuzmK9 4SVFQTCjVorkbl0RWChS9LyC+ZSEuK/X3fjJXiOtigxH5drP2G37I+M0SaO6bJLTli8GeG 7ZMPglCn1MY9LTCkYM8VafeA4nZZIqE= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-198-ROHZxsdVOS6YXr5xEWSiZA-1; Wed, 01 Nov 2023 14:14:40 -0400 X-MC-Unique: ROHZxsdVOS6YXr5xEWSiZA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8CF7A811E7E; Wed, 1 Nov 2023 18:14:39 +0000 (UTC) Received: from [10.22.33.245] (unknown [10.22.33.245]) by smtp.corp.redhat.com (Postfix) with ESMTP id A37552026D4C; Wed, 1 Nov 2023 18:14:38 +0000 (UTC) Message-ID: <7dc9cf67-b482-a723-c779-14c7598e1869@redhat.com> Date: Wed, 1 Nov 2023 14:14:38 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH v8 0/7] cgroup/cpuset: Support remote partitions Content-Language: en-US To: =?UTF-8?Q?Michal_Koutn=c3=bd?= Cc: Tejun Heo , Zefan Li , Johannes Weiner , Christian Brauner , Jonathan Corbet , Shuah Khan , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Giuseppe Scrivano References: <20230905133243.91107-1-longman@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-Spam-Status: No, score=-5.0 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 01 Nov 2023 11:16:26 -0700 (PDT) On 10/24/23 12:13, Michal Koutný wrote: > On Fri, Oct 13, 2023 at 12:03:18PM -0400, Waiman Long wrote: >>> [chain] >>> root >>> | \ >>> mid1a mid1b >>> cpuset.cpus=0-1 cpuset.cpus=2-15 >>> cpuset.cpus.partition=root >>> | >>> mid2 >>> cpuset.cpus=0-1 >>> cpuset.cpus.partition=root >>> | >>> cont >>> cpuset.cpus=0-1 >>> cpuset.cpus.partition=root >> In this case, the effective CPUs of both mid1a and mid2 will be empty. IOW, >> you can't have any task in these 2 cpusets. > I see, that is relevant to a threaded subtree only where the admin / app > can know how to distribute CPUs and place threads to internal nodes. > >> For the remote case, you can have intermediate tasks in both mid1a and mid2 >> as long as cpuset.cpus contains more CPUs than cpuset.cpus.exclusive. > It's obvious that cpuset.cpus.exclusive should be exclusive among > siblings. > Should it also be so along the vertical path? Sorry for the late reply. I have forgot to respond earlier. We don't support that vertical exclusive check in cgroup v1 cpuset.cpu_exclusive. > root > | > mid1a > cpuset.cpus=0-2 > cpuset.cpus.exclusive=0 > | > mid2 > cpuset.cpus=0-2 > cpuset.cpus.exclusive=1 > | > cont > cpuset.cpus=0-2 > cpuset.cpus.exclusive=2 > cpuset.cpus.partition=root > > IIUC, this should be a valid config regardless of cpuset.cpus.partition > setting on mid1a and mid2. > Whereas > > root > | > mid1a > cpuset.cpus=0-2 > cpuset.cpus.exclusive=0 > | > mid2 > cpuset.cpus=0-2 > cpuset.cpus.exclusive=1-2 > cpuset.cpus.partition=root > | > cont > cpuset.cpus=1-2 > cpuset.cpus.exclusive=1-2 > cpuset.cpus.partition=root > > Here, I'm hesitating, will mid2 have any exclusively owned cpus? > > (I have flashes of understading cpus.exclusive as being a more > expressive mechanism than partitions. OTOH, it seems non-intuitive when > both are combined, thus I'm asking to internalize it better. > Should partitions be deprecated for simplicty? They're still good to > provide the notification mechanism of invalidation. > cpuset.cpus.exclusive.effective don't have that.) Like cpuset.cpus, cpuset.cpus.exclusive follows the same hierarchical rule. IOW, the CPUs in cpuset.cpus.exclusive will be ignored if they are not present in its ancestor nodes. The value in cpuset.cpus.exclusive shows the intent of the users. cpuset.cpus.exclusive.effective shows the real exclusive CPUs when partition is enabled. So we just can't use cpuset.cpus.exclusive as a replacement for cpuset.cpus.partition. As a result, we can't actually support the vertical CPU exclusion as you suggest above. > >> They will be ready eventually. This requirement of remote partition actually >> came from our OpenShift team as the use of just local partition did not meet >> their need. They don't need access to exclusive CPUs in the parent cgroup >> layer for their management daemons. They do need to activate isolated >> partition in selected child cgroups to support our Telco customers to run >> workloads like DPDK. >> >> So they will add the support to upstream Kubernetes. > Is it worth implementing anything touching (ancestral) > cpuset.cpus.partition then? I don't quite get what you want to ask here. Cheers, Longman