Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1678319pxb; Mon, 11 Oct 2021 10:44:25 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4IcapYe5nlszcIdhul1tmXML2dYo26CQs3y0DqbOSQ+8QMm6qEqru+ejYn/cZjhCCZMDf X-Received: by 2002:a05:6a00:138a:b0:44c:b200:38d7 with SMTP id t10-20020a056a00138a00b0044cb20038d7mr27267735pfg.5.1633974265072; Mon, 11 Oct 2021 10:44:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633974265; cv=none; d=google.com; s=arc-20160816; b=AmTBbDDKvT6JRWpVQsmbah56L5Y4dqBhACSas3Z7wbCjQESNW42RYLsoW+8DulLvcx XZo1voI0BMV1ekTbuPEmAAmdBD1nGWdjBLInGJz49nZjIHYYYpCsKd/ZY7p3+p0HJwot Xijs5G477CzzTHhD4ABhtWTPLgRrR72hv599mbzWO0RlrzggnF81vR7qCJQn/5VSIhgW zCSeE2wwOGQG41Z7pw50WBaBsTVqah4rA2MRCLjDe03GsVWY5DYeP4DSHPs2k+Hy97Fr i2uuQcRmSAtJ69AQDZQPlsa66wlSZp/AOM3hsUFNMvVQ8ABjhF+OqN1j5ojzaiDmbwvc 8Q6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:sender:dkim-signature; bh=cv+8p4QBr+yumNO08E52pdNZjmudW0YvW+eKFeVXzNw=; b=jvaM8qOOiFxsMtOP0St1cB9u4c6EciXjFclnTeCL4wYn20vfreE8o7mZhyAWgjd9uh EZ8eZqcrFiFK/g53rAubhlhKcWhSlqfVEBNXC9Z3+s2a8Op8SbvN8JJiIZGuh7Cr0wUb v+859MrPGjE0hpfRqKlN/uFoJsTsHwzunCk/P9jccqNdb6b5ah+BrIvmeqiQ2dAH1VC7 nvARPY2hBaswjHCY8CEhkFB3IGJ5b8hv8M8r3AQ+CEnHn8OdATwqePi2xzE1Gipl0hF/ k3TFKYiokUaGg2MiUlV+N/smgD+cPs4yLPP97yQ7OYcOiO1DCchtvRZ9ayTNyqhAHIO/ 7Q/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=X2Ln10Tg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k8si12052053pll.435.2021.10.11.10.44.12; Mon, 11 Oct 2021 10:44:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=X2Ln10Tg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233801AbhJKRoi (ORCPT + 99 others); Mon, 11 Oct 2021 13:44:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233583AbhJKRo3 (ORCPT ); Mon, 11 Oct 2021 13:44:29 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1AD0C061570; Mon, 11 Oct 2021 10:42:29 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id kk10so13813004pjb.1; Mon, 11 Oct 2021 10:42:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=cv+8p4QBr+yumNO08E52pdNZjmudW0YvW+eKFeVXzNw=; b=X2Ln10Tg4qFyX1S7WwFgytlT9WmcLpxGM4Zit5dEo/o+1WnWYfpNX3gNqY4uzNIAN6 Ntyju7CQJyfW85aoc0hfLy+G7n+1MefYEgfBKmm0h1ZuvROPAUWv3EVbcTtD4vcfc+pg XrIwgPiXVy1qkK/HWnDaeiKTRSPRwPmqGK278qGMcqunMoECVG4wY8sKczSXTrp5P7dG p8gHyjLlv03q7Um00Npt36ae6F98ErTGhZoMrEuwsh+yY3MijosVvecksUfE/jMtXzxj vkO9BV7RzVaMhR3xGwb8bNFnp3uyzr8dBPSYQ+ONNzOqyFNGGEhOqfefekQeLs1fMCxj +bCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=cv+8p4QBr+yumNO08E52pdNZjmudW0YvW+eKFeVXzNw=; b=i/rcDgzX2N5JL9nXwT6WaW8czcuaP285NHTUHpGMFy4ARBmvpcAJ3rZMTomSgdaERH Zovr9/B0fO3OmZFhBBL+EgBgD68a2Vf2nByFq5QVSq93QnTF1cqmHeP1D0cQv2f1rcnl ESP4SyYI6v2m8aXSK6v5BtHuy3WnUzfdTGSkEmFjry/3S/+knXS08LLXdqHikwiiiHYj G1khgsMza/WuqUjfduQkKsHNtv501fXa375Fcsh8YuccTYIgrVHa/70IhXhdlaqZlDOf exhkmyXBDzH7hMLP6OmhptNelF7idScEtxvIKEntrNCi2o5UE4kWK9RYo/ymYJKn9o4f +lDw== X-Gm-Message-State: AOAM533vpfs+SFrA3OGS4TcaBP347mhxSUCuNNA2h1fOhNVxwr9a21hU qzYA5sgZ4RAByWK6XezbvCo= X-Received: by 2002:a17:90a:8b89:: with SMTP id z9mr367573pjn.89.1633974148929; Mon, 11 Oct 2021 10:42:28 -0700 (PDT) Received: from localhost (2603-800c-1a02-1bae-e24f-43ff-fee6-449f.res6.spectrum.com. [2603:800c:1a02:1bae:e24f:43ff:fee6:449f]) by smtp.gmail.com with ESMTPSA id e12sm8471062pfl.67.2021.10.11.10.42.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Oct 2021 10:42:28 -0700 (PDT) Sender: Tejun Heo Date: Mon, 11 Oct 2021 07:42:27 -1000 From: Tejun Heo To: Michal =?iso-8859-1?Q?Koutn=FD?= Cc: Christian Brauner , "Pratik R. Sampat" , bristot@redhat.com, christian@brauner.io, ebiederm@xmission.com, lizefan.x@bytedance.com, hannes@cmpxchg.org, mingo@kernel.org, juri.lelli@redhat.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org, containers@lists.linux.dev, containers@lists.linux-foundation.org, pratik.r.sampat@gmail.com Subject: Re: [RFC 0/5] kernel: Introduce CPU Namespace Message-ID: References: <20211009151243.8825-1-psampat@linux.ibm.com> <20211011101124.d5mm7skqfhe5g35h@wittgenstein> <20211011141737.GA58758@blackbody.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20211011141737.GA58758@blackbody.suse.cz> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Mon, Oct 11, 2021 at 04:17:37PM +0200, Michal Koutn? wrote: > The problem as I see it is the mapping from a real dedicated HW to a > cgroup restricted environment ("container"), which can be shared. In > this instance, the virtualized view would not be able to represent a > situation when a CPU is assigned non-exclusively to multiple cpusets. There is a fundamental problem with trying to represent a resource shared environment controlled with cgroup using system-wide interfaces including procfs because the goal of many cgroup resource control includes work-conservation, which also is one of the main reason why containers are more attractive in resource-intense deployments. System-level interfaces naturally describe a discrete system, which can't express the dynamic distribution with cgroups. There are aspects of cgroups which are akin to hard partitioning and thus can be represented by diddling with system level interfaces. Whether those are worthwhile to pursuit depends on how easy and useful they are; however, there's no avoiding that each of those is gonna be a very partial and fragmented thing, which significantly contributes the default cons list of such attempts. > > Existing solutions to the problem include userspace tools like LXCFS > > which can fake the sysfs information by mounting onto the sysfs online > > file to be in coherence with the limits set through cgroup cpuset. > > However, LXCFS is an external solution and needs to be explicitly setup > > for applications that require it. Another concern is also that tools > > like LXCFS don't handle all the other display mechanism like procfs load > > stats. > > > > Therefore, the need of a clean interface could be advocated for. > > I'd like to write something in support of your approach but I'm afraid that the > problem of the mapping (dedicated vs shared) makes this most suitable for some > external/separate entity such as the LCXFS already. This is more of a unit problem than an interface one - ie. the existing numbers in the system interface doesn't really fit what needs to be described. One approach that we've found useful in practice is dynamically changing resource consumption based on shortage, as measured by PSI, rather than some number representing what's available. e.g. for a build service, building a feedback loop which monitors its own cpu, memory and io pressures and modulates the number of concurrent jobs. There are some numbers which would be fundamentlaly useful - e.g. ballpark number of threads needed to saturate the computing capacity available to the cgroup, or ballpark bytes of memory available without noticeable contention. Those, I think we definitely need to work on, but I don't see much point in trying to bend existing /proc numbers for them. Thanks. -- tejun