Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp658264pxk; Wed, 9 Sep 2020 15:35:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx3fiQDvObpF5RSKXdtv4bROZXDP0X7Ru26xVyPpHF8O0yfj1qE1u3MgH+Qe/8eyyaOVbw+ X-Received: by 2002:a50:9d0a:: with SMTP id v10mr6358678ede.144.1599690935841; Wed, 09 Sep 2020 15:35:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599690935; cv=none; d=google.com; s=arc-20160816; b=WDWU1QM3jbIYCnvYR7eadCwWFlkY6r9aBWTHmUaSZKVs3AZMrhEeVNENVAdjW7pWB2 ylWKOFF2RP4KhIKmFQVqvLC84H8OCN6LKH8fHJhoBGtWWE/P7OXLkui+G6w5WAAL6+va aWZjRXoe2Fzt+9YI7omuv/ETbG9unhj2XLxh7QDAjfEV46mGy5C4FAJlRdv+630sT+EK GFPo1B9Vv4PWCa+tihfkx7nA67xmcior1pZ+04mQW4jojN8XTWVlM95NcUgdLW+LE0gt O3MjNILixrFxywPGvnzm03b1slMdaIGeh+jc9OHbISbFCnBM54PIPc0uU7Pm4OZYfVGt rk8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=gU09eftWZvk3qY6Kmx3FrY2pOAhwI0ipXWrTFndQlj8=; b=ECCZxwctyxsuuVGEtV9STXIGkY9kwEecaeIWO/xJBV9BFMDjBgRFahNe9BctQCvUBV EKQLDBAqY0Pqp6a8BdJGF4IZc3twfDzkTSB723/k3MRvt2lx15M2fOHv+dPzWkLCqWSR Z1gxKqA1+jMPBLEhgEBA4/bSCmcdKDDYrQScaxzecBNn5Uqp4B2Gsj4JVisvxRmZ3ld4 ya69TJJ+S6bL+0LtdxX6muiEcbIQpQhBn5/5UyaLXFhi1prm3cLcDULNLDfOJWd5U2NR lhHuj2zyELW20ThzcPEW53/GU7eJSOouwcCjPNkfyyRRFfLiIwAB7x/mQQZL1dldDsOz KW/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=oLrryTb9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cd4si2418232ejb.107.2020.09.09.15.35.13; Wed, 09 Sep 2020 15:35:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=oLrryTb9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728936AbgIIWeE (ORCPT + 99 others); Wed, 9 Sep 2020 18:34:04 -0400 Received: from mail.kernel.org ([198.145.29.99]:47272 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728207AbgIIWeE (ORCPT ); Wed, 9 Sep 2020 18:34:04 -0400 Received: from localhost (lfbn-ncy-1-588-162.w81-51.abo.wanadoo.fr [81.51.203.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 41B4920C09; Wed, 9 Sep 2020 22:34:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1599690843; bh=UQbrIMRndWpCO8XQGf0QGRRyTJhPBhqUbyV2/Xk+ZFE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=oLrryTb9IizzHDDHwyxBU3y9MV3ayJFgQ/HrRZEwKZe4cGtvxVPAOE7ZJQNOgawQv ZVUtjiiqkbPkpxRDlDDcjAhw1XsWia4dYztX12BRCY8Cfkx+maH5NuYfq4gaGIxxIQ bBnVO9Wq3Z+sOm2otIwve/XcDJp6yTFqlbWwZtzE= Date: Thu, 10 Sep 2020 00:34:01 +0200 From: Frederic Weisbecker To: Marcelo Tosatti Cc: Phil Auld , Peter Zijlstra , Thomas Gleixner , "Paul E. McKenney" , Joel Fernandes , linux-kernel@vger.kernel.org Subject: Re: Requirements to control kernel isolation/nohz_full at runtime Message-ID: <20200909223400.GA20541@lenoir> References: <20200901104640.GA13814@lenoir> <20200903182359.GA1016174@fuller.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200903182359.GA1016174@fuller.cnet> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 03, 2020 at 03:23:59PM -0300, Marcelo Tosatti wrote: > On Tue, Sep 01, 2020 at 12:46:41PM +0200, Frederic Weisbecker wrote: > > == Unbound affinity == > > > > Restore kernel threads, workqueue, timers, etc... wide affinity. But take care of cpumasks that have been set through other > > interfaces: sysfs, procfs, etc... > > We were looking at a userspace interface: what would be a proper > (unified, similar to isolcpus= interface) and its implementation: > > The simplest idea for interface seemed to be exposing the integer list of > CPUs and isolation flags to userspace (probably via sysfs). > > The scheme would allow flags to be separately enabled/disabled, > with not all flags being necessary toggable (could for example > disallow nohz_full= toggling until it is implemented, but allow for > other isolation features to be toggable). > > This would require per flag housekeeping_masks (instead of a single). Right, I think cpusets provide exactly. > Back to the userspace interface, you mentioned earlier that cpusets > was a possibility for it. However: > > "Cpusets provide a Linux kernel mechanism to constrain which CPUs and > Memory Nodes are used by a process or set of processes. > > The Linux kernel already has a pair of mechanisms to specify on which > CPUs a task may be scheduled (sched_setaffinity) and on which Memory > Nodes it may obtain memory (mbind, set_mempolicy). > > Cpusets extends these two mechanisms as follows:" > > The isolation flags do not necessarily have anything to do with > tasks, but with CPUs: a given feature is disabled or enabled on a > given CPU. > No? When cpusets are set as exclusive, they become strict CPU properties. I think we'll need to enforce the exclusive property to set the isolated flags. Then you're free to move the tasks you like into any isolated cpusets. > Regarding locking of the masks, since housekeeping_masks can be called > from hot paths (eg: get_nohz_timer_target) it seems RCU is a natural > fit, so userspace would: > > 1) use interface to change cpumask for a given feature: > > -> set_rcu_pointer > -> wait for grace period Yep, could be a solution. > 2) proceed to trigger actions that rely on housekeeping_cpumask, > to validate the cpumask at 1) is being used. Exactly. I guess we can simply call directly to subsystems (timers, workqueue, kthreads, ...) from the isolation code upon cpumask update. This way we avoid ordering surprises that would come with a notifier. > Regarding nohz_full=, a way to get an immediate implementation > (without handling the issues you mention above) would be to boot > with a set of CPUs as "nohz_full toggable" and others not. For > the nohz_full toggable ones, you'd introduce a per-CPU tick > dependency that is enabled/disabled on runtime. Probably better > to avoid this one if possible... Right but you would still have all the overhead that comes with nohz full (kernel entry/exit tracking, RCU userspace extended grace period, RCU callbacks offloaded, vtime accounting, ...). It will become really interesting once we can switch all that overhead off. Thanks.