Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1873149imm; Thu, 24 May 2018 02:05:37 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqTQFnrTK4ainERA/glf2nq1bCk81n3rUx1lGpUBi+ZfCx/g2X+aD2iUOOVDjqS5AnaYPCi X-Received: by 2002:a63:ac55:: with SMTP id z21-v6mr5082102pgn.73.1527152737832; Thu, 24 May 2018 02:05:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527152737; cv=none; d=google.com; s=arc-20160816; b=gcGI8P9wiuAS3UsEwDB7ErOsV4Tf9C1zk6Hhi/WbINZZXAnT8I8LKtQbZK2DwyjNor UWvHmgK/9g6B0sorerP6/gFzoIuF0I0tBMudyuDddwzACmOLcR908t47xi+BVzh5KOxB 60Jb8X6qLvU6oS5ro+48siItI2iHOP1bxp2b2eEVudr4DXaRxNr2dGbtBxvqtQe3QCzD 8g9w/ntaIbSlS1A78iNyEHWvHI27CmCtn6cIMyyHFVeU9vggCGJA5x3NlsqQHiWtcVJh 86V3zsn6jIbNcj+Bwtfd6OW3ZCmYBA3Nxq0hpZRBbj6o66LRCQas4tt+5CpxnT+boE/q EOKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=SMY6ulz+Ut+i85B4NDDZN8tOS/xtOBQwm7r34xTFb+Y=; b=T6milVuHrvZgSvVM6mGK+1IcEFGlT0SsliYeF+1HtY/2czUpbWQBmaw/0126eilyq0 OcPVP9V+BeCG/SADZ7Hnxh/OFzi15tt5JLvjrmPY0px+zPSJIf+KdSHSzJVtF2MzDyM5 BeNjGrzgwHs69FIbh+ZNP4Jb3O73iLxC+8j7gAtaZXormnaJJrxXLpRWc4YogWeHlqb8 ZJTH09vctcz3XBFdckZhJm+G3hYOzOGsrxIiv+vubPhcqZAARF5BqDpqyfshM5nRwyoj x+4VkRsdUDp2ZvjK29z6zXkLpxmRoKYQQ0oCJF6utL8l8JbCm9ol7wiDjM06C9Zp/Pzv gGgw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n10-v6si16156468pgp.457.2018.05.24.02.05.22; Thu, 24 May 2018 02:05:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965742AbeEXJEj (ORCPT + 99 others); Thu, 24 May 2018 05:04:39 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:38686 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965526AbeEXJEf (ORCPT ); Thu, 24 May 2018 05:04:35 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 825E31435; Thu, 24 May 2018 02:04:35 -0700 (PDT) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BC0053F589; Thu, 24 May 2018 02:04:32 -0700 (PDT) Date: Thu, 24 May 2018 10:04:30 +0100 From: Patrick Bellasi To: Waiman Long Cc: Tejun Heo , Li Zefan , Johannes Weiner , Peter Zijlstra , Ingo Molnar , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@fb.com, pjt@google.com, luto@amacapital.net, Mike Galbraith , torvalds@linux-foundation.org, Roman Gushchin , Juri Lelli Subject: Re: [PATCH v8 4/6] cpuset: Make generate_sched_domains() recognize isolated_cpus Message-ID: <20180524090430.GZ30654@e110439-lin> References: <1526590545-3350-1-git-send-email-longman@redhat.com> <1526590545-3350-5-git-send-email-longman@redhat.com> <20180523173453.GY30654@e110439-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 23-May 16:18, Waiman Long wrote: > On 05/23/2018 01:34 PM, Patrick Bellasi wrote: > > Hi Waiman, > > > > On 17-May 16:55, Waiman Long wrote: > > > > [...] > > > >> @@ -672,13 +672,14 @@ static int generate_sched_domains(cpumask_var_t **domains, > >> int ndoms = 0; /* number of sched domains in result */ > >> int nslot; /* next empty doms[] struct cpumask slot */ > >> struct cgroup_subsys_state *pos_css; > >> + bool root_load_balance = is_sched_load_balance(&top_cpuset); > >> > >> doms = NULL; > >> dattr = NULL; > >> csa = NULL; > >> > >> /* Special case for the 99% of systems with one, full, sched domain */ > >> - if (is_sched_load_balance(&top_cpuset)) { > >> + if (root_load_balance && !top_cpuset.isolation_count) { > > Perhaps I'm missing something but, it seems to me that, when the two > > conditions above are true, then we are going to destroy and rebuild > > the exact same scheduling domains. > > > > IOW, on 99% of systems where: > > > > is_sched_load_balance(&top_cpuset) > > top_cpuset.isolation_count = 0 > > > > since boot time and forever, then every time we update a value for > > cpuset.cpus we keep rebuilding the same SDs. > > > > It's not strictly related to this patch, the same already happens in > > mainline based just on the first condition, but since you are extending > > that optimization, perhaps you can tell me where I'm possibly wrong or > > which cases I'm not considering. > > > > I'm interested mainly because on Android systems those conditions > > are always true and we see SDs rebuilds every time we write > > something in cpuset.cpus, which ultimately accounts for almost all the > > 6-7[ms] time required for the write to return, depending on the CPU > > frequency. > > > > Cheers Patrick > > > Yes, that is true. I will look into how to further optimize this. Thanks > for the suggestion. FWIW, following is my take on top of your series. With the following patch applied I see a reduction of the average execution time for a rebuild_sched_domains_locked() from 1.4[ms] to 40[us] while running 60 /tg1/cpuset.cpus switches in a loop on an JunoR2 Arm board using the performance cpufreq governor. ---8<--- From 84bb8137ce79f74849d97e30871cf67d06d8d682 Mon Sep 17 00:00:00 2001 From: Patrick Bellasi Date: Wed, 23 May 2018 16:33:06 +0100 Subject: [PATCH 1/1] cgroup/cpuset: disable sched domain rebuild when not required The generate_sched_domains() already addresses the "special case for 99% of systems" which require a single full sched domain at the root, spanning all the CPUs. However, the current support is based on an expensive sequence of operations which destroy and recreate the exact same scheduling domain configuration. If we notice that: 1) CPUs in "cpuset.isolcpus" are excluded from load balancing by the isolcpus= kernel boot option, and will never be load balanced regardless of the value of "cpuset.sched_load_balance" in any cpuset. 2) the root cpuset has load_balance enabled by default at boot and it's the only parameter which userspace can change at run-time. we know that, by default, every system comes up with a complete and properly configured set of scheduling domains covering all the CPUs. Thus, on every system, unless the user explicitly disables load balance for the top_cpuset, the scheduling domains already configured at boot time by the scheduler/topology code and updated in consequence of hotplug events, are already properly configured for cpuset too. This configuration is the default one for 99% of the systems, and it's also the one used by most of the Android devices which never disable load balance from the top_cpuset. Thus, while load balance is enabled for the top_cpuset, destroying/rebuilding the scheduling domains at every cpuset.cpus reconfiguration is a useless operation which will always produce the same result. Let's anticipate the "special" optimization within: rebuild_sched_domains_locked() thus completely skipping the expensive: generate_sched_domains() partition_sched_domains() for all the cases we know that the scheduling domains already defined will not be affected by whatsoever value of cpuset.cpus. The proposed solution is the minimal variation to optimize the case for systems with load balance enabled at the root level and without isolated CPUs. As soon as one of these conditions is not more valid, we fall back to the original behavior. Signed-off-by: Patrick Bellasi Cc: Li Zefan Cc: Tejun Heo , Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Frederic Weisbecker Cc: Johannes Weiner Cc: Mike Galbraith Cc: Paul Turner Cc: Waiman Long Cc: Juri Lelli Cc: kernel-team@fb.com Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- kernel/cgroup/cpuset.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 8f586e8bdc98..cff14be94678 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -874,6 +874,11 @@ static void rebuild_sched_domains_locked(void) !cpumask_subset(top_cpuset.effective_cpus, cpu_active_mask)) goto out; + /* Special case for the 99% of systems with one, full, sched domain */ + if (!top_cpuset.isolation_count && + is_sched_load_balance(&top_cpuset)) + goto out; + /* Generate domain masks and attrs */ ndoms = generate_sched_domains(&doms, &attr); -- 2.15.1 ---8<--- -- #include Patrick Bellasi