Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752111AbdHMPN4 (ORCPT ); Sun, 13 Aug 2017 11:13:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:8952 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750816AbdHMPNz (ORCPT ); Sun, 13 Aug 2017 11:13:55 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 9CA95285097 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lcapitulino@redhat.com Date: Sun, 13 Aug 2017 11:13:40 -0400 From: Luiz Capitulino To: Frederic Weisbecker Cc: LKML , Peter Zijlstra , Chris Metcalf , Thomas Gleixner , Christoph Lameter , "Paul E . McKenney" , Ingo Molnar , Mike Galbraith , Rik van Riel , Wanpeng Li Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz Message-ID: <20170813111340.0ade6d58@redhat.com> In-Reply-To: <20170812141004.GA21918@lerouge> References: <1500643290-25842-1-git-send-email-fweisbec@gmail.com> <1500643290-25842-8-git-send-email-fweisbec@gmail.com> <20170811123927.33e094f3@redhat.com> <20170812141004.GA21918@lerouge> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Sun, 13 Aug 2017 15:13:55 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4636 Lines: 99 On Sat, 12 Aug 2017 16:10:06 +0200 Frederic Weisbecker wrote: > On Fri, Aug 11, 2017 at 03:09:57PM -0400, Luiz Capitulino wrote: > > On Fri, 21 Jul 2017 15:21:28 +0200 > > Frederic Weisbecker wrote: > > > -void __init housekeeping_init(void) > > > +/* Parse the boot-time housekeeping CPU list from the kernel parameters. */ > > > +static int __init housekeeping_setup(char *str) > > > { > > > - if (!tick_nohz_full_enabled()) > > > - return; > > > - > > > - if (!alloc_cpumask_var(&housekeeping_mask, GFP_KERNEL)) { > > > - WARN(1, "NO_HZ: Can't allocate not-full dynticks cpumask\n"); > > > - cpumask_clear(tick_nohz_full_mask); > > > - tick_nohz_full_running = false; > > > - return; > > > + alloc_bootmem_cpumask_var(&housekeeping_mask); > > > + if (cpulist_parse(str, housekeeping_mask) < 0) { > > > + pr_warn("Housekeeping: Incorrect cpumask\n"); > > > + free_bootmem_cpumask_var(housekeeping_mask); > > > + return 1; > > > } > > > > > > - cpumask_andnot(housekeeping_mask, > > > - cpu_possible_mask, tick_nohz_full_mask); > > > - > > > static_branch_enable(&housekeeping_overriden); > > > > > > /* We need at least one CPU to handle housekeeping work */ > > > WARN_ON_ONCE(cpumask_empty(housekeeping_mask)); > > > + > > > + return 1; > > > } > > > +__setup("housekeeping=", housekeeping_setup); > > > > Am I right that from now on nohz_full= users will also have > > to specify housekeeping= in order to get nohz_full working? > > If that's correct, then won't this patch break nohz_full for > > existing setups? > > nohz_full= will still work but will only imply tick stop. A few isolation > details that were enabled by nohz_full= won't be handled anymore such as: > unbound timers affinity, watchdog disablement, rcu threads affinity, sched idle > load balancing... Those are now handled by housekeeping= > > So yes in a sense, this can break some setup that assume nohz_full= does more > than stopping the tick. Yes, the problem is that this is how it has always worked. Also, the breakage will be very subtle and hard to debug. > Perhaps I should remove the nohz_full= parameter altogether and let nohz_full controlled > by housekeeping= only. How much can kernel parameters be considered as kernel ABIs? That's a very good question, I don't have an answer for that. > Also I'm wondering if "housekeeping=" is a clear name for users. "isolation=" or > "cpu_isolation=" would be better and more obvious. Housekeeping based naming would only be > internal implementation detail. And deactivating the tick through "cpu_isolation=" would > be clearer than if we did through "housekeeping=". That's exactly my thinking while I was reviewing the series! > Of course the problem is that we already have "isolcpus=". But re-implementing isolcpus > on top of housekeeping might be a good idea. I believe that the current implementation on > top of NULL domains isn't much beloved. A less controversial implementation might even > allow us to control it though cpusets. You're completely right. Some people don't use isolcpus= because it disables load balancing and that may be a problem for setups where tasks are pinned to a set of CPUs where the number of tasks is greater than the number of CPUs. However, for the cases where you have a single task pinned to a CPU, having load balancing taking place adds an extra latency (I won't remember how much, but I guess it was more than 10us). If there's a way to "disable" load balancing from user-space, say with cpusets, then I think we should keep the isolated CPUs attached to a domain as you suggest. Another detail about isolcpus= is that it doesn't isolate the CPU from kernel threads. That is, unpinned kernel threads are allowed to run on CPUs not isolated with isolcpus=. We might consider changing that for a new isolation option. I know that there are many arguments against isolcpus= and some people advice using cpusets. The problem with that advice is that isolcpus= goes a bit beyond isolating a CPU from user-space tasks. One additional thing is does for example, is pinning the kernel_init() thread to housekeeping CPUs. This is key, because that thread will create timers at early boot that will pin themselves to the CPU they run. Finally, I'm wondering how all this will fit together with TASK_ISOLATION. One of the questions I ask myself is: can/should the things TASK_ISOLATION does be done by a kernel command-line parameter instead? Or should we try to come up with a list of global things to control (eg. the tick, kernel thread affinity, etc) and per-task controls?