Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751644AbdHNRBO (ORCPT ); Mon, 14 Aug 2017 13:01:14 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:35792 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751427AbdHNRBM (ORCPT ); Mon, 14 Aug 2017 13:01:12 -0400 Date: Mon, 14 Aug 2017 19:01:09 +0200 From: Frederic Weisbecker To: Luiz Capitulino Cc: LKML , Peter Zijlstra , Chris Metcalf , Thomas Gleixner , Christoph Lameter , "Paul E . McKenney" , Ingo Molnar , Mike Galbraith , Rik van Riel , Wanpeng Li Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz Message-ID: <20170814170107.GA27479@lerouge> References: <1500643290-25842-1-git-send-email-fweisbec@gmail.com> <1500643290-25842-8-git-send-email-fweisbec@gmail.com> <20170811123927.33e094f3@redhat.com> <20170812141004.GA21918@lerouge> <20170813111340.0ade6d58@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170813111340.0ade6d58@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5281 Lines: 107 On Sun, Aug 13, 2017 at 11:13:40AM -0400, Luiz Capitulino wrote: > On Sat, 12 Aug 2017 16:10:06 +0200 > Frederic Weisbecker wrote: > > > Am I right that from now on nohz_full= users will also have > > > to specify housekeeping= in order to get nohz_full working? > > > If that's correct, then won't this patch break nohz_full for > > > existing setups? > > > > nohz_full= will still work but will only imply tick stop. A few isolation > > details that were enabled by nohz_full= won't be handled anymore such as: > > unbound timers affinity, watchdog disablement, rcu threads affinity, sched idle > > load balancing... Those are now handled by housekeeping= > > > > So yes in a sense, this can break some setup that assume nohz_full= does more > > than stopping the tick. > > Yes, the problem is that this is how it has always worked. Also, > the breakage will be very subtle and hard to debug. [...] > > > Perhaps I should remove the nohz_full= parameter altogether and let nohz_full controlled > > by housekeeping= only. How much can kernel parameters be considered as kernel ABIs? > > That's a very good question, I don't have an answer for that. That said, "nohz_full=" never implied too much isolation features so far, and those have often changed over time, as in RCU. I think unbound timer affinity is the most important one. Perhaps we can keep "nohz_full=1-15" as an alias for a future "cpu_isolation=nohz,1-15" and at least imply unbound timer affinity with it. > > > Also I'm wondering if "housekeeping=" is a clear name for users. "isolation=" or > > "cpu_isolation=" would be better and more obvious. Housekeeping based naming would only be > > internal implementation detail. And deactivating the tick through "cpu_isolation=" would > > be clearer than if we did through "housekeeping=". > > That's exactly my thinking while I was reviewing the series! > > > Of course the problem is that we already have "isolcpus=". But re-implementing isolcpus > > on top of housekeeping might be a good idea. I believe that the current implementation on > > top of NULL domains isn't much beloved. A less controversial implementation might even > > allow us to control it though cpusets. > > You're completely right. Some people don't use isolcpus= because it > disables load balancing and that may be a problem for setups where > tasks are pinned to a set of CPUs where the number of tasks is greater > than the number of CPUs. However, for the cases where you have a > single task pinned to a CPU, having load balancing taking place adds > an extra latency (I won't remember how much, but I guess it was more > than 10us). What is the source of the load balancing inducing such latency when a single task is affine to a CPU? If this is idle load balancing, it is now affine to housekeepers. If this is task wakeup then it's suprising because select_task_rq() is optimized toward single CPU affinity. Is there another source I'm overlooking? > If there's a way to "disable" load balancing from user-space, say > with cpusets, then I think we should keep the isolated CPUs attached > to a domain as you suggest. I'm not sure such a solution would be accepted. The most sensible way to disable load balancing is still to tune the affinity of tasks. If there is an off-case overhead with load balancing (ie: when no more than one task is affine to that CPU) then we should solve that with a fast path. > Another detail about isolcpus= is that it doesn't isolate the CPU > from kernel threads. That is, unpinned kernel threads are allowed > to run on CPUs not isolated with isolcpus=. We might consider changing > that for a new isolation option. You mean unpinned kernel threads are allowed to run on isolcpus, right? That definetly can be solved. > > I know that there are many arguments against isolcpus= and some people > advice using cpusets. The problem with that advice is that isolcpus= > goes a bit beyond isolating a CPU from user-space tasks. One additional > thing is does for example, is pinning the kernel_init() thread to > housekeeping CPUs. This is key, because that thread will create timers > at early boot that will pin themselves to the CPU they run. Right, but also unbound timers are affine to housekeepers, we needed that for nohz_full. > Finally, I'm wondering how all this will fit together with TASK_ISOLATION. > One of the questions I ask myself is: can/should the things TASK_ISOLATION > does be done by a kernel command-line parameter instead? Or should we > try to come up with a list of global things to control (eg. the tick, > kernel thread affinity, etc) and per-task controls? So I've been thinking a lot about that lately. I told Chris that TASK_ISOLATION shouldn't be a CPU feature but a task feature. Then I realized that it doesn't work either, my bad :-) In the end I think that the most part of it must be a CPU property: nohz, task isolation, timers and workqueue affinity, etc... Then what's left for the per task thing is to tell it when it is unexpectingly interrupted by noise. Therefore I think most of the isolation features should be controlled by command line and cpusets (through a new cpuset subsystem maybe) then TASK_ISOLATION through prtcl() for the noise monitoring. Thanks.