Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752165AbdHNRe4 (ORCPT ); Mon, 14 Aug 2017 13:34:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48778 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750910AbdHNRez (ORCPT ); Mon, 14 Aug 2017 13:34:55 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 36AF27DD00 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=lcapitulino@redhat.com Date: Mon, 14 Aug 2017 13:34:40 -0400 From: Luiz Capitulino To: Frederic Weisbecker Cc: LKML , Peter Zijlstra , Chris Metcalf , Thomas Gleixner , Christoph Lameter , "Paul E . McKenney" , Ingo Molnar , Mike Galbraith , Rik van Riel , Wanpeng Li Subject: Re: [RFC PATCH 7/9] housekeeping: Use own boot option, independant from nohz Message-ID: <20170814133440.3dc31bad@redhat.com> In-Reply-To: <20170814170107.GA27479@lerouge> References: <1500643290-25842-1-git-send-email-fweisbec@gmail.com> <1500643290-25842-8-git-send-email-fweisbec@gmail.com> <20170811123927.33e094f3@redhat.com> <20170812141004.GA21918@lerouge> <20170813111340.0ade6d58@redhat.com> <20170814170107.GA27479@lerouge> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Mon, 14 Aug 2017 17:34:55 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4865 Lines: 98 On Mon, 14 Aug 2017 19:01:09 +0200 Frederic Weisbecker wrote: > > > Perhaps I should remove the nohz_full= parameter altogether and let nohz_full controlled > > > by housekeeping= only. How much can kernel parameters be considered as kernel ABIs? > > > > That's a very good question, I don't have an answer for that. > > That said, "nohz_full=" never implied too much isolation features so far, and those have > often changed over time, as in RCU. I think unbound timer affinity is the most important > one. > > Perhaps we can keep "nohz_full=1-15" as an alias for a future "cpu_isolation=nohz,1-15" > and at least imply unbound timer affinity with it. That would work for me. > > > Also I'm wondering if "housekeeping=" is a clear name for users. "isolation=" or > > > "cpu_isolation=" would be better and more obvious. Housekeeping based naming would only be > > > internal implementation detail. And deactivating the tick through "cpu_isolation=" would > > > be clearer than if we did through "housekeeping=". > > > > That's exactly my thinking while I was reviewing the series! > > > > > Of course the problem is that we already have "isolcpus=". But re-implementing isolcpus > > > on top of housekeeping might be a good idea. I believe that the current implementation on > > > top of NULL domains isn't much beloved. A less controversial implementation might even > > > allow us to control it though cpusets. > > > > You're completely right. Some people don't use isolcpus= because it > > disables load balancing and that may be a problem for setups where > > tasks are pinned to a set of CPUs where the number of tasks is greater > > than the number of CPUs. However, for the cases where you have a > > single task pinned to a CPU, having load balancing taking place adds > > an extra latency (I won't remember how much, but I guess it was more > > than 10us). > > What is the source of the load balancing inducing such latency when a single > task is affine to a CPU? If this is idle load balancing, it is now affine to > housekeepers. If this is task wakeup then it's suprising because select_task_rq() > is optimized toward single CPU affinity. I guess it was idle load balancing, but I don't remember because this was a few years ago. I think this might be reproducible without using isolcpus=. I'll give it a try shortly and let you know. > Is there another source I'm overlooking? > > > If there's a way to "disable" load balancing from user-space, say > > with cpusets, then I think we should keep the isolated CPUs attached > > to a domain as you suggest. > > I'm not sure such a solution would be accepted. The most sensible way > to disable load balancing is still to tune the affinity of tasks. If there > is an off-case overhead with load balancing (ie: when no more than one > task is affine to that CPU) then we should solve that with a fast path. OK, I'll take a look. > > Another detail about isolcpus= is that it doesn't isolate the CPU > > from kernel threads. That is, unpinned kernel threads are allowed > > to run on CPUs not isolated with isolcpus=. We might consider changing > > that for a new isolation option. > > You mean unpinned kernel threads are allowed to run on isolcpus, right? Exactly. > That definetly can be solved. > > > > > I know that there are many arguments against isolcpus= and some people > > advice using cpusets. The problem with that advice is that isolcpus= > > goes a bit beyond isolating a CPU from user-space tasks. One additional > > thing is does for example, is pinning the kernel_init() thread to > > housekeeping CPUs. This is key, because that thread will create timers > > at early boot that will pin themselves to the CPU they run. > > Right, but also unbound timers are affine to housekeepers, we needed that for > nohz_full. > > > Finally, I'm wondering how all this will fit together with TASK_ISOLATION. > > One of the questions I ask myself is: can/should the things TASK_ISOLATION > > does be done by a kernel command-line parameter instead? Or should we > > try to come up with a list of global things to control (eg. the tick, > > kernel thread affinity, etc) and per-task controls? > > So I've been thinking a lot about that lately. I told Chris that TASK_ISOLATION > shouldn't be a CPU feature but a task feature. Then I realized that it doesn't work > either, my bad :-) In the end I think that the most part of it must be a CPU > property: nohz, task isolation, timers and workqueue affinity, etc... Then what's > left for the per task thing is to tell it when it is unexpectingly interrupted by noise. > > Therefore I think most of the isolation features should be controlled by > command line and cpusets (through a new cpuset subsystem maybe) then TASK_ISOLATION > through prtcl() for the noise monitoring. I agree.