Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751630AbdH1Rd0 (ORCPT ); Mon, 28 Aug 2017 13:33:26 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:33059 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751546AbdH1RdY (ORCPT ); Mon, 28 Aug 2017 13:33:24 -0400 Date: Mon, 28 Aug 2017 19:33:19 +0200 From: Frederic Weisbecker To: Peter Zijlstra Cc: LKML , Chris Metcalf , Thomas Gleixner , Luiz Capitulino , Christoph Lameter , "Paul E . McKenney" , Ingo Molnar , Mike Galbraith , Rik van Riel , Wanpeng Li Subject: Re: [RFC PATCH 12/12] housekeeping: Reimplement isolcpus on housekeeping Message-ID: <20170828173315.GA3631@lerouge> References: <1503453071-952-1-git-send-email-fweisbec@gmail.com> <1503453071-952-13-git-send-email-fweisbec@gmail.com> <20170828100957.jcjhh77ylxvsyisy@hirez.programming.kicks-ass.net> <20170828132302.GA32618@lerouge> <20170828133116.zu3xujkkmb4cmks2@hirez.programming.kicks-ass.net> <20170828152714.GB32618@lerouge> <20170828162416.nmdfvutqaki4sahx@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170828162416.nmdfvutqaki4sahx@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2801 Lines: 80 On Mon, Aug 28, 2017 at 06:24:16PM +0200, Peter Zijlstra wrote: > On Mon, Aug 28, 2017 at 05:27:15PM +0200, Frederic Weisbecker wrote: > > On Mon, Aug 28, 2017 at 03:31:16PM +0200, Peter Zijlstra wrote: > > > > I'm fairly sure that was very intentional. If you want to isolate stuff > > > you don't want load-balancing. > > > > Yes I guess that was intentional. In fact having NULL domains is convenient > > as it also isolates from many things: tasks, workqueues, timers. > > Huh, what? That's entirely unrelated to the NULL domain. > > The reason people like isolcpus= is that is ensures _nothing_ runs on > those CPUs before you explicitly place something there. > > _That_ is what ensures there are no timers etc.. placed on those CPUs. Sure that's what I meant. > > Once you run something on that CPU, it stays there. > > It is also what I dislike about isolcpus, its a boot time feature, if > you want to reconfigure your system you need a reboot. Indeed. > > > Although for example I guess (IIUC) that if you create an unbound > > timer on a NULL domain, it will be stuck on it for ever as we can't > > walk any hierarchy from the current CPU domain. > > Not sure what you're on about. Timers have their own hierarchy. Check out get_nohz_timer_target() which relies on scheduler hierarchies to look up a CPU to enqueue an unpinned timer on. > > > I'm not sure how much that can apply to unbound workqueues > > as well. > > Well, unbound workqueued will not immediately end up on those CPUs, > since they'll have an affinity exlusive of those CPUs per construction. Ah that's right. > But IIRC there's an affinity setting for workqueues where you could > force it on if you wanted to. Yep: /sys/devices/virtual/workqueue/cpumask > > > But the thing is with NULL domains: things can not migrate in and neither > > can them migrate out, which is not exactly what CPU isolation wants. > > No, its exactly what they want. You get what you put in and nothing > more. If you want something else, use cpusets. That's still a subtle behaviour that involves knowledge of some scheduler core details. I wish we hadn't exposed such a low level scheduler control as a general purpose kernel parameter. Anyway at least that confirms one worry we had: kernel parameters are kernel ABI that we can't break. > > > > Now, I completely hate the isolcpus feature and wish is a speedy death, > > > but replacing it with something sensible is difficult because cgroups > > > :-( > > > > Ah, that would break cgroup somehow? > > Well, ideally something like this would start the system with all the > 'crap' threads in !root cgroup. But that means cgroupfs needs to be > populated with at least two directories on boot. And current cgroup > cruft doesn't expect that. Ah I see. Thanks!