Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp2228044ybi; Mon, 1 Jul 2019 08:21:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqzBiCogTCeyfuORARHm9j1N8tGXa6fjdSbZyTN6WEgAXe3kQxDkKmOYo44scT+2wysipGkF X-Received: by 2002:a63:b919:: with SMTP id z25mr25186751pge.201.1561994490493; Mon, 01 Jul 2019 08:21:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561994490; cv=none; d=google.com; s=arc-20160816; b=p5UOMcXMrd5fIypMEiDVjcUlBvETByIk9thjTg9p3+WWq/CtupvfHcXmbswQg2pU2b zhOkZ7O5/RZwIjjfGm1eHNOLeOQbU28QHysx7uS9SmWv5aPNcn032szx6blQigrlILqb TcDArdaZzn9D6hFCmcq0WxVPgg839sNUfo45oaekheSCY863UfTWHzjQ0Mooa4ylWAwP dE7kYKcabApRv0yf29Rg5GxN2WkLgc8YwQL+cnpyzSkKdrLZwNRiEeGkXJCE1RiOSpVO z8+ZAIIT98fJ3i1oVmac4CShKMLJJy15iHSM5l5cuDy9hw52q7G0shNSOsWDcH6IViXY BLSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=I0/2inc6+B8OvAQzyXBo6KONzbo10YAbigFmxSiaCqU=; b=hWdvEKzfCtm8eK3H8FgIbH34G0ZQSfk2dh4/cfBvnYDUfLE48C7gi9xPWUW9ISMAm2 wfiMTanZshDkj0r3GHjGyaBwa47YvYJ401AknG/DCAc0v9E+knwwezi6tNQ9CyMeoufA 0km5S50NSgviRh20O63+WTiSgkoFTVBFwNUIcKwUiwA6s2FN/q2dGrSgabZUaeohS5JM cQ+YM6OVCV+o/Nb/ZtzNdsHGsLN0qp0cxoUsRQvVt+15k4JNNSyj3Nb5nuJzPXdc+GwX MB7UwVURYigK126ngvgwzN0klPAwjy3igunsf/ThKFswofVv8f3UfcIdVzEu1kNTK9S+ Jo0Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v74si10547117pgb.375.2019.07.01.08.21.14; Mon, 01 Jul 2019 08:21:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729214AbfGANz6 (ORCPT + 99 others); Mon, 1 Jul 2019 09:55:58 -0400 Received: from foss.arm.com ([217.140.110.172]:35482 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727736AbfGANz5 (ORCPT ); Mon, 1 Jul 2019 09:55:57 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E40F3344; Mon, 1 Jul 2019 06:55:56 -0700 (PDT) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 066623F246; Mon, 1 Jul 2019 06:55:54 -0700 (PDT) Date: Mon, 1 Jul 2019 14:55:52 +0100 From: Patrick Bellasi To: Peter Zijlstra Cc: subhra mazumdar , linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, steven.sistare@oracle.com, dhaval.giani@oracle.com, daniel.lezcano@linaro.org, vincent.guittot@linaro.org, viresh.kumar@linaro.org, tim.c.chen@linux.intel.com, mgorman@techsingularity.net, Paul Turner , riel@surriel.com, morten.rasmussen@arm.com Subject: Re: [RESEND PATCH v3 0/7] Improve scheduler scalability for fast path Message-ID: <20190701135552.kb4os6bxxhh2lyw6@e110439-lin> References: <20190627012919.4341-1-subhra.mazumdar@oracle.com> <20190701090204.GQ3402@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190701090204.GQ3402@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01-Jul 11:02, Peter Zijlstra wrote: > On Wed, Jun 26, 2019 at 06:29:12PM -0700, subhra mazumdar wrote: > > Hi, > > > > Resending this patchset, will be good to get some feedback. Any suggestions > > that will make it more acceptable are welcome. We have been shipping this > > with Unbreakable Enterprise Kernel in Oracle Linux. > > > > Current select_idle_sibling first tries to find a fully idle core using > > select_idle_core which can potentially search all cores and if it fails it > > finds any idle cpu using select_idle_cpu. select_idle_cpu can potentially > > search all cpus in the llc domain. This doesn't scale for large llc domains > > and will only get worse with more cores in future. > > > > This patch solves the scalability problem by: > > - Setting an upper and lower limit of idle cpu search in select_idle_cpu > > to keep search time low and constant > > - Adding a new sched feature SIS_CORE to disable select_idle_core > > > > Additionally it also introduces a new per-cpu variable next_cpu to track > > the limit of search so that every time search starts from where it ended. > > This rotating search window over cpus in LLC domain ensures that idle > > cpus are eventually found in case of high load. > > Right, so we had a wee conversation about this patch series at OSPM, and > I don't see any of that reflected here :-( > > Specifically, given that some people _really_ want the whole L3 mask > scanned to reduce tail latency over raw throughput, while you guys > prefer the other way around, it was proposed to extend the task model. > > Specifically something like a latency-nice was mentioned (IIRC) where a Right, AFAIR PaulT suggested to add support for the concept of a task being "latency tolerant": meaning we can spend more time to search for a CPU and/or avoid preempting the current task. > task can give a bias but not specify specific behaviour. This is very > important since we don't want to be ABI tied to specific behaviour. I like the idea of biasing, especially considering we are still in the domain of the FAIR scheduler. If something more mandatory should be required there are other classes which are likely more appropriate. > Some of the things we could tie to this would be: > > - select_idle_siblings; -nice would scan more than +nice, Just to be sure, you are not proposing to use the nice value we already have, i.e. p->{static,normal}_prio but instead a new similar concept, right? Otherwise, pros would be we don't touch userspace, but as a cons we would have side effects, i.e. bandwidth allocation. While I think we don't want to mix "specific behaviors" with "biases". > - wakeup preemption; when the wakee has a relative smaller > latency-nice value than the current running task, it might preempt > sooner and the other way around of course. I think we currently have a single system-wide parameter for that now: sched_wakeup_granularity_ns ==> sysctl_sched_min_granularity which is used in: wakeup_gran() for the wakeup path check_preempt_tick() for the periodic tick that's where it should be possible to extend the heuristics with some biasing based on the latency-nice attribute of a task, right? > - pack-vs-spread; +nice would pack more with like tasks (since we > already spread by default [0] I don't think -nice would affect much > here). That will be very useful for the Android case too. In Android we used to call it "prefer_idle", but that's probably not the best name, conceptually similar however. In Android we would use a latency-nice concept to go for either the fast (select_idle_siblings) or the slow (energy aware) path. > Hmmm? Just one more requirement I think it's worth to consider since the beginning: CGroups support That would be very welcome interface. Just because is so much more convenient (and safe) to set these bias on a group of tasks depending on their role in the system. Do you have any idea on how we can expose such a "lantency-nice" property via CGroups? It's very similar to cpu.shares but it does not represent a resource which can be partitioned. Best, Patrick -- #include Patrick Bellasi