Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp1880210ybi; Mon, 1 Jul 2019 02:03:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqz4WCcYDvmxClwIqCeNUiKHp4uKGpUrnpcbW3r+vrJcDVipjuQ2Uo6Dn7jfcMINKexUOMLz X-Received: by 2002:a63:fd50:: with SMTP id m16mr14142481pgj.192.1561971810121; Mon, 01 Jul 2019 02:03:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561971810; cv=none; d=google.com; s=arc-20160816; b=G5dofMgqOKMv9c1FwO34Z/y5cj3jmpL7sKml524VAavCoAccz/wG9U9iAsbZxurZwf dTksbf75QkzYtl+jmDPvWu2x/jE6S/y6/iQKHUpWp9jTeEXOYD02e136MLSWB7iY237V 3hdn3IE+ieRf7bXDPI46p813xGvUZJPMi+6auPgsAiIKAtkL+EjbZrf7NYaj40DFT5NV rBjkrf2M3AKBagm+gfVOmnKjR4v7mYpKalVxXjZ7k2qn7xvgqrbGDIqHUDSaeUmCNJ1s jGY0YHuGp439lYQeZxogKJK0dJmfwzchRFFqcFPb5tzKw//bR5eREi1ZbGTIunWp1kNw 5dIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=nap7ea9ZuQDK+WQcOoOnWBJHxx44lH5AsQ2GFZK7jcc=; b=BXRW+mMcsdwaf1r03b1xUI6nSOPsmdLNoPkJdANqTUJS0r4vRMUgUnAoeEnOf39G9Q 6yO6xa+VHdSkA0GvhxL+28vhBYBd92sRvtlws0NY2QMKZwbi+YMWqT9oPth7g8YUrWvP B9fEK0+UVz3b3QAUxDs73xtMqU6mOryByl+vWwC7U9DuC0G/g0s8Mdd5txtXhRsIhC4V +jhPXzq7cqSuCx9WCNNJBfz60t32naxoUIgS2A29V0XfN3cApYJvO6JBlZv2+xAvK15g 7RXfLGolHSFEBHt1s6rmCS3BKrb8LRJOczTKI6fO1AYbFqNvCxmSt056fdtD9N6xsISM tI3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=A2o3MTX2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o70si9731643pgo.261.2019.07.01.02.03.13; Mon, 01 Jul 2019 02:03:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=A2o3MTX2; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728255AbfGAJCh (ORCPT + 99 others); Mon, 1 Jul 2019 05:02:37 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:35574 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728184AbfGAJCh (ORCPT ); Mon, 1 Jul 2019 05:02:37 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=nap7ea9ZuQDK+WQcOoOnWBJHxx44lH5AsQ2GFZK7jcc=; b=A2o3MTX2dBSo4ZP+AF6wKo1o0 OjeQCaS5lYE0tPjVyEuWkkEmVcJoL3YKVHYK3JevAUqrkF+05SNDF/QKUgCB4LiFYAjvNRsJDyi52 APEBbS2c924gXzNKlGKcHkLiE0ObDiZLNjTXmclcaHvE2w89auV6h+65OICAwRa+Q86vy4sqfftnS eOLz2lTxK5QOpqFuUh/newNO7R4F3NZ9G5VQ5ChZ6NXd4QCMYMAAvQEYYuprQFaqbHTZuAO8Dso3Z 0Hy7FL9ZBUXWOYHlh/21aLbcdmh+5EeYxbYVucXTsr7V3OfCpIZiVkOYdNfNsL1WS7Vg00j8eG+/A xO5dGpHFQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.92 #3 (Red Hat Linux)) id 1hhsCM-0007P9-MX; Mon, 01 Jul 2019 09:02:06 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id E32FD20AB58E2; Mon, 1 Jul 2019 11:02:04 +0200 (CEST) Date: Mon, 1 Jul 2019 11:02:04 +0200 From: Peter Zijlstra To: subhra mazumdar Cc: linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, steven.sistare@oracle.com, dhaval.giani@oracle.com, daniel.lezcano@linaro.org, vincent.guittot@linaro.org, viresh.kumar@linaro.org, tim.c.chen@linux.intel.com, mgorman@techsingularity.net, Paul Turner , riel@surriel.com, morten.rasmussen@arm.com Subject: Re: [RESEND PATCH v3 0/7] Improve scheduler scalability for fast path Message-ID: <20190701090204.GQ3402@hirez.programming.kicks-ass.net> References: <20190627012919.4341-1-subhra.mazumdar@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190627012919.4341-1-subhra.mazumdar@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 26, 2019 at 06:29:12PM -0700, subhra mazumdar wrote: > Hi, > > Resending this patchset, will be good to get some feedback. Any suggestions > that will make it more acceptable are welcome. We have been shipping this > with Unbreakable Enterprise Kernel in Oracle Linux. > > Current select_idle_sibling first tries to find a fully idle core using > select_idle_core which can potentially search all cores and if it fails it > finds any idle cpu using select_idle_cpu. select_idle_cpu can potentially > search all cpus in the llc domain. This doesn't scale for large llc domains > and will only get worse with more cores in future. > > This patch solves the scalability problem by: > - Setting an upper and lower limit of idle cpu search in select_idle_cpu > to keep search time low and constant > - Adding a new sched feature SIS_CORE to disable select_idle_core > > Additionally it also introduces a new per-cpu variable next_cpu to track > the limit of search so that every time search starts from where it ended. > This rotating search window over cpus in LLC domain ensures that idle > cpus are eventually found in case of high load. Right, so we had a wee conversation about this patch series at OSPM, and I don't see any of that reflected here :-( Specifically, given that some people _really_ want the whole L3 mask scanned to reduce tail latency over raw throughput, while you guys prefer the other way around, it was proposed to extend the task model. Specifically something like a latency-nice was mentioned (IIRC) where a task can give a bias but not specify specific behaviour. This is very important since we don't want to be ABI tied to specific behaviour. Some of the things we could tie to this would be: - select_idle_siblings; -nice would scan more than +nice, - wakeup preemption; when the wakee has a relative smaller latency-nice value than the current running task, it might preempt sooner and the other way around of course. - pack-vs-spread; +nice would pack more with like tasks (since we already spread by default [0] I don't think -nice would affect much here). Hmmm?