Received: by 10.192.165.148 with SMTP id m20csp199938imm; Fri, 20 Apr 2018 05:30:23 -0700 (PDT) X-Google-Smtp-Source: AIpwx48jgTFASLVhIYxg/eOjlPF3AjgBB2j1cGenwBjOTVHnLjo/ybQEHhK7AeXXamQmh+zDkw/9 X-Received: by 10.99.104.71 with SMTP id d68mr8137840pgc.99.1524227423264; Fri, 20 Apr 2018 05:30:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524227423; cv=none; d=google.com; s=arc-20160816; b=LinHmCsHrFmi4osJZ/hyaFHX272h6uPLD+0d9YLRl14BcBRGN1aNzlbjvIf/PVn1KF CaupvRFedANvbSVupqvlL/xL9EuI9FOACwaZm7E/wEOaFAkHPptCtynxhHMHnNIEqhes 7PSV81OUMoAaV9f3/yCWY9AvjlPqcdAXYddkc/gDqApf2J12dVG/ZYPKTjEJ1lx/Xxbv 0g9IdTizJ3Kxk1LGrXtcF7SbELMht2grSunHOf5o9/UaYKhDadBQZ043EZUrR2PeX1Wh i1NvGpFwtT2qK2mxn88muSgVqktHWIiH8+6yPIjZjlmncGHHj7Fo1DEzvOtsH92evjYj WGgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:dkim-signature:arc-authentication-results; bh=K2YRr8JK3mB+QGCH/7qM3SEQ03lkAleT9IIwM9RpW4w=; b=Hqwc7P/yDwrzos2ojc77D8I+EwkEg8iB57sH0Lzyw6OcckSuK5tthzFdOxPaOx5FmP dpWUq16g4R6GOT4vcnOz9S3eN0mZxLsCsx/wB2qWV/PFpoCbZhklhD0dU1FB5Z7I63mY UaExcRpeX752vi4BKe/4FibNLSDMAdBXikIEvDzrkIvtT0EWP6a56ED9piZdI+jzO73o iJ/SG5kw71PJChXrkSeyjf9n5fdpxlr5BfxKaOJPaaXE3sLQ36RE10RMkCZiw6td8iBy L1LUUwa9bzJXXoFfGIncgYUQIgsMyzbhG2T+k4UP64Qh0E8dIgQ+5bLQbvVA1tQt1M2U YeMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CHc8YDaQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s16si2022528pgv.596.2018.04.20.05.30.09; Fri, 20 Apr 2018 05:30:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CHc8YDaQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754833AbeDTM24 (ORCPT + 99 others); Fri, 20 Apr 2018 08:28:56 -0400 Received: from mail-pl0-f50.google.com ([209.85.160.50]:41149 "EHLO mail-pl0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754726AbeDTM2y (ORCPT ); Fri, 20 Apr 2018 08:28:54 -0400 Received: by mail-pl0-f50.google.com with SMTP id bj1-v6so5189639plb.8 for ; Fri, 20 Apr 2018 05:28:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :organization:mime-version:content-transfer-encoding; bh=K2YRr8JK3mB+QGCH/7qM3SEQ03lkAleT9IIwM9RpW4w=; b=CHc8YDaQLdR6tVHRNG529fMlzJYr7xS5xD2/bohkWJfJEM1gKPjDqXCxIcGLXQZNDH v9qo2dNO0QCAAGopbs9uhMJ8Vk9G5MvGFG75M7mAwm4rSqNs0mbUYF6uZ8nac95ln6q/ 3A7GYqfXXeiR+R6Hwj4qQ+OtP/Pa5VctelMch3/FnBQhwhUK+zUL9mqqPm9+TCxLnodY 2o2Yax48C5sL2oLNRlvKEr1KoxV+KqlraSW/7ayxyOTyEqh2x2i6Jj91wuABu+S1C07e AwzVIGUl49KQs8SfJFswC1zMLjsyfugbIqH3/Z17Nd7vSgVVSrS0x/F2FH/5b96FOZp7 pV5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:organization:mime-version:content-transfer-encoding; bh=K2YRr8JK3mB+QGCH/7qM3SEQ03lkAleT9IIwM9RpW4w=; b=ROnjumiwm39k69fQzn+Y+VUDYHWRTMLPMyZ18WPyI7KTOD2bVzSs3lA4w1HqYc4eEg DQs0KMJU7Gq7r3voBIagnczcA7e7fWtvBkiXmT5QVcW53SUTJZ4A7H+rZXnt+4a0s5sQ uFsYu2CM8vTSPRlKckTMiMV1o6ConkEdB18ZdVCysrKtDjxix4jB3wlCXvrtCTYzNNvV eGrOp7YsO4eGFiPQ2Re04NghUyt8ll19W6swTTsfv5eyo2FjyjOabzIKrVOUFpV+Eg/P xK2/fDFIFLfzbhmvebGaYHXrLoIkdskAKCSmimEmGk7E6IICy2NWiXm/DydCc+goSNL2 7zxA== X-Gm-Message-State: ALQs6tB/xONaDAfOmrj2875ugT/5g3AItqFnYrnxFeF9NF1FUJpvXAHL 5At1x2UQ036Q5DNlxVcylubWSw== X-Received: by 2002:a17:902:4483:: with SMTP id l3-v6mr10182005pld.282.1524227333385; Fri, 20 Apr 2018 05:28:53 -0700 (PDT) Received: from roar.ozlabs.ibm.com (59-102-70-78.tpgi.com.au. [59.102.70.78]) by smtp.gmail.com with ESMTPSA id e4sm11979906pfa.128.2018.04.20.05.28.49 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 20 Apr 2018 05:28:52 -0700 (PDT) Date: Fri, 20 Apr 2018 22:28:42 +1000 From: Nicholas Piggin To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Ingo Molnar , "Rafael J. Wysocki" , "Paul E . McKenney" Subject: Re: [RFC PATCH] kernel/sched/core: busy wait before going idle Message-ID: <20180420222842.33dc5b33@roar.ozlabs.ibm.com> In-Reply-To: <20180420105827.GK4064@hirez.programming.kicks-ass.net> References: <20180415133149.24112-1-npiggin@gmail.com> <20180420074456.GA4064@hirez.programming.kicks-ass.net> <20180420190126.1644f4cd@roar.ozlabs.ibm.com> <20180420105827.GK4064@hirez.programming.kicks-ass.net> Organization: IBM X-Mailer: Claws Mail 3.16.0 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 20 Apr 2018 12:58:27 +0200 Peter Zijlstra wrote: > On Fri, Apr 20, 2018 at 07:01:47PM +1000, Nicholas Piggin wrote: > > On Fri, 20 Apr 2018 09:44:56 +0200 > > Peter Zijlstra wrote: > > > > > On Sun, Apr 15, 2018 at 11:31:49PM +1000, Nicholas Piggin wrote: > > > > This is a quick hack for comments, but I've always wondered -- > > > > if we have a short term polling idle states in cpuidle for performance > > > > -- why not skip the context switch and entry into all the idle states, > > > > and just wait for a bit to see if something wakes up again. > > > > > > Is that context switch so expensive? > > > > I guess relatively much more than taking one branch mispredict on the > > loop exit when the task wakes. 10s of cycles vs 1000s? > > Sure, just wondering how much. And I'm assuming you're looking at Power > here, right? Well I'll try to get more numbers. Yes, talking about power. It trails x86 on context switches by a bit, but similar orders of magnitude. My skylake is doing ~1900 cycles syscall + context switch with a distro kernel. POWER9 is ~2500. > > > And what kernel did you test on? We recently merged a bunch of patches > > > from Rafael that avoided disabling the tick for short idle predictions. > > > This also has a performance improvements for such workloads. Did your > > > kernel include those? > > > > Yes that actually improved profiles quite a lot, but these numbers were > > with those changes. I'll try to find some fast disks or network and get > > some more more interesting numbers. > > OK, good that you have those patches in. That ensures you're not trying > to fix something that's possibly already addressed elsewhere. Yep. > > > > > It's not uncommon to see various going-to-idle work in kernel profiles. > > > > This might be a way to reduce that (and just the cost of switching > > > > registers and kernel stack to idle thread). This can be an important > > > > path for single thread request-response throughput. > > > > > > So I feel that _if_ we do a spin here, it should only be long enough to > > > amortize the schedule switch context. > > > > > > However, doing busy waits here has the downside that the 'idle' time is > > > not in fact fed into the cpuidle predictor. > > > > That's why I cc'ed Rafael :) > > > > Yes the latency in my hack is probably too long, but I think if we did > > this, the cpuile predictor could become involved here. There is no > > fundamental reason it has to wait for the idle task to be context > > switched for that... it's already become involved in core scheduler > > code. > > Yes, cpuidle/cpufreq are getting more and more intergrated so there is > no objection from that point. > > Growing multiple 'idle' points otoh is a little dodgy and could cause > some maintenance issues. Right, it should be done a bit better than my patch, which is just a hack. > Of course, this loop would have the same idle-duration problems as the > poll_state.c one. We should probably use that code. Also, do we want to > ask the estimator before doing this? If it predicts a very long idle > time, spinning here is just wasting cycles. I would say so, yes. I think if we did go this route, it should take over the the existing polling idle states, so it would make sense to control it in a similar way. (Unless polling idle is the only state available of course we need to switch to it eventually, and we must immediately switch in case of do_task_dead, etc) Anyway I'll wait for the merge window to settle and try to get some more numbers. I basically just wanted to see if there were any fundamental problems with the concept. Thanks, Nick