Received: by 10.223.176.5 with SMTP id f5csp4285215wra; Tue, 30 Jan 2018 05:07:06 -0800 (PST) X-Google-Smtp-Source: AH8x2255+RExZ/XZFWc7Fvd8ivjQNoDBrCV/tj1AleFLi3G/pxaEldf0bBtoPB+fHw2HUBDz1+Dx X-Received: by 2002:a17:902:7789:: with SMTP id o9-v6mr5129669pll.84.1517317626724; Tue, 30 Jan 2018 05:07:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517317626; cv=none; d=google.com; s=arc-20160816; b=SXRTP8/Fc2CVYo1F/7dbxgEIqMgVeXkg5FIUItzYGT+TbVNoHFADjHvE48BE2wI/z8 opJqKkOwaOZXCRqmxTRysKE0S6I0gff4nxB0E/B8S+TlWqBSwb8ipZCO1WhLnO9/iysc rYaQR/8WdeQebCzpXIgVPOodFSb8QOHQaQCmHEd+YVJzEDOUbyGCtKh2X6XtzdUEP1kU 65Ww3SH/kHdfk1SiDFmG5P+N+sAtfsdQY+7HsgNL03LGkuVAVL/KasHDnYNNYx1ZnRNY Tgc1yX5bdlT9yWHUFiMlVd6k97JuYdru4A+PqmZ3AbqOeWpMG0nqeA8kkj1zYo5QA0g/ PoSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=hhRSrBEI3gMnv9v7OSMPoPbio+NAwy4kvP2MY/tpPZc=; b=xf3GSx5EOKr21kr6HbkpGEwEdIh9p//S2MDm1DEL35KJ1IiU8/89RY29qFjCuixjdX abKD3p3ru9WuvwfLvM6mX1lBQAfUT2ChofyCyVZpl5oDMzpfpBvTfqnjmwHeSttvGTB/ O8AR3k/RvgclNdlmqq8vlzQMdxuw8RkTOJJI7F7Zsh2oGB9mgG6v068ryCh3w+bT/42f nTW5wqfddmqMLRDtegNbP4tFrJmioBDWLJI6vsB6e7twDFpKxqYO/vKndvBTUMBPRxwJ GikkC/r16fQt7/rD6/CjxE7ngytlqeMTfTuQ1wHlfMVprmGoycqHB1gkzeDzOEveSvWR oSjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZWfxwIto; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a33-v6si11744241pli.154.2018.01.30.05.06.52; Tue, 30 Jan 2018 05:07:06 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZWfxwIto; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752185AbeA3NGX (ORCPT + 99 others); Tue, 30 Jan 2018 08:06:23 -0500 Received: from mail-io0-f194.google.com ([209.85.223.194]:38745 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751831AbeA3NGU (ORCPT ); Tue, 30 Jan 2018 08:06:20 -0500 Received: by mail-io0-f194.google.com with SMTP id d13so11394984iog.5 for ; Tue, 30 Jan 2018 05:06:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=hhRSrBEI3gMnv9v7OSMPoPbio+NAwy4kvP2MY/tpPZc=; b=ZWfxwItoC3MVieEL1ew8oCXL9xaYd014m/nnCG5s0ySJ+3B9v15wT1nuiQFYMxfJNN gCNpifGJA/2HVeL+QmgVc8LxHfE0L2P4tMPtCZVsZUVW9JmhzTqj/xXUvcHCSx8qjmN0 V987gUU3R14aJqnr/LWTbKqzNC8eTqKXWsJmk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=hhRSrBEI3gMnv9v7OSMPoPbio+NAwy4kvP2MY/tpPZc=; b=rRjfxQhWetchf/RfRfCp+s6/QuJGjd5rBibZcMhDPX9MencKEURK+Atn+dTi1lZ69H DigVKU+abB2pdR4aUrJTeJwVJ6LNro9jTk4huHdIsU++biizkSZIB5w7HbG23xiQgj5L ek7EvRG/IuQzmt0oaRLeOgeGAlWAxXMS4oC52s4pY9nxWCKdkgsrn/xm5PXNxE9E0VGG bHuDG5Agw7C05E0MTGJwGVbsh7FmmXd98wJ7FfADiFPW9ET0cEZEqPcdKh6tbwXF8+S5 1VQPWVdwGBbI5xs6MNuD0Xbo0zQfRTGIt3/vVe4SS5giXfTzL5KN6d9baDplAC3k7AwH zkjQ== X-Gm-Message-State: AKwxytd7vt72J+YA9PXQuwk4++qW+DCLUd+eP/7YrOTIG4O3JBHeHdmo 3PekjZSI+T95mMiorbJKZ7RTAcLhMQ15mOj2rAakEQ== X-Received: by 10.107.148.68 with SMTP id w65mr22495088iod.65.1517317579633; Tue, 30 Jan 2018 05:06:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.50.198 with HTTP; Tue, 30 Jan 2018 05:05:59 -0800 (PST) In-Reply-To: <4fb17712-3f98-c407-42b3-4601a4f294d9@arm.com> References: <20171222075934.f6yenvcb2zkf2ysd@hirez.programming.kicks-ass.net> <20171222082915.4lcb7xyyooqyjpia@hirez.programming.kicks-ass.net> <20171222091221.ow5vn3ydx3hj4nht@hirez.programming.kicks-ass.net> <20171222185629.lysjebfifgdwvvhu@hirez.programming.kicks-ass.net> <20171222204247.kyc6ugyyu3ei7zhs@hirez.programming.kicks-ass.net> <20180115082609.GA6320@linaro.org> <20180118103807.GD28799@e105550-lin.cambridge.arm.com> <20180124082536.GA32318@linaro.org> <4fb17712-3f98-c407-42b3-4601a4f294d9@arm.com> From: Vincent Guittot Date: Tue, 30 Jan 2018 14:05:59 +0100 Message-ID: Subject: Re: [RFC PATCH 2/5] sched: Add NOHZ_STATS_KICK To: Valentin Schneider Cc: Peter Zijlstra , Morten Rasmussen , Ingo Molnar , linux-kernel , Brendan Jackman , Dietmar Eggemann , Morten Rasmussen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30 January 2018 at 12:41, Valentin Schneider wrote: > (Resending because I snuck in some HTML... Apologies) > > On 01/30/2018 08:32 AM, Vincent Guittot wrote: >> >> On 29 January 2018 at 20:31, Valentin Schneider >> wrote: >>> >>> Hi Vincent, Peter, >>> >>> I've been running some tests on your patches (Peter's base + the 2 from >>> Vincent). The results themselves are hosted at [1]. >>> The base of those tests is the same: a task ("accumulator") is ran for 5 >>> seconds (arbitrary value) to accumulate some load, then goes to sleep for >>> .5 >>> seconds. >>> >>> I've set up 3 test scenarios: >>> >>> Update by nohz_balance_kick() >>> ----------------------------- >>> Right before the "accumulator" task goes to sleep, a CPU-hogging task >>> (100% >>> utilization) is spawned on another CPU. It won't go idle so the only way >>> to >>> update the blocked load generated by "accumulator" is to kick an ILB >>> (NOHZ_STATS_KICK). >>> >>> The test shows that this is behaving nicely - we keep kicking an ILB >>> every >>> ~36ms (see next test for comments on that) until there is no more blocked >>> load. I did however notice some interesting scenarios: after the load has >>> been fully decayed, a tiny background task can spawn and end in less than >>> a >>> scheduling period. However, it still goes through >>> nohz_balance_enter_idle(), >>> and thus sets nohz.stats_state, which will later cause an ILB kick. >>> >>> This makes me wonder if it's worth kicking ILBs for such tiny load values >>> - >>> perhaps it could be worth having a margin to set rq->has_blocked_load ? >> >> >> So it's difficult to know what will be the load/utilization on the >> cfs_rq once the cpu wakes up. Even if it's for a really short time, >> that's doesn't mean that the load/utilization is small because it can >> be the migration of a big task that just have a very short wakes up >> this time. >> That's why I don't make any assumption on the utilization/load value >> when a cpu goes to sleep >> > > Right, hadn't thought about those kind of migrations. > >>> >>> Furthermore, this tiny task will cause the ILB to iterate over all of the >>> idle CPUs, although only one has stale load. For load update via >>> NEWLY_IDLE >>> load_balance() we use: >>> >>> static bool update_nohz_stats(struct rq *rq) >>> { >>> if (!rq->has_blocked_load) >>> return false; >>> [...] >>> } >>> >>> But for load update via _nohz_idle_balance(), we iterate through all of >>> the >>> nohz CPUS and unconditionally call update_blocked_averages(). This could >>> be >>> avoided by remembering which CPUs have stale load before going idle. >>> Initially I thought that was what nohz.stats_state was for, but it isn't. >>> With Vincent's patches it's only ever set to either 0 or 1, but we could >>> use >>> it as a CPU mask, and use it to skip nohz CPUs that don't have stale load >>> in >>> _nohz_idle_balance() (when NOHZ_STATS_KICK). >> >> >> I have studied a way to keep track of how many cpus still have blocked >> load to try to minimize the number of useless ilb kick but this add >> more atomic operations which can impact the system throughput with >> heavy load and lot of very small wake up. that's why i have propose >> this solution which is more simple. But it's probably just a matter of >> where we want to "waste" time. Either we accept to spent a bit more >> time to check the state of idle CPUs or we accept to kick ilb from >> time to time for no good reason. >> > > Agreed. I have the feeling that spending more time doing atomic ops could be > worth it - I'll try to test this out and see if it's actually relevant. > >>> >>> Update by idle_balance() >>> ------------------------ >>> Right before the "accumulator" task goes to sleep, a tiny periodic >>> (period=32ms) task is spawned on another CPU. It's expected that it will >>> update the blocked load in idle_balance(), either by running >>> _nohz_idle_balance() locally or kicking an ILB (The overload flag >>> shouldn't >>> be set in this test case, so we shouldn't go through the NEWLY_IDLE >>> load_balance()). >>> >>> This also seems to be working fine, but I'm noticing a delay between load >>> updates that is closer to 64ms than 32ms. After digging into it I found >>> out >>> that the time checks done in idle_balance() and nohz_balancer_kick() are >>> time_after(jiffies, next_stats), but IMHO they should be >>> time_after_eq(jiffies, next_stats) to have 32ms-based updates. This also >>> explains the 36ms periodicity of the updates in the test above. >> >> >> I have use the 32ms as a minimum value between update. We must use the >> time_after() if we want to have at least 32ms between each update. We >> will have a 36ms period if the previous update was triggered by the >> tick (just after in fact) but there will be only 32ms if the last >> update was done during an idle_balance that happens just before the >> tick. With time_after_eq, the update period will between 28 and >> 32ms. >> >> Then, I mention a possible optimization by using time_after_eq in the >> idle_balance() so a newly_idle cpu will have more chance (between 0 >> and 4ms for hz250) to do the update before a ilb is kicked >> > > IIUC with time_after() the update period should be within ]32, 36] ms, but > it looks like I'm always on that upper bound in my tests. > > When evaluating whether we need to kick_ilb() for load updates, we'll always > be right after the tick (excluding the case in idle_balance), which explains > why we wait for an extra tick in the "update by nohz_balancer_kick()" test > case. > > The tricky part is that, as you say, the update by idle_balance() can happen > anywhere between [0-4[ ms after a tick (or before, depending on how you see > it), so using time_after_eq could make the update period < 32ms - and this > also impacts a load update by nohz_balance_kick() if the previous update was > done by idle_balance()... This is what causes the update period to be closer > to 64ms in my test case, but it's somewhat artificial because I only have a > 32ms-periodic task running - if there was any other task running the period > could remain in that ]32, 36] ms interval. > > Did I get that right ? yes > >> Thanks, >> Vincent >> >>> >>> >>> No update (idle system) >>> ----------------------- >>> Nothing special here, just making sure nothing happens when the system is >>> fully idle. On a sidenote, that's relatively hard to achieve - I had to >>> switch over to Juno because my HiKey960 gets interrupts every 16ms. The >>> Juno >>> still gets woken up every now and then but it's a bit quieter. >>> >>> >>> [1]: >>> https://gist.github.com/valschneider/a8da7bb8e11fb1ec63a419710f56c0a0 >>> >>> >>> [snip]