Received: by 10.223.185.116 with SMTP id b49csp1031087wrg; Fri, 16 Feb 2018 11:08:07 -0800 (PST) X-Google-Smtp-Source: AH8x224RL1bX9mWADs+JbYHpsyEH0UoxGBzfaYoJVG6pYF/YD2X82LiRd/dpZvMKcYL1SD4rbGVV X-Received: by 10.99.114.18 with SMTP id n18mr5864679pgc.169.1518808087333; Fri, 16 Feb 2018 11:08:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518808087; cv=none; d=google.com; s=arc-20160816; b=PxeQjGS373Pz2HUjLEF4UOFh9gDSbd5OfBDxd1OJm+KHewR7RA3NXQrN7tL6q8q8KH GEv6zDagNYDkL1w2FX71Kb9REWJZdhMa3WxQM1YuJyLfdlBEOoUMpck+wlbWFsSP1E+H NG5oq3n4aKblb2ic2GRCGPBRluUeiEhAbt8QA02cqWg264LBQKR86RHFSTc3XHpqXAis 16qTyrwIIt1xh8BFPXhSpMXiKMYdQLH1lnBQ8hg/ZfJafBu/cJ57v+dVSmSPA3HJ/KUf Gr/TQH+VLXITc1SBZcf8aBZn8chblrrBXxwjyE2NgRwLkKntB5AsGEPf2pVJsZEICPLF 1gZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=Pb9sFsC9rsGctUQc7r9G75HZnIdlM5a8calKScULkIY=; b=gFOJZD66gOUH7Ub6Vy1SAIKvxzv+7xJobGsfM29XUxJGCx1oNsRZPNjcbGw486FVi1 VOxb9N5OHXhowt2n5s0EixW2/8LwoZ477EiVIXM6Ck5oGTp0sXEgOg6DXCPYX6jk/F0f QhCTPpvE3sqrQ3kGq8k+xP4N06WzRvkpfKLNWdVHDOwW2TU9K0xhvpi+T4Tmk60OjkTu LDPGil5tgDmpjU4xf68dYOtQeuxXd28Q/c1O88NeG6JlDkgmC7vm0l/ZKKYlRBvpI7yx eKM02rkA0kMZNiVq2coKPRANukECf2Br4hzRmzEKc9nxDRXiqreyvys2ILWpBZwdTJ8j pHFw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e6si1964603pfg.279.2018.02.16.11.07.53; Fri, 16 Feb 2018 11:08:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967817AbeBPMxk (ORCPT + 99 others); Fri, 16 Feb 2018 07:53:40 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:39384 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S967661AbeBPMxi (ORCPT ); Fri, 16 Feb 2018 07:53:38 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 984DA1435; Fri, 16 Feb 2018 04:53:38 -0800 (PST) Received: from [10.1.206.74] (e113632-lin.cambridge.arm.com [10.1.206.74]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 609033F24D; Fri, 16 Feb 2018 04:53:37 -0800 (PST) Subject: Re: [PATCH v5 1/3] sched: Stop nohz stats when decayed To: Vincent Guittot , peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org Cc: morten.rasmussen@foss.arm.com, brendan.jackman@arm.com, dietmar.eggemann@arm.com References: <1518622006-16089-1-git-send-email-vincent.guittot@linaro.org> <1518622006-16089-2-git-send-email-vincent.guittot@linaro.org> From: Valentin Schneider Message-ID: Date: Fri, 16 Feb 2018 12:53:36 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <1518622006-16089-2-git-send-email-vincent.guittot@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/14/2018 03:26 PM, Vincent Guittot wrote: > Stopped the periodic update of blocked load when all idle CPUs have fully > decayed. We introduce a new nohz.has_blocked that reflect if some idle > CPUs has blocked load that have to be periodiccally updated. nohz.has_blocked > is set everytime that a Idle CPU can have blocked load and it is then clear > when no more blocked load has been detected during an update. We don't need > atomic operation but only to make cure of the right ordering when updating > nohz.idle_cpus_mask and nohz.has_blocked. > > Suggested-by: Peter Zijlstra (Intel) > Signed-off-by: Vincent Guittot > --- > kernel/sched/fair.c | 122 ++++++++++++++++++++++++++++++++++++++++++--------- > kernel/sched/sched.h | 1 + > 2 files changed, 102 insertions(+), 21 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 7af1fa9..5a6835e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > > [...] > > -static void update_nohz_stats(struct rq *rq) > +static bool update_nohz_stats(struct rq *rq) > { > #ifdef CONFIG_NO_HZ_COMMON > unsigned int cpu = rq->cpu; > > + if (!rq->has_blocked_load) > + return false; > + > if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask)) > - return; > + return false; > > if (!time_after(jiffies, rq->last_blocked_load_update_tick)) > - return; > + return true; > > update_blocked_averages(cpu); > + > + return rq->has_blocked_load; > +#else > + return false; > #endif > } > (Wrongly thought that this bit was in a different patch, comment should have been squashed in previous reply...) I've been thinking about this, and it's a messy one if we want to skip CPUs in idle_balance() / clear the nohz.has_blocked_flag. AFAICT, the rq->has_blocked_load flag can be wrongly cleared: if we're calling update_nohz_stats() for CPU A, but CPU A got out/in of idle really quickly in that same timeframe, I'm not sure you can guarantee the clearing of rq->has_blocked_load done in update_blocked_averages() will always end up in memory before the setting of the flag in nohz_balance_enter_idle(). I was going to say we don't have this problem in _nohz_idle_balance() but actually I think we do. We have the checking of nohz.idle_cpus_mask after the smp_mb(), which makes sure the clear of nohz.has_blocked will never overwrite the set in nohz_balance_enter_idle(), but it doesn't guarantee the same for the rq flag. So we can have nohz CPUs with blocked load but with rq->has_blocked_load set to false. Which isn't a problem now but it is if we want to use the flag to skip CPUs. Am I correct or am I going crazy ? There's a comment about this in nohz_balance_enter_idle() but I'm confused now: /* * Can be set safely without rq->lock held * If a clear happens, it will have evaluated last additions because * rq->lock is held during the check and the clear */ rq->has_blocked_load = 1;