Received: by 10.223.176.5 with SMTP id f5csp3249956wra; Mon, 29 Jan 2018 10:45:37 -0800 (PST) X-Google-Smtp-Source: AH8x226TP8v1QzmjyI6lWZ5fgv4O6Spm6kr1oLoM1NlBUwkHtkO25UndIh/yXjKVh/CbxEwSvCj2 X-Received: by 10.101.81.197 with SMTP id i5mr22326034pgq.220.1517251537023; Mon, 29 Jan 2018 10:45:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517251536; cv=none; d=google.com; s=arc-20160816; b=06l/dBpYRzlijwqKRixqb9FO2e0/OVfFdI4pdA9NDyvNR00Va4lir3hilVQcYHwbzg e4u77hxZTgxqTfQQsnHn3GFkuDuRfATsXDMsr1XCdqx27eskAzXN6Vr8vWDuTsP9IiZl cIitw0ZRuyvDzJMb0TQWCX5PenjhLCTHlubgelY4il0XWGeV0Ej3pQ3vhqn07w4zVmWc heFdiuytM5M8hLAFjlXXjYr29HyX7y6XSVnLWLRy8iMVet6jSi+LzGGA+/Qg5Pu0eI9Y n877GioZ4LzTPiuPINsF42yh11ETwbgPz988D91jy7p3HY2rey43OMBbKPYjvV1Yj6QA dhcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=/inHIGARW4VJb6UVUo7MeVayzdk0cbbhQgpnex6NtMs=; b=jx23/b88TLoQMOJqmhruvZVkIJiEBlMRRkxK6PsGLDyWOwOZHYiczyoS5RWjByiCZW LTjk9wCsjRj21oddxGquHRqq5rzRfvvXA5dGYCW+hw8qf7JASdoQf21rqsx8rh5Ksn/H G2wl8sk+R8tEI/pcijWKMdbUXEmSb1wR96G7TP5TDbCjXELbRq28NRgfCUiwS+dU3oXv lyi8wnFMZdzbTgX8yAXKE4F6YgLNWUg4Jrg2ziSa8fssOKuWVbQJtW0MxThYFf8AZe1r i3IaP09hwKJxHpmU3i3Q6AXfBJY0Au7AJtUPF2WQCrRAZStS8I89r+8en6USHLVx/MnI RenA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d4si100554pgt.188.2018.01.29.10.45.22; Mon, 29 Jan 2018 10:45:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751712AbeA2Snu (ORCPT + 99 others); Mon, 29 Jan 2018 13:43:50 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:44630 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751357AbeA2Sns (ORCPT ); Mon, 29 Jan 2018 13:43:48 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B6C9F1435; Mon, 29 Jan 2018 10:43:47 -0800 (PST) Received: from [0.0.0.0] (e107985-lin.cambridge.arm.com [10.1.210.41]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 336CB3F487; Mon, 29 Jan 2018 10:43:46 -0800 (PST) Subject: Re: [RFC PATCH 2/5] sched: Add NOHZ_STATS_KICK To: Vincent Guittot , Peter Zijlstra , Morten Rasmussen Cc: Ingo Molnar , linux-kernel , Brendan Jackman , Morten Rasmussen References: <20171222075934.f6yenvcb2zkf2ysd@hirez.programming.kicks-ass.net> <20171222082915.4lcb7xyyooqyjpia@hirez.programming.kicks-ass.net> <20171222091221.ow5vn3ydx3hj4nht@hirez.programming.kicks-ass.net> <20171222185629.lysjebfifgdwvvhu@hirez.programming.kicks-ass.net> <20171222204247.kyc6ugyyu3ei7zhs@hirez.programming.kicks-ass.net> <20180115082609.GA6320@linaro.org> <20180118103807.GD28799@e105550-lin.cambridge.arm.com> <20180124082536.GA32318@linaro.org> From: Dietmar Eggemann Message-ID: Date: Mon, 29 Jan 2018 19:43:44 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180124082536.GA32318@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/24/2018 09:25 AM, Vincent Guittot wrote: > Hi, > > Le Thursday 18 Jan 2018 à 10:38:07 (+0000), Morten Rasmussen a écrit : >> On Mon, Jan 15, 2018 at 09:26:09AM +0100, Vincent Guittot wrote: >>> Le Wednesday 03 Jan 2018 à 10:16:00 (+0100), Vincent Guittot a écrit : [...] >>> >>> Hi Peter, >>> >>> With the patch below on top of your branch, the blocked loads are updated and >>> decayed regularly. The main differences are: >>> - It doesn't use a timer to trig ilb but the tick and when a cpu becomes idle. >>> The main drawback of this solution is that the load is blocked when the >>> system is fully idle with the advantage of not waking up a fully idle >>> system. We have to wait for the next tick or newly idle event for updating >>> blocked load when the system leaves idle stat which can be up to a tick long. >>> If this is too long, we can check for kicking ilb when task wakes up so the >>> blocked load will be updated as soon as the system leaves idle state. >>> The main advantage is that we don't wake up a fully idle system every 32ms to >>> update blocked load that will be not used. >>> - I'm working on one more improvement to use nohz_idle_balance in the newly >>> idle case when the system is not overloaded and >>> (this_rq->avg_idle > sysctl_sched_migration_cost). In this case, we can try to >>> use nohz_idle_balance with NOHZ_STATS_KICK and abort as soon as it exceed >>> this_rq->avg_idle. This will remove some calls to kick_ilb and some wake up >>> of an idle cpus. >> >> This sound like what I meant in my other reply :-) >> >> It seems pointless to have a timer to update PELT if the system is >> completely idle, and when it isn't we can piggy back other events to >> make the updates happen. > > The patch below implements what has been described above. It calls part of > nohz_idle_balance when a cpu becomes idle and kick a ilb if it takes too much > time. This removes part of ilb that are kicked on an idle cpu for updating > the blocked load but the ratio really depends on when the tick happens compared > to a cpu becoming idle and the 32ms boundary. I have an additionnal patch that > enables to update the blocked loads when a cpu becomes idle 1 period before > kicking an ilb and there is far less ilb because we give more chance to the > newly idle case (time_after is replaced by time_after_eq in idle_balance()). > > The patch also uses a function cfs_rq_has_blocked, which only checks the > util/load_avg, instead of the cfs_rq_is_decayed which check *_sum too. This > reduce significantly the number of update of blocked load. the *_avg will be > fully decayed in around 300~400ms but it's far longer for the *_sum which have > a higher resolution and we can easily reach almost seconds. But only the *_avg > are used to make decision so keeping some blocked *_sum is acceptable. > > --- > kernel/sched/fair.c | 121 +++++++++++++++++++++++++++++++++++++++------------- > 1 file changed, 92 insertions(+), 29 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 898785d..ed90303 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7356,6 +7356,17 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq) > return true; > } > > +static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq) > +{ > + if (cfs_rq->avg.load_avg) > + return true; > + > + if (cfs_rq->avg.util_avg) > + return true; > + > + return false; > +} > + Can we not change cfs_rq_is_decayed() to use avg.foo_avg instead of avg.foo_sum ? > #ifdef CONFIG_FAIR_GROUP_SCHED > > static void update_blocked_averages(int cpu) > @@ -7393,7 +7404,9 @@ static void update_blocked_averages(int cpu) > */ > if (cfs_rq_is_decayed(cfs_rq)) > list_del_leaf_cfs_rq(cfs_rq); > - else > + > + /* Don't need periodic decay once load/util_avg are null */ > + if (cfs_rq_has_blocked(cfs_rq)) > done = false; > } > > @@ -7463,7 +7476,7 @@ static inline void update_blocked_averages(int cpu) > update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq); > #ifdef CONFIG_NO_HZ_COMMON > rq->last_blocked_load_update_tick = jiffies; > - if (cfs_rq_is_decayed(cfs_rq)) > + if (cfs_rq_has_blocked(cfs_rq)) Schouldn't this be !cfs_rq_has_blocked(cfs_rq) ? > rq->has_blocked_load = 0; > #endif > rq_unlock_irqrestore(rq, &rf); [...] > @@ -9438,7 +9451,17 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > */ > if (need_resched()) { > has_blocked_load = true; > - break; > + goto abort; > + } > + > + /* > + * If the update is done while CPU becomes idle, we abort > + * the update when its cost is higher than the average idle > + * time in orde to not delay a possible wake up. > + */ > + if (idle == CPU_NEWLY_IDLE && this_rq->avg_idle < curr_cost) { > + has_blocked_load = true; > + goto abort; > } > > rq = cpu_rq(balance_cpu); > @@ -9453,10 +9476,10 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > if (time_after_eq(jiffies, rq->next_balance)) { > struct rq_flags rf; > > - rq_lock_irq(rq, &rf); > + rq_lock_irqsave(rq, &rf); > update_rq_clock(rq); > cpu_load_update_idle(rq); > - rq_unlock_irq(rq, &rf); > + rq_unlock_irqrestore(rq, &rf); > > if (flags & NOHZ_BALANCE_KICK) > rebalance_domains(rq, CPU_IDLE); > @@ -9466,10 +9489,17 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) > next_balance = rq->next_balance; > update_next_balance = 1; > } Why do you do this cpu_load_update_idle(rq) even this was called with CPU_NEWLY_IDLE? Wouldn't it be sufficient to jump to the curr_cost calculation in this case? > + > + domain_cost = sched_clock_cpu(this_cpu) - t0; > + curr_cost += domain_cost; > + > } > > - update_blocked_averages(this_cpu); > - has_blocked_load |= this_rq->has_blocked_load; > + /* Newly idle CPU doesn't need an update */ > + if (idle != CPU_NEWLY_IDLE) { > + update_blocked_averages(this_cpu); > + has_blocked_load |= this_rq->has_blocked_load; > + } > > if (flags & NOHZ_BALANCE_KICK) > rebalance_domains(this_rq, CPU_IDLE); [...]