2019-10-03 00:28:46

by Crystal Wood

[permalink] [raw]
Subject: [PATCH] tick-sched: Update nohz load even if tick already stopped

The way loadavg is tracked during nohz only pays attention to the load
upon entering nohz. This can be particularly noticeable if nohz is
entered while non-idle, and then the cpu goes idle and stays that way for
a long time. We've had reports of a loadavg near 150 on a mostly idle
system.

Calling calc_load_nohz_start() regardless of whether the tick is already
stopped addresses the issue when going idle. Tracking load changes when
not going idle (e.g. multiple SCHED_FIFO tasks coming and going) is not
addressed by this patch.

Signed-off-by: Scott Wood <[email protected]>
---
kernel/time/tick-sched.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 955851748dc3..f177d8168400 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -763,6 +763,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
ts->do_timer_last = 0;
}

+ /* Even if the tick was already stopped, load may have changed */
+ calc_load_nohz_start();
+
/* Skip reprogram of event if its not changed */
if (ts->tick_stopped && (expires == ts->next_tick)) {
/* Sanity check: make sure clockevent is actually programmed */
@@ -783,7 +786,6 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
* the scheduler tick in nohz_restart_sched_tick.
*/
if (!ts->tick_stopped) {
- calc_load_nohz_start();
quiet_vmstat();

ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
--
1.8.3.1


2019-10-09 15:48:26

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] tick-sched: Update nohz load even if tick already stopped

On Wed, Oct 02, 2019 at 06:55:35PM -0400, Scott Wood wrote:
> The way loadavg is tracked during nohz only pays attention to the load
> upon entering nohz. This can be particularly noticeable if nohz is
> entered while non-idle, and then the cpu goes idle and stays that way for
> a long time. We've had reports of a loadavg near 150 on a mostly idle
> system.
>
> Calling calc_load_nohz_start() regardless of whether the tick is already
> stopped addresses the issue when going idle. Tracking load changes when
> not going idle (e.g. multiple SCHED_FIFO tasks coming and going) is not
> addressed by this patch.
>
> Signed-off-by: Scott Wood <[email protected]>
> ---
> kernel/time/tick-sched.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 955851748dc3..f177d8168400 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -763,6 +763,9 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> ts->do_timer_last = 0;
> }
>
> + /* Even if the tick was already stopped, load may have changed */
> + calc_load_nohz_start();
> +
> /* Skip reprogram of event if its not changed */
> if (ts->tick_stopped && (expires == ts->next_tick)) {
> /* Sanity check: make sure clockevent is actually programmed */
> @@ -783,7 +786,6 @@ static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
> * the scheduler tick in nohz_restart_sched_tick.
> */
> if (!ts->tick_stopped) {
> - calc_load_nohz_start();
> quiet_vmstat();
>
> ts->last_tick = hrtimer_get_expires(&ts->sched_timer);


Thanks. I've pondered over your patch to try to avoid calling
calc_load_nohz_start() unconditionally like that. But in the end the
fast path of this function shouldn't bring much overhead and does pretty
much the same as what I would do to call it conditionally.

So I'm applying it.