References: <1500038464-8742-1-git-send-email-josef@toxicpanda.com> <1500038464-8742-8-git-send-email-josef@toxicpanda.com>
User-agent: mu4e 0.9.17; emacs 25.1.1
From: Brendan Jackman <brendan.jackman@arm.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: mingo@redhat.com, peterz@infradead.org, linux-kernel@vger.kernel.org,
        umgwanakikbuti@gmail.com, tj@kernel.org, kernel-team@fb.com,
        Josef Bacik <jbacik@fb.com>
Subject: Re: [PATCH 7/7] sched/fair: don't wake affine recently load balanced tasks
Message-ID: <87fudbsgs1.fsf@arm.com>
In-reply-to: <1500038464-8742-8-git-send-email-josef@toxicpanda.com>
Date: Tue, 01 Aug 2017 11:51:05 +0100
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2339
Lines: 68

Hi Josef,

I happened to be thinking about something like this while investigating
a totally different issue with ARM big.LITTLE. Comment below...

On Fri, Jul 14 2017 at 13:21, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
>
> The wake affinity logic will move tasks between two cpu's that appear to be
> loaded equally at the current time, with a slight bias towards cache locality.
> However on a heavily loaded system the load balancer has a better insight into
> what needs to be moved around, so instead keep track of the last time a task was
> migrated by the load balancer.  If it was recent, opt to let the process stay on
> it's current CPU (or an idle sibling).
>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  include/linux/sched.h |  1 +
>  kernel/sched/fair.c   | 11 +++++++++++
>  2 files changed, 12 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1a0eadd..d872780 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -528,6 +528,7 @@ struct task_struct {
>  	unsigned long			wakee_flip_decay_ts;
>  	struct task_struct		*last_wakee;
>
> +	unsigned long			last_balance_ts;
>  	int				wake_cpu;
>  #endif
>  	int				on_rq;
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 034d5df..6a98a38 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5604,6 +5604,16 @@ static int wake_wide(struct task_struct *p)
>  	unsigned int slave = p->wakee_flips;
>  	int factor = this_cpu_read(sd_llc_size);
>
> +	/*
> +	 * If we've balanced this task recently we don't want to undo all of
> +	 * that hard work by the load balancer and move it to the current cpu.
> +	 * Constantly overriding the load balancers decisions is going to make
> +	 * it question its purpose in life and give it anxiety and self worth
> +	 * issues, and nobody wants that.
> +	 */
> +	if (time_before(jiffies, p->last_balance_ts + HZ))
> +		return 1;
> +
>  	if (master < slave)
>  		swap(master, slave);
>  	if (slave < factor || master < slave * factor)
> @@ -7097,6 +7107,7 @@ static int detach_tasks(struct lb_env *env)
>  			goto next;
>
>  		detach_task(p, env);
> +		p->last_balance_ts = jiffies;

I guess this timestamp should be set in the active balance path too?

>  		list_add(&p->se.group_node, &env->tasks);
>
>  		detached++;

Cheers,
Brendan