Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932523AbdHVKj2 (ORCPT ); Tue, 22 Aug 2017 06:39:28 -0400 Received: from foss.arm.com ([217.140.101.70]:41642 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932347AbdHVKj1 (ORCPT ); Tue, 22 Aug 2017 06:39:27 -0400 References: <20170821152128.14418-1-brendan.jackman@arm.com> <20170821152128.14418-3-brendan.jackman@arm.com> <20170821211400.GF32112@worktop.programming.kicks-ass.net> User-agent: mu4e 0.9.17; emacs 25.1.1 From: Brendan Jackman To: Joel Fernandes Cc: Peter Zijlstra , LKML , Andres Oportus , Ingo Molnar , Morten Rasmussen , Dietmar Eggemann , Vincent Guittot Subject: Re: [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group In-reply-to: Date: Tue, 22 Aug 2017 11:39:26 +0100 Message-ID: <87efs33mzl.fsf@arm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4268 Lines: 105 On Tue, Aug 22 2017 at 04:34, Joel Fernandes wrote: > Hi Peter, > > On Mon, Aug 21, 2017 at 2:14 PM, Peter Zijlstra wrote: >> On Mon, Aug 21, 2017 at 04:21:28PM +0100, Brendan Jackman wrote: >>> The current use of returning NULL from find_idlest_group is broken in >>> two cases: >>> >>> a1) The local group is not allowed. >>> >>> In this case, we currently do not change this_runnable_load or >>> this_avg_load from its initial value of 0, which means we return >>> NULL regardless of the load of the other, allowed groups. This >>> results in pointlessly continuing the find_idlest_group search >>> within the local group and then returning prev_cpu from >>> select_task_rq_fair. >> >>> b) smp_processor_id() is the "idlest" and != prev_cpu. >>> >>> find_idlest_group also returns NULL when the local group is >>> allowed and is the idlest. The caller then continues the >>> find_idlest_group search at a lower level of the current CPU's >>> sched_domain hierarchy. However new_cpu is not updated. This means >>> the search is pointless and we return prev_cpu from >>> select_task_rq_fair. >>> >> >> I think its much simpler than that.. but its late, so who knows ;-) >> >> Both cases seem predicated on the assumption that we'll return @cpu when >> we don't find any idler CPU. Consider, if the local group is the idlest, >> we should stick with @cpu and simply proceed with the child domain. >> >> The confusion, and the bugs, seem to have snuck in when we started >> considering @prev_cpu, whenever that was. The below is mostly code >> movement to put that whole while(sd) loop into its own function. >> >> The effective change is setting @new_cpu = @cpu when we start that loop: >> > >> --- >> kernel/sched/fair.c | 83 +++++++++++++++++++++++++++++++---------------------- >> 1 file changed, 48 insertions(+), 35 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index c77e4b1d51c0..3e77265c480a 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -5588,10 +5588,10 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) >> } >> >> /* >> - * find_idlest_cpu - find the idlest cpu among the cpus in group. >> + * find_idlest_group_cpu - find the idlest cpu among the cpus in group. >> */ >> static int >> -find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) >> +find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) >> { >> unsigned long load, min_load = ULONG_MAX; >> unsigned int min_exit_latency = UINT_MAX; >> @@ -5640,6 +5640,50 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) >> return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu; >> } >> >> +static int >> +find_idlest_cpu(struct sched_domain *sd, struct task_struct *p, int cpu, int sd_flag) >> +{ >> + struct sched_domain *tmp; >> + int new_cpu = cpu; >> + >> + while (sd) { >> + struct sched_group *group; >> + int weight; >> + >> + if (!(sd->flags & sd_flag)) { >> + sd = sd->child; >> + continue; >> + } >> + >> + group = find_idlest_group(sd, p, cpu, sd_flag); >> + if (!group) { >> + sd = sd->child; >> + continue; > > But this will still have the issue of pointlessly searching in the > local_group when the idlest CPU is in the non-local group? Stemming > from the fact that find_idlest_group is broken if the local group is > not allowed. > > I believe this is fixed by Brendan's patch? : > > "Initializing this_runnable_load and this_avg_load to ULONG_MAX > instead of 0. This means in case a1) we now return the idlest > non-local group." > Yeah, I don't think this is enough, firstly because of affinity and secondly because I _guess_ we don't want to migrate from prev_cpu to @cpu if none of the sched_domains have sd_flag set (...right? That would feel like wake balancing even when there's no SD_BALANCE_WAKE anywhere). However the code movement helps - I'll combine it with Vincent's suggestions and post a v2.