Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932589AbdHVKpj (ORCPT ); Tue, 22 Aug 2017 06:45:39 -0400 Received: from foss.arm.com ([217.140.101.70]:41896 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932347AbdHVKpj (ORCPT ); Tue, 22 Aug 2017 06:45:39 -0400 References: <20170821152128.14418-1-brendan.jackman@arm.com> <20170821152128.14418-3-brendan.jackman@arm.com> <20170821211400.GF32112@worktop.programming.kicks-ass.net> <87efs33mzl.fsf@arm.com> User-agent: mu4e 0.9.17; emacs 25.1.1 From: Brendan Jackman To: Joel Fernandes Cc: Peter Zijlstra , LKML , Andres Oportus , Ingo Molnar , Morten Rasmussen , Dietmar Eggemann , Vincent Guittot Subject: Re: [PATCH 2/2] sched/fair: Fix use of NULL with find_idlest_group In-reply-to: <87efs33mzl.fsf@arm.com> Date: Tue, 22 Aug 2017 11:45:42 +0100 Message-ID: <87bmn73mp5.fsf@arm.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4566 Lines: 110 On Tue, Aug 22 2017 at 10:39, Brendan Jackman wrote: > On Tue, Aug 22 2017 at 04:34, Joel Fernandes wrote: >> Hi Peter, >> >> On Mon, Aug 21, 2017 at 2:14 PM, Peter Zijlstra wrote: >>> On Mon, Aug 21, 2017 at 04:21:28PM +0100, Brendan Jackman wrote: >>>> The current use of returning NULL from find_idlest_group is broken in >>>> two cases: >>>> >>>> a1) The local group is not allowed. >>>> >>>> In this case, we currently do not change this_runnable_load or >>>> this_avg_load from its initial value of 0, which means we return >>>> NULL regardless of the load of the other, allowed groups. This >>>> results in pointlessly continuing the find_idlest_group search >>>> within the local group and then returning prev_cpu from >>>> select_task_rq_fair. >>> >>>> b) smp_processor_id() is the "idlest" and != prev_cpu. >>>> >>>> find_idlest_group also returns NULL when the local group is >>>> allowed and is the idlest. The caller then continues the >>>> find_idlest_group search at a lower level of the current CPU's >>>> sched_domain hierarchy. However new_cpu is not updated. This means >>>> the search is pointless and we return prev_cpu from >>>> select_task_rq_fair. >>>> >>> >>> I think its much simpler than that.. but its late, so who knows ;-) >>> >>> Both cases seem predicated on the assumption that we'll return @cpu when >>> we don't find any idler CPU. Consider, if the local group is the idlest, >>> we should stick with @cpu and simply proceed with the child domain. >>> >>> The confusion, and the bugs, seem to have snuck in when we started >>> considering @prev_cpu, whenever that was. The below is mostly code >>> movement to put that whole while(sd) loop into its own function. >>> >>> The effective change is setting @new_cpu = @cpu when we start that loop: >>> >> >>> --- >>> kernel/sched/fair.c | 83 +++++++++++++++++++++++++++++++---------------------- >>> 1 file changed, 48 insertions(+), 35 deletions(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index c77e4b1d51c0..3e77265c480a 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -5588,10 +5588,10 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) >>> } >>> >>> /* >>> - * find_idlest_cpu - find the idlest cpu among the cpus in group. >>> + * find_idlest_group_cpu - find the idlest cpu among the cpus in group. >>> */ >>> static int >>> -find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) >>> +find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) >>> { >>> unsigned long load, min_load = ULONG_MAX; >>> unsigned int min_exit_latency = UINT_MAX; >>> @@ -5640,6 +5640,50 @@ static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) >>> return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu; >>> } >>> >>> +static int >>> +find_idlest_cpu(struct sched_domain *sd, struct task_struct *p, int cpu, int sd_flag) >>> +{ >>> + struct sched_domain *tmp; >>> + int new_cpu = cpu; >>> + >>> + while (sd) { >>> + struct sched_group *group; >>> + int weight; >>> + >>> + if (!(sd->flags & sd_flag)) { >>> + sd = sd->child; >>> + continue; >>> + } >>> + >>> + group = find_idlest_group(sd, p, cpu, sd_flag); >>> + if (!group) { >>> + sd = sd->child; >>> + continue; >> >> But this will still have the issue of pointlessly searching in the >> local_group when the idlest CPU is in the non-local group? Stemming >> from the fact that find_idlest_group is broken if the local group is >> not allowed. >> >> I believe this is fixed by Brendan's patch? : >> >> "Initializing this_runnable_load and this_avg_load to ULONG_MAX >> instead of 0. This means in case a1) we now return the idlest >> non-local group." >> > > Yeah, I don't think this is enough, firstly because of affinity and > secondly because I _guess_ we don't want to migrate from prev_cpu to > @cpu if none of the sched_domains have sd_flag set (...right? That would > feel like wake balancing even when there's no SD_BALANCE_WAKE anywhere). Hmm, actually we always have sd_flag set, because otherwise we wouldn't have done sd = tmp at the beginning of select_task_rq_fair. > > However the code movement helps - I'll combine it with Vincent's > suggestions and post a v2.