Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp423843pxb; Fri, 16 Apr 2021 08:54:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwL+l2YORxtPgCldcY7tGPIMFS82t/DmEZwInIYq9xLql7yHeMfD/zM+1j8vRGqaHWpK9+t X-Received: by 2002:a63:5b26:: with SMTP id p38mr8923931pgb.141.1618588452011; Fri, 16 Apr 2021 08:54:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618588452; cv=none; d=google.com; s=arc-20160816; b=RDE6FVg2nfKZompafpCw3pIdFsoaAcriuo95170McBb0RwQnt0a3ujzRZmBiNWmpBK iWKsbYnmmLe/j0cyYkjpt0YqkAqqLxzWAnBIWdZIfqNOR3YWq1ku2amUCJrPgErTM6S5 LrPojN4v50bMCgMt72fhHhq8PunNoUJooaE+apunYz1sW/Vt9KwNDND254Iyq7+grIem 6muvh9heM8WawzlE+SDaC4KPcmpCT7l0E9of1ww7R1j/nPb1ryTwKY/ymaKhDDtp3OOj GHH8JjwipHes7/nLG6I5EMro61PnO9mFMairLPFIJRGxEH3yXDUV65RBlHkuKmtOqWYt Ab5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=Icv6wfageHWd4iMnekPzpHhUck/9+vUG68GbtOfui5s=; b=OCYeUmS6WueCd0nbyd3TvHErzYLfLXemwc18E5pVy4Qk44TufIx0Ty/AJOFAFfRQk8 LURAAfJkxSTsnQxZQkkbKqIZHqwzv0G+9nfTVnfbS8u13IpbLqKeO6tHKlhw374rs0RP ujGSeQFzKkIalGVPKJNvfvG30niUiJvGNcJbY9UD2te7iMPiGSbb5Tj9wgmkVBenr7tb GA0A7djPiPlrgKNcHd6vCJmbeZEl7w7ICJz0tIuxthYNpNY8JWYzpne4/LfG+02iCj9g NsJGp/ddrKyqE7ZROX/GGoq0Rpu7FFAlBqAcjURloqWaC1AYGM50gcKGE5xLRTHE+vfq Sndw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ThgZH6a8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h2si6551215pfc.321.2021.04.16.08.53.59; Fri, 16 Apr 2021 08:54:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ThgZH6a8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241311AbhDPNaL (ORCPT + 99 others); Fri, 16 Apr 2021 09:30:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235011AbhDPNaJ (ORCPT ); Fri, 16 Apr 2021 09:30:09 -0400 Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0213DC061574 for ; Fri, 16 Apr 2021 06:29:43 -0700 (PDT) Received: by mail-lf1-x129.google.com with SMTP id 12so44619626lfq.13 for ; Fri, 16 Apr 2021 06:29:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Icv6wfageHWd4iMnekPzpHhUck/9+vUG68GbtOfui5s=; b=ThgZH6a8eO9NVFoyBylzD0/Pjua1eA4owl4IRUiE5DZnVNT6DFkE2sT7hOA9Nrr3nE 1OK/LlW51Z9M9IEXksmkYrBFeKZJTfDckYpPNA/jmDrW/4srhA2stXe3yLanRVa1iO+U JPGdGELaxlqU/1bgLjd4gc36YHTgPD6hqwlpWIOdRAQoOV5KwSQCXLCA0HBJ9I9qe6EH lKH0e0uI7UKnzIEHmebYdSktaUzUMXRh2qh+uNKBFBA06VYs61tKVafzElPh9NP04tBs a9hzArx/7jrD84vyjrdiBKS0nHFMDrCPwwHtkSj+dBO3011CQXaF0jpg0hchz/BN2cyJ 51Hg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Icv6wfageHWd4iMnekPzpHhUck/9+vUG68GbtOfui5s=; b=NykqRUqcL5BJMfhC1utP/Zn/sxtlLV937VKSie6yIFzkyZLlkob+EE0Kf3ZN5b+sEZ 9b/JtB1QNMX8V1SFbPWYnMWnu4LFYJLdn62dYN79Jq3BkK932EqruL5V7qQrUCmtDNgp 88QSzIUjLvJDzjzl87QHr2qYFkA8aWKVJKcDRvsKzqK1vczH1Zseui3mg7mnADuoBTcN VraHKIykKrRB5diRwvi7eB4ZOuXwIg2apb6npfGrHazbmNGJQwTI4qgXDHGPF7bfjZmr Ft4ypoYEoK46sCenW0t7H6FuXx5Y56SioMiKAGmN2fwuGiNU8UrPeX9tvmZnIPAIAiEI BfBg== X-Gm-Message-State: AOAM530En8Ymyr09DLwFmowOncluitLlPpuLB6C5h8pNlZPIIPYNVx2/ bB07t1qQWTW9JPT5r5SlwdqNGb09biwdyfx+SYNtJA== X-Received: by 2002:a05:6512:54a:: with SMTP id h10mr3354930lfl.305.1618579781436; Fri, 16 Apr 2021 06:29:41 -0700 (PDT) MIME-Version: 1.0 References: <20210415175846.494385-1-valentin.schneider@arm.com> <20210415175846.494385-2-valentin.schneider@arm.com> In-Reply-To: <20210415175846.494385-2-valentin.schneider@arm.com> From: Vincent Guittot Date: Fri, 16 Apr 2021 15:29:30 +0200 Message-ID: Subject: Re: [PATCH 1/2] sched/fair: Filter out locally-unsolvable misfit imbalances To: Valentin Schneider Cc: linux-kernel , Peter Zijlstra , Ingo Molnar , Dietmar Eggemann , Morten Rasmussen , Qais Yousef , Quentin Perret , Pavan Kondeti , Rik van Riel , Lingutla Chandrasekhar Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 15 Apr 2021 at 19:58, Valentin Schneider wrote: > > Consider the following (hypothetical) asymmetric CPU capacity topology, > with some amount of capacity pressure (RT | DL | IRQ | thermal): > > DIE [ ] > MC [ ][ ] > 0 1 2 3 > > | CPU | capacity_orig | capacity | > |-----+---------------+----------| > | 0 | 870 | 860 | > | 1 | 870 | 600 | > | 2 | 1024 | 850 | > | 3 | 1024 | 860 | > > If CPU1 has a misfit task, then CPU0, CPU2 and CPU3 are valid candidates to > grant the task an uplift in CPU capacity. Consider CPU0 and CPU3 as > sufficiently busy, i.e. don't have enough spare capacity to accommodate > CPU1's misfit task. This would then fall on CPU2 to pull the task. > > This currently won't happen, because CPU2 will fail > > capacity_greater(capacity_of(CPU2), sg->sgc->max_capacity) > > in update_sd_pick_busiest(), where 'sg' is the [0, 1] group at DIE > level. In this case, the max_capacity is that of CPU0's, which is at this > point in time greater than that of CPU2's. This comparison doesn't make > much sense, given that the only CPUs we should care about in this scenario > are CPU1 (the CPU with the misfit task) and CPU2 (the load-balance > destination CPU). > > Aggregate a misfit task's load into sgs->group_misfit_task_load only if > env->dst_cpu would grant it a capacity uplift. > > Note that the aforementioned capacity vs sgc->max_capacity comparison was > meant to prevent misfit task downmigration: candidate groups classified as > group_misfit_task but with a higher (max) CPU capacity than the destination CPU > would be discarded. This change makes it so said group_misfit_task > classification can't happen anymore, which may cause some undesired > downmigrations. > > Further tweak find_busiest_queue() to ensure this doesn't happen. Also note > find_busiest_queue() can now iterate over CPUs with a higher capacity than > the local CPU's, so add a capacity check there. > > Signed-off-by: Valentin Schneider > --- > kernel/sched/fair.c | 63 ++++++++++++++++++++++++++++++++------------- > 1 file changed, 45 insertions(+), 18 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 9b8ae02f1994..d2d1a69d7aa7 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5759,6 +5759,12 @@ static unsigned long capacity_of(int cpu) > return cpu_rq(cpu)->cpu_capacity; > } > > +/* Is CPU a's capacity noticeably greater than CPU b's? */ > +static inline bool cpu_capacity_greater(int a, int b) > +{ > + return capacity_greater(capacity_of(a), capacity_of(b)); > +} > + > static void record_wakee(struct task_struct *p) > { > /* > @@ -7486,6 +7492,7 @@ struct lb_env { > > enum fbq_type fbq_type; > enum migration_type migration_type; > + enum group_type src_grp_type; > struct list_head tasks; > }; > > @@ -8447,6 +8454,32 @@ static bool update_nohz_stats(struct rq *rq) > #endif > } > > +static inline void update_sg_lb_misfit_stats(struct lb_env *env, > + struct sched_group *group, > + struct sg_lb_stats *sgs, > + int *sg_status, > + int cpu) > +{ > + struct rq *rq = cpu_rq(cpu); > + > + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY) || > + !rq->misfit_task_load) > + return; > + > + *sg_status |= SG_OVERLOAD; > + > + /* > + * Don't attempt to maximize load for misfit tasks that can't be > + * granted a CPU capacity uplift. > + */ > + if (cpu_capacity_greater(env->dst_cpu, cpu)) { > + sgs->group_misfit_task_load = max( > + sgs->group_misfit_task_load, > + rq->misfit_task_load); > + } > + > +} > + > /** > * update_sg_lb_stats - Update sched_group's statistics for load balancing. > * @env: The load balancing environment. > @@ -8498,12 +8531,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, > if (local_group) > continue; > > - /* Check for a misfit task on the cpu */ > - if (env->sd->flags & SD_ASYM_CPUCAPACITY && > - sgs->group_misfit_task_load < rq->misfit_task_load) { > - sgs->group_misfit_task_load = rq->misfit_task_load; > - *sg_status |= SG_OVERLOAD; > - } > + update_sg_lb_misfit_stats(env, group, sgs, sg_status, i); > } > > /* Check if dst CPU is idle and preferred to this group */ > @@ -8550,15 +8578,9 @@ static bool update_sd_pick_busiest(struct lb_env *env, > if (!sgs->sum_h_nr_running) > return false; > > - /* > - * Don't try to pull misfit tasks we can't help. > - * We can use max_capacity here as reduction in capacity on some > - * CPUs in the group should either be possible to resolve > - * internally or be covered by avg_load imbalance (eventually). > - */ > + /* Don't try to pull misfit tasks we can't help */ > if (sgs->group_type == group_misfit_task && > - (!capacity_greater(capacity_of(env->dst_cpu), sg->sgc->max_capacity) || > - sds->local_stat.group_type != group_has_spare)) > + sds->local_stat.group_type != group_has_spare) > return false; > > if (sgs->group_type > busiest->group_type) > @@ -9288,6 +9310,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env) > if (!sds.busiest) > goto out_balanced; > > + env->src_grp_type = busiest->group_type; > + > /* Misfit tasks should be dealt with regardless of the avg load */ > if (busiest->group_type == group_misfit_task) > goto force_balance; > @@ -9441,8 +9465,8 @@ static struct rq *find_busiest_queue(struct lb_env *env, > * average load. > */ > if (env->sd->flags & SD_ASYM_CPUCAPACITY && > - !capacity_greater(capacity_of(env->dst_cpu), capacity) && > - nr_running == 1) > + env->src_grp_type <= group_fully_busy && > + !capacity_greater(capacity_of(env->dst_cpu), capacity)) > continue; > > switch (env->migration_type) { > @@ -9504,15 +9528,18 @@ static struct rq *find_busiest_queue(struct lb_env *env, > case migrate_misfit: > /* > * For ASYM_CPUCAPACITY domains with misfit tasks we > - * simply seek the "biggest" misfit task. > + * simply seek the "biggest" misfit task we can > + * accommodate. > */ > + if (!cpu_capacity_greater(env->dst_cpu, i)) Use the same level of interface as above. This makes code and the condition easier to follow in find_busiest_queue() capacity_greater(capacity_of(env->dst_cpu), capacity_of(i)) > + continue; > + > if (rq->misfit_task_load > busiest_load) { > busiest_load = rq->misfit_task_load; > busiest = rq; > } > > break; > - > } > } > > -- > 2.25.1 >