Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp310314imw; Fri, 8 Jul 2022 03:30:06 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uxGBelmgkFsRd+EQqt0TiznUJxsTqUk+9PrFz2Qaw1lUQb+gxUHXVlMVLgHmOQLvHXbnHv X-Received: by 2002:a17:90a:b398:b0:1ef:7e67:6 with SMTP id e24-20020a17090ab39800b001ef7e670006mr10744438pjr.123.1657276206045; Fri, 08 Jul 2022 03:30:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657276206; cv=none; d=google.com; s=arc-20160816; b=pAF9KvXSp9GwkccUrV3xTMtVbndHRWIHN1TwB0DrDB9rZe3UxJEVEjsDdId4ruOxMf uGQ2KZ6O7S1rdft0UBYC7e/+cCwzN7g+N0ZIg1utXc8FaVQgBoszW7m4tS4i0eTRKthg o4Ddk62Gpt73YuYOfswN/pTp5fvAkHcl+/NqTqjmekbYIcnz7s9kAdaI/SqyUjK5RLy5 gFJYgKl82QfxLqgCoUT5iwfNo7XkKLHcDZPvBpJm0FZPQ/e5V6s0rxr9OYXqvpRfjNsB gfVbShjQvXw6ISqMknKYXMzGF3UrNeRofm2ebkb+5Ajeggkxrq5q2W5DzUaTvkRmRMvx oAhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=GWnLD0P7DtwvuakkLB+gb+tc9Dkuav58vmceZ3UAWhY=; b=SKoxqwRo/UyZ5xJMgZsiMhJDPwaB9DfdLnKLLKulB1/syjCdA5RA4Si5HbwcKEZSmA lzudGBXkk/kb7XfWEaG+FgkOi/lCUG0AvI3FRBe8uNpiMB8MX2StvrRO9ZJhHqtif4OL 7ttMJzx3P6FByN03Ar2KoGUBfzNribnyP2TVJ6wVowHX9bMB5gscS+sYrtceAVMkEWHJ p6WkgfUrXf9rtFYMb9O3K1xfY+f3kHOPraQmRqfKI2WmDYlosB5ne7ewkjWcD+OMZA6O gs5uWrcyGZNdZPKn0IGbDh/EukAgonIWWALKJCNPorkyJ9B+jDoFh8wUOK/yKOk/ck0Y LwcQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hk17-20020a17090b225100b001ecc27c8cbesi2523954pjb.168.2022.07.08.03.29.51; Fri, 08 Jul 2022 03:30:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237157AbiGHKLK (ORCPT + 99 others); Fri, 8 Jul 2022 06:11:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237495AbiGHKLI (ORCPT ); Fri, 8 Jul 2022 06:11:08 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0C49684EC5 for ; Fri, 8 Jul 2022 03:11:06 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E521A1063; Fri, 8 Jul 2022 03:11:06 -0700 (PDT) Received: from [192.168.178.6] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D303B3F66F; Fri, 8 Jul 2022 03:11:04 -0700 (PDT) Message-ID: <063c4695-a9d4-f77a-55a7-7c554b765c7e@arm.com> Date: Fri, 8 Jul 2022 12:10:40 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH] sched/fair: fix case with reduced capacity CPU Content-Language: en-US To: Vincent Guittot Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, linux-kernel@vger.kernel.org, david.chen@nutanix.com, zhangqiao22@huawei.com References: <20220702045254.22922-1-vincent.guittot@linaro.org> <88fab4b6-8e5c-3a4e-e32b-a0867d51398b@arm.com> From: Dietmar Eggemann In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/07/2022 09:17, Vincent Guittot wrote: > On Thu, 7 Jul 2022 at 18:43, Dietmar Eggemann wrote: >> >> On 02/07/2022 06:52, Vincent Guittot wrote: [...] >>> The rework of the load balance has filterd the case when the CPU is s/filterd/filtered >>> classified to be fully busy but its capacity is reduced. >>> >>> Check if CPU's capacity is reduced while gathering load balance statistics >>> and classify it group_misfit_task instead of group_fully_busy so we can enum group_type { ... /* * SD_ASYM_CPUCAPACITY only: One task doesn't fit with CPU's capacity * and must be migrated to a more powerful CPU. */ group_misfit_task ... This `SD_ASYM_CPUCAPACITY only:` should be removed now. [...] >>> @@ -8798,6 +8798,19 @@ sched_asym(struct lb_env *env, struct sd_lb_stats *sds, struct sg_lb_stats *sgs >>> return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu); >>> } >>> >>> +static inline bool >>> +sched_reduced_capacity(struct rq *rq, struct sched_domain *sd) minor: Why not `static inline int check_reduced_capacity()` ? All similar functions like check_cpu_capacity(), check_cpu_capacity() follow this approach. [...] >>> @@ -8851,11 +8865,17 @@ static inline void update_sg_lb_stats(struct lb_env *env, >>> if (local_group) >>> continue; >>> >>> - /* Check for a misfit task on the cpu */ >>> - if (env->sd->flags & SD_ASYM_CPUCAPACITY && >>> - sgs->group_misfit_task_load < rq->misfit_task_load) { >>> - sgs->group_misfit_task_load = rq->misfit_task_load; >>> - *sg_status |= SG_OVERLOAD; >>> + if (env->sd->flags & SD_ASYM_CPUCAPACITY) { >>> + /* Check for a misfit task on the cpu */ >>> + if (sgs->group_misfit_task_load < rq->misfit_task_load) { >>> + sgs->group_misfit_task_load = rq->misfit_task_load; >>> + *sg_status |= SG_OVERLOAD; >>> + } >>> + } else if ((env->idle != CPU_NOT_IDLE) && >>> + sched_reduced_capacity(rq, env->sd) && >>> + (sgs->group_misfit_task_load < load)) { >>> + /* Check for a task running on a CPU with reduced capacity */ >>> + sgs->group_misfit_task_load = load; >>> } Minor: This now has if(A) if(B) else if(C && B') little bit harder to read. [...] >> I'm wondering why you've chosen that hybrid approach `group_misfit_task >> -> migrate_load` and not `group_misfit_task -> migrate_misfit`. > > because, it means enabling the tracking of misfit task on rq at each > task enqueue/dequeue/tick ... Then mistfit for heterogeneous platform > checks max_cpu_capacity what we don't care and will trigger unwanted > misfit migration for smp Agreed, rq->misfit_task_load can't be used here. >> It looks like this `rq->cfs.h_nr_running = 1` case almost (since we >> check `busiest->nr_running > 1`) always ends up in the load_balance() >> `if (!ld_moved)` condition and need_active_balance() can return 1 in >> case `if ((env->idle != CPU_NOT_IDLE) && ...` condition. This leads to >> active load_balance and this >> >> IMHO, the same you can achieve when you would stay with >> `group_misfit_task -> migrate_misfit`. >> >> I think cpu_load(rq) can be used instead of `rq->misfit_task_load` in >> the migrate_misfit case of find_busiest_queue() too. > > I don't think because you can have a higher cpu_load() but not being misfit You're right, I forgot about this. Essentially we would need extra state (e.g. in lb_env) to save which CPU in the busiest group has the misfit.