Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp986949pxb; Thu, 15 Apr 2021 11:01:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwUfCSIo/A2tOBki5Ai7D8941CJt2vTe2A64djmackykYDzPC66PpsVHzgL1WF6qb8meKyf X-Received: by 2002:a62:7cd0:0:b029:253:acca:2ce0 with SMTP id x199-20020a627cd00000b0290253acca2ce0mr4393652pfc.30.1618509689789; Thu, 15 Apr 2021 11:01:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618509689; cv=none; d=google.com; s=arc-20160816; b=0FEnaEmlVdgFKn2EaKgyL9eWKZbY2EuH74kplBWrRuoLsc+vR+XO4vzqy9RO9dBz6q 4NMmtgHl8WgOFArwlhkPc67kFr/AKnLXoJOiTsYoJF/xkAmdiTwYs1P37sw2dPNrUxGJ ELhfjsZMFgiNwmf5pZOI8SZz6bqoLqzKjqYSDqKpFjM/AHUaH3kIppnpO88nhCw0u1Ca 7TJGE0kWIkgFt8EDRXOTPxuysAYzld0NslYkvDA28NneufkEqMwRrPbnTKxfE435+OJF p+96xZiw45TYtzLylMPK08LqZP2OoFwPbovaXjFCtSbYNzJDgXEPNWeXWU6RLuv3SdjY LOmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=qo2QWBQH2K2gYdpeUvVeOclqVPp16dCtvy18/g/tGQ4=; b=uHWJwiJGBBRefVJIXpFIIyHGMufb2+1TvJOYuKCsY9VA4l2mFXtwJ3RbuYwTIxOA4U Tr5pQOGYWQqYpXPN2FsFIARCPV1T6yIhQ5SpXvwDll+AmHMb24dxxsQXeDrc2V9qI667 9fyNS6DQhZ+1ogllykD4udLSUmaH2QWdnM9PwenWFtWh/W6t914oLBFgRGO2wXIg55KB GyHSZPzvVArrVvV9aD1z09GQjPPQv0RZGVmXBxsrOQIXT2j6kG3ovTg/nCjgTUnlefzg TsR6CHDxaJc2cAi11bNtmb42GQo1zq51Sfiq/wMsPD/PyQM+9G1REVDrQFmrn5+iX9Xu 4IwA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x6si3250697pfm.286.2021.04.15.11.01.16; Thu, 15 Apr 2021 11:01:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234452AbhDOR70 (ORCPT + 99 others); Thu, 15 Apr 2021 13:59:26 -0400 Received: from foss.arm.com ([217.140.110.172]:51824 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234419AbhDOR7Y (ORCPT ); Thu, 15 Apr 2021 13:59:24 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 24D7212FC; Thu, 15 Apr 2021 10:59:01 -0700 (PDT) Received: from e113632-lin.cambridge.arm.com (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 92D983FA45; Thu, 15 Apr 2021 10:58:59 -0700 (PDT) From: Valentin Schneider To: linux-kernel@vger.kernel.org Cc: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Morten Rasmussen , Qais Yousef , Quentin Perret , Pavan Kondeti , Rik van Riel , Lingutla Chandrasekhar Subject: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks Date: Thu, 15 Apr 2021 18:58:46 +0100 Message-Id: <20210415175846.494385-3-valentin.schneider@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210415175846.494385-1-valentin.schneider@arm.com> References: <20210415175846.494385-1-valentin.schneider@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Consider the following topology: DIE [ ] MC [ ][ ] 0 1 2 3 capacity_orig_of(x \in {0-1}) < capacity_orig_of(x \in {2-3}) w/ CPUs 2-3 idle and CPUs 0-1 running CPU hogs (util_avg=1024). When CPU2 goes through load_balance() (via periodic / NOHZ balance), it should pull one CPU hog from either CPU0 or CPU1 (this is misfit task upmigration). However, should a e.g. pcpu kworker awake on CPU0 just before this load_balance() happens and preempt the CPU hog running there, we would have, for the [0-1] group at CPU2's DIE level: o sgs->sum_nr_running > sgs->group_weight o sgs->group_capacity * 100 < sgs->group_util * imbalance_pct IOW, this group is group_overloaded. Considering CPU0 is picked by find_busiest_queue(), we would then visit the preempted CPU hog in detach_tasks(). However, given it has just been preempted by this pcpu kworker, task_hot() will prevent it from being detached. We then leave load_balance() without having done anything. Long story short, preempted misfit tasks are affected by task_hot(), while currently running misfit tasks are intentionally preempted by the stopper task to migrate them over to a higher-capacity CPU. Align detach_tasks() with the active-balance logic and let it pick a cache-hot misfit task when the destination CPU can provide a capacity uplift. Signed-off-by: Valentin Schneider --- kernel/sched/fair.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d2d1a69d7aa7..43fc98d34276 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7493,6 +7493,7 @@ struct lb_env { enum fbq_type fbq_type; enum migration_type migration_type; enum group_type src_grp_type; + enum group_type dst_grp_type; struct list_head tasks; }; @@ -7533,6 +7534,31 @@ static int task_hot(struct task_struct *p, struct lb_env *env) return delta < (s64)sysctl_sched_migration_cost; } + +/* + * What does migrating this task do to our capacity-aware scheduling criterion? + * + * Returns 1, if the task needs more capacity than the dst CPU can provide. + * Returns 0, if the task needs the extra capacity provided by the dst CPU + * Returns -1, if the task isn't impacted by the migration wrt capacity. + */ +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env *env) +{ + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) + return -1; + + if (!task_fits_capacity(p, capacity_of(env->src_cpu))) { + if (cpu_capacity_greater(env->dst_cpu, env->src_cpu)) + return 0; + else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu)) + return 1; + else + return -1; + } + + return task_fits_capacity(p, capacity_of(env->dst_cpu)) ? -1 : 1; +} + #ifdef CONFIG_NUMA_BALANCING /* * Returns 1, if task migration degrades locality @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) if (tsk_cache_hot == -1) tsk_cache_hot = task_hot(p, env); + /* + * On a (sane) asymmetric CPU capacity system, the increase in compute + * capacity should offset any potential performance hit caused by a + * migration. + */ + if ((env->dst_grp_type == group_has_spare) && + !migrate_degrades_capacity(p, env)) + tsk_cache_hot = 0; + if (tsk_cache_hot <= 0 || env->sd->nr_balance_failed > env->sd->cache_nice_tries) { if (tsk_cache_hot == 1) { @@ -9310,6 +9345,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) if (!sds.busiest) goto out_balanced; + env->dst_grp_type = local->group_type; env->src_grp_type = busiest->group_type; /* Misfit tasks should be dealt with regardless of the avg load */ -- 2.25.1