Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1296202rwd; Thu, 8 Jun 2023 15:52:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5W94mptwrcVVwP6dcX9e5yktfjpB5Fl4GETSoIe5FKp74GxPDEBFNp6C+A8BwIXDSEs0dz X-Received: by 2002:a05:6a00:15c6:b0:654:4a24:d787 with SMTP id o6-20020a056a0015c600b006544a24d787mr13472223pfu.12.1686264772338; Thu, 08 Jun 2023 15:52:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686264772; cv=none; d=google.com; s=arc-20160816; b=cmcMISwxcdJOk89RK1T4/855a+OE+gRyS5gaoURU5c5hHb/KJu6cVtBtD6rTT4yg/j 67IkBVLLL8rAoQhb+v7xntSi9Lj2EX5mCTeD4cCyfCAmqMIyXQuZQD6x/gs+hIaCvHxn tZJejzBtGKGz54d43DkFpAH1piBMavBkRYrP2VEnwmoKLwAbY/FTIHsXig2NuaO7q9wE 1QS/Nnt2J0UGTxPseidptZZuyVbBEoZK8V6SmbkEHEmcSHYeThp4eMj7Ry+U9csBLGu2 bsVKUj7qPJO0KNZ4mjKR6KcVvLjj+6cDpV9hn6285A8tvlI/CHYFQ44LHVKcAL1ffMQ0 xIsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=alVEYL9savfhQyJJ4G+fsrHSXplNXSBu1LVaHRzd+Vw=; b=kPkcFSF43gk39pgqw0u7qXPTzVdKdECiB136YeJ9vUjeKq8tvjMPiL5XWLnRh8o1YG m6lX9pe+4VoW5UmLScpIBq1O/ahR3Ex+XRx5VKFRlFjFKuLZNHsniSga28zDVqZjdU2s ZtbAtM7t82asenFHNmnnkYgeCZJTx34d3tABIZrKdjDNO3fYtwKPBEVXjo4s6ylE4k3s ZrXysxkPG7IC8rwnCj/I48Wq8NIjP1qzChKte29GyP/vpD0SuSS/bQnXUMXgR0Dnb8ml hvnAPafAJWzTQ8Mck0FXDCYn7YW3xfRq8D1m+1bjCNI27DA1AZ4m9/V6ET7A7woM7aS4 ol4Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=e3EB2gwQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b127-20020a62cf85000000b0065b34dd28c4si1397940pfg.330.2023.06.08.15.52.40; Thu, 08 Jun 2023 15:52:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=e3EB2gwQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237194AbjFHWc2 (ORCPT + 99 others); Thu, 8 Jun 2023 18:32:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237142AbjFHWcS (ORCPT ); Thu, 8 Jun 2023 18:32:18 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3AC03C2 for ; Thu, 8 Jun 2023 15:32:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686263537; x=1717799537; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bxF4ml+2B8CLxSOfFJcQMgzgf08hlb1xTj0M9Ps/3+U=; b=e3EB2gwQfBe0ZP648wLyitcM+HJdh5obYl70+wPOlAt0M+xn25/YBE8z Adi7lEj/ooIFUCJseHlVmVgZq8EgGHCeQLksPqLWm6YWn20UmsysFvx4J pt8dveoY3OQ1eK6rCCPwrFs7IPFajVsk4xDi+r/w00ahMMyV/86yKz0++ jFmaIKuQIXsi/oaX5OkLT2Hka0L6JbwPrcKa/rSSL/6pHrfNSzom/g454 wabDbYCicYWkUC96JJqxfVZKnUC+041e4w7m1SOYB5Hfht7gctkjuL8uq +WPE6ND795WEqX+y0x9qmi8yCHgMqA4Tqiycvk07Ac5b+t+BeEl8gfSXH A==; X-IronPort-AV: E=McAfee;i="6600,9927,10735"; a="347094742" X-IronPort-AV: E=Sophos;i="6.00,227,1681196400"; d="scan'208";a="347094742" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jun 2023 15:32:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10735"; a="956906448" X-IronPort-AV: E=Sophos;i="6.00,227,1681196400"; d="scan'208";a="956906448" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by fmsmga006.fm.intel.com with ESMTP; 08 Jun 2023 15:32:15 -0700 From: Tim Chen To: Peter Zijlstra Cc: Tim C Chen , Juri Lelli , Vincent Guittot , Ricardo Neri , "Ravi V . Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J . Wysocki" , Srinivas Pandruvada , Steven Rostedt , Valentin Schneider , Ionela Voinescu , x86@kernel.org, linux-kernel@vger.kernel.org, Shrikanth Hegde , Srikar Dronamraju , naveen.n.rao@linux.vnet.ibm.com, Yicong Yang , Barry Song , Chen Yu , Hillf Danton Subject: [Patch v2 3/6] sched/fair: Implement prefer sibling imbalance calculation between asymmetric groups Date: Thu, 8 Jun 2023 15:32:29 -0700 Message-Id: X-Mailer: git-send-email 2.32.0 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Tim C Chen In the current prefer sibling load balancing code, there is an implicit assumption that the busiest sched group and local sched group are equivalent, hence the tasks to be moved is simply the difference in number of tasks between the two groups (i.e. imbalance) divided by two. However, we may have different number of cores between the cluster groups, say when we take CPU offline or we have hybrid groups. In that case, we should balance between the two groups such that #tasks/#cores ratio is the same between the same between both groups. Hence the imbalance computed will need to reflect this. Additionally when we have asymmetric packing, we will need to bias the balancing according to whether the busiest group is favored or the local group if favored. If the destination group is favored and has idle cores, we should move at least one task from the busiest group to avoid leaving favored idle core unused. But if the busiest group is favored, we should limit the number of tasks we move so we will not create idle cores in busiest group, leaving favored cores unused. Adjust the sibling imbalance computation to take into account of the above considerations. Signed-off-by: Tim Chen --- kernel/sched/fair.c | 65 +++++++++++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 5 ++++ 2 files changed, 66 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 03573362274f..0b0904263d51 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9372,6 +9372,65 @@ static inline bool smt_balance(struct lb_env *env, struct sg_lb_stats *sgs, return false; } +static inline long sibling_imbalance(struct lb_env *env, + struct sd_lb_stats *sds, + struct sg_lb_stats *busiest, + struct sg_lb_stats *local) +{ + int ncores_busiest, ncores_local; + long imbalance; + + if (env->idle == CPU_NOT_IDLE) + return 0; + + ncores_busiest = sds->busiest->cores; + ncores_local = sds->local->cores; + + if (ncores_busiest == ncores_local && + (!(env->sd->flags & SD_ASYM_PACKING) || + sched_asym_equal(env->dst_cpu, + sds->busiest->asym_prefer_cpu))) { + imbalance = busiest->sum_nr_running; + lsub_positive(&imbalance, local->sum_nr_running); + return imbalance; + } + + /* Balance such that nr_running/ncores ratio are same on both groups */ + imbalance = ncores_local * busiest->sum_nr_running; + lsub_positive(&imbalance, ncores_busiest * local->sum_nr_running); + /* Normalize imbalance to become tasks to be moved to restore balance */ + imbalance /= ncores_local + ncores_busiest; + + if (env->sd->flags & SD_ASYM_PACKING) { + int limit; + + if (!busiest->sum_nr_running) + goto out; + + if (sched_asym_prefer(env->dst_cpu, sds->busiest->asym_prefer_cpu)) { + /* Don't leave preferred core idle */ + if (imbalance == 0 && local->sum_nr_running < ncores_local) + imbalance = 1; + goto out; + } + + /* Limit tasks moved from preferred group, don't leave cores idle */ + limit = busiest->sum_nr_running; + lsub_positive(&limit, ncores_busiest); + if (imbalance > limit) + imbalance = limit; + + goto out; + } + + /* Take advantage of resource in an empty sched group */ + if (imbalance == 0 && local->sum_nr_running == 0 && + busiest->sum_nr_running > 1) + imbalance = 1; +out: + return imbalance << 1; +} + static inline bool sched_reduced_capacity(struct rq *rq, struct sched_domain *sd) { @@ -10230,14 +10289,12 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s } if (busiest->group_weight == 1 || sds->prefer_sibling) { - unsigned int nr_diff = busiest->sum_nr_running; /* * When prefer sibling, evenly spread running tasks on * groups. */ env->migration_type = migrate_task; - lsub_positive(&nr_diff, local->sum_nr_running); - env->imbalance = nr_diff; + env->imbalance = sibling_imbalance(env, sds, busiest, local); } else { /* @@ -10424,7 +10481,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) * group's child domain. */ if (sds.prefer_sibling && local->group_type == group_has_spare && - busiest->sum_nr_running > local->sum_nr_running + 1) + sibling_imbalance(env, &sds, busiest, local) > 1) goto force_balance; if (busiest->group_type != group_overloaded) { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 5f7f36e45b87..adffe0894cdb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -804,6 +804,11 @@ static inline bool sched_asym_prefer(int a, int b) return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b); } +static inline bool sched_asym_equal(int a, int b) +{ + return arch_asym_cpu_priority(a) == arch_asym_cpu_priority(b); +} + struct perf_domain { struct em_perf_domain *em_pd; struct perf_domain *next; -- 2.32.0