Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp4058512rdg; Wed, 18 Oct 2023 13:46:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH0VpJdSWKAfEQS5GeeXYU7CgYrOQG+m1tyv4dC8AP1oAiV2BbeWO8x5j/jKNpuh+JrOR6F X-Received: by 2002:a17:902:e88f:b0:1bc:2abb:4e98 with SMTP id w15-20020a170902e88f00b001bc2abb4e98mr568080plg.21.1697661990412; Wed, 18 Oct 2023 13:46:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697661990; cv=none; d=google.com; s=arc-20160816; b=AkL4VLZKp7mZmzawzGtnPASsCPOXt+f9muAnAFPS8zmpsn+lHVaiseTWHM3xPRVoz+ ABG+vJbHyqs/1s9QfEiN0dufTWU8A+hY8mqxpzVVOJtlpwA3g0uqxqsfYU6SHENTdaAm OxecEOtT2BKpDuPsC+XtQFxQdGmXIGEaSfTQpg2dpyqQ8nDZfwTYW8VjJYrXHDeUIGJt 20eXLtJVNSbSIpJqqRHmQ/Sqe5UfRsr48qMISNFfNFb2org27Mb4AwryUTzVVchLg2SI eDehOKLnKmueLdU2/bMCCAQ5yiwBhZulgoFwg1o++iKaRX4BrilxNf3twj+5aipxss0K 3Rjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=STeBHKbDomCw5R3hMzpTGHytisczgOLG+k8eKAHAbbk=; fh=MlL9xuK1bL5YhHTfhEwdGxMCfB66v6TLl/GiGg/sHRw=; b=O3BY6fBLh2KSU7uk3ildKjSVgNaX0mpTVfEDJxGzN2ir8XkEj8b/UB+sNBEdXNKw1F 8u9Wu6gJiHbVRjZ+BmSkUe19z/JEevkak+rb53xC0kPR2UnGr3n3ZYJ1erJHbuyAy+D+ Bxoz3UiODxsG18D2IG1wdOl+IxlQrnHGpHel/0Ip3/UTYbjGWF6tX+lgYG5iwi4aHJY9 c2J/gTz0L0+b0khbEf+YKkC+v89ucw7pBEzRI0LvPqLplNLlin0O9gtfuPRi74LEI10W OJo6Ik8+yLR+P7IXsHpbkhJTYSVDIHEkC8pNJZ4nYlFfgzHe1tdw/LKZpxiOkM4DwDeg QiGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=rLzhAqnI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id b21-20020a170902d89500b001bbc61fedafsi599093plz.422.2023.10.18.13.46.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 13:46:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=rLzhAqnI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id BBC7E80558AF; Wed, 18 Oct 2023 13:45:44 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231578AbjJRUpO (ORCPT + 99 others); Wed, 18 Oct 2023 16:45:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229605AbjJRUpK (ORCPT ); Wed, 18 Oct 2023 16:45:10 -0400 Received: from smtpout.efficios.com (unknown [IPv6:2607:5300:203:b2ee::31e5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A18E4F7 for ; Wed, 18 Oct 2023 13:45:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1697661905; bh=wmHB0GzjgzeTuu/mZy6F8Xk4AczIQaMt14bc8/r6UEA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rLzhAqnIP5c3DFqFA2/l9TZOCNQ9XPfyZA7j/UPBH5d/EOBErpF5k5IOPbGDT3McC pnLjmrRLFxexlFmfu74EiSvdunKkVySuLUNG9Qwrtbd8D8tTY5XTcOCTRp4h4J3USW Ek4Av8dU8ixX9OdJCQ4OKjdwPhXZH3ncA2amAyzhUbSydsnM/wXTiN1W5vz3c6AVbY CmIsWmTbowKjAmfTqXPk4CQy8NpNpYwLiQ0OWdIzxMVkVjoUMIYeEc9sf88tCYt5JM UKsi3XR65jesZqgPL6E8ZKCunRvEOh7kRR4G1l1s3jHcIbOZiLrPSBsXqp8MgTbe7L x6jM99HYFECpg== Received: from thinkos.internal.efficios.com (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4S9jXK38gCz1Y98; Wed, 18 Oct 2023 16:45:05 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Ingo Molnar , Valentin Schneider , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Vincent Guittot , Juri Lelli , Swapnil Sapkal , Aaron Lu , Chen Yu , Tim Chen , K Prateek Nayak , "Gautham R . Shenoy" , x86@kernel.org Subject: [RFC PATCH 1/2] sched/fair: Introduce UTIL_FITS_CAPACITY feature Date: Wed, 18 Oct 2023 16:45:10 -0400 Message-Id: <20231018204511.1563390-2-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20231018204511.1563390-1-mathieu.desnoyers@efficios.com> References: <20231018204511.1563390-1-mathieu.desnoyers@efficios.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 18 Oct 2023 13:45:44 -0700 (PDT) Introduce the UTIL_FITS_CAPACITY scheduler feature. The runqueue selection picks the previous, target, or recent runqueues if they have enough remaining capacity to enqueue the task before scanning for an idle cpu. This feature is introduced in preparation for the SELECT_BIAS_PREV scheduler feature. Its performance benefits are noticeable when combined with the SELECT_BIAS_PREV feature. The following benchmarks only cover the UTIL_FITS_CAPACITY feature. Those are performed on a v6.5.5 kernel with mitigations=off. The following hackbench workload on a 192 cores AMD EPYC 9654 96-Core Processor (over 2 sockets) keeps relatively the same wall time (49s). hackbench -g 32 -f 20 --threads --pipe -l 480000 -s 100 We can observe that the number of migrations is reduced significantly with this patch (improvement): Baseline: 117M cpu-migrations (9.355 K/sec) With patch: 67M cpu-migrations (5.470 K/sec) The task-clock utilization is reduced (degradation): Baseline: 253.275 CPUs utilized With patch: 223.130 CPUs utilized The number of context-switches is increased (degradation): Baseline: 445M context-switches (35.516 K/sec) With patch: 581M context-switches (47.548 K/sec) So the improvement due to reduction of migrations is countered by the degradation in CPU utilization and context-switches. The following SELECT_BIAS_PREV feature will address this. Link: https://lore.kernel.org/r/09e0f469-a3f7-62ef-75a1-e64cec2dcfc5@amd.com Link: https://lore.kernel.org/lkml/20230725193048.124796-1-mathieu.desnoyers@efficios.com/ Link: https://lore.kernel.org/lkml/20230810140635.75296-1-mathieu.desnoyers@efficios.com/ Link: https://lore.kernel.org/lkml/20230810140635.75296-1-mathieu.desnoyers@efficios.com/ Link: https://lore.kernel.org/lkml/f6dc1652-bc39-0b12-4b6b-29a2f9cd8484@amd.com/ Link: https://lore.kernel.org/lkml/20230822113133.643238-1-mathieu.desnoyers@efficios.com/ Link: https://lore.kernel.org/lkml/20230823060832.454842-1-aaron.lu@intel.com/ Link: https://lore.kernel.org/lkml/20230905171105.1005672-1-mathieu.desnoyers@efficios.com/ Link: https://lore.kernel.org/lkml/cover.1695704179.git.yu.c.chen@intel.com/ Link: https://lore.kernel.org/lkml/20230929183350.239721-1-mathieu.desnoyers@efficios.com/ Link: https://lore.kernel.org/lkml/20231012203626.1298944-1-mathieu.desnoyers@efficios.com/ Link: https://lore.kernel.org/lkml/20231017221204.1535774-1-mathieu.desnoyers@efficios.com/ Signed-off-by: Mathieu Desnoyers Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Vincent Guittot Cc: Juri Lelli Cc: Swapnil Sapkal Cc: Aaron Lu Cc: Chen Yu Cc: Tim Chen Cc: K Prateek Nayak Cc: Gautham R . Shenoy Cc: x86@kernel.org --- kernel/sched/fair.c | 49 ++++++++++++++++++++++++++++++++++++----- kernel/sched/features.h | 6 +++++ kernel/sched/sched.h | 5 +++++ 3 files changed, 54 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1d9c2482c5a3..8058058afb11 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4497,6 +4497,37 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, trace_sched_util_est_se_tp(&p->se); } +/* + * Returns true if adding the task utilization to the estimated + * utilization of the runnable tasks on @cpu does not exceed the + * capacity of @cpu. + * + * This considers only the utilization of _runnable_ tasks on the @cpu + * runqueue, excluding blocked and sleeping tasks. This is achieved by + * using the runqueue util_est.enqueued, and by estimating the capacity + * of @cpu based on arch_scale_cpu_capacity and arch_scale_thermal_pressure + * rather than capacity_of() because capacity_of() considers + * blocked/sleeping tasks in other scheduler classes. + * + * The utilization vs capacity comparison is done without the margin + * provided by fits_capacity(), because fits_capacity() is used to + * validate whether the utilization of a task fits within the overall + * capacity of a cpu, whereas this function validates whether the task + * utilization fits within the _remaining_ capacity of the cpu, which is + * more precise. + */ +static inline bool task_fits_remaining_cpu_capacity(unsigned long task_util, + int cpu) +{ + unsigned long total_util, capacity; + + if (!sched_util_fits_capacity_active()) + return false; + total_util = READ_ONCE(cpu_rq(cpu)->cfs.avg.util_est.enqueued) + task_util; + capacity = arch_scale_cpu_capacity(cpu) - arch_scale_thermal_pressure(cpu); + return total_util <= capacity; +} + static inline int util_fits_cpu(unsigned long util, unsigned long uclamp_min, unsigned long uclamp_max, @@ -7124,12 +7155,15 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) int i, recent_used_cpu; /* - * On asymmetric system, update task utilization because we will check - * that the task fits with cpu's capacity. + * With the UTIL_FITS_CAPACITY feature and on asymmetric system, + * update task utilization because we will check that the task + * fits with cpu's capacity. */ - if (sched_asym_cpucap_active()) { + if (sched_util_fits_capacity_active() || sched_asym_cpucap_active()) { sync_entity_load_avg(&p->se); task_util = task_util_est(p); + } + if (sched_asym_cpucap_active()) { util_min = uclamp_eff_value(p, UCLAMP_MIN); util_max = uclamp_eff_value(p, UCLAMP_MAX); } @@ -7139,7 +7173,8 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) */ lockdep_assert_irqs_disabled(); - if ((available_idle_cpu(target) || sched_idle_cpu(target)) && + if ((available_idle_cpu(target) || sched_idle_cpu(target) || + task_fits_remaining_cpu_capacity(task_util, target)) && asym_fits_cpu(task_util, util_min, util_max, target)) return target; @@ -7147,7 +7182,8 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) * If the previous CPU is cache affine and idle, don't be stupid: */ if (prev != target && cpus_share_cache(prev, target) && - (available_idle_cpu(prev) || sched_idle_cpu(prev)) && + (available_idle_cpu(prev) || sched_idle_cpu(prev) || + task_fits_remaining_cpu_capacity(task_util, prev)) && asym_fits_cpu(task_util, util_min, util_max, prev)) return prev; @@ -7173,7 +7209,8 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) if (recent_used_cpu != prev && recent_used_cpu != target && cpus_share_cache(recent_used_cpu, target) && - (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) && + (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu) || + task_fits_remaining_cpu_capacity(task_util, recent_used_cpu)) && cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) && asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) { return recent_used_cpu; diff --git a/kernel/sched/features.h b/kernel/sched/features.h index ee7f23c76bd3..9a84a1401123 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -97,6 +97,12 @@ SCHED_FEAT(WA_BIAS, true) SCHED_FEAT(UTIL_EST, true) SCHED_FEAT(UTIL_EST_FASTUP, true) +/* + * Select the previous, target, or recent runqueue if they have enough + * remaining capacity to enqueue the task. Requires UTIL_EST. + */ +SCHED_FEAT(UTIL_FITS_CAPACITY, true) + SCHED_FEAT(LATENCY_WARN, false) SCHED_FEAT(ALT_PERIOD, true) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e93e006a942b..463e75084aed 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2090,6 +2090,11 @@ static const_debug __maybe_unused unsigned int sysctl_sched_features = #endif /* SCHED_DEBUG */ +static __always_inline bool sched_util_fits_capacity_active(void) +{ + return sched_feat(UTIL_EST) && sched_feat(UTIL_FITS_CAPACITY); +} + extern struct static_key_false sched_numa_balancing; extern struct static_key_false sched_schedstats; -- 2.39.2