Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp2007626rwb; Thu, 27 Jul 2023 00:04:45 -0700 (PDT) X-Google-Smtp-Source: APBJJlFfv6a6DGtGoAAHFeL4oDaxTj0w/Uk9eUFmzILcg3JrQz1LpJ06MJilchEf18ac7YTz7Oh5 X-Received: by 2002:aa7:d98d:0:b0:522:275b:3af9 with SMTP id u13-20020aa7d98d000000b00522275b3af9mr1033658eds.35.1690441485548; Thu, 27 Jul 2023 00:04:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690441485; cv=none; d=google.com; s=arc-20160816; b=C6XwUDvf29BaQjvL1JFddxOq+8iQeNVDq24g95thwPpGWbiywupMzkOifNigajq6tp 6f3nzIegidImYPsegw1ivhxjiUGYttwDlMfdAmqrgBNNY1ldA+Qhgotz1lhGmvThlSzM C83o2S/8hKTEUezRPBe87obkMgwZNOdgohMj1U20S6uMr6K5gawzolg3taahz6VFtWgD c/WU03PyK3dUGigCoSdQzeYVJ+pV/iI0+J5SmnQkUvlRWPF8I3tbH0cGBpfcfZTwZQl2 H9lYsJwRMFMtEZeWXuio6JkJKfD9RNhOOwoF0/Q4p7vaQo48tAxZdBtfJywm3T+y0zew W+hg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Fj7eJkmZFExyytuHTdZrkuqdrSZv6G7Wyl+wnUln2H8=; fh=kEHk686/MEbGguZbbjojhxJn6iv9OlXC0a2olYq50WY=; b=te5u0vwwoEHR4uWqQldi0mXQVGu+A8FG2HdjzpdRD5oX+vJntxFsVVG1REgYlDrH1z pVo7ZnK3wVRi4BU6HDnP5KxthrD1iq60n98QmSVVbpBBtVVXlNyS0kHjKZROC/PRsky3 MLdp0N+MNpDVQ7RUNrIjOgk81J1Prrh7fl9qiFIbEGIeMeqcQqBLQ25x24xpvuOsZFDB Vbo5RQyX8nmm+mq2uZfCKvdWxOn5XzxFpY611n2dGAil+EshMBI1q9aUBHmqtBGJF7gN 9EwTj7d9CKCB8NvZ1gmd0ifnjnA1ucmTvFS5K+3rXR/isNl62gz/9ThQnuDIa9QAZCiY Kxow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Jj+p/CZw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n16-20020aa7c790000000b005224de8db7csi491904eds.465.2023.07.27.00.04.21; Thu, 27 Jul 2023 00:04:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Jj+p/CZw"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232842AbjG0Gjx (ORCPT + 99 others); Thu, 27 Jul 2023 02:39:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232769AbjG0Gjk (ORCPT ); Thu, 27 Jul 2023 02:39:40 -0400 Received: from mgamail.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 590D82691 for ; Wed, 26 Jul 2023 23:39:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1690439978; x=1721975978; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QzTClbz4y9JgCaxQftHb02vD+LpQaxd1CzkR8WFoPIc=; b=Jj+p/CZwq4Xakgbg+72efVluWfNCyi2bbd8OtadglPtPtmaAmIo0wzOx 4JBdBVAjEN+EbXASbc26zInFmS6mm7yXrTAtZEsc1cuRnGQ8IBSxpYQ+Y +1NDyhGsr6NjjiarQXnF1urZ1gl96TvfNVfpS/mhU4h/i2kyWTexpISuj u7lyxp2tOPPzh4dKcKzvDLy6ABIwHt/Im+/oTGaY/Eyv/z5jZQInASJt2 pb3cZdxVKhi0mLW8gTg/eHTJb+D5tWEKmMzat5nqhrafRk4JCZ+midZSO gp4yp7W1vukSNcvkguctASKt4uJe201yhWc6YT6SZRJw4V8pC6fPwZOKg w==; X-IronPort-AV: E=McAfee;i="6600,9927,10783"; a="347829789" X-IronPort-AV: E=Sophos;i="6.01,234,1684825200"; d="scan'208";a="347829789" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jul 2023 23:39:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10783"; a="973430346" X-IronPort-AV: E=Sophos;i="6.01,234,1684825200"; d="scan'208";a="973430346" Received: from chenyu-dev.sh.intel.com ([10.239.62.164]) by fmsmga006.fm.intel.com with ESMTP; 26 Jul 2023 23:39:11 -0700 From: Chen Yu To: Peter Zijlstra , Vincent Guittot Cc: Ingo Molnar , Juri Lelli , Tim Chen , Mel Gorman , Dietmar Eggemann , K Prateek Nayak , "Gautham R . Shenoy" , Chen Yu , Aaron Lu , linux-kernel@vger.kernel.org, Chen Yu Subject: [RFC PATCH 4/7] sched/fair: Calculate the scan depth for idle balance based on system utilization Date: Thu, 27 Jul 2023 22:35:02 +0800 Message-Id: <61e6fce60ca738215b6e5ad9033fb692c3a8fbb1.1690273854.git.yu.c.chen@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When the CPU is about to enter idle, it invokes newidle_balance() to pull some tasks from other runqueues. Although there is per domain max_newidle_lb_cost to throttle the newidle_balance(), it would be good to further limit the scan based on overall system utilization. The reason is that there is no limitation for newidle_balance() to launch this balance simultaneously on multiple CPUs. Since each newidle_balance() has to traverse all the groups to calculate the statistics one by one, this total time cost on newidle_balance() could be O(n^2). n is the number of groups. This issue is more severe if there are many groups within 1 domain, for example, a system with a large number of Cores in a LLC domain. This is not good for performance or power saving. sqlite has spent quite some time on newidle balance() on Intel Sapphire Rapids, which has 2 x 56C/112T = 224 CPUs: 6.69% 0.09% sqlite3 [kernel.kallsyms] [k] newidle_balance 5.39% 4.71% sqlite3 [kernel.kallsyms] [k] update_sd_lb_stats Based on this observation, limit the scan depth of newidle_balance() by considering the utilization of the sched domain. Let the number of scanned groups be a linear function of the utilization ratio: nr_groups_to_scan = nr_groups * (1 - util_ratio) Suggested-by: Tim Chen Signed-off-by: Chen Yu --- include/linux/sched/topology.h | 1 + kernel/sched/fair.c | 30 ++++++++++++++++++++++++++++++ kernel/sched/features.h | 1 + 3 files changed, 32 insertions(+) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index d6a64a2c92aa..af2261308529 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -84,6 +84,7 @@ struct sched_domain_shared { int nr_idle_scan; unsigned long total_load; unsigned long total_capacity; + int nr_sg_scan; }; struct sched_domain { diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index edcfee9965cd..6925813db59b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10153,6 +10153,35 @@ static void ilb_save_stats(struct lb_env *env, WRITE_ONCE(sd_share->total_capacity, sds->total_capacity); } +static void update_ilb_group_scan(struct lb_env *env, + unsigned long sum_util, + struct sched_domain_shared *sd_share) +{ + u64 tmp, nr_scan; + + if (!sched_feat(ILB_UTIL)) + return; + + if (!sd_share) + return; + + if (env->idle == CPU_NEWLY_IDLE) + return; + + /* + * Limit the newidle balance scan depth based on overall system + * utilization: + * nr_groups_scan = nr_groups * (1 - util_ratio) + * and util_ratio = sum_util / (sd_weight * SCHED_CAPACITY_SCALE) + */ + nr_scan = env->sd->nr_groups * sum_util; + tmp = env->sd->span_weight * SCHED_CAPACITY_SCALE; + do_div(nr_scan, tmp); + nr_scan = env->sd->nr_groups - nr_scan; + if ((int)nr_scan != sd_share->nr_sg_scan) + WRITE_ONCE(sd_share->nr_sg_scan, (int)nr_scan); +} + /** * update_sd_lb_stats - Update sched_domain's statistics for load balancing. * @env: The load balancing environment. @@ -10231,6 +10260,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } update_idle_cpu_scan(env, sum_util); + update_ilb_group_scan(env, sum_util, sd_share); /* save a snapshot of stats during periodic load balance */ ilb_save_stats(env, sd_share, sds); diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 3cb71c8cddc0..30f6d1a2f235 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -103,3 +103,4 @@ SCHED_FEAT(ALT_PERIOD, true) SCHED_FEAT(BASE_SLICE, true) SCHED_FEAT(ILB_SNAPSHOT, true) +SCHED_FEAT(ILB_UTIL, true) -- 2.25.1