Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp5236163rwd; Mon, 12 Jun 2023 01:56:22 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6pNahExBfNjN6shEbx+2gQjK3Bpft0xMYLZhxX5NJvmzjKGXSI9ns+ewo6Xbe7oUXLxkYg X-Received: by 2002:aca:1c03:0:b0:39c:706d:ec5e with SMTP id c3-20020aca1c03000000b0039c706dec5emr3079043oic.57.1686560182487; Mon, 12 Jun 2023 01:56:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686560182; cv=none; d=google.com; s=arc-20160816; b=ylePyAzzm9JnBnTsUhWyJeAzd3388M02CfmVQj+OhJ6FAHcDGx6cj0O4C/mDo5lGHA CJpbl33VbhjSU7JL6r+HSY5etEs5zTR20gGP6eNEVSQbrS/D7shSuNqI+6iWjn3Y6paD jm4R4cQIvJh6bt/6JiuL7LnMXT/4+peIoEYNLthxfl8FudFvGuQcYknRmXJsW688hlIy 149S7Li6trzMJcNtL0p1xzsauegCNd0aZGCJkNaAJ2lj6OoZXpCZd8ln98cZIPk2f7MX aB/aJC47t/Su+DFE2RswEOXj0iY3hV3uznPQ1ZKsLNxuzu+5o0WOvqavPM71Fa/jduf4 bExQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=y8WfA6Gxr2SF0Xa3JbtJ76YeGCSVMauQvecWy9gIqdg=; b=dwArqPgXvtkSOHA1GGJ2f0NT53//e0eSKcnTswh3doGbj0QQejeCQcUDzdHL07MB31 ccEgLezS8/YdG4QmcEroql/CrxgIIs+j+tX7Dv+3N8Uy/xUa5HopdvmNwyCforqcTR9d c7GIv6yEHj8mocnqMa737rI03T8qQW7JXglgc54d9re2zDuWv3Q1EfHqEQKnxVwoZrb1 v5LF3fM2W+I8A5c5AKvfjmWvaVdOJphvPcqXrwj9ggahAoMF7rlUKN1sdHpSez+U5Oel yJDjQeryl9ncTwdZv1R6lYL7xkpa+8iLFrn70wBB/We/yIuu9oWJObvjVqac/jJ3/ODo 3C7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TIWGhWwE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 2-20020a17090a018200b00256bc4aee2fsi3144597pjc.179.2023.06.12.01.56.10; Mon, 12 Jun 2023 01:56:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=TIWGhWwE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231899AbjFLIXG (ORCPT + 99 others); Mon, 12 Jun 2023 04:23:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59256 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229874AbjFLIWu (ORCPT ); Mon, 12 Jun 2023 04:22:50 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8053CB0 for ; Mon, 12 Jun 2023 01:22:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686558160; x=1718094160; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=2rgG2rKS+U3BDC47CyywE8OfkQ7iEr7xRn7WAqmb0O0=; b=TIWGhWwEYvO5GtqRPwOD8VXYfVM5AlI+Xo1uaa95IXkkiEnwvzyIfGNE ifdnUxGFXJHU7K9lGuii38+bj7ffzyJntvmTuVFKBoiJfrHU7njkkT+hi Y35KStbEFWG5xN9f347oI8UcAwfr9WXu/fPkAOibCMd7bJTHgyzF6w5mJ b63/+oCgiJt/zbjTzOq2HEOAfw9hx1AZ9IayJnenEqYnQV/hJNNjh4tUo I20DB3DwALeQ+q4EdYOsXcPluwfKmiffY09/9V6dKtMJVTBNaRcWcRhdH TuCDNdojYIHd1oks1yI1q+VhR9sKjewzPqBXB1CqXKsRAjWB6k/V8mwAu g==; X-IronPort-AV: E=McAfee;i="6600,9927,10738"; a="337612945" X-IronPort-AV: E=Sophos;i="6.00,236,1681196400"; d="scan'208";a="337612945" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Jun 2023 01:22:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10738"; a="714315908" X-IronPort-AV: E=Sophos;i="6.00,236,1681196400"; d="scan'208";a="714315908" Received: from chenyu-dev.sh.intel.com ([10.239.62.164]) by fmsmga007.fm.intel.com with ESMTP; 12 Jun 2023 01:22:35 -0700 From: Chen Yu To: Peter Zijlstra , Vincent Guittot , Ingo Molnar , Juri Lelli Cc: Tim Chen , Mel Gorman , Dietmar Eggemann , K Prateek Nayak , Abel Wu , "Gautham R . Shenoy" , Len Brown , Chen Yu , Yicong Yang , linux-kernel@vger.kernel.org, Chen Yu Subject: [RFC PATCH 0/4] Limit the scan depth to find the busiest sched group during newidle balance Date: Tue, 13 Jun 2023 00:17:53 +0800 Message-Id: X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, This is an attempt to reduce the cost of newidle balance which is found to occupy noticeable CPU cycles on some high-core count systems. For example, by running sqlite on Intel Sapphire Rapids, which has 2 x 56C/112T = 224 CPUs: 6.69% 0.09% sqlite3 [kernel.kallsyms] [k] newidle_balance 5.39% 4.71% sqlite3 [kernel.kallsyms] [k] update_sd_lb_stats The main idea comes from the following question raised by Tim: Do we always have to find the busiest group and pull from it? Would a relatively busy group be enough? The proposal ILB_UTIL mainly adjusts the newidle balance scan depth within the current sched domain, based on the system utilization in this domain. The more spare time there is in the domain, the more time each newidle balance can spend on scanning for a busy group. Although the newidle balance has per domain max_newidle_lb_cost to decide whether to launch the balance or not, the ILB_UTIL provides a smaller granularity to decide how many groups each newidle balance can scan. patch 1/4 is code cleanup. patch 2/4 is to introduce a new variable in sched domain to indicate the number of groups, and will be used by patch 3 and patch 4. patch 3/4 is to calculate the scan depth in each periodic load balance. patch 4/4 is to limit the scan depth based on the result of patch 3, and the depth will be used by newidle_balance()-> find_busiest_group() -> update_sd_lb_stats() According to the test result, netperf/tbench shows some improvements when the system is underloaded, while no noticeable difference from hackbench/schbench. While I'm trying to run more benchmarks including some macro-benchmarks, I send this draft patch out and seek for suggestion from the community if this is the right thing to do and if we are in the right direction. [We also have other wild ideas like sorting the groups by their load in the periodic load balance, later newidle_balance() can fetch the corresponding group in O(1). And this change seems to get improvement too according to the test result]. Any comments would be appreciated. Chen Yu (4): sched/fair: Extract the function to get the sd_llc_shared sched/topology: Introduce nr_groups in sched_domain to indicate the number of groups sched/fair: Calculate the scan depth for idle balance based on system utilization sched/fair: Throttle the busiest group scanning in idle load balance include/linux/sched/topology.h | 5 +++ kernel/sched/fair.c | 74 +++++++++++++++++++++++++++++----- kernel/sched/features.h | 1 + kernel/sched/topology.c | 10 ++++- 4 files changed, 79 insertions(+), 11 deletions(-) -- 2.25.1