Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp32529366rwd; Fri, 7 Jul 2023 15:58:34 -0700 (PDT) X-Google-Smtp-Source: APBJJlGX0bI/E3iMxYeI3QGLhCNy0oYcGXJVa1AiD0DpE2j91J+qSYMoLhcs995B2ypWPdYnCHSn X-Received: by 2002:a05:6870:171d:b0:19f:202:4fb9 with SMTP id h29-20020a056870171d00b0019f02024fb9mr9155894oae.38.1688770714174; Fri, 07 Jul 2023 15:58:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688770714; cv=none; d=google.com; s=arc-20160816; b=Sk8fTFRUJvPsKmfD8VG6DdByn2elIpxlHYo0h5usAYaZG5VAISPOKRGfw0v/4MtRPq kf/JdY6fd1nKUsexte9sGVFHEfuW9IlsLYkdWIuH1Bv2Jr34FYOiKOnjFx7/5JaDaHTl 78hcaf/2PHo3V9wkywtsX8TUYepJ3NiPGdFjEySyHzEAL5/QSjE2DIDcdHVidK6paBxj +k8f77Z002uvmbePgNX377BGwaDGLz8XQJL3D2m6Du+hBkhZmA3yWvjfod/D/N3RhZKM 7VwEqtX+o8odNNTf0NGpHpoVgLodiDx1RSQFc2pvfr0/LFE8F60ncWHef8jNA3WSAD6g /AOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=4xKNgp2Bkv0DW1pSNcT/gaMo4dovGy/Pl/CXaCgSdio=; fh=WFWf0Dp4/U1Nrm9QLcaVI8S6jztbAAaISyT5Jp5Te0s=; b=TdpvKValm9nTZ4K0IF5OeRfBubZJRlF77P25i0NuXaqmBREhnOHbObZ8EKiz7NmU85 YGJ0DfyRuhCEefmbE7yuHTKaWd41IjhXDyZ3DtkVurO958HAi3Ola+UCq87+X3wubS2T Yrw+5e8NTMX/AtG2YsB1uwQ1IqS6BUJh5eWd6QjVnb4Cwf1hTbLeOuSoEnGJaFpEODLR Nu6HYA8NW2mefntRTY6T3OEEOdvgVyU5MUFELGuCIrNwxhdYOrxpACFjsJ+X0mG2x3IM vZv5LXEF5ppARLFt9UM1VSSco156BYV5ralwdooG/k9G5EKPfIieDVkDzozM+k0lePCs 35fQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Xgp4Vyl6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j6-20020a17090a318600b00263aff4ccf0si3007129pjb.3.2023.07.07.15.58.21; Fri, 07 Jul 2023 15:58:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Xgp4Vyl6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231430AbjGGW4z (ORCPT + 99 others); Fri, 7 Jul 2023 18:56:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229629AbjGGW4y (ORCPT ); Fri, 7 Jul 2023 18:56:54 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DB1F1997 for ; Fri, 7 Jul 2023 15:56:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1688770613; x=1720306613; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=HVXRmj/J734LlcosDZLbFzKEkqCUDWQNtSlrKyrH6kU=; b=Xgp4Vyl6YBofjt40ZVb/CYm5b7MwqTLPj4V7qkUr7VctVNR9MAdONWwy DiKj13rZ7ncD7cjmtRuOoqALIQvwShPg/DOgfC2qeB5Ol7a+yfF1hDppW vQ93do6NZA3ombu/tmuLdM4MHxS5ZC9jm3Jix88ms6BW1MqQ5dY4hwvcu tkAKL1q5sL3YBFkXT4Rgec2AT0gpMNw3awIUSo6cz8gKZkU67fO3ijGMq jnrY3OFxhHiLp1RR/2hqODelTqiPkQmCLJohcajqcg/MICYtsN/5+fVYU 93e5e5YC7MMOphHStJDg6Gt3+RL/fCrG8fQsAgP9Wu84F+6TXKtlz/3NN Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10764"; a="427683425" X-IronPort-AV: E=Sophos;i="6.01,189,1684825200"; d="scan'208";a="427683425" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Jul 2023 15:56:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10764"; a="714176648" X-IronPort-AV: E=Sophos;i="6.01,189,1684825200"; d="scan'208";a="714176648" Received: from b04f130c83f2.jf.intel.com ([10.165.154.98]) by orsmga007.jf.intel.com with ESMTP; 07 Jul 2023 15:56:52 -0700 From: Tim Chen To: Peter Zijlstra Cc: Tim Chen , Juri Lelli , Vincent Guittot , Ricardo Neri , "Ravi V . Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J . Wysocki" , Srinivas Pandruvada , Steven Rostedt , Valentin Schneider , Ionela Voinescu , x86@kernel.org, linux-kernel@vger.kernel.org, Shrikanth Hegde , Srikar Dronamraju , naveen.n.rao@linux.vnet.ibm.com, Yicong Yang , Barry Song , Chen Yu , Hillf Danton Subject: [Patch v3 0/6] Enable Cluster Scheduling for x86 Hybrid CPUs Date: Fri, 7 Jul 2023 15:56:59 -0700 Message-Id: X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is the third version of patches to fix issues to allow cluster scheduling on x86 hybrid CPUs. They address concerns raised by Peter on the second version. Please refer to the cover letter in the first version for the motivation behind this patch series. Changes from v2: 1. Peter pointed out that biasing asym packing in sibling imbalance computation is unnecessary. We will negate extra turbo headroom advantage by concentrating tasks in the preferred group. In v3, we simplify computing sibling imbalance only in proportion to the number of cores, and remove asym packing bias. We do not lose any performance and do a bit better than v2. 2. Peter asked the question of whether it is better to round the sibling_imbalance() computation or floor the sibling_imbalanace() as in the v2 implementation. I did find the rounding to be better in threaded tensor computation, hence v3 adopt rounding in sibling_imbalance(). The performance of both versions are listed in the performance data below. 3. Fix patch 1 to take SMT thread number more than 2 into consideration. 4. Various style clean ups suggested by Peter. Past Versions: [v1] https://lore.kernel.org/lkml/CAKfTPtD1W6vJQBsNKEt_4tn2EeAs=73CeH4LoCwENrh2JUDwnQ@mail.gmail.com/T/ [v2] https://lore.kernel.org/all/cover.1686263351.git.tim.c.chen@linux.intel.com/ v3 Performance numbers: This version Single Threaded 6.3-rc5 with cluster Improvement Alternative Improvement Benchmark Baseline scheduling in Performance implementation in Performance (round imbalance) (floor imbalance) (run-run deviation) (run-run deviation) (run-run deviation) ------------------------------------------------------------------------------------------------------------ tjbench (+/- 0.08%) (+/- 0.12%) 0.03% (+/- 0.11%) 0.00% PhPbench (+/- 0.31%) (+/- 0.50%) +0.19% (+/- 0.87%) +0.21% flac (+/- 0.58%) (+/- 0.41%) +0.48% (+/- 0.41%) +1.02% pybench (+/- 3.16%) (+/- 2.87%) +2.04% (+/- 2.22%) +4.25% This version with cluster Improvement Alternative Improvement Multi Threaded 6.3-rc5 scheduling in Performance implementation in Performance Benchmark Baseline (round imbalance) (floor imbalance) (-#threads) (run-run deviation) (run-run deviation) (run-run deviation) ------------------------------------------------------------------------------------------------------------ Kbuild-8 (+/- 2.90%) (+/- 0.23%) -1.10% (+/- 0.40%) -1.01% Kbuild-10 (+/- 3.08%) (+/- 0.51%) -1.93% (+/- 0.49%) -1.57% Kbuild-12 (+/- 3.28%) (+/- 0.39%) -1.10% (+/- 0.23%) -0.98% Tensor Lite-8 (+/- 4.84%) (+/- 0.86%) -1.32% (+/- 0.58%) -0.78% Tensor Lite-10 (+/- 0.87%) (+/- 0.30%) +0.68% (+/- 1.24%) -0.13% Tensor Lite-12 (+/- 1.37%) (+/- 0.82%) +4.16% (+/- 1.65%) +1.19% Tim Peter Zijlstra (Intel) (1): sched/debug: Dump domains' sched group flags Ricardo Neri (1): sched/fair: Consider the idle state of the whole core for load balance Tim C Chen (4): sched/fair: Determine active load balance for SMT sched groups sched/topology: Record number of cores in sched group sched/fair: Implement prefer sibling imbalance calculation between asymmetric groups sched/x86: Add cluster topology to hybrid CPU arch/x86/kernel/smpboot.c | 3 + kernel/sched/debug.c | 1 + kernel/sched/fair.c | 137 +++++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 1 + kernel/sched/topology.c | 10 ++- 5 files changed, 143 insertions(+), 9 deletions(-) -- 2.32.0