Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754220AbdIGHUX (ORCPT ); Thu, 7 Sep 2017 03:20:23 -0400 Received: from mail-dm3nam03on0042.outbound.protection.outlook.com ([104.47.41.42]:62993 "EHLO NAM03-DM3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752276AbdIGHUV (ORCPT ); Thu, 7 Sep 2017 03:20:21 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Suravee.Suthikulpanit@amd.com; From: Suravee Suthikulpanit To: linux-kernel@vger.kernel.org Cc: mingo@redhat.com, peterz@infradead.org, bp@suse.de, Suravee Suthikulpanit Subject: [PATCH v3] sched/topology: Introduce NUMA identity node sched domain Date: Thu, 7 Sep 2017 02:20:05 -0500 Message-Id: <1504768805-46716-1-git-send-email-suravee.suthikulpanit@amd.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [165.204.78.1] X-ClientProxiedBy: DM5PR20CA0012.namprd20.prod.outlook.com (10.173.136.150) To CY4PR12MB1735.namprd12.prod.outlook.com (10.175.62.145) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a0a7c4ae-d384-4a15-4050-08d4f5c0e52a X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(48565401081)(300000503095)(300135400095)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:CY4PR12MB1735; X-Microsoft-Exchange-Diagnostics: 1;CY4PR12MB1735;3:F54x1MtSdxEY20MkHcppsRb5egxJDEHakm6eXsFpgUoJNEjyGl+oLRBTTTj6mEQup4tmVYrnPYR6X4Ho8lphTDwhSklJvMiAJgklxGM1fREiEm5DMBnGyOpgjKXStNDHMTpMdrmW/lL970Jl6l9ujq93mJzKj4AJpbxJYNudkXkiX5+jnq/Jdx8CB5cF9KKaf2GNGnT76VoMXczuWSexQfmh5EzJpheApluPm/gNFSVWaE2APsqmSOeMvQ7Vd/1C;25:mhwJPtAF64xoDmGlzd4hUAqmwImyS4bWEFcVCfUj4xPev4/GLkrIdwIEDhPFK8rIH1ttBRLXFM/FAzbtjl3y5CeK/sMsYQ46PdZdZWLZRiaIP7GW4DyJ8/pdErONLKqbVUy36G6qr771AwhfkNgKfYgiOZeF1XBghiejOv6HgnI8iu/UcSLtx170P1Z32sxhIuSS4UusYnuZYcug8KoXLGQO+m4BNPSyJyrLGW+tRuimSt25G11SK2jRCREbzWJqu7+l2yJeEtQ1+QnrrmfsFKlOiXEpPdR+tWHf2cC4xqzKJiSP33NotlLqnRe0g95e0yUIFqT5pQH6YGqZQ+Zoqw==;31:TkY0bAqsr1ZY0hN7Asmnju/MneK2H4yBRNb4RPtivCU/Q4iTLsfIyb7gFnZ6zPoHYa0WlzNnlROdHPFWt9P30ySqSN8o75kp4/kLHZb6YE/8vos251IsSdBEJ2QHdMnlBWB6UZkS0/YyYwhHbY9YZExuqsrbPxlGqZDYHm0K9xIadD0lMRXBvWdMFX7DDB+etIIQG69eS6MUwVV82WNWQ9SGlxd30o3oCK5SHLiKtzc= X-MS-TrafficTypeDiagnostic: CY4PR12MB1735: X-Microsoft-Exchange-Diagnostics: 1;CY4PR12MB1735;20:SIXgoy/u7webSIkbikmcnbaTa2V8EO1K/rn8g4NlHEyzyI55SXicHg9ghulk3GeXJKOHXXZG85CYv+omxYFaC/+ZRQB4D0oUjBwe5nPioLCge754ZwPEKkI94OrUExDEIqvRUGDmlUFbNK8XaESLLMiphrAvZpUYmjJUhvtpxGXj8jBirzrEHKoNqhDKgrlKFPY+ninWRfCdexhzgvaoDwAkPIHAshttx/yfsn1sqQZLv9+E/LqUHLem3lJQ9TDdaYMo8Z3n7kQ+/culwGUQyw0k26qEfZEn3GkNI3i+cgbq6NIZj08Ly92aJBL3NbWyRqXx8zPPFS+W4mm3WM4w486so07OO3dqn5hNQ+Ckksu4jSRO2vkEYwux6hvULudTiEvWDSFm/l8kFGCSFFSlt8yxyMWFaDkoyZQMv5DcEN6nNSYkUd0zniokYLYpnaUb8aIjS34A+CjnFKV2TpTpixIh7Jl+1xs0PeVcL7FVxJf4KcJzSq7XPTtWDGJYSshy;4:+ImbMjsiBcF6+885dDcg0esYyKQF+oXJLZ9lfmSV0sLBB39KzJWCXPtavpufwElDeDQw2sZDAzXS0X7fBcCrEKkNbWdfPKIP1gXAjWfkLuosZFHWbX2gxr0kSvksJWBn714F4car6zj3LQBdRGrVXrVZt91ksLi3rUPr3A5bPaBa5+wszyN5Fmv1DEtpUoyBzNn8H+0r3673Gu1WKUl6CGNTH7WNe3J1v+JK35fKrDg/fBcTR5hl+sZus1ogi0NmOFkxmns5bv+j9d4gBMlp3vJrOn2ooA6vfktLMDmu/xw= X-Exchange-Antispam-Report-Test: UriScan:(767451399110); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(93006095)(93001095)(3002001)(10201501046)(100000703101)(100105400095)(6055026)(6041248)(20161123564025)(20161123560025)(20161123558100)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:CY4PR12MB1735;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:CY4PR12MB1735; X-Forefront-PRVS: 04238CD941 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6009001)(39860400002)(199003)(189002)(110136004)(4326008)(47776003)(42186005)(66066001)(6486002)(53936002)(53416004)(5660300001)(2351001)(68736007)(106356001)(6916009)(8936002)(8676002)(81156014)(81166006)(2361001)(33646002)(25786009)(305945005)(86362001)(105586002)(189998001)(478600001)(7736002)(3846002)(6666003)(50226002)(50986999)(50466002)(6116002)(101416001)(5003940100001)(48376002)(97736004)(36756003)(72206003)(2906002);DIR:OUT;SFP:1101;SCL:1;SRVR:CY4PR12MB1735;H:ssuthiku-ubuntu-b2.amd.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CY4PR12MB1735;23:gyNq9PIiVwYMGsbJvg/0q34N8aX9pa5NZaoRKG09r?= =?us-ascii?Q?L0tqm7DcuOQhiWX9Gl5YAlJD9A04xniTS1QhDzv37eAogVzDXsE245uCntCs?= =?us-ascii?Q?wAq54mLrpzAV5npH4zJE2Uj02NsyLvdnJIyq5qLHN7Y/bruyhBXSowOMWR0F?= =?us-ascii?Q?b+f6igbAG/YiY2je4F1OPuOQa7OaN5J+YAebEDvxOR7pL+IbMyxUBSHQa3ly?= =?us-ascii?Q?ITLi8FTFOev/V9BxRyamF+qWMt3MOT6VS1GLrypIWQn4PXmhw5YzBUtWiPZg?= =?us-ascii?Q?6iCjDLpRdpE6uJuXqZHO7m2dUUrVA0TisC94iwbT9uPBkmVfRilALzDo41Nb?= =?us-ascii?Q?MoSwgQkAS47xr4mEnbid4TI6cgnfe+TLmThIzJ9ulcUP3ioA/674pJ1c5krU?= =?us-ascii?Q?xL5LlHTSEP3cmK1ZmRdvqnmAfj9EqPAcaPMPqtmFb118/ey5vs+hkwPTAHBk?= =?us-ascii?Q?G7pMDt1dqwuLituc0t8Az5ZX4thsyoycvt2OaZiJBDQV6V1ZmjtqwIbsOrOx?= =?us-ascii?Q?/jbaWWaBzbu0GgZm+VT82j2/9taSkLS1RMTHw8IuLU0k68K5NLz+upMlamgy?= =?us-ascii?Q?PR/f5F1nmRBnJN0xOY7EEhGymq4FUpCKLkq3ijVm3lypvrx9eyjGW7AaH6+q?= =?us-ascii?Q?ibKd9VY4qMf78W7VinLDIug4T3W3aSizQ/8kWL2NVDE1JZDCeoLoMCX6f5d+?= =?us-ascii?Q?n3ZKIfDeFR4gh2hXneXkb0914SB3LygPJt2VjSNb6tdw5G0H9L8ZeAJ2Al5S?= =?us-ascii?Q?ZNApy3iNHsz57gOTxHb58ZOh1znRWxIHW+kMdGtOmwsYO/rvSzGeOI67toi7?= =?us-ascii?Q?Y3CX+XGG5APHGm2ZfO0IKSkhfMK9doSHEADtjFcWXbrUFaNCnxlA1zAeBHpE?= =?us-ascii?Q?rTpXInQMMqbn+1J8hDDnZidbX+adTe4F/+Haozm6deJgDc/C2qMdmLCz28Mk?= =?us-ascii?Q?+AsoglWA289Pcua3eeZGrp8TJhqaIGGIFH+xUeNPHfQrJJHQ6sE4OE4wic0w?= =?us-ascii?Q?jtqpgV+9YSbGFWyoFC3bH0djzZbxz3JMWgLjfUoRj4Zkg=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR12MB1735;6:MchBx1w1WpSCg0sXraWK+58oUDarhbU86qwH80NCJUFNbWIqSJ5PdK9D2jxC3lTbA92M4X+vfjB+1mvgejyMbKsJCQm8wkEhv4zaoRQDSm8WRQp0CWBEdUvUAqPLCz9xFcxL7fObmhYtezDH61BvjqKRBUp1TUNcVu4ZvaN6s5xuQ3xeul9lTes4Cxn/NAFzzqF7pnXx6SO8UgNtYvlG9W1tspYlzUpLb1ULOUmt55vP8aJLfejJCXRbixQZjEMmV93DD6dbSS+lCCxFQqEKdC93Qy45ET1pHZ1cylH4QMYGuUFcI8pkX+YXmTV/1j2DygJfzp4K3TPpINtEXVeWug==;5:KnC8uUWQNLytWoSfWkgEE/jQji4ctyzU6uFUUQiR8k0WXvEgmc9c/yXMOAfrc6UKpbsm0Wdd4KQ1hTQBgnKXMxe52+oGiKu3kkDcn6kabB+7PbwkKFO44q33OYvylN2N5tTC6koiUkFuThrdppVzWg==;24:OlLJ30MY2BAxQk0ff845iWGs706qcRTt5l5XQZ8IUNAEXpdgpsFEr5eGIMLXROWdejELjRwHXfmc7zQn3gtL0foEOVSpguuOA0Os4MK1pNI=;7:s4QkAefs3Yj6MKi6cCMnZ5/9FCZhPlQKEx/R7zjIHU4QiovVpw89dbAoFNjXMsmraqYqak2v9hbV59c6MNOgJlnawd2nhNQv13NeZnbQ3mIUTquZ6Id90TK3xV33uJFX1fO/0N2n17S7Egw6hD2jAG++ptrDPbVgJ4Z7hFMMnFbdubNG5rwu1bEWLbMFDlLaMHyXcNrRGnoRCPpgLhf54tl8dV6AcMXuKsof/Ijf89M= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY4PR12MB1735;20:nkz77KUIsv1tKdyv5JAkL4hQi5wmbkwaCiZdwmPlsuF8uIEcMe7KKeb19c4C/oc2ixnjLKkAmZpugZzDEi+vFBSbgBRWFxdc6gLUS2P5ZLfNyJZJ0cp1S003B5Z3SCoghk7ng1P9nagkD6Ge6B2S0c3Awd7S4o41exb5Myu5Nwy5vcaq0uU7p9xTxyGcMN4nKHZ4uthf/D0jBEGxNVvZvE2xA+VNqrzB2OkbifKkjlGXapDFxfuRKlfoT+Lgakwd X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Sep 2017 07:20:19.0234 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR12MB1735 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4242 Lines: 119 On AMD Family17h-based (EPYC) system, a logical NUMA node can contain upto 8 cores (16 threads) with the following topology. ---------------------------- C0 | T0 T1 | || | T0 T1 | C4 --------| || |-------- C1 | T0 T1 | L3 || L3 | T0 T1 | C5 --------| || |-------- C2 | T0 T1 | #0 || #1 | T0 T1 | C6 --------| || |-------- C3 | T0 T1 | || | T0 T1 | C7 ---------------------------- Here, there are 2 last-level (L3) caches per logical NUMA node. A socket can contain upto 4 NUMA nodes, and a system can support upto 2 sockets. With full system configuration, current scheduler creates 4 sched domains: domain0 SMT (span a core) domain1 MC (span a last-level-cache) domain2 NUMA (span a socket: 4 nodes) domain3 NUMA (span a system: 8 nodes) Note that there is no domain to represent cpus spaning a logical NUMA node. With this hierarchy of sched domains, the scheduler does not balance properly in the following cases: Case1: When running 8 tasks, a properly balanced system should schedule a task per logical NUMA node. This is not the case for the current scheduler. Case2: In some cases, threads are scheduled on the same cpu, while other cpus are idle. This results in run-to-run inconsistency. For example: taskset -c 0-7 sysbench --num-threads=8 --test=cpu \ --cpu-max-prime=100000 run Total execution time ranges from 25.1s to 33.5s depending on threads placement, where 25.1s is when all 8 threads are balanced properly on 8 cpus. Introducing NUMA identity node sched domain, which is based on how SRAT/SLIT table define a logical NUMA node. This results in the following hierarchy of sched domains on the same system described above. domain0 SMT (span a core) domain1 MC (span a last-level-cache) domain2 NODE (span a logical NUMA node) domain3 NUMA (span a socket: 4 nodes) domain4 NUMA (span a system: 8 nodes) This fixes the improper load balancing cases mentioned above. Note that in case cpumask of the last-level-cache and NODE domains are the same (e.g. on AMD family10h/15h servers), the NODE domain will be excluded. Therefore, this change will not affect those systems. Signed-off-by: Suravee Suthikulpanit --- kernel/sched/topology.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 79895ae..98a8bbc 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1335,6 +1335,10 @@ void sched_init_numa(void) if (!sched_domains_numa_distance) return; + /* Includes NUMA identity node at level 0. */ + sched_domains_numa_distance[level++] = curr_distance; + sched_domains_numa_levels = level; + /* * O(nr_nodes^2) deduplicating selection sort -- in order to find the * unique distances in the node_distance() table. @@ -1382,8 +1386,7 @@ void sched_init_numa(void) return; /* - * 'level' contains the number of unique distances, excluding the - * identity distance node_distance(i,i). + * 'level' contains the number of unique distances * * The sched_domains_numa_distance[] array includes the actual distance * numbers. @@ -1445,9 +1448,26 @@ void sched_init_numa(void) tl[i] = sched_domain_topology[i]; /* + * Do not setup NUMA node level if it has the same cpumask + * as sched domain at previous level. This is the case for + * system with: + * LLC == NODE : LLC (MC) sched domain span a NUMA node. + * DIE == NODE : DIE sched domain span a NUMA node. + * + * Assume all NUMA nodes are identical, so only check node 0. + */ + if (!cpumask_equal(sched_domains_numa_masks[0][0], tl[i-1].mask(0))) { + tl[i++] = (struct sched_domain_topology_level){ + .mask = sd_numa_mask, + .numa_level = 0, + SD_INIT_NAME(NODE) + }; + } + + /* * .. and append 'j' levels of NUMA goodness. */ - for (j = 0; j < level; i++, j++) { + for (j = 1; j < level; i++, j++) { tl[i] = (struct sched_domain_topology_level){ .mask = sd_numa_mask, .sd_flags = cpu_numa_flags, -- 2.7.4