Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751612AbdINQMy (ORCPT ); Thu, 14 Sep 2017 12:12:54 -0400 Received: from mail-bl2nam02on0041.outbound.protection.outlook.com ([104.47.38.41]:60704 "EHLO NAM02-BL2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751328AbdINQM3 (ORCPT ); Thu, 14 Sep 2017 12:12:29 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Suravee.Suthikulpanit@amd.com; Subject: Re: [PATCH v3] sched/topology: Introduce NUMA identity node sched domain To: linux-kernel@vger.kernel.org References: <1504768805-46716-1-git-send-email-suravee.suthikulpanit@amd.com> Cc: mingo@redhat.com, peterz@infradead.org, bp@suse.de From: Suravee Suthikulpanit Message-ID: <95039c1a-7839-d758-e882-1baaf1337960@amd.com> Date: Thu, 14 Sep 2017 09:12:22 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <1504768805-46716-1-git-send-email-suravee.suthikulpanit@amd.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [165.204.53.123] X-ClientProxiedBy: YTXPR0101CA0031.CANPRD01.PROD.OUTLOOK.COM (52.132.32.44) To BN6PR12MB1730.namprd12.prod.outlook.com (10.175.101.15) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 5fcad4d9-3001-49e8-8fd9-08d4fb8b64e1 X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(300000502095)(300135100095)(22001)(2017030254152)(300000503095)(300135400095)(48565401081)(2017052603199)(201703131423075)(201703031133081)(201702281549075)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:BN6PR12MB1730; X-Microsoft-Exchange-Diagnostics: 1;BN6PR12MB1730;3:yBxWWn7GtOApAYS6KgOGr0odMRWt1jZLXHp5Y4HNyCRTHTXG5+LAu7boyZPIbtRz/18XXjsOLhj0f2dsL79cwcThrWtB9cIRGgJwtd0QxyLzIxYO3GXbHjM7fwt7Dm1zW6UkCzmHSsqsf4TxcXjG82KyyepE8ENvu0dHt25rtcA/jcKLoFJN0a7XmoimcLq98km5+MgqjKzEhTTASk9C2PthL+kCfvB4Gwuh4zTIdJfwEeHofnR4CVbv0U7N4FpU;25:9SxcY+BXnsA0asrZRHz27xqEMDbsBG9MlQzLiKU7eKzJ6YTSAVo6B/blqvx+vTrLkImVKVluE50TP+YdptVdpvuMW0kbX3Sg0dZGX4Fl44HJ61uvcgjUJPjcgrwiI0Y8LvrBCB/fmzYkCu+PbwxKzqrqdhXVPRE7pZZc66rDYOdWgYXlRqaKtn662gGYIpViQWV78eztIQCRqh3dUZsNA/7Wu254M1vQEHgf5wCBgKCF6Q7uzU89KfaEbbVuKr7daJYI0AwTQD3CFxcSQ9ueIC4T5e9NMyu3m/De9xUT3DLp5EN7m31gocOtJrJsuzqIElon3atyWT7k8BRoTBJ2tw==;31:xs/Aoz4rRHvUu8PK8AooB85Tw7RNeI8iubmkuW4xyaYuuHm7gGYFSflZ9y6KO9OrejXbp1x/jP51/9dnoPXsMF58grp99UjXX8Q+ZFsuW3sDrD6TZaEdK3RYfZHPKaJnCIKKgAxWVaU4DRkeNGmsttXEslvYg+jbGSFhW+Tcm7NEHpuRZdCDjBxGhRm2bAcF/60VZgnU1aSgRSl9CNsS8kIWo0I/0/825CgwsdG+BBA= X-MS-TrafficTypeDiagnostic: BN6PR12MB1730: X-Microsoft-Exchange-Diagnostics: 1;BN6PR12MB1730;20:RlReT7t2NyAbPUzAZWI/2ilZI/qnGXOYq3bQwSFUDaNOEF5QwKavsvcr7OZ1RnEWBlg0mJbEH04fudGS7hTexbyqiEmM/L8QgjxfAbBBE3O+tNQh4DHLtHNS2uCFdEDc7yih1cH/5DLgNFSqlULFLjnauXCPfvbU4PMHb3SQO1KWcnNpQZGN0ngtDwCInU9di0JTRqOJfRYiPUYyvlcliZKNzaxZrxk+3ktLPOogqWDcaTI83V0R+Hlo0gxz86eDkI9ACo51LXIrmTyf1Kt3kQ5IwnWCXbp01ZSTlKefR8AZw4Dagmv5ulDHFzlZftBm5Z9W3KaRb+0Fr7oIXSUuSXBt3fXIdDZ5vCfwPw+xP/Qfqaa36bzi17CFayDvzSIAcfnsXl7BXyr+07e9fJAukGNZJlkCDTPTzTi/dmxQXEOkASatgkNqDRyjXbQcCHgFPut62GG1ZsMDxfU4PBWRUerP4TnTDPS51OPUrWxi6wiDeVtcVQTZScLoIAqxk2XX;4:RXjYYl9O6bGwGa1MZw/7LC+lQ7lY70hyKlbRno+zoMIL7lNhGAfbpFdKxOBa63vZJu2XN7ZcaPN1T8wZUxAk8sZmiWMx2JL5tFMEEh0yD47Bif5EDCW64+uAl6f5CT0iBZAlQnx7UgWrEiHeUd+KYaCNlUiWLygWbiYIZPDcEv7sHfPOq1qStMmXOl8NTz5R++QrS4iL16tQeINtnO7WoDITh6OxFT82+Yqs2UrU5hcaLvPgSa98l7Z0I0TkrkJBkZANWOy+CXwZ4rRObXAmyulcdK2VYXBrEtaPxvjBmPI= X-Exchange-Antispam-Report-Test: UriScan:(767451399110); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(8121501046)(5005006)(93006095)(93001095)(10201501046)(3002001)(100000703101)(100105400095)(6055026)(6041248)(20161123560025)(20161123562025)(20161123558100)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:BN6PR12MB1730;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:BN6PR12MB1730; X-Forefront-PRVS: 0430FA5CB7 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6009001)(39860400002)(376002)(346002)(199003)(189002)(24454002)(230700001)(65956001)(68736007)(83506001)(53936002)(189998001)(33646002)(65826007)(31696002)(478600001)(81166006)(7736002)(5660300001)(23746002)(86362001)(6916009)(8936002)(316002)(2950100002)(3846002)(8676002)(305945005)(16526017)(4001350100001)(31686004)(6116002)(97736004)(81156014)(105586002)(4326008)(53546010)(25786009)(53416004)(6486002)(36756003)(2361001)(6666003)(2906002)(101416001)(54356999)(50986999)(106356001)(76176999)(65806001)(2351001)(47776003)(64126003)(72206003)(6246003)(229853002)(66066001)(50466002)(110136004);DIR:OUT;SFP:1101;SCL:1;SRVR:BN6PR12MB1730;H:wsp093661wss.amd.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;BN6PR12MB1730;23:iqzpVmuvh8n7mggF/HQHg7st681ITIid0FPH2?= =?Windows-1252?Q?Gv5K023tQ0BjSLNpTnkgRP8gPvUdz4SoMYFgEJHVUOx+wzbMLv+nT0Zy?= =?Windows-1252?Q?CHAaV16tQ7Bq+I2dv8RZf/eLjS23GOWJXihz32GsHgnvNJHjvqZgeMmS?= =?Windows-1252?Q?mLOEl6tMqxYveNbvLNPGzrvt6Jq9fAmNTHBn3cNAlAcFcSIEZmTNTPXI?= =?Windows-1252?Q?31gQhMTdOmoRMCAynble1516LK3jWZoniWKy11oGZs5LcYAbf0OYxEeF?= =?Windows-1252?Q?2b/LUatjtvoo1K2kBMlGDB9SNrizn4lQyVDiXQ2c7vr5tygmKk3J7Doq?= =?Windows-1252?Q?Xk2XGKtgr9rCpim0kDnPINbFkJUFfVMfzaaNivuUg4L4XNa45bx0RLL4?= =?Windows-1252?Q?UF3BWKCZOFwt+GYvyqdmvDV75r+hIYgKsyjC998QTTddBWmn8JdpjnMv?= =?Windows-1252?Q?LF2dK0lg0KX6qhWlAsdJxZufgHVWiXMnoPb8LoVe0LWaTIozb33+K7kH?= =?Windows-1252?Q?3aiR2TqobfVLaN/AE0Yq3EBRVElRgy+UkE5Gm1KvZF7ZNuK7pkLBJfK8?= =?Windows-1252?Q?QVDNVthBUMjAKqRX27O7SYPJPvJ6rbegk8rvfG3+M7Aj5f8wEDJIHKGf?= =?Windows-1252?Q?eBc23L2zX5gg90uSmZaCS7o1kgg1tXbzreVN+W0SWAWP4DbQEwN35sqd?= =?Windows-1252?Q?croywM0wTIvTpRsE8bQ3qTl6K+LLY7cVk7oTifFBhG+UWpboG6AS6Mxi?= =?Windows-1252?Q?to/4OfiUtyQ8mUk7Nk50S4MrIaAO4qGVdKXvb2f5x9lNUpmYJ627YMiy?= =?Windows-1252?Q?mvrqMHfzXtTgW6mea/SC39AEmp82anrVX7v3RMWV0flpkPRzN0wkAvdW?= =?Windows-1252?Q?zHMmWPiTL6KGKbTs51SofQDpmGQ9WTCRG0X6z5q7eAAQ8Kzb0Q1n+pER?= =?Windows-1252?Q?Qkw+vbVAbekE9TvxLySlZWF1eC1V3yYw1c/Wnz9zqNR1B71cBBVCQ41b?= =?Windows-1252?Q?CD/Tsq6kcPqBiJByigJyrO4Hs3fl3rBON9iBAJquQnKLoSn24C4CDvmj?= =?Windows-1252?Q?9Ltaosjshb7OcX2rBFAzDeVzCXAbkjzyY/bAnSxenkoHUVB5BXfyDcnQ?= =?Windows-1252?Q?OWitlAwZTKg3cSaukoJzqkq8GXhXnXDkZbR1xIuCIq4j2AOv9xG4eibp?= =?Windows-1252?Q?k1jjWZVu3avmDfuGZf6iPWI3CyKJpW0znLwo7zUdgV1AAt1waHcU2DjT?= =?Windows-1252?Q?MmD9nJoa+z4Qke7aQoxxwruU747QFAtk2RTfpakKU9Wbstp1WNZAQuqt?= =?Windows-1252?Q?2eCHUGWf3/18cjj5unJX+rGl2jQFzxEA3DKareSuCvj0wY0vED5gai+p?= =?Windows-1252?Q?75X+MvSYwckXL8RAGnZRj7w7u6xd/Z8ixzWk9rjpigXGKHJkBmLFynao?= =?Windows-1252?Q?WehNX2HM8Hk7C0PY0Sm71Id5x1Z7UCYfD/ilt48dNButhROQGYLkiVS/?= =?Windows-1252?Q?u5adyXc4II8tcuIQ+hztfm2IDk8?= X-Microsoft-Exchange-Diagnostics: 1;BN6PR12MB1730;6:TGg+gpp2h5tFs+JH3ASt70GMkYVblf6guaWs3wxm8sEsVP1nuxX00gP4vZ9uiTji5M9INCz2gcVjNv2+qvEcc3aJI3Uupk7COPwQV8oAmCjltuN2qzTWXe6m9qbg1cruq5D1u/FliUvRWFBONDf4efVMamFbAHNZ9mvQsIxoF6qXtLlUuxF2DnZH9ZCrTWcajsJgTR6SY0sutTVz4fLVs83X/rJ3cI9RamSHxmttIkDgX43LjgLu7xc/wV3jm0tDcVYG1b7N4aN7mqKxud0vlY76rEw9L/SwOtfoOjkuIFP0rL6aq39ywHTeVdA3EKDalsHnip2MZvVy50G2VyUCoA==;5:B0xtL6t4XpE4nKKP/hFbXZRGXKYMXb4ZiQ0M84PXVgMXCfxzvXarL2o9TVVVASGNjFLiMZx6/HvHP2QSHs/V1I/0vle3mTW/wf0m0oI7U3SI/RRXj0hSFoHTHE9eFfw0nTJbM3rcx/MMjTKjNLhp3A==;24:L28yf1R6SAwXXgO7MjgRy5Ja0xY8Fr0vacO/9plwLf4LZ1cG9cPRoiyEXF5egaD5bOc+NZsCQYW1N/Ke1AUiVXxjiv2iZq6s3NXx8f0Owgw=;7:RurKeuSjY5NLurA63thsz2jfsmBz4vEPrqNNlKkIpZI5o+D/R7BRpgu0oGJGBYurVPDOUrYdYaZ3fV6/g5dJeBThFUjkJSaUrOjwERO2v7Szsrh9RI5SZw+UXH+Lgbn0MW0VO37QTtWBREgdQklYFJ6DMXG2yTlmrFJNsWvIAVxXB/oKIAeW9vvSD9AgWclgWLPLMatwU3ovmSB6ZvnNzPNffeNzcL2d6508DPekh0U= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BN6PR12MB1730;20:WMe6KuPd89eFzO4C6xJ+j/JQuMRykAUOXXPvlRjsu+DZStF92rbVTiBZHib8/d9b75KUX5MdvfYMERn7nrmhjd3+VgD/iQDtbR3cPjspiiILnVz9zW8RYjvy6kXlJrpWqSzqS0xkRQ69HlHM8RArCjJMC7oP6f0PGwaTryZ8+m1RtC2/A7bbSrL4IM0gIAZsiDhVaTlc+ulR2SvVqYd1jzDDanoWbVzCJTPiwWsZ0zUvAOmdSZ4zQId37uwzHdfx X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Sep 2017 16:12:27.1050 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR12MB1730 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4563 Lines: 126 Hi, Are there any other concerns with this patch? Thanks, Suravee On 9/7/17 00:20, Suravee Suthikulpanit wrote: > On AMD Family17h-based (EPYC) system, a logical NUMA node can contain > upto 8 cores (16 threads) with the following topology. > > ---------------------------- > C0 | T0 T1 | || | T0 T1 | C4 > --------| || |-------- > C1 | T0 T1 | L3 || L3 | T0 T1 | C5 > --------| || |-------- > C2 | T0 T1 | #0 || #1 | T0 T1 | C6 > --------| || |-------- > C3 | T0 T1 | || | T0 T1 | C7 > ---------------------------- > > Here, there are 2 last-level (L3) caches per logical NUMA node. > A socket can contain upto 4 NUMA nodes, and a system can support > upto 2 sockets. With full system configuration, current scheduler > creates 4 sched domains: > > domain0 SMT (span a core) > domain1 MC (span a last-level-cache) > domain2 NUMA (span a socket: 4 nodes) > domain3 NUMA (span a system: 8 nodes) > > Note that there is no domain to represent cpus spaning a logical > NUMA node. With this hierarchy of sched domains, the scheduler does > not balance properly in the following cases: > > Case1: > When running 8 tasks, a properly balanced system should > schedule a task per logical NUMA node. This is not the case for > the current scheduler. > > Case2: > In some cases, threads are scheduled on the same cpu, while other > cpus are idle. This results in run-to-run inconsistency. For example: > > taskset -c 0-7 sysbench --num-threads=8 --test=cpu \ > --cpu-max-prime=100000 run > > Total execution time ranges from 25.1s to 33.5s depending on threads > placement, where 25.1s is when all 8 threads are balanced properly > on 8 cpus. > > Introducing NUMA identity node sched domain, which is based on how > SRAT/SLIT table define a logical NUMA node. This results in the following > hierarchy of sched domains on the same system described above. > > domain0 SMT (span a core) > domain1 MC (span a last-level-cache) > domain2 NODE (span a logical NUMA node) > domain3 NUMA (span a socket: 4 nodes) > domain4 NUMA (span a system: 8 nodes) > > This fixes the improper load balancing cases mentioned above. > > Note that in case cpumask of the last-level-cache and NODE domains > are the same (e.g. on AMD family10h/15h servers), the NODE domain > will be excluded. Therefore, this change will not affect those systems. > > Signed-off-by: Suravee Suthikulpanit > --- > kernel/sched/topology.c | 26 +++++++++++++++++++++++--- > 1 file changed, 23 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 79895ae..98a8bbc 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -1335,6 +1335,10 @@ void sched_init_numa(void) > if (!sched_domains_numa_distance) > return; > > + /* Includes NUMA identity node at level 0. */ > + sched_domains_numa_distance[level++] = curr_distance; > + sched_domains_numa_levels = level; > + > /* > * O(nr_nodes^2) deduplicating selection sort -- in order to find the > * unique distances in the node_distance() table. > @@ -1382,8 +1386,7 @@ void sched_init_numa(void) > return; > > /* > - * 'level' contains the number of unique distances, excluding the > - * identity distance node_distance(i,i). > + * 'level' contains the number of unique distances > * > * The sched_domains_numa_distance[] array includes the actual distance > * numbers. > @@ -1445,9 +1448,26 @@ void sched_init_numa(void) > tl[i] = sched_domain_topology[i]; > > /* > + * Do not setup NUMA node level if it has the same cpumask > + * as sched domain at previous level. This is the case for > + * system with: > + * LLC == NODE : LLC (MC) sched domain span a NUMA node. > + * DIE == NODE : DIE sched domain span a NUMA node. > + * > + * Assume all NUMA nodes are identical, so only check node 0. > + */ > + if (!cpumask_equal(sched_domains_numa_masks[0][0], tl[i-1].mask(0))) { > + tl[i++] = (struct sched_domain_topology_level){ > + .mask = sd_numa_mask, > + .numa_level = 0, > + SD_INIT_NAME(NODE) > + }; > + } > + > + /* > * .. and append 'j' levels of NUMA goodness. > */ > - for (j = 0; j < level; i++, j++) { > + for (j = 1; j < level; i++, j++) { > tl[i] = (struct sched_domain_topology_level){ > .mask = sd_numa_mask, > .sd_flags = cpu_numa_flags, >