Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756483AbcDGPHx (ORCPT ); Thu, 7 Apr 2016 11:07:53 -0400 Received: from mail-bl2on0122.outbound.protection.outlook.com ([65.55.169.122]:36444 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755945AbcDGPHv (ORCPT ); Thu, 7 Apr 2016 11:07:51 -0400 Authentication-Results: amacapital.net; dkim=none (message not signed) header.d=none;amacapital.net; dmarc=none action=none header.from=hpe.com; Message-ID: <570677BD.1000800@hpe.com> Date: Thu, 7 Apr 2016 11:07:41 -0400 From: Waiman Long User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.12) Gecko/20130109 Thunderbird/10.0.12 MIME-Version: 1.0 To: Andy Lutomirski CC: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , "linux-kernel@vger.kernel.org" , X86 ML , Jiang Liu , Borislav Petkov , Andy Lutomirski , Scott J Norton , Douglas Hatch , Randy Wright Subject: Re: [PATCH] x86/hpet: Reduce HPET counter read contention References: <1459951324-53339-1-git-send-email-Waiman.Long@hpe.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [72.71.243.60] X-ClientProxiedBy: CY1PR08CA0037.namprd08.prod.outlook.com (10.163.94.175) To DF4PR84MB0316.NAMPRD84.PROD.OUTLOOK.COM (10.162.193.30) X-MS-Office365-Filtering-Correlation-Id: d8470af1-6ff4-410b-6915-08d35ef6617e X-Microsoft-Exchange-Diagnostics: 1;DF4PR84MB0316;2:hZAs0XoyE7giD0bOvoaRG+GAyTUigWj5Hn8P1m5ErOzvhF6EJi13Q0eG/w/RVII8S5W2v2wd9XuopIC0ztUWr/hewkr+N6YYFmFluUtJffXChyNNs80AvIFYV0xP90EYbTIrFbf3oNZgspHIDLpTp5QaopvikKThlpmr0leDnu6QtzrYghlINFPHjEQSRZ9V;3:etdFf3mcPDaHyi4rteM+PYC+gnAwCSSu6z0afGTDF/Wnhf+cVM5PhMeQpQfhKZ/nmbspIPuet57zAjcJ9QsDrzOSNSvWDcJWgYiW2b/cEkR21EQvjyt/1253slZcHbMY;25:+zMePaKAGO4Zq9o+kBcq4PjHLsuZfaP6xNEw6lHH0MV+UTL6qoRQz3VPGLNQ3q1AE571F6xLiBwrXav7QWoiBi3RBpcVocDr8Cst8izUA5SSMMPfaKPcFgF82wnDS88s1M1e+RoHGgq5+2TN4bgrEqFFETqcdUComttGGPg3GxeN9KCPxhfXvSOwKRVEIIc+wrcWo9QzU6w5mF345oLgSb1Aflvp2Zm9LNdHJlZc99sCnSfjVNQMfFNzmnmeB7NYHndly+1tc0e4YARsiSOBo4L6nUbj+h7vDIcZmXgr80twMjoehWiaQkco6npIMJphaOp8Rp/l6q5OLB98RFQ8Vk6PAuw2qI4rQ0H1dp48q4yKh4OtzbNAwCQsT1WL+6soWTaqQPtuFTLVLLeKmhaRnZ7i6M/qh/ekW0Q4HD4Ji45/iyPp8nvkgV6KZs2Ar+vMF3NnkC6SF9WXFgsJXGiVUmmLnKrw+w51C+BZGAOt3tBHC8EW9Z7zXPeFxHPvRj80HRIeJW/TI3RykJc0j63HbfKtd+dIhYoxuIZKQjfQfho4GvMpqwNJ8RbTKwTci0NB8yq6lTV+eg6lIX1X3TLZQAOrJYFzkaDL2GGCh5/Rrlg= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DF4PR84MB0316; X-LD-Processed: 105b2061-b669-4b31-92ac-24d304d195dc,ExtAddr X-Microsoft-Exchange-Diagnostics: 1;DF4PR84MB0316;20:pEKZ/yDoDr1RlWm/2iL5ZXiMYlAtTgIo8CmpxWIpSt6mhyai9TSTaLOGLXD2slGXJDGVwO5ICt2msnCEFFT20kVOcU1+Y9kXZbwpB9QGFtV+c1L0yOM73TN7uutH1LbTjLu1ENtMM2wqxUAeYsAZnlhwKfp2F+6x6azh20bbgcTEOuS6ZBR43QDU8V3QLXunKE8ebqf03Hc7+2X9Jl4qql3dvifQpdtOLUNp6OEnRJKxJzSLUK3otKfX4BrV68ecH471hfSSNsCkgH6bLC2WoIYp70TpmxKiUC8frs6XXmZTPRGX1rdadAxCydLQXAWfrx2Da8Pa8TXr7o+k4mZsEQ==;4:D5G/fGhFO1uZFgYrDLVphYYxXTJtlvzNYkekJ4zWJwyaZ+4qQ+jiQ8JgtrFCPDoyDdUvUrVLTPn3c2c7fPIbCaEZ+kgH+B/ABu4HrRXkV1qrIZ6oW9bBxZ4BgfSCgqpICqjF6GmJoCDWt6eJDRziJ7DE9CNHCkactMb09OmT/qaiaXRqon5oHnwEtWPpOX+iF8mIuMHSDBMj/99bQqBf0oOcAHff6NyHZcqJH3jD8VGbSq8IxbyxvALmXikP8D+6NzpVJwFYELy781zSN3Myi8C2AXipyhJhAfomnKYc7yJdgAxCYfkaZ3MDoB+8P2S/uS+8NT7wQ3ht9uEsLf6Ay1DK992NnRPsyqjEzwKjhjOLkOMzrKaLd0Hvk4bdxDxKn96LlsF0CwVduV9h+2kPyQ== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026);SRVR:DF4PR84MB0316;BCL:0;PCL:0;RULEID:;SRVR:DF4PR84MB0316; X-Forefront-PRVS: 0905A6B2C7 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6049001)(6009001)(24454002)(377454003)(92566002)(189998001)(50466002)(47776003)(23676002)(81166005)(42186005)(2950100001)(2906002)(5004730100002)(110136002)(230700001)(64126003)(4326007)(86362001)(117156001)(4001350100001)(83506001)(1096002)(65816999)(33656002)(19580395003)(80316001)(19580405001)(5008740100001)(36756003)(87266999)(54356999)(76176999)(50986999)(66066001)(77096005)(6116002)(575784001)(586003)(3846002);DIR:OUT;SFP:1102;SCL:1;SRVR:DF4PR84MB0316;H:[192.168.142.152];FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtERjRQUjg0TUIwMzE2OzIzOmI4K2Nld0dmbllYWUpDNnNFUytBWHZpN0do?= =?utf-8?B?ZCt1ZlVnSzB2SnBtNit0cnFrUXNReXllM3UrQUdUSzBOZllhYmdBV1h3Q1pZ?= =?utf-8?B?VjhQa3lrZGFLdFZDS1dHY2FrYlQwSU1lZGJLZk5kUG80cjRlY1F1ZFV5WkJJ?= =?utf-8?B?a0Q3c3FWWkhjY3BkeFRId3pSQmd1RVRHbkV1WXdDVDJuOUI5dDZjMm4ycTNM?= =?utf-8?B?T3dtaEF2WmU0cE83bWVZcExGaW1sdGNydzdSc0hxUkNjWEF4ajNJbTdNMkJB?= =?utf-8?B?VnFDWlJKV3JlRWh4bzE3TUE2NCtYZnUrUnMxQVU2Q05TdW16TWNwcCtwc0Mw?= =?utf-8?B?Z25Ndzc5SUMzaCtOL1BrT2cwWDNWY1pVcGg1dnlPeVVOYTVYUEZUQ0NCakRF?= =?utf-8?B?cG9JRWhHOG1LVzhLdkpsRjFiR3h2bzRlNGhUckY4QUFFTlNSb3lhRlZtSmdF?= =?utf-8?B?YWsvZVIyamt4bnoxVnplWjVvSGdaQ054OWtMbEZxbzE5TXVsSlFoSGpmNm1F?= =?utf-8?B?NlBsd2ZmcEx1cVVTUDNGWjcvUEErdFNOSkRwbG9TSXFRV2pQN2JjZkhRSzU3?= =?utf-8?B?Wi9GQWI2YmhtdytZbXl5UTREQVhmMTBXNml1dWJBZjYrWUFuRWZlS2xDdk1J?= =?utf-8?B?eW5Bc0tFZng3bHNKRlBqV1ZmQXo4c2Y1S3RKcHBWTjdLT0xDV1VOd0I0R0NH?= =?utf-8?B?VjkwRFUxSnF5aEhhaG9aOWxidXBWM0p3UUx5Mlc4RE1nOHZhdHFQSWJDUWhv?= =?utf-8?B?WExhT0Nud0pjMWd5QVV0U1VjZWQvdWExRXB6MGVONGtTbS9MSS9rdGNncGdz?= =?utf-8?B?TmdhMmJTNUkwc0J6YnNzcVlKVlE1NTZwMWxiY2QyK25IUnVTSzZxYmg0U1pm?= =?utf-8?B?T2pjSEVXTTFtSmY2VkkvL09nVklUSitvNUJTZUNQWGkvZmdkVFFnMVZJc0FJ?= =?utf-8?B?UW53d0M1S0NZVlI1V1QwQnRocW1Ba24wWVdFdHI5QWIzQUdLSytQV0x6RkN5?= =?utf-8?B?TXhuWnN6cFZ6QzlMcU9RRjlMV3NSb2JSbkIyUklVUStHaXNoK25Nc2ZiejRk?= =?utf-8?B?RWZhYkRtdGhNb0JKZnRlWTZKQUp2WnJuYk9HTjQ1VWdRSEpqVGhLQnVlL1RM?= =?utf-8?B?dkwwRWMvK0hBb0tRclZUOXlObWprSmxkaC9vb0l2bXY5ZGNMcS9vSnMwL3JB?= =?utf-8?B?RnJPOTU3aFROQXk0cjdXbzBhditnME5la0RMWG85bU1ZdUFHYzAyZmZCUlFl?= =?utf-8?B?TnUrd0dYY1g3YjZyRmNCR1dKM0N0ZWJHWmlLN0Yxei9oVnBLVDFyalFkb0dX?= =?utf-8?B?YzlUSjVOYkk2allzRFVLNzJwZDQxdVlHUGNsR2dYSHRGODluazFzMnV4QWtV?= =?utf-8?B?Tjdla3VoQ3NDZ25sNkx1SmJJaFUvZDZTREhnWlVzdkhKenJlazZhUmErL1U2?= =?utf-8?B?WU1URXZGN04rUW9qQVJWU3RGTTRtVFhhajA2NTRuY2NBbjFVMGFZNFNGTEcv?= =?utf-8?B?RXQ2QT09?= X-Microsoft-Exchange-Diagnostics: 1;DF4PR84MB0316;5:mhmeXf56eziILp+fk2UaCCHaolh6umYA8OKu17kVKPXy44ObK/mnm5T6+uI99bG05/jf7opgh37+pew5pDxd0MZf2jMdSEdgxYG4YGJH0yPiJp/wQP+WBlGPYM3h3gngHO0JkVbIiReddt+TO3wDYA==;24:q0xV48wZqUzHTK000sezQEj7WtOq0rK8Lk6wEXDkHCXgaoeGUzFqNO59JoBWznfPETgPL4FSBJGR/t51yqMIFQ+a4hta+bwz/R5nuh4azCA= SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 07 Apr 2016 15:07:46.5948 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DF4PR84MB0316 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4421 Lines: 93 On 04/07/2016 12:58 AM, Andy Lutomirski wrote: > On Wed, Apr 6, 2016 at 7:02 AM, Waiman Long wrote: >> On a large system with many CPUs, using HPET as the clock source can >> have a significant impact on the overall system performance because >> of the following reasons: >> 1) There is a single HPET counter shared by all the CPUs. >> 2) HPET counter reading is a very slow operation. >> >> Using HPET as the default clock source may happen when, for example, >> the TSC clock calibration exceeds the allowable tolerance. Something >> the performance slowdown can be so severe that the system may crash >> because of a NMI watchdog soft lockup, for example. >> >> This patch attempts to reduce HPET read contention by using the fact >> that if more than one task are trying to access HPET at the same time, >> it will be more efficient if one task in the group reads the HPET >> counter and shares it with the rest of the group instead of each >> group member reads the HPET counter individually. >> >> This is done by using a combination word with a sequence number and >> a bit lock. The task that gets the bit lock will be responsible for >> reading the HPET counter and update the sequence number. The others >> will monitor the change in sequence number and grab the HPET counter >> accordingly. >> >> On a 4-socket Haswell-EX box with 72 cores (HT off), running the >> AIM7 compute workload (1500 users) on a 4.6-rc1 kernel (HZ=1000) >> with and without the patch has the following performance numbers >> (with HPET or TSC as clock source): >> >> TSC = 646515 jobs/min >> HPET w/o patch = 566708 jobs/min >> HPET with patch = 638791 jobs/min >> >> The perf profile showed a reduction of the %CPU time consumed by >> read_hpet from 4.99% without patch to 1.41% with patch. >> >> On a 16-socket IvyBridge-EX system with 240 cores (HT on), on the >> other hand, the performance numbers of the same benchmark were: >> >> TSC = 3145329 jobs/min >> HPET w/o patch = 1108537 jobs/min >> HPET with patch = 3019934 jobs/min >> >> The corresponding perf profile showed a drop of CPU consumption of >> the read_hpet function from more than 34% to just 2.96%. >> >> Signed-off-by: Waiman Long >> --- >> arch/x86/kernel/hpet.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++- >> 1 files changed, 109 insertions(+), 1 deletions(-) >> >> diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c >> index a1f0e4a..9e3de73 100644 >> --- a/arch/x86/kernel/hpet.c >> +++ b/arch/x86/kernel/hpet.c >> @@ -759,11 +759,112 @@ static int hpet_cpuhp_notify(struct notifier_block *n, >> #endif >> >> /* >> + * Reading the HPET counter is a very slow operation. If a large number of >> + * CPUs are trying to access the HPET counter simultaneously, it can cause >> + * massive delay and slow down system performance dramatically. This may >> + * happen when HPET is the default clock source instead of TSC. For a >> + * really large system with hundreds of CPUs, the slowdown may be so >> + * severe that it may actually crash the system because of a NMI watchdog >> + * soft lockup, for example. >> + * >> + * If multiple CPUs are trying to access the HPET counter at the same time, >> + * we don't actually need to read the counter multiple times. Instead, the >> + * other CPUs can use the counter value read by the first CPU in the group. >> + * >> + * A sequence number whose lsb is a lock bit is used to control which CPU >> + * has the right to read the HPET counter directly and which CPUs are going >> + * to get the indirect value read by the lock holder. For the later group, >> + * if the sequence number differs from the expected locked value, they >> + * can assume that the saved HPET value is up-to-date and return it. >> + * >> + * This mechanism is only activated on system with a large number of CPUs. >> + * Currently, it is enabled when nr_cpus> 64. >> + */ > Reading the HPET is so slow that all the atomic ops in the world won't > make a dent. Why not just turn this optimization on unconditionally? > > --Andy I am constantly on the alert that we should not introduce regression on lesser systems like a single socket machine with a few cores. That is why I put the check to conditionally enable this optimization. I have no issue of taking that out and let it be the default as long as no one object. Cheers, Longman