Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941460AbcKPWtg (ORCPT ); Wed, 16 Nov 2016 17:49:36 -0500 Received: from mail-db5eur01on0069.outbound.protection.outlook.com ([104.47.2.69]:47536 "EHLO EUR01-DB5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753478AbcKPWtd (ORCPT ); Wed, 16 Nov 2016 17:49:33 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=cmetcalf@mellanox.com; Subject: Re: [PATCH v2] tile: avoid using clocksource_cyc2ns with absolute cycle count To: John Stultz References: <1479324933-8161-1-git-send-email-cmetcalf@mellanox.com> CC: Thomas Gleixner , Salman Qazi , "Paul Turner" , Tony Lindgren , Steven Miao , lkml , Peter Zijlstra From: Chris Metcalf Message-ID: Date: Wed, 16 Nov 2016 15:16:59 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [12.216.194.146] X-ClientProxiedBy: DM5PR18CA0002.namprd18.prod.outlook.com (10.175.218.140) To VI1PR0501MB2767.eurprd05.prod.outlook.com (10.172.11.17) X-Microsoft-Exchange-Diagnostics: 1;VI1PR0501MB2767;2:SKY6NZX2pW6aI1Ca0mtg7HQX6E9pBbK0jvuIOaM9KOWbSr5CU7fjEW55uAoQusovQ+ojzH1MTspfEGKYsE/daVXhQteyShTZf8TszsgHMAjVI1j6xpDc9DQ9EKPFnzKXyvO5BZoGshH96rcYncBw882Nmcd7QInEZ/UD3rLMgT4=;3:zcxalgiOxEEhAIFsfFfF5m8TrkCxOBew4G4bSQ9fVKLVHmT8fqo1vEECTgV0rPiu6MbvQOwvAM5Y5zFMBxCaVnlry7OfNMdwnyjHTlFXcyjs5m6AYKcP5AFMPjORl4kY9J3nMK2s1CGNQ/jtayCgxYGFbwak+zVgRIN8396vC8U=;25:laRZZulosn85buF9+RsawhW/Kj4HAXmPf5GZWp4msuVQl3Ptf9ETIoeZ38GySw0hV/zSCo4RuuiTsf3UmxphwMed1Gyb52DCIYQram7MOJYxOsJfqPZdR1C7laprK3plF1Jqa1EB9QSYXgCwmlGSYwUzKn/847H4zO8Mj+YjuaaaDPvQ3WL8RUywIsI1H0c6O1K/Dzauo6SQ/RB2anRXAjrkN9m6Mr74eI/4HfoXIIQvtMQB1+GDxTLK2EBaWA4a+UCN9jNYV7pwDKsUk4IlvvAljL45F/e1AScklVeuNzsyeUE0jLP7IXgd7ajUXJrS9J65clrEugf6agI08FxFYc5c+HKA9dILFxAybcaYOTzvI8jsIS9I8XVf+GZlK0wfWBMQr5KupUL5BY6aH0y0AtWpjq1iGD7deMml4YeSWFIAabDcu/eW1cIdeR8Xci/90MgmgOAYLj0TdoUkztUIrw== X-MS-Office365-Filtering-Correlation-Id: 368541bb-2f95-4770-d974-08d40e5d9085 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:VI1PR0501MB2767; X-Microsoft-Exchange-Diagnostics: 1;VI1PR0501MB2767;31:0CfuSp/EHJpUdQh8KaF8mnJzrFHCMZqSWAgv6BoPvee7mbkVexWPTNzKhhtza/ZHxVrAK/2YQrEpRnt/GGqeIZuJgn9uL1GEzMS9QvvY09Nbz9sg6ggWc6v7/6fO2Mlu38i6rZn5e1ZR1TZORq/4QlTjKY14R6SCryzF6XiX4P429yZBvJ5JM4P+bP5vFq/ypdVjApwClFVuaqheSoK1Q+o6t7lTaDE4NzrAZiZsYHz19krtmTfP2K7JeWwbuk1qzIKKWONqnWbzlPYWzG4KaZNcn0vBWDruP2mRs9IXx5g=;20:skcWBbk6l2FOF8gUEWs/v+3Rz3jxuJX/1pUcTvldgtOa2qfqdwLYKGWyndt5oZ1JoWOBMUMx1Bc1IPafZsuY4yT8JPXK88CKoPHk8odYIKNdpa4c7gJtGAmKULTpiR3fQIpM0zNEk2NiPXBRBvaKWlSb0gyvBmcXbhldzZbA2n97PCqEXZGO8vA+4JHBXMuneRy5bFA+IsUZ5/VgF/5qvy6Z0IExekJ3nbNX48aIZcw3hHtuY/DBssnSt47wZ3a+dr2oamGLO+DIlbPwMcIamInYR1bywdHdy+JoT2XgYfIZLL6gR3GNiHQtlThVRE1lm1kl5KGJIU6HMOWaSnK8iOi7OIw4SdQG6m4S3ax9414sOHBp4h/pPZZBsC2iSPnjdmdrsEExiIACqb3U0ukJrjeIY+nlnLP4QqApsSyPVBE5bM5Fjsas2q8tvOmYOe9tB1i+HkcJcOdBbI1E12/cJIMk8JI3jrHJu1iidF1yboKX8VODUUyJPzyygUcNTz8x X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(171992500451332); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040281)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(6055026)(6041223);SRVR:VI1PR0501MB2767;BCL:0;PCL:0;RULEID:;SRVR:VI1PR0501MB2767; X-Microsoft-Exchange-Diagnostics: 1;VI1PR0501MB2767;4:TqL0JzdDtZwAP8yFSAaV06Xyegz7gHQE9YqMwfvA/sp4/TBPKEZHt9CztyW0+8yZpZKCkYwbL2LTny8IIxBVvVxgQ/a02LvFfBdqzupUVzQj1yMdBu8Wi8gsxwPnuhsJbDrcszfP4YNnpf8unn4vSVnEDR7oA+rj0x68Dk739/693FbedXtX8hXQtD2qMCMoc8ZISsZyMeWIoumAswR2mDCP+PRYyLowNDDs2Z2a6DDfHhTGKl78vkbuvKmpn+FkBr4/2acQ7KeMoBMUAcAplnGXNTpMNYcmp3QQ9IOQvFYdt6SDOOhWxn6B6h3KbpUnMKWUAfNzTzmLd2Folz9etZO4IhFhqktxTPhyf49ORC6GP1S1s1Lyd2jp8gU6kQIVVlIZvXoJ6vjsMDs8oJYr2sY17vqZvLR3u0j/FxrW1flXilYwws09E2tA3eiEt482kO7b2OOE6PXOvDjxdxmj85Rl5VpMdN2ZVum0nkLNSJc= X-Forefront-PRVS: 01283822F8 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6049001)(6009001)(7916002)(24454002)(377454003)(189002)(199003)(3846002)(189998001)(6116002)(105586002)(4326007)(2906002)(36756003)(31686004)(23676002)(42186005)(64126003)(106356001)(101416001)(229853002)(230700001)(76176999)(50466002)(54356999)(50986999)(81166006)(8676002)(81156014)(92566002)(65806001)(33646002)(83506001)(305945005)(7846002)(5660300001)(7736002)(2950100002)(6916009)(65826007)(110136003)(86362001)(77096005)(97736004)(4001350100001)(66066001)(6666003)(65956001)(68736007)(31696002)(47776003)(15760500001)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0501MB2767;H:[10.15.7.187];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtWSTFQUjA1MDFNQjI3Njc7MjM6cWJXcVRzTXBjNU11Zmx5dFB4WHM1T2w4?= =?utf-8?B?WGFGSUxzbkk3SGZpRzhNTXRsUG85R2M4UVJLYzRTd2RtZVA2UTIvZlo0V0ZC?= =?utf-8?B?Qjk3NzFJTUJMUUZITEErdk9iNm94ZDBhKzRoRnVJenltc3RDWXR4QXEwQW42?= =?utf-8?B?eWppSEFKUVNoaUUySjhMSFZqV2JPaEhmWVVxMlhLL25adFdZQXVZRTFjL3Vs?= =?utf-8?B?Qkk1OUVwTGF3LzRKOStSczhJbjFBRjZZSGppVHYyblUwc09xUGkzcUp0YjN4?= =?utf-8?B?S0dJWTAxejdnc0FHYnpOMEJxd0dXSDFtUytHR1ZPOVF2ZjhLK1YwMHEydTA2?= =?utf-8?B?dDNEbnA2ZHNUdGJuNmNDUDFTM2JuMWRwN2dRck5IUEV6RHBEdkgrb2FRWXhh?= =?utf-8?B?VHFTY24zdUZwalphUXdkYTdNOTB2VndjaVJtWXhHenBaZjVnSE9GNWdHQTl4?= =?utf-8?B?RCtOb20wWXprTUJWMEdXc1VCVFFYNHVVUG0yQ1lqT0wrMTlzUWszR3piUTEy?= =?utf-8?B?R0RhYUtMV0l0SHJBZFlHdW1oM21ib0MrQ3J2MEV3djlaelNCdWJaczJCbGxK?= =?utf-8?B?NjhSck5IbUx5dGlRYkl3YlNRdDZ3UGlpS3l2NjZQWW4xYU92dlNhVWlleUVW?= =?utf-8?B?cHpFcTRLU053SnZRNmZJRlZWYVlSOHlEWHNpa3ZMOVhTNDhhZFdibVdPR0hz?= =?utf-8?B?YVY5eVRBQmhYWlowUTNsRXpGRE5ybzBuREkzZUs1QTFaWTY2czg4NWNkdDd1?= =?utf-8?B?MEE0VndVajF2ODRiS2IwRW9LT0EzbjJhc3I4SWNMdk53SkpicmRuOE9NUVBo?= =?utf-8?B?UUJWc2V4cHVVUzdKSTNDcHdHb3NXM0k5Y1l1VmNwSEhHOWRKdzBCUm5oWCtM?= =?utf-8?B?aTM4WHZGUnFVUDNxVGJ0OXVhTm9lcUlzSnhJSW5UVWExZ1ZLQ0lGUksrbUxi?= =?utf-8?B?SGR0RGwrYUJnN2NVTmIvWVMydHRJbStiR2dmbzMwTi82TDFuUWVXNEh4eHJr?= =?utf-8?B?eW12Mk5LaW9XZDhCTUdlMkhyS2pWL3BOU1VrQXdiU0x5TkdqWnJHRmVNU3Zr?= =?utf-8?B?aUtqTUVhQ0E0SFVhNnhLK0J6Z0tveFJ3VFRqTWlpMnpTQU92N0l0TGNSZExt?= =?utf-8?B?c2FYZXpBRno0QUtLYWJ5Qk1QNWtwZlhFN1k1Ym1NSWp0SDN1NEZwalNrK1pY?= =?utf-8?B?QTFxd1RGQzB5UU1uZDFQVjgzeTl4WUdKY3BKdlVNVlg4RkowWG9PMmFMMXNk?= =?utf-8?B?dDl5Z2l2ZEhCU1E3YlFwaGRuMTBNSzExc2tVMjY2S01UWVIyRFQ5YUhGV1RQ?= =?utf-8?B?Z1JUcU01b1hXWHFERkZrSFptL3ZiMDBZYXlLSGZGRTNhbTBQSjgyUWVHK2FS?= =?utf-8?B?NlNjY3h0S1BQckNQcFJ4cXBxUUpZQ2Y2QTVBaWxlbG1QRHNnWWFKZzBMak4v?= =?utf-8?B?K21ma25TRjFBc0lwdjlIalA1bGxic2tJeDFKM3JGMFQvcHQxT0d6ZjhVNThx?= =?utf-8?B?VTdscHJIOGQ4MDdabThaOXkyZHY2WFVtWHpMd0lJQzVsSFVMSEIzOVJJNVJJ?= =?utf-8?B?MytsbWRYYTRBcWNCZmdPVEJEM1BsM25YQzdLWW56RnZsd0pxa0pGQkN6WjVo?= =?utf-8?B?N3oxSEtTMWxnVGw1RW83T3ZXK0F6aTFZZWdHVDlKa0IycEpQSWQ1UTYwcjlH?= =?utf-8?B?Tjd2TERuNExkZDgxQ2Y4eWwyOExYeVZVOGpiTmNDeXVTM3pSVDBwZVBuV01x?= =?utf-8?B?LytQUGdGL1FCOWRjcjR6a3BUOHNHa2tXbEt0ZmxzSUxlSGVvcHh5WE1xRzhC?= =?utf-8?Q?1u8tnNZeZtEZiMA?= X-Microsoft-Exchange-Diagnostics: 1;VI1PR0501MB2767;6:3wopGoVcABreMuTRu14Vi729OvFIGBQC3gnCA3xiCru2QNzvNk7sLBQslpZrdbG2cINzEWsznnwRCoQEjr/dG9wuJp6vwXRXd2gygge033x4FbS2eIrMNXli1CDbkRCKEIn0Nq5DU3K731vn7VtyWw+c/ciOx/Ovp0Q/FtxfrlZFsfMsdPk1DvZnCJsgFVaAtJ0PLKbUzMparMe1gx/stzZsXzuJ85XKNJo0BFjCb5srhHByFZ6k7ZyHSsC/pVc2xyYCFZYL3F44AkuGCfss6DfIFzLJUU1ZNF/L2A8OzrNILJASCfKfWjUoTv0bizZogbVdKUptZ3xutDLIGO6vIQ==;5:jhDaTwJe4nlETT39R6wuO72o/jszQp+aenmptz0JPsMFNohMn/VxdwG1s8pbNcL1kUKBFYz67VZeS5Ws0jhm0JKGnYv2X5g2rV6B5Z/c2I4A7D3W1iLRKPiPz8aoX+2TT9773W+G1rITtznZ1v4kzVTneWUPBzpy/WIWxlC629U=;24:aSxTaLb9qHg8XIoipIZ9Sp5hYt5rkHnlmEDWGJVwyvebIeErrHAvPQe1Asv0xc9evCIt57WlaMcE36KR9FSrhJPB/2Quo3xrOzgkF0EglUU= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;VI1PR0501MB2767;7:Pay78PvlRm5dthRjtAqvWzFx54fQ55VhOkA7gJjpXRr+Bw5fb+tnRcLsueahd6q0TBy7y4U1HK5X52OWHDCNRU2rTSdfPsANkIHJYCC1BZOojaeCdOKL2hUV3Pud7tmuLBwmiEmQj8/bspANKVUCvhLA3GafsKmA8m1JIL5BcAWhp+VPTQ7Qa0hdZy7h6ZcBmjt9kyxRZ6yAl32u8SOVZTthBXOUj9pKlqttKHwHJ7mrxvmXEsCHINmoM1Ha6MBtgjYniYCEjnnt+bKd+KzD7NcQLcG8c0MXmC7/Idxvx9+Tciq0jkrAYUkHOF/jMHisihTsVP81qg46i7hKdOPGpo/IopPU9I+fGgtEzFZFUHo= X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Nov 2016 20:17:16.7075 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0501MB2767 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2915 Lines: 66 On 11/16/2016 2:59 PM, John Stultz wrote: > On Wed, Nov 16, 2016 at 11:35 AM, Chris Metcalf wrote: >> For large values of "mult" and long uptimes, the intermediate >> result of "cycles * mult" can overflow 64 bits. For example, >> the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; >> we have mult = 853, and after 208.5 days, we overflow 64 bits. >> >> Since clocksource_cyc2ns() is intended to be used for relative >> cycle counts, not absolute cycle counts, performance is more >> importance than accepting a wider range of cycle values. >> So, just use mult_frac() directly in tile's sched_clock(). >> >> Signed-off-by: Chris Metcalf >> --- >> Blackfin should make a similar change in their sched_clock(). >> >> arch/tile/kernel/time.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c >> index 178989e6d3e3..ea960d660917 100644 >> --- a/arch/tile/kernel/time.c >> +++ b/arch/tile/kernel/time.c >> @@ -218,8 +218,8 @@ void do_timer_interrupt(struct pt_regs *regs, int fault_num) >> */ >> unsigned long long sched_clock(void) >> { >> - return clocksource_cyc2ns(get_cycles(), >> - sched_clock_mult, SCHED_CLOCK_SHIFT); >> + return mult_frac(get_cycles(), >> + sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT); >> } > So... looking closer at mult_frac(), its a really slow implementation, > doing 2 divs and a mod and a mult. Hopefully the compiler can sort out > the divs are power of two, and optimize it out, but I'm still > hesitant. > > sched_clock() is normally a very hot-path call, so this might have a > real performance impact, especially compared to what its replacing. > > In your earlier patch, you mentioned this was similar to 4cecf6d401a0 > ("sched, x86: Avoid unnecessary overflow in > sched_clock"). It might be better to actually try to use similar logic > there, to make sure the performance impact is minimal. This was the first thing I looked at when I saw the mult_frac() implementation. The modulus operations are indeed converted to bitmasks and the divides to shifts. We do have to do two multiplies instead of one, but that's basically the worst of the cost. Change 4cecf6d401a0 results in essentially identical code for x86 as this proposed change does for tile. In fact a follow-on change by Salman introduced mult_frac() and switched to using it, so it was identical at that point. PeterZ (cc'ed) then improved it to use __int128 math via mul_u64_u32_shr(), but that doesn't help tile; we only do one multiply instead of two, but the multiply is handled by an out-of-line call to __multi3, and the sched_clock() function ends up about 2.5x slower as a result. Thanks for thinking about this! -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com