Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935352AbcLMTx1 (ORCPT ); Tue, 13 Dec 2016 14:53:27 -0500 Received: from mx2.suse.de ([195.135.220.15]:54211 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935230AbcLMTxP (ORCPT ); Tue, 13 Dec 2016 14:53:15 -0500 X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "References" From: Jiri Slaby To: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Chris Metcalf , Jiri Slaby Subject: [PATCH 3.12 15/38] tile: avoid using clocksource_cyc2ns with absolute cycle count Date: Tue, 13 Dec 2016 20:52:41 +0100 Message-Id: <2323e1a9edc8304ed4fc9c3039dfbaf25dbb2f6b.1481658746.git.jslaby@suse.cz> X-Mailer: git-send-email 2.11.0 In-Reply-To: <15034b96ec06ee859b67c6cd4e3be569a4ef286b.1481658746.git.jslaby@suse.cz> References: <15034b96ec06ee859b67c6cd4e3be569a4ef286b.1481658746.git.jslaby@suse.cz> In-Reply-To: References: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2809 Lines: 67 From: Chris Metcalf 3.12-stable review patch. If anyone has any objections, please let me know. =============== commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream. For large values of "mult" and long uptimes, the intermediate result of "cycles * mult" can overflow 64 bits. For example, the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; we have mult = 853, and after 208.5 days, we overflow 64 bits. Since clocksource_cyc2ns() is intended to be used for relative cycle counts, not absolute cycle counts, performance is more importance than accepting a wider range of cycle values. So, just use mult_frac() directly in tile's sched_clock(). Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow in sched_clock") by Salman Qazi results in essentially the same generated code for x86 as this change does for tile. In fact, a follow-on change by Salman introduced mult_frac() and switched to using it, so the C code was largely identical at that point too. Peter Zijlstra then added mul_u64_u32_shr() and switched x86 to use it. This is, in principle, better; by optimizing the 64x64->64 multiplies to be 32x32->64 multiplies we can potentially save some time. However, the compiler piplines the 64x64->64 multiplies pretty well, and the conditional branch in the generic mul_u64_u32_shr() causes some bubbles in execution, with the result that it's pretty much a wash. If tilegx provided its own implementation of mul_u64_u32_shr() without the conditional branch, we could potentially save 3 cycles, but that seems like small gain for a fair amount of additional build scaffolding; no other platform currently provides a mul_u64_u32_shr() override, and tile doesn't currently have an header to put the override in. Additionally, gcc currently has an optimization bug that prevents it from recognizing the opportunity to use a 32x32->64 multiply, and so the result would be no better than the existing mult_frac() until such time as the compiler is fixed. For now, just using mult_frac() seems like the right answer. Signed-off-by: Chris Metcalf Signed-off-by: Jiri Slaby --- arch/tile/kernel/time.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c index 5d10642db63e..1edda6603cd9 100644 --- a/arch/tile/kernel/time.c +++ b/arch/tile/kernel/time.c @@ -216,8 +216,8 @@ void do_timer_interrupt(struct pt_regs *regs, int fault_num) */ unsigned long long sched_clock(void) { - return clocksource_cyc2ns(get_cycles(), - sched_clock_mult, SCHED_CLOCK_SHIFT); + return mult_frac(get_cycles(), + sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT); } int setup_profiling_timer(unsigned int multiplier) -- 2.11.0