Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932943AbcKPT7U (ORCPT ); Wed, 16 Nov 2016 14:59:20 -0500 Received: from mail-yw0-f172.google.com ([209.85.161.172]:35315 "EHLO mail-yw0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752270AbcKPT7S (ORCPT ); Wed, 16 Nov 2016 14:59:18 -0500 MIME-Version: 1.0 In-Reply-To: <1479324933-8161-1-git-send-email-cmetcalf@mellanox.com> References: <1479324933-8161-1-git-send-email-cmetcalf@mellanox.com> From: John Stultz Date: Wed, 16 Nov 2016 11:59:16 -0800 Message-ID: Subject: Re: [PATCH v2] tile: avoid using clocksource_cyc2ns with absolute cycle count To: Chris Metcalf Cc: Thomas Gleixner , Salman Qazi , Paul Turner , Tony Lindgren , Steven Miao , lkml Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1964 Lines: 47 On Wed, Nov 16, 2016 at 11:35 AM, Chris Metcalf wrote: > For large values of "mult" and long uptimes, the intermediate > result of "cycles * mult" can overflow 64 bits. For example, > the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; > we have mult = 853, and after 208.5 days, we overflow 64 bits. > > Since clocksource_cyc2ns() is intended to be used for relative > cycle counts, not absolute cycle counts, performance is more > importance than accepting a wider range of cycle values. > So, just use mult_frac() directly in tile's sched_clock(). > > Signed-off-by: Chris Metcalf > --- > Blackfin should make a similar change in their sched_clock(). > > arch/tile/kernel/time.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c > index 178989e6d3e3..ea960d660917 100644 > --- a/arch/tile/kernel/time.c > +++ b/arch/tile/kernel/time.c > @@ -218,8 +218,8 @@ void do_timer_interrupt(struct pt_regs *regs, int fault_num) > */ > unsigned long long sched_clock(void) > { > - return clocksource_cyc2ns(get_cycles(), > - sched_clock_mult, SCHED_CLOCK_SHIFT); > + return mult_frac(get_cycles(), > + sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT); > } So... looking closer at mult_frac(), its a really slow implementation, doing 2 divs and a mod and a mult. Hopefully the compiler can sort out the divs are power of two, and optimize it out, but I'm still hesitant. sched_clock() is normally a very hot-path call, so this might have a real performance impact, especially compared to what its replacing. In your earlier patch, you mentioned this was similar to 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow in sched_clock"). It might be better to actually try to use similar logic there, to make sure the performance impact is minimal. thanks -john