Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753296AbcKPBKz (ORCPT ); Tue, 15 Nov 2016 20:10:55 -0500 Received: from mail-oi0-f51.google.com ([209.85.218.51]:34548 "EHLO mail-oi0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751481AbcKPBKy (ORCPT ); Tue, 15 Nov 2016 20:10:54 -0500 MIME-Version: 1.0 In-Reply-To: References: <1479152569-16890-1-git-send-email-cmetcalf@mellanox.com> From: John Stultz Date: Tue, 15 Nov 2016 17:10:53 -0800 Message-ID: Subject: Re: [PATCH] time: Avoid signed overflow in timekeeping_delta_to_ns() To: Thomas Gleixner Cc: Chris Metcalf , Laurent Vivier , David Gibson , "Christopher S . Hall" , lkml , Liav Rehana Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2898 Lines: 67 On Tue, Nov 15, 2016 at 2:03 PM, John Stultz wrote: > On Tue, Nov 15, 2016 at 1:53 PM, Thomas Gleixner wrote: >> On Mon, 14 Nov 2016, John Stultz wrote: >> >>> On Mon, Nov 14, 2016 at 11:42 AM, Chris Metcalf wrote: >>> > This bugfix was originally made in commit 35a4933a8959 ("time: >>> > Avoid signed overflow in timekeeping_get_ns()"). When the code was >>> > refactored in commit 6bd58f09e1d8 ("time: Add cycles to nanoseconds >>> > translation") the signed overflow fix was lost. Re-introduce it. >>> > >>> > Signed-off-by: Chris Metcalf >>> > --- >>> > I happened to be looking for an unrelated fix, found this code, >>> > realized the tip code didn't match the fixed code, and >>> > backtracked to where it had gone away. >>> > >>> > kernel/time/timekeeping.c | 3 +-- >>> > 1 file changed, 1 insertion(+), 2 deletions(-) >>> > >>> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c >>> > index 37dec7e3db43..57926bc7b7f3 100644 >>> > --- a/kernel/time/timekeeping.c >>> > +++ b/kernel/time/timekeeping.c >>> > @@ -304,8 +304,7 @@ static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr, >>> > { >>> > s64 nsec; >>> > >>> > - nsec = delta * tkr->mult + tkr->xtime_nsec; >>> > - nsec >>= tkr->shift; >>> > + nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift; >>> >>> Ugh. >>> >>> So... I think this proves the original fix was *far* too subtle to >>> maintain. So I think reintroducing it as-is doesn't protect us from >>> undoing it. If the problem is really using the signed intermediate >>> nsec value, we should get rid of that. >> >> As I told the other guy who submitted something similar: This is not really >> helpful. It merily drags the overflow case out by a factor of 2. > > Well... So lost time (where a VM/gdb caused stall runs past the > clocksource or causes an mult overflow) is a bit less problematic then > getting a huge negative nsec value. > >> So we really need to figure out under which circumstances this can happen >> and fixup either the callsites or detect the condition right there, which >> I'd like to avoid for the hotpath. > > I get that catching the (delta > TOOBIG) case, but even then I'm not > sure how we deal that condition in a way that results in anything > meaningfully different from the less-problematic unsigned overflow > (ie, capping it). So I think I'm going to queue up Liav's fix here, as it has been in my TOQUEUE folder for a bit longer. Thomas: I know you didn't like it when it was originally submitted, preferring to catch the case when it happens, but the signed shift is more problematic. Additionally, the CONFIG_DEBUG_TIMEKEEPING checks should already warn on the next tick when this case triggers (when the offset is larger then max_cycles). Sound ok? thanks -john