Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753741AbcLILPn (ORCPT ); Fri, 9 Dec 2016 06:15:43 -0500 Received: from terminus.zytor.com ([198.137.202.10]:42904 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752641AbcLILPk (ORCPT ); Fri, 9 Dec 2016 06:15:40 -0500 Date: Fri, 9 Dec 2016 03:13:12 -0800 From: tip-bot for Thomas Gleixner Message-ID: Cc: cmetcalf@mellanox.com, linux-kernel@vger.kernel.org, tglx@linutronix.de, david@gibson.dropbear.id.au, john.stultz@linaro.org, christopher.s.hall@intel.com, mingo@kernel.org, lvivier@redhat.com, prarit@redhat.com, hpa@zytor.com, liavr@mellanox.com, richardcochran@gmail.com, peterz@infradead.org Reply-To: lvivier@redhat.com, liavr@mellanox.com, peterz@infradead.org, richardcochran@gmail.com, hpa@zytor.com, prarit@redhat.com, cmetcalf@mellanox.com, linux-kernel@vger.kernel.org, christopher.s.hall@intel.com, mingo@kernel.org, john.stultz@linaro.org, david@gibson.dropbear.id.au, tglx@linutronix.de In-Reply-To: <20161208204228.688545601@linutronix.de> References: <20161208204228.688545601@linutronix.de> To: linux-tip-commits@vger.kernel.org Subject: [tip:timers/core] timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion Git-Commit-ID: 9c1645727b8fa90d07256fdfcc45bf831242a3ab X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5996 Lines: 140 Commit-ID: 9c1645727b8fa90d07256fdfcc45bf831242a3ab Gitweb: http://git.kernel.org/tip/9c1645727b8fa90d07256fdfcc45bf831242a3ab Author: Thomas Gleixner AuthorDate: Thu, 8 Dec 2016 20:49:32 +0000 Committer: Thomas Gleixner CommitDate: Fri, 9 Dec 2016 12:06:41 +0100 timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion The clocksource delta to nanoseconds conversion is using signed math, but the delta is unsigned. This makes the conversion space smaller than necessary and in case of a multiplication overflow the conversion can become negative. The conversion is done with scaled math: s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift; Shifting a signed integer right obvioulsy preserves the sign, which has interesting consequences: - Time jumps backwards - __iter_div_u64_rem() which is used in one of the calling code pathes will take forever to piecewise calculate the seconds/nanoseconds part. This has been reported by several people with different scenarios: David observed that when stopping a VM with a debugger: "It was essentially the stopped by debugger case. I forget exactly why, but the guest was being explicitly stopped from outside, it wasn't just scheduling lag. I think it was something in the vicinity of 10 minutes stopped." When lifting the stop the machine went dead. The stopped by debugger case is not really interesting, but nevertheless it would be a good thing not to die completely. But this was also observed on a live system by Liav: "When the OS is too overloaded, delta will get a high enough value for the msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so after the shift the nsec variable will gain a value similar to 0xffffffffff000000." Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation"). It had been fixed a year ago already in commit 35a4933a8959 ("time: Avoid signed overflow in timekeeping_get_ns()"). Though it's not surprising that the issue has been reintroduced because the function itself and the whole call chain uses s64 for the result and the propagation of it. The change in this recent commit is subtle: s64 nsec; - nsec = (d * m + n) >> s: + nsec = d * m + n; + nsec >>= s; d being type of cycle_t adds another level of obfuscation. This wouldn't have happened if the previous change to unsigned computation would have made the 'nsec' variable u64 right away and a follow up patch had cleaned up the whole call chain. There have been patches submitted which basically did a revert of the above patch leaving everything else unchanged as signed. Back to square one. This spawned a admittedly pointless discussion about potential users which rely on the unsigned behaviour until someone pointed out that it had been fixed before. The changelogs of said patches added further confusion as they made finally false claims about the consequences for eventual users which expect signed results. Despite delta being cycle_t, aka. u64, it's very well possible to hand in a signed negative value and the signed computation will happily return the correct result. But nobody actually sat down and analyzed the code which was added as user after the propably unintended signed conversion. Though in sensitive code like this it's better to analyze it proper and make sure that nothing relies on this than hunting the subtle wreckage half a year later. After analyzing all call chains it stands that no caller can hand in a negative value (which actually would work due to the s64 cast) and rely on the signed math to do the right thing. Change the conversion function to unsigned math. The conversion of all call chains is done in a follow up patch. This solves the starvation issue, which was caused by the negative result, but it does not solve the underlying problem. It merily procrastinates it. When the timekeeper update is deferred long enough that the unsigned multiplication overflows, then time going backwards is observable again. It does neither solve the issue of clocksources with a small counter width which will wrap around possibly several times and cause random time stamps to be generated. But those are usually not found on systems used for virtualization, so this is likely a non issue. I took the liberty to claim authorship for this simply because analyzing all callsites and writing the changelog took substantially more time than just making the simple s/s64/u64/ change and ignore the rest. Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation") Reported-by: David Gibson Reported-by: Liav Rehana Signed-off-by: Thomas Gleixner Reviewed-by: David Gibson Acked-by: Peter Zijlstra (Intel) Cc: Parit Bhargava Cc: Laurent Vivier Cc: "Christopher S. Hall" Cc: Chris Metcalf Cc: Richard Cochran Cc: John Stultz Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de Signed-off-by: Thomas Gleixner --- kernel/time/timekeeping.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index b2286e9..bfe589e 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -299,10 +299,10 @@ u32 (*arch_gettimeoffset)(void) = default_arch_gettimeoffset; static inline u32 arch_gettimeoffset(void) { return 0; } #endif -static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr, +static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, cycle_t delta) { - s64 nsec; + u64 nsec; nsec = delta * tkr->mult + tkr->xtime_nsec; nsec >>= tkr->shift;