Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934163AbcKILaN (ORCPT ); Wed, 9 Nov 2016 06:30:13 -0500 Received: from mail.linuxfoundation.org ([140.211.169.12]:54260 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933464AbcKILGn (ORCPT ); Wed, 9 Nov 2016 06:06:43 -0500 From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Ashton Holmes , Michael Thayer , Thomas Gleixner , Michal Necasek , Peter Zijlstra , knut.osmundsen@oracle.com, stern@rowland.harvard.edu, rt@linutronix.de Subject: [PATCH 4.8 042/138] timers: Prevent base clock rewind when forwarding clock Date: Wed, 9 Nov 2016 11:45:25 +0100 Message-Id: <20161109102846.606485321@linuxfoundation.org> X-Mailer: git-send-email 2.10.2 In-Reply-To: <20161109102844.808685475@linuxfoundation.org> References: <20161109102844.808685475@linuxfoundation.org> User-Agent: quilt/0.64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3804 Lines: 110 4.8-stable review patch. If anyone has any objections, please let me know. ------------------ From: Thomas Gleixner commit 041ad7bc758db259bb960ef795197dd14aab19a6 upstream. Ashton and Michael reported, that kernel versions 4.8 and later suffer from USB timeouts which are caused by the timer wheel rework. This is caused by a bug in the base clock forwarding mechanism, which leads to timers expiring early. The scenario which leads to this is: run_timers() while (jiffies >= base->clk) { collect_expired_timers(); base->clk++; expire_timers(); } So base->clk = jiffies + 1. Now the cpu goes idle: idle() get_next_timer_interrupt() nextevt = __next_time_interrupt(); if (time_after(nextevt, base->clk)) base->clk = jiffies; jiffies has not advanced since run_timers(), so this assignment effectively decrements base->clk by one. base->clk is the index into the timer wheel arrays. So let's assume the following state after the base->clk increment in run_timers(): jiffies = 0 base->clk = 1 A timer gets enqueued with an expiry delta of 63 ticks (which is the case with the USB timeout and HZ=250) so the resulting bucket index is: base->clk + delta = 1 + 63 = 64 The timer goes into the first wheel level. The array size is 64 so it ends up in bucket 0, which is correct as it takes 63 ticks to advance base->clk to index into bucket 0 again. If the cpu goes idle before jiffies advance, then the bug in the forwarding mechanism sets base->clk back to 0, so the next invocation of run_timers() at the next tick will index into bucket 0 and therefore expire the timer 62 ticks too early. Instead of blindly setting base->clk to jiffies we must make the forwarding conditional on jiffies > base->clk, but we cannot use jiffies for this as we might run into the following issue: if (time_after(jiffies, base->clk) { if (time_after(nextevt, base->clk)) base->clk = jiffies; jiffies can increment between the check and the assigment far enough to advance beyond nextevt. So we need to use a stable value for checking. get_next_timer_interrupt() has the basej argument which is the jiffies value snapshot taken in the calling code. So we can just that. Thanks to Ashton for bisecting and providing trace data! Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible") Reported-by: Ashton Holmes Reported-by: Michael Thayer Signed-off-by: Thomas Gleixner Cc: Michal Necasek Cc: Peter Zijlstra Cc: knut.osmundsen@oracle.com Cc: stern@rowland.harvard.edu Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161022110552.175308322@linutronix.de Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- kernel/time/timer.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1510,12 +1510,16 @@ u64 get_next_timer_interrupt(unsigned lo is_max_delta = (nextevt == base->clk + NEXT_TIMER_MAX_DELTA); base->next_expiry = nextevt; /* - * We have a fresh next event. Check whether we can forward the base: + * We have a fresh next event. Check whether we can forward the + * base. We can only do that when @basej is past base->clk + * otherwise we might rewind base->clk. */ - if (time_after(nextevt, jiffies)) - base->clk = jiffies; - else if (time_after(nextevt, base->clk)) - base->clk = nextevt; + if (time_after(basej, base->clk)) { + if (time_after(nextevt, basej)) + base->clk = basej; + else if (time_after(nextevt, base->clk)) + base->clk = nextevt; + } if (time_before_eq(nextevt, basej)) { expires = basem;