Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757481Ab2ECO67 (ORCPT ); Thu, 3 May 2012 10:58:59 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:55935 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755939Ab2ECO5J (ORCPT ); Thu, 3 May 2012 10:57:09 -0400 From: Gilad Ben-Yossef To: linux-kernel@vger.kernel.org Cc: Gilad Ben-Yossef , Thomas Gleixner , Tejun Heo , John Stultz , Andrew Morton , KOSAKI Motohiro , Mel Gorman , Mike Frysinger , David Rientjes , Hugh Dickins , Minchan Kim , Konstantin Khlebnikov , Christoph Lameter , Chris Metcalf , Hakan Akkan , Max Krasnyansky , Frederic Weisbecker , linux-mm@kvack.org Subject: [PATCH v1 1/6] timer: make __next_timer_interrupt explicit about no future event Date: Thu, 3 May 2012 17:55:57 +0300 Message-Id: <1336056962-10465-2-git-send-email-gilad@benyossef.com> X-Mailer: git-send-email 1.7.0.4 In-Reply-To: <1336056962-10465-1-git-send-email-gilad@benyossef.com> References: <1336056962-10465-1-git-send-email-gilad@benyossef.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4785 Lines: 142 Current timer code fails to correctly return a value meaning that there is no future timer event, with the result that the timer keeps getting re-armed in HZ one shot mode even when we could turn it off, generating unneeded interrupts. This patch attempts to fix this problem. What is happening is that when __next_timer_interrupt() wishes to return a value that signifies "there is no future timer event", it returns (base->timer_jiffies + NEXT_TIMER_MAX_DELTA). However, the code in tick_nohz_stop_sched_tick(), which called __next_timer_interrupt() via get_next_timer_interrupt(), compares the return value to (last_jiffies + NEXT_TIMER_MAX_DELTA) to see if the timer needs to be re-armed. base->timer_jiffies != last_jiffies and so tick_nohz_stop_sched_tick() interperts the return value as indication that there is a distant future event 12 days from now and programs the timer to fire next after KIME_MAX nsecs instead of avoiding to arm it. This ends up causesing a needless interrupt once every KTIME_MAX nsecs. I've noticed a similar but slightly different fix to the same problem in the Tilera kernel tree from Chris M. (I've wrote this before seeing that one), so some variation of this fix is in use on real hardware for some time now. Signed-off-by: Gilad Ben-Yossef CC: Thomas Gleixner CC: Tejun Heo CC: John Stultz CC: Andrew Morton CC: KOSAKI Motohiro CC: Mel Gorman CC: Mike Frysinger CC: David Rientjes CC: Hugh Dickins CC: Minchan Kim CC: Konstantin Khlebnikov CC: Christoph Lameter CC: Chris Metcalf CC: Hakan Akkan CC: Max Krasnyansky CC: Frederic Weisbecker CC: linux-kernel@vger.kernel.org CC: linux-mm@kvack.org --- kernel/timer.c | 31 +++++++++++++++++++++---------- 1 files changed, 21 insertions(+), 10 deletions(-) diff --git a/kernel/timer.c b/kernel/timer.c index a297ffc..32ba64a 100644 --- a/kernel/timer.c +++ b/kernel/timer.c @@ -1187,11 +1187,13 @@ static inline void __run_timers(struct tvec_base *base) * is used on S/390 to stop all activity when a CPU is idle. * This function needs to be called with interrupts disabled. */ -static unsigned long __next_timer_interrupt(struct tvec_base *base) +static bool __next_timer_interrupt(struct tvec_base *base, + unsigned long *next_timer) { unsigned long timer_jiffies = base->timer_jiffies; unsigned long expires = timer_jiffies + NEXT_TIMER_MAX_DELTA; - int index, slot, array, found = 0; + int index, slot, array; + bool found = false; struct timer_list *nte; struct tvec *varray[4]; @@ -1202,12 +1204,12 @@ static unsigned long __next_timer_interrupt(struct tvec_base *base) if (tbase_get_deferrable(nte->base)) continue; - found = 1; + found = true; expires = nte->expires; /* Look at the cascade bucket(s)? */ if (!index || slot < index) goto cascade; - return expires; + goto out; } slot = (slot + 1) & TVR_MASK; } while (slot != index); @@ -1233,7 +1235,7 @@ cascade: if (tbase_get_deferrable(nte->base)) continue; - found = 1; + found = true; if (time_before(nte->expires, expires)) expires = nte->expires; } @@ -1245,7 +1247,7 @@ cascade: /* Look at the cascade bucket(s)? */ if (!index || slot < index) break; - return expires; + goto out; } slot = (slot + 1) & TVN_MASK; } while (slot != index); @@ -1254,7 +1256,10 @@ cascade: timer_jiffies += TVN_SIZE - index; timer_jiffies >>= TVN_BITS; } - return expires; +out: + if (found) + *next_timer = expires; + return found; } /* @@ -1317,9 +1322,15 @@ unsigned long get_next_timer_interrupt(unsigned long now) if (cpu_is_offline(smp_processor_id())) return now + NEXT_TIMER_MAX_DELTA; spin_lock(&base->lock); - if (time_before_eq(base->next_timer, base->timer_jiffies)) - base->next_timer = __next_timer_interrupt(base); - expires = base->next_timer; + if (time_before_eq(base->next_timer, base->timer_jiffies)) { + + if (__next_timer_interrupt(base, &expires)) + base->next_timer = expires; + else + expires = now + NEXT_TIMER_MAX_DELTA; + } else + expires = base->next_timer; + spin_unlock(&base->lock); if (time_before_eq(expires, now)) -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/