Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S638303AbdDZOz0 (ORCPT ); Wed, 26 Apr 2017 10:55:26 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:35940 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S638183AbdDZOzT (ORCPT ); Wed, 26 Apr 2017 10:55:19 -0400 Date: Wed, 26 Apr 2017 16:55:15 +0200 From: Frederic Weisbecker To: Ingo Molnar Cc: Thomas Gleixner , LKML , Peter Zijlstra , Rik van Riel , James Hartsock , Tim Wright , Pavel Machek Subject: Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v2 Message-ID: <20170426145514.GB16523@lerouge> References: <1492783255-5051-1-git-send-email-fweisbec@gmail.com> <20170424080835.22yjqtj6xkynx3nm@gmail.com> <20170424140436.GD21353@lerouge> <20170424144523.smldipadtlukkpoc@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170424144523.smldipadtlukkpoc@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3787 Lines: 104 On Mon, Apr 24, 2017 at 04:45:23PM +0200, Ingo Molnar wrote: > > * Frederic Weisbecker wrote: > > > On Mon, Apr 24, 2017 at 10:08:35AM +0200, Ingo Molnar wrote: > > > > > > * Frederic Weisbecker wrote: > > > > > > > As suggested by Thomas Gleixner, the second patch now integrates > > > > a fix in case the sanity check fails and the clockevent isn't programmed > > > > as expected. > > > > > > > > Frederic Weisbecker (2): > > > > nohz: Fix again collision between tick and other hrtimers > > > > tick: Make sure tick timer is active when bypassing reprogramming > > > > > > > > kernel/time/tick-sched.c | 33 ++++++++++++++++++++++++++++++--- > > > > kernel/time/tick-sched.h | 2 ++ > > > > 2 files changed, 32 insertions(+), 3 deletions(-) > > > > > > So I think one of these is causing a new warning on latest -tip: > > > > > > [ 333.341756] ------------[ cut here ]------------ > > > [ 333.346404] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:874 __tick_nohz_idle_enter+0x461/0x490 > > > > Oh I'll never be done with that bug :) > > > > Ok I just booted your config with tip/master and didn't see the warning. > > But the boot seem to be stalled some time after mounting the root fs. > > > > Can you please try the following patch and tell me what it returns to you? > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index c47d135..6d72e8b 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -872,6 +872,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, > > goto out; > > > > WARN_ON_ONCE(1); > > + printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu\n", basemono, ts->next_tick, dev->next_event); > > } > > > > Here's what it prints: > > [ 707.251791] basemono: 706016000000 ts->next_tick: 693216000000 dev->next_event: 706016406127 So weird... Ok I'm going to need serious traces. Can you please add this boot option? trace_event=hrtimer_cancel,hrtimer_start,hrtimer_expire_entry And please also apply the following (on top of tip/tmp.tmp), it would be interesting to see the resulting trace file from the CPU where the warning triggers. Thanks Ingo! diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index b2df684..b4a6dda 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -156,6 +156,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) * to the same deadline. */ ts->next_tick = 0; + trace_printk("ts->next_tick reset (tick)\n"); } #endif update_process_times(user_mode(regs)); @@ -672,6 +673,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) * cached clock deadline. */ ts->next_tick = 0; + trace_printk("ts->next_tick reset (tick restart)\n"); } static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, @@ -789,6 +791,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, goto out; WARN_ON_ONCE(1); + trace_printk("basemono: %llu ts->next_tick: %llu dev->next_event: %llu\n", basemono, ts->next_tick, dev->next_event); + tracing_stop(); printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu\n", basemono, ts->next_tick, dev->next_event); } @@ -810,6 +814,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, } ts->next_tick = tick; + trace_printk("ts->next_tick = %llu\n", ts->next_tick); /* * If the expiration time == KTIME_MAX, then we simply stop @@ -892,6 +897,7 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts) * deadline if it comes back online later. */ ts->next_tick = 0; + trace_printk("ts->next_tick reset (offline)\n"); return false; }