Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965399AbdDZStZ (ORCPT ); Wed, 26 Apr 2017 14:49:25 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:34746 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965370AbdDZStR (ORCPT ); Wed, 26 Apr 2017 14:49:17 -0400 Date: Wed, 26 Apr 2017 20:49:13 +0200 From: Ingo Molnar To: Frederic Weisbecker Cc: Thomas Gleixner , LKML , Peter Zijlstra , Rik van Riel , James Hartsock , Tim Wright , Pavel Machek , Mike Galbraith Subject: Re: [PATCH 0/2] nohz: Deal with clock reprogram skipping issues v2 Message-ID: <20170426184913.pfgcuxcjuyigk4oe@gmail.com> References: <1492783255-5051-1-git-send-email-fweisbec@gmail.com> <20170424080835.22yjqtj6xkynx3nm@gmail.com> <20170424140436.GD21353@lerouge> <20170424144523.smldipadtlukkpoc@gmail.com> <20170426145514.GB16523@lerouge> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170426145514.GB16523@lerouge> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2539 Lines: 66 * Frederic Weisbecker wrote: > On Mon, Apr 24, 2017 at 04:45:23PM +0200, Ingo Molnar wrote: > > > > * Frederic Weisbecker wrote: > > > > > On Mon, Apr 24, 2017 at 10:08:35AM +0200, Ingo Molnar wrote: > > > > > > > > * Frederic Weisbecker wrote: > > > > > > > > > As suggested by Thomas Gleixner, the second patch now integrates > > > > > a fix in case the sanity check fails and the clockevent isn't programmed > > > > > as expected. > > > > > > > > > > Frederic Weisbecker (2): > > > > > nohz: Fix again collision between tick and other hrtimers > > > > > tick: Make sure tick timer is active when bypassing reprogramming > > > > > > > > > > kernel/time/tick-sched.c | 33 ++++++++++++++++++++++++++++++--- > > > > > kernel/time/tick-sched.h | 2 ++ > > > > > 2 files changed, 32 insertions(+), 3 deletions(-) > > > > > > > > So I think one of these is causing a new warning on latest -tip: > > > > > > > > [ 333.341756] ------------[ cut here ]------------ > > > > [ 333.346404] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:874 __tick_nohz_idle_enter+0x461/0x490 > > > > > > Oh I'll never be done with that bug :) > > > > > > Ok I just booted your config with tip/master and didn't see the warning. > > > But the boot seem to be stalled some time after mounting the root fs. > > > > > > Can you please try the following patch and tell me what it returns to you? > > > > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > > index c47d135..6d72e8b 100644 > > > --- a/kernel/time/tick-sched.c > > > +++ b/kernel/time/tick-sched.c > > > @@ -872,6 +872,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, > > > goto out; > > > > > > WARN_ON_ONCE(1); > > > + printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu\n", basemono, ts->next_tick, dev->next_event); > > > } > > > > > > > Here's what it prints: > > > > [ 707.251791] basemono: 706016000000 ts->next_tick: 693216000000 dev->next_event: 706016406127 > > So weird... > > Ok I'm going to need serious traces. Can you please add this boot option? > > trace_event=hrtimer_cancel,hrtimer_start,hrtimer_expire_entry Sorry, don't have the time for extensive traces this close to the merge window - but are you sure you cannot reproduce it? The warning popped up on all 3 test systems I tried (two Intel servers, one AMD server), and it also hit Mike's server - with a fairly regular distro-ish config. Thanks, Ingo