Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755520AbdC2J4j (ORCPT ); Wed, 29 Mar 2017 05:56:39 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:34638 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755601AbdC2J4i (ORCPT ); Wed, 29 Mar 2017 05:56:38 -0400 MIME-Version: 1.0 In-Reply-To: <20170328172601.4d17256c@redhat.com> References: <20170323165512.60945ac6@redhat.com> <1490636129.8850.76.camel@redhat.com> <20170328132406.7d23579c@redhat.com> <20170328161454.4a5d9e8b@redhat.com> <1490734912.8850.85.camel@redhat.com> <20170328172601.4d17256c@redhat.com> From: Wanpeng Li Date: Wed, 29 Mar 2017 17:56:30 +0800 Message-ID: Subject: Re: [BUG nohz]: wrong user and system time accounting To: Luiz Capitulino Cc: Rik van Riel , Frederic Weisbecker , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1915 Lines: 49 2017-03-29 5:26 GMT+08:00 Luiz Capitulino : > On Tue, 28 Mar 2017 17:01:52 -0400 > Rik van Riel wrote: > >> On Tue, 2017-03-28 at 16:14 -0400, Luiz Capitulino wrote: >> > On Tue, 28 Mar 2017 13:24:06 -0400 >> > Luiz Capitulino wrote: >> > > I'm starting to suspect that the nohz code may be programming >> > > the tick period to be shorter than 1ms when it re-activates >> > > the tick. >> > >> > And I think I was right, it looks like the nohz code is programming >> > the tick period incorrectly when restarting the tick. The patch below >> > fixes things for me, but I still have some homework todo and more >> > testing before posting a patch for inclusion. Could you guys test it? >> >> Your patch seems to work. I don't claim to understand why >> your patch makes a difference, but for this particular test >> case, on this particular setup, it seems to work... > > I don't fully understand why either yet. I was looking for places > where nohz might be programming the tick period incorrectly and The bug is still present when I config CONTEXT_TRACKING_FORCE and nohz=off in the boot parameter. Regards, Wanpeng Li > I found that there's a case in tick_nohz_stop_sched_tick() where > tick_nohz_restart() is called only to reprogram the tick timer, > not cancel the tick. In this case, ts->last_tick seems to be out > of date. Fixing this fixed accounting for me. > >> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c >> > index 7fe53be..9abe979 100644 >> > --- a/kernel/time/tick-sched.c >> > +++ b/kernel/time/tick-sched.c >> > @@ -1152,6 +1152,7 @@ static enum hrtimer_restart >> > tick_sched_timer(struct hrtimer *timer) >> > struct pt_regs *regs = get_irq_regs(); >> > ktime_t now = ktime_get(); >> > >> > + ts->last_tick = now; >> > tick_sched_do_timer(now); >> > >> > /* >> >