Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932581AbdC1V0K convert rfc822-to-8bit (ORCPT ); Tue, 28 Mar 2017 17:26:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37811 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752401AbdC1V0J (ORCPT ); Tue, 28 Mar 2017 17:26:09 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 4AE2D81F03 Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=lcapitulino@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 4AE2D81F03 Date: Tue, 28 Mar 2017 17:26:01 -0400 From: Luiz Capitulino To: Rik van Riel Cc: Wanpeng Li , Frederic Weisbecker , linux-kernel@vger.kernel.org Subject: Re: [BUG nohz]: wrong user and system time accounting Message-ID: <20170328172601.4d17256c@redhat.com> In-Reply-To: <1490734912.8850.85.camel@redhat.com> References: <20170323165512.60945ac6@redhat.com> <1490636129.8850.76.camel@redhat.com> <20170328132406.7d23579c@redhat.com> <20170328161454.4a5d9e8b@redhat.com> <1490734912.8850.85.camel@redhat.com> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Tue, 28 Mar 2017 21:26:03 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1732 Lines: 40 On Tue, 28 Mar 2017 17:01:52 -0400 Rik van Riel wrote: > On Tue, 2017-03-28 at 16:14 -0400, Luiz Capitulino wrote: > > On Tue, 28 Mar 2017 13:24:06 -0400 > > Luiz Capitulino wrote: > > > I'm starting to suspect that the nohz code may be programming > > > the tick period to be shorter than 1ms when it re-activates > > > the tick. > > > > And I think I was right, it looks like the nohz code is programming > > the tick period incorrectly when restarting the tick. The patch below > > fixes things for me, but I still have some homework todo and more > > testing before posting a patch for inclusion. Could you guys test it? > > Your patch seems to work. I don't claim to understand why > your patch makes a difference, but for this particular test > case, on this particular setup, it seems to work... I don't fully understand why either yet. I was looking for places where nohz might be programming the tick period incorrectly and I found that there's a case in tick_nohz_stop_sched_tick() where tick_nohz_restart() is called only to reprogram the tick timer, not cancel the tick. In this case, ts->last_tick seems to be out of date. Fixing this fixed accounting for me. > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > > index 7fe53be..9abe979 100644 > > --- a/kernel/time/tick-sched.c > > +++ b/kernel/time/tick-sched.c > > @@ -1152,6 +1152,7 @@ static enum hrtimer_restart > > tick_sched_timer(struct hrtimer *timer) > >         struct pt_regs *regs = get_irq_regs(); > >         ktime_t now = ktime_get(); > >   > > +       ts->last_tick = now; > >         tick_sched_do_timer(now); > >   > >         /* >