Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753529AbdC3B6s (ORCPT ); Wed, 29 Mar 2017 21:58:48 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:34809 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752782AbdC3B6q (ORCPT ); Wed, 29 Mar 2017 21:58:46 -0400 MIME-Version: 1.0 In-Reply-To: <1490818125.28917.11.camel@redhat.com> References: <20170323165512.60945ac6@redhat.com> <1490636129.8850.76.camel@redhat.com> <20170328132406.7d23579c@redhat.com> <20170329131656.1d6cb743@redhat.com> <1490818125.28917.11.camel@redhat.com> From: Wanpeng Li Date: Thu, 30 Mar 2017 09:58:44 +0800 Message-ID: Subject: Re: [BUG nohz]: wrong user and system time accounting To: Rik van Riel Cc: Luiz Capitulino , Frederic Weisbecker , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2141 Lines: 55 2017-03-30 4:08 GMT+08:00 Rik van Riel : > On Wed, 2017-03-29 at 13:16 -0400, Luiz Capitulino wrote: >> On Tue, 28 Mar 2017 13:24:06 -0400 >> Luiz Capitulino wrote: >> >> > 1. In my tracing I'm seeing that sometimes (always?) the >> > time interval between two timer interrupts is less than 1ms >> >> I think that's the root cause. >> >> In this trace, we see the following: >> >> 1. On CPU15, we transition from user-space to kernel-space because >> of a timer interrupt (it's the tick) >> >> 2. vtimer_delta() returns 0, because jiffies didn't change since the >> last accounting >> >> 3. While CPU15 is executing in kernel-space, jiffies is updated >> by CPU0 >> >> 4. When going back to user-space, vtime_delta() returns non-zero >> and the whole time is accounted for system time (observe how >> the cputime parameter in account_system_time() is less than 1ms) > > In other words, the tick on cpu0 is aligned > with the tick on the nohz_full cpus, and > jiffies is advanced while the nohz_full cpus > with an active tick happen to be in kernel > mode? > > Frederic, can you think of any reason why > the tick on nohz_full CPUs would end up aligned > with the tick on cpu0, instead of running at some > random offset? > > A random offset, or better yet a somewhat randomized > tick length to make sure that simultaneous ticks are > fairly rare and the vtime sampling does not end up > "in phase" with the jiffies incrementing, could make > the accounting work right again. > > Of course, that assumes the above hypothesis is correct :) There is such a feature skew_tick currently, refer to commit 5307c9556bc (tick: add tick skew boot option), w/ skew_tick=1 boot parameter, the bug disappear, however, the commit also mentioned that it will hurt power consumption. I will try Frederic's proposal which is similar to my original idea "how bad would it be to revert to sched_clock() instead of jiffies in vtime_delta()? We could use nanosecond granularity to check deltas but only perform an actual cputime update when that delta >= TICK_NSEC." Regards, Wanpeng Li