Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754125AbdDKLnm (ORCPT ); Tue, 11 Apr 2017 07:43:42 -0400 Received: from mail-oi0-f66.google.com ([209.85.218.66]:35221 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752291AbdDKLnk (ORCPT ); Tue, 11 Apr 2017 07:43:40 -0400 MIME-Version: 1.0 In-Reply-To: <20170411113645.j5gvvad3vczvyvlo@hirez.programming.kicks-ass.net> References: <20170323165512.60945ac6@redhat.com> <1490636129.8850.76.camel@redhat.com> <20170328132406.7d23579c@redhat.com> <20170329131656.1d6cb743@redhat.com> <1490818125.28917.11.camel@redhat.com> <1490848051.4167.57.camel@gmx.de> <20170330133802.GC3626@lerouge> <20170411113645.j5gvvad3vczvyvlo@hirez.programming.kicks-ass.net> From: Wanpeng Li Date: Tue, 11 Apr 2017 19:43:39 +0800 Message-ID: Subject: Re: [BUG nohz]: wrong user and system time accounting To: Peter Zijlstra Cc: Frederic Weisbecker , Mike Galbraith , Rik van Riel , Luiz Capitulino , "linux-kernel@vger.kernel.org" , Thomas Gleixner Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2288 Lines: 53 2017-04-11 19:36 GMT+08:00 Peter Zijlstra : > On Tue, Apr 11, 2017 at 07:03:17PM +0800, Wanpeng Li wrote: >> 2017-03-30 21:38 GMT+08:00 Frederic Weisbecker : >> > On Thu, Mar 30, 2017 at 02:47:11PM +0800, Wanpeng Li wrote: >> >> [...] >> >> > >> >> >> >> -------------------------------------->8----------------------------------------------------- >> >> >> >> use nanosecond granularity to check deltas but only perform an actual >> >> cputime update when that delta >= TICK_NSEC. >> >> >> >> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c >> >> index f3778e2b..f1ee393 100644 >> >> --- a/kernel/sched/cputime.c >> >> +++ b/kernel/sched/cputime.c >> >> @@ -676,18 +676,21 @@ void thread_group_cputime_adjusted(struct >> >> task_struct *p, u64 *ut, u64 *st) >> >> #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN >> >> static u64 vtime_delta(struct task_struct *tsk) >> >> { >> >> - unsigned long now = READ_ONCE(jiffies); >> >> + u64 now = local_clock(); >> > >> > I fear we need a global clock, because the reader (task_cputime()) needs >> > to compute the delta and therefore use the same clock from any CPU. >> > >> > Or we can use the local_clock() but the reader must access the same. >> > >> > So there would be vtime_delta_writer() which uses local_clock and stores >> > the current CPU to tsk->vtime_cpu (under the vtime_seqcount). And then >> > vtime_delta_reader() which calls sched_clock_cpu(tsk->vtime_cpu) which >> > is protected by vtime_seqcount as well. >> > >> > Although those sched_clock_cpu() things seem to only matter when the >> > sched_clock() is unstable. And that stability is a condition for nohz_full >> > to work anyway. So probably sched_clock() alone would be enough. >> >> I observed ~60% user time and ~40% sys time when replace local_clock() >> above by sched_clock()(two cpu hogs on the cpu in nohz_full mode). In >> addition, Luiz's testcast ./acct-bug 1 995 will show 100% idle time. >> If keep local_clock() in vtime_delta(), cpu hogs testcase will >> success. However, Luiz's testcase still show 100% idle time. > > Assuming a stable TSC, there should be no difference between > local_clock() and sched_clock(). So it is weird. I did't see any unstable tsc dump in dmesg. Regards, Wanpeng Li