Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756838AbbFPOgN (ORCPT ); Tue, 16 Jun 2015 10:36:13 -0400 Received: from mail-la0-f50.google.com ([209.85.215.50]:32965 "EHLO mail-la0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754547AbbFPOfy convert rfc822-to-8bit (ORCPT ); Tue, 16 Jun 2015 10:35:54 -0400 MIME-Version: 1.0 In-Reply-To: References: <1434099316-29749-1-git-send-email-fredrik.markstrom@gmail.com> <1434099316-29749-2-git-send-email-fredrik.markstrom@gmail.com> <1434104217.1495.74.camel@twins> <20150612110158.GA18673@twins.programming.kicks-ass.net> From: =?UTF-8?Q?Fredrik_Markstr=C3=B6m?= Date: Tue, 16 Jun 2015 16:35:21 +0200 Message-ID: Subject: Re: [PATCH 1/1] cputime: Make the reported utime+stime correspond to the actual runtime. To: Peter Zijlstra Cc: mingo@redhat.com, linux-kernel@vger.kernel.org, Rik van Riel Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5000 Lines: 147 I cleaned up the test application a bit, in case it helps someone understand the problem and test other potential fixes. It's at: https://gist.github.com/frma71/5af5a2a4b264d5cdd265 I basically copied the cputime_adjust() code out of the kernel and added some stubs to be able to compile and run it as an ordinary user mode application. To download, compile and run it: % wget https://goo.gl/PE6RSj -O testadjust.c % gcc -W -Wall -o testadjust testadjust.c % ./testadjust My questions right now are: 1) Is this a problem worth fixing (should the reported sys+user = sum_exec_runtime) ? 2) Is there a preferred solution to the global spinlock ? /Fredrik On Mon, Jun 15, 2015 at 5:34 PM, Fredrik Markström wrote: > Hello Peter, your patch helps with some of the cases but not all: > > (the "called with.." below means cputime_adjust() is called with the > values specified in it's struct task_cputime argument.) > > It helps when called with: > > sum_exec_runtime=1000000000 utime=0 stime=1 > ... followed by... > sum_exec_runtime=1010000000 utime=100 stime=1 > > It doesn't help when called with: > > sum_exec_runtime=1000000000 utime=1 stime=0 > ... followed by... > sum_exec_runtime=1010000000 utime=1 stime=100 > > Also if we get a call with: > > sum_exec_runtime=1000000000 utime=1 stime=1 > > ... then get preempted after your proposed fix and before we are done > with the calls to cpu_advance(), then gets called again (from a > different thread) with: > > sum_exec_runtime=1010000000 utime=100 stime=1 > > ... it still breaks. > > I think there might be additional concurrency problems before, between > and/or possibly after the calls to cputime_advance(), at least if we > want to guarantee that sys+user should stay sane. I believe my > proposed patch eliminates those potential problems in a pretty > straight forward way. > > I tried to come up with a lock free solution but didn't find a simple > solution. Since, from what I understand, the likelihood of scalability > issues here are unlikely I felt that simplicity was preferred. Also > the current implementation has two cmpxchg:s, and my proposal a single > spinlock, so on some setups I bet it's more efficient (like mine with > a lousy interconnect and preempt-rt (but I'm on thin ice here)). > > Below is the output from my test application (it's to much of a hack > to post publicly), but I'd be happy to clean it up and post it if > necessary. > > /Fredrik > > > #. => . [=====> FAILED] > > 0.0 sum_exec=100000000000 utime=0 stime=1 => 0.0 tot=10000 > user=0 sys=10000 > 0.1 sum_exec=101000000000 utime=100 stime=1 => 0.1 tot=10100 > user=100 sys=10000 > > 1.0 sum_exec=100000000000 utime=1 stime=0 => 1.0 tot=10000 > user=10000 sys=0 > 1.1 sum_exec=101000000000 utime=1 stime=100 => 1.1 tot=20000 > user=10000 sys=10000 =====> FAILED > > 2.0 sum_exec=100000000000 utime=1 stime=1 => 2.0 tot=10000 > user=5000 sys=5000 > 2.1 sum_exec=101000000000 utime=100 stime=1 => 2.1 tot=10100 > user=5100 sys=5000 > > 3.0 sum_exec=100000000000 utime=1 stime=1 => <> > 3.1 sum_exec=101000000000 utime=100 stime=1 => 3.1 > tot=10100 user=10000 sys=100 > <> 3.0 tot=15000 user=10000 sys=5000 =====> FAILED > > > On Fri, Jun 12, 2015 at 1:01 PM, Peter Zijlstra wrote: >> On Fri, Jun 12, 2015 at 12:16:57PM +0200, Peter Zijlstra wrote: >>> On Fri, 2015-06-12 at 10:55 +0200, Fredrik Markstrom wrote: >>> > The scaling mechanism might sometimes cause top to report >100% >>> > (sometimes > 1000%) cpu usage for a single thread. This patch makes >>> > sure that stime+utime corresponds to the actual runtime of the thread. >>> >>> This Changelog is inadequate, it does not explain the actual problem. >>> >>> > +static DEFINE_SPINLOCK(prev_time_lock); >>> >>> global (spin)locks are bad. >> >> Since you have a proglet handy to test this; does something like the >> below help anything? >> >> --- >> kernel/sched/cputime.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c >> index f5a64ffad176..3d3f60a555a0 100644 >> --- a/kernel/sched/cputime.c >> +++ b/kernel/sched/cputime.c >> @@ -613,6 +613,10 @@ static void cputime_adjust(struct task_cputime *curr, >> >> stime = scale_stime((__force u64)stime, >> (__force u64)rtime, (__force u64)total); >> + >> + if (stime < prev->stime) >> + stime = prev->stime; >> + >> utime = rtime - stime; >> } >> > > > > -- > /Fredrik -- /Fredrik -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/