Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934732AbdC3VZv (ORCPT ); Thu, 30 Mar 2017 17:25:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:38116 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754625AbdC3VZt (ORCPT ); Thu, 30 Mar 2017 17:25:49 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 49F6375724 Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=lcapitulino@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 49F6375724 Date: Thu, 30 Mar 2017 17:25:46 -0400 From: Luiz Capitulino To: Frederic Weisbecker Cc: Wanpeng Li , Mike Galbraith , Rik van Riel , "linux-kernel@vger.kernel.org" , Peter Zijlstra , Thomas Gleixner Subject: Re: [BUG nohz]: wrong user and system time accounting Message-ID: <20170330172546.4e8e1a6a@redhat.com> In-Reply-To: <20170330141816.GE3626@lerouge> References: <20170323165512.60945ac6@redhat.com> <1490636129.8850.76.camel@redhat.com> <20170328132406.7d23579c@redhat.com> <20170329131656.1d6cb743@redhat.com> <1490818125.28917.11.camel@redhat.com> <1490848051.4167.57.camel@gmx.de> <20170330133802.GC3626@lerouge> <20170330141816.GE3626@lerouge> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 30 Mar 2017 21:25:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1980 Lines: 46 On Thu, 30 Mar 2017 16:18:17 +0200 Frederic Weisbecker wrote: > On Thu, Mar 30, 2017 at 09:59:54PM +0800, Wanpeng Li wrote: > > 2017-03-30 21:38 GMT+08:00 Frederic Weisbecker : > > > If it works, we may want to take that solution, likely less performance sensitive > > > than using sched_clock(). In fact sched_clock() is fast, especially as we require it to > > > be stable for nohz_full, but using it involves costly conversion back and forth to jiffies. > > > > So both Rik and you agree with the skew tick solution, I will try it > > tomorrow. Btw, if we should just add random offset to the cpu in the > > nohz_full mode or add random offset to all cpus like the codes above? > > Lets just keep it to all CPUs for simplicty. > Also please add a comment that explains why we need that skew_tick on nohz_full. I've tried all the test-cases we discussed in this thread with skew_tick=1 and it worked as expected in bare-metal and KVM guests. However, I found a test-case that works in bare-metal but show problems in KVM guests. It could something that's KVM specific, or it could be something that's harder to reproduce in bare-metal. The reproducer is (not sure all the steps are necessary): 1. Isolate 8 cores in the host with isolcpus= and nohz_full= (and skew_tick=1) 2. Create a KVM guest with 8 vCPUs and pin each vCPU to an isolated host core 3. Boot the guest with isolcpus=2,3,4,5,6,7 nohz_full=2,3,4,5,6,7 skew_tick=1 4. Once the guest is booted, run: # for i in $(seq 2 7); do taskset -c $i hog& ;done # taskset -c 2,3,4,5,6,7 \ cyclictest -m -n -q -p95 -D 1m -h60 -i 200 -t 6 -a 2,3,4,5,6,7 (where hog is a program taking 100% of the CPU, and cyclictest is RT's cyclictest) 5. Run top -d1 In a few minutes into this test-case, I see one isolated CPU in the guest reporting around 95% system time (where the expected is close to 100% user time, which the others isolated CPUs correctly report).