Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752435AbcLJBsL (ORCPT ); Fri, 9 Dec 2016 20:48:11 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:34607 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750989AbcLJBsJ (ORCPT ); Fri, 9 Dec 2016 20:48:09 -0500 Date: Sat, 10 Dec 2016 02:48:06 +0100 From: Frederic Weisbecker To: LKML Cc: Martin Schwidefsky , Tony Luck , Wanpeng Li , Peter Zijlstra , Michael Ellerman , Heiko Carstens , Benjamin Herrenschmidt , Thomas Gleixner , Paul Mackerras , Ingo Molnar , Fenghua Yu , Rik van Riel , Stanislaw Gruszka Subject: Re: [PATCH 09/10] s390/cputime: delayed accounting of system time Message-ID: <20161210014804.GA3023@lerouge> References: <1480991543-6557-1-git-send-email-fweisbec@gmail.com> <1480991543-6557-10-git-send-email-fweisbec@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1480991543-6557-10-git-send-email-fweisbec@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6455 Lines: 157 On Tue, Dec 06, 2016 at 03:32:22AM +0100, Frederic Weisbecker wrote: > From: Martin Schwidefsky > > The account_system_time() function is called with a cputime that > occurred while running in the kernel. The function detects which > context the CPU is currently running in and accounts the time to > the correct bucket. This forces the arch code to account the > cputime for hardirq and softirq immediately. > > Such accounting function can be costly and perform unwelcome divisions > and multiplications, among others. > > The arch code can delay the accounting for system time. For s390 > the accounting is done once per timer tick and for each task switch. > > Signed-off-by: Martin Schwidefsky > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Michael Ellerman > Cc: Heiko Carstens > Cc: Martin Schwidefsky > Cc: Tony Luck > Cc: Fenghua Yu > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Stanislaw Gruszka > Cc: Wanpeng Li > [rebase against latest cputime tree, massaged changelog accordingly] > Signed-off-by: Frederic Weisbecker Looking at this patch again, I think I need to do another pass on it. Comments below: > /* > * Update process times based on virtual cpu times stored by entry.S > * to the lowcore fields user_timer, system_timer & steal_clock. > */ > static int do_account_vtime(struct task_struct *tsk, int hardirq_offset) > { > - u64 timer, clock, user, system, steal; > - u64 user_scaled, system_scaled; > + u64 timer, clock, user, guest, system, hardirq, softirq, steal; > > timer = S390_lowcore.last_update_timer; > clock = S390_lowcore.last_update_clock; > @@ -110,36 +119,57 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset) > #endif > : "=m" (S390_lowcore.last_update_timer), > "=m" (S390_lowcore.last_update_clock)); > - S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer; > - S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock; > + clock = S390_lowcore.last_update_clock - clock; > + timer -= S390_lowcore.last_update_timer; > + > + if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0)) > + S390_lowcore.guest_timer += timer; > + else if (hardirq_count() - hardirq_offset) > + S390_lowcore.hardirq_timer += timer; We should get rid of the hardirq_offset argument, it doesn't really make sense anymore. Also it makes the accounting buggy now. It's called from the tick through account_user_time() with hardirq_offset=1, so the irq time is incorrectly accumulated as system time. Guest time may be incorrect too. In fact it may have been buggy even before this patchset because vtime_account_user() isn't only called from the tick but also from task switch, and hardirq_offset remains 1 for those two cases. Not good. > + else if (in_serving_softirq()) > + S390_lowcore.softirq_timer += timer; > + else > + S390_lowcore.system_timer += timer; > > /* Update MT utilization calculation */ > if (smp_cpu_mtid && > time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies))) > update_mt_scaling(); > > + /* Calculate cputime delta */ > user = S390_lowcore.user_timer - tsk->thread.user_timer; > - S390_lowcore.steal_timer -= user; > tsk->thread.user_timer = S390_lowcore.user_timer; > - > + guest = S390_lowcore.guest_timer - tsk->thread.guest_timer; > + tsk->thread.guest_timer = S390_lowcore.guest_timer; > system = S390_lowcore.system_timer - tsk->thread.system_timer; > - S390_lowcore.steal_timer -= system; > tsk->thread.system_timer = S390_lowcore.system_timer; > + hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer; > + tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer; > + softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer; > + tsk->thread.softirq_timer = S390_lowcore.softirq_timer; > + S390_lowcore.steal_timer += > + clock - user - guest - system - hardirq - softirq; > > - user_scaled = user; > - system_scaled = system; > - /* Do MT utilization scaling */ > - if (smp_cpu_mtid) { > - u64 mult = __this_cpu_read(mt_scaling_mult); > - u64 div = __this_cpu_read(mt_scaling_div); > + /* Push account value */ > + if (user) { > + account_user_time(tsk, user); > + tsk->utimescaled += scale_vtime(user); > + } > > - user_scaled = (user_scaled * mult) / div; > - system_scaled = (system_scaled * mult) / div; > + if (guest) { > + account_guest_time(tsk, guest); > + tsk->utimescaled += scale_vtime(guest); > } > - account_user_time(tsk, user); > - tsk->utimescaled += user_scaled; > - account_system_time(tsk, hardirq_offset, system); > - tsk->stimescaled += system_scaled; > + > + if (system) > + account_system_index_scaled(tsk, system, scale_vtime(system), > + CPUTIME_SYSTEM); > + if (hardirq) > + account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq), > + CPUTIME_IRQ); > + if (softirq) > + account_system_index_scaled(tsk, softirq, scale_vtime(softirq), > + CPUTIME_SOFTIRQ); > > steal = S390_lowcore.steal_timer; > if ((s64) steal > 0) { > @@ -147,16 +177,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset) > account_steal_time(steal); > } > > - return virt_timer_forward(user + system); > + return virt_timer_forward(user + guest + system + hardirq + softirq); > } > > void vtime_task_switch(struct task_struct *prev) > { > do_account_vtime(prev, 0); This call should be removed, the task switch already calls vtime_account_user(). > prev->thread.user_timer = S390_lowcore.user_timer; > + prev->thread.guest_timer = S390_lowcore.guest_timer; > prev->thread.system_timer = S390_lowcore.system_timer; > + prev->thread.hardirq_timer = S390_lowcore.hardirq_timer; > + prev->thread.softirq_timer = S390_lowcore.softirq_timer; > S390_lowcore.user_timer = current->thread.user_timer; > + S390_lowcore.guest_timer = current->thread.guest_timer; > S390_lowcore.system_timer = current->thread.system_timer; > + S390_lowcore.hardirq_timer = current->thread.hardirq_timer; > + S390_lowcore.softirq_timer = current->thread.softirq_timer; > }