Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932417AbcLLK2H (ORCPT ); Mon, 12 Dec 2016 05:28:07 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:40857 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754111AbcLLK2F (ORCPT ); Mon, 12 Dec 2016 05:28:05 -0500 Date: Mon, 12 Dec 2016 11:27:54 +0100 From: Martin Schwidefsky To: Frederic Weisbecker Cc: LKML , Tony Luck , Wanpeng Li , Peter Zijlstra , Michael Ellerman , Heiko Carstens , Benjamin Herrenschmidt , Thomas Gleixner , Paul Mackerras , Ingo Molnar , Fenghua Yu , Rik van Riel , Stanislaw Gruszka Subject: Re: [PATCH 09/10] s390/cputime: delayed accounting of system time In-Reply-To: <20161210014804.GA3023@lerouge> References: <1480991543-6557-1-git-send-email-fweisbec@gmail.com> <1480991543-6557-10-git-send-email-fweisbec@gmail.com> <20161210014804.GA3023@lerouge> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16121210-0016-0000-0000-0000027BFB9F X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16121210-0017-0000-0000-00002426AB52 Message-Id: <20161212112754.5ad104cf@mschwideX1> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-12-12_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1612120168 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7570 Lines: 180 On Sat, 10 Dec 2016 02:48:06 +0100 Frederic Weisbecker wrote: > On Tue, Dec 06, 2016 at 03:32:22AM +0100, Frederic Weisbecker wrote: > > From: Martin Schwidefsky > > > > The account_system_time() function is called with a cputime that > > occurred while running in the kernel. The function detects which > > context the CPU is currently running in and accounts the time to > > the correct bucket. This forces the arch code to account the > > cputime for hardirq and softirq immediately. > > > > Such accounting function can be costly and perform unwelcome divisions > > and multiplications, among others. > > > > The arch code can delay the accounting for system time. For s390 > > the accounting is done once per timer tick and for each task switch. > > > > Signed-off-by: Martin Schwidefsky > > Cc: Benjamin Herrenschmidt > > Cc: Paul Mackerras > > Cc: Michael Ellerman > > Cc: Heiko Carstens > > Cc: Martin Schwidefsky > > Cc: Tony Luck > > Cc: Fenghua Yu > > Cc: Peter Zijlstra > > Cc: Rik van Riel > > Cc: Thomas Gleixner > > Cc: Ingo Molnar > > Cc: Stanislaw Gruszka > > Cc: Wanpeng Li > > [rebase against latest cputime tree, massaged changelog accordingly] > > Signed-off-by: Frederic Weisbecker > > Looking at this patch again, I think I need to do another pass on it. > Comments below: > > > /* > > * Update process times based on virtual cpu times stored by entry.S > > * to the lowcore fields user_timer, system_timer & steal_clock. > > */ > > static int do_account_vtime(struct task_struct *tsk, int hardirq_offset) > > { > > - u64 timer, clock, user, system, steal; > > - u64 user_scaled, system_scaled; > > + u64 timer, clock, user, guest, system, hardirq, softirq, steal; > > > > timer = S390_lowcore.last_update_timer; > > clock = S390_lowcore.last_update_clock; > > @@ -110,36 +119,57 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset) > > #endif > > : "=m" (S390_lowcore.last_update_timer), > > "=m" (S390_lowcore.last_update_clock)); > > - S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer; > > - S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock; > > + clock = S390_lowcore.last_update_clock - clock; > > + timer -= S390_lowcore.last_update_timer; > > + > > + if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0)) > > + S390_lowcore.guest_timer += timer; > > + else if (hardirq_count() - hardirq_offset) > > + S390_lowcore.hardirq_timer += timer; > > We should get rid of the hardirq_offset argument, it doesn't really make sense > anymore. Also it makes the accounting buggy now. It's called from the tick > through account_user_time() with hardirq_offset=1, so the irq time is incorrectly > accumulated as system time. Guest time may be incorrect too. > > In fact it may have been buggy even before this patchset because vtime_account_user() > isn't only called from the tick but also from task switch, and hardirq_offset remains 1 > for those two cases. Not good. For s390 the do_account_vtime function is called from vtime_task_switch and vtime_flush. 1) vtime_task_switch is exclusively called from finish_task_switch outside of irq context. The call to do_account_vtime with hardirq_offset==0 from vtime_task_switch is correct. 2) The call to vtime_flush in vtime_common_task_switch is irrelevant for s390 as we define __ARCH_HAS_VTIME_TASK_SWITCH 3) The call to vtime_flush in account_process_tick is done in irq context from update_process_times. hardirq_offset==1 is also correct. As far as s390 is concerned that looks good. > > + else if (in_serving_softirq()) > > + S390_lowcore.softirq_timer += timer; > > + else > > + S390_lowcore.system_timer += timer; > > > > /* Update MT utilization calculation */ > > if (smp_cpu_mtid && > > time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies))) > > update_mt_scaling(); > > > > + /* Calculate cputime delta */ > > user = S390_lowcore.user_timer - tsk->thread.user_timer; > > - S390_lowcore.steal_timer -= user; > > tsk->thread.user_timer = S390_lowcore.user_timer; > > - > > + guest = S390_lowcore.guest_timer - tsk->thread.guest_timer; > > + tsk->thread.guest_timer = S390_lowcore.guest_timer; > > system = S390_lowcore.system_timer - tsk->thread.system_timer; > > - S390_lowcore.steal_timer -= system; > > tsk->thread.system_timer = S390_lowcore.system_timer; > > + hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer; > > + tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer; > > + softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer; > > + tsk->thread.softirq_timer = S390_lowcore.softirq_timer; > > + S390_lowcore.steal_timer += > > + clock - user - guest - system - hardirq - softirq; > > > > - user_scaled = user; > > - system_scaled = system; > > - /* Do MT utilization scaling */ > > - if (smp_cpu_mtid) { > > - u64 mult = __this_cpu_read(mt_scaling_mult); > > - u64 div = __this_cpu_read(mt_scaling_div); > > + /* Push account value */ > > + if (user) { > > + account_user_time(tsk, user); > > + tsk->utimescaled += scale_vtime(user); > > + } > > > > - user_scaled = (user_scaled * mult) / div; > > - system_scaled = (system_scaled * mult) / div; > > + if (guest) { > > + account_guest_time(tsk, guest); > > + tsk->utimescaled += scale_vtime(guest); > > } > > - account_user_time(tsk, user); > > - tsk->utimescaled += user_scaled; > > - account_system_time(tsk, hardirq_offset, system); > > - tsk->stimescaled += system_scaled; > > + > > + if (system) > > + account_system_index_scaled(tsk, system, scale_vtime(system), > > + CPUTIME_SYSTEM); > > + if (hardirq) > > + account_system_index_scaled(tsk, hardirq, scale_vtime(hardirq), > > + CPUTIME_IRQ); > > + if (softirq) > > + account_system_index_scaled(tsk, softirq, scale_vtime(softirq), > > + CPUTIME_SOFTIRQ); > > > > steal = S390_lowcore.steal_timer; > > if ((s64) steal > 0) { > > @@ -147,16 +177,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset) > > account_steal_time(steal); > > } > > > > - return virt_timer_forward(user + system); > > + return virt_timer_forward(user + guest + system + hardirq + softirq); > > } > > > > void vtime_task_switch(struct task_struct *prev) > > { > > do_account_vtime(prev, 0); > > This call should be removed, the task switch already calls vtime_account_user(). The vtime_account_user function is empty for s390.. > > prev->thread.user_timer = S390_lowcore.user_timer; > > + prev->thread.guest_timer = S390_lowcore.guest_timer; > > prev->thread.system_timer = S390_lowcore.system_timer; > > + prev->thread.hardirq_timer = S390_lowcore.hardirq_timer; > > + prev->thread.softirq_timer = S390_lowcore.softirq_timer; > > S390_lowcore.user_timer = current->thread.user_timer; > > + S390_lowcore.guest_timer = current->thread.guest_timer; > > S390_lowcore.system_timer = current->thread.system_timer; > > + S390_lowcore.hardirq_timer = current->thread.hardirq_timer; > > + S390_lowcore.softirq_timer = current->thread.softirq_timer; > > } > -- blue skies, Martin. "Reality continues to ruin my life." - Calvin.