Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp460202pxj; Wed, 16 Jun 2021 06:29:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyCT2HYCmmSBR4Sg7g1BHGgUAhDgGUguCeXcWm4T5yMHAGb53GCrOndD7XHma1jMZjudXQW X-Received: by 2002:aa7:c54c:: with SMTP id s12mr4223431edr.374.1623850172680; Wed, 16 Jun 2021 06:29:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623850172; cv=none; d=google.com; s=arc-20160816; b=yIcPWDVorOcLWoT5xJ9Zjilaw0yLO2koQhwTHS+PyF07CUWKVkDnll4m2CUpiSGs0a hQcpxo3YGJmfsNtyXKf/zc1Nk8kVG4TlwL/mwEpexlwOj5SZ6yB4aniNY5qfn2sou81H rd29Xl0cY8oQ/MSXHxygo1d17fKBfspvDGJ0qMzQ2/730bbmV97qgWsI1iDjAMilqa9A bYgbaNYQd0uBmyXWmlHp87nZdzRxSg5KmAZ23qc6QGBwPEbqivh4CqC0S4QspOIVn7z4 ikamMIVyqQyG5ZsXIs77kG2SSUF3bR1+87vT6kaWxbYcB9zNEEHo56ED4Lja/ab3meBg NNCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=MWv2/giThGBSyNx7MUlTX4ADJ123OQ0ptvZo7V2NbO0=; b=EhpKNjD4jPuTHRS7lC4XiH03l0HqT8gPI6lxQ1PJi6a5i41s2QI89fdvRHIF4xiPmP DUSR8nPHHIWvRgL2xk0EwJJmqNnZ4dGJxAao5F+q24OpXv3T/hiGSurngs+I+KGl9j24 r+9Pvo80MjDmbZcbP5STYDxyh8lmdmTluEYZEzeSeBlKA3JAnThMsStRN1fZtVim1LLD y6v1YhhsNDamJ4WrvTkpAtymCEBmS+n2oXHxbZQO0Zl1O32822HXMT0hhDt993OhQ9Ai aACLmkw1GwwXqjEKpS6xHA82t1oKIazNg/P1wSZZvn9FfzY2PwK5Yj1Bw94GZg1AKH9y MJMg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=elUDiQS9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g25si2117116eds.466.2021.06.16.06.29.09; Wed, 16 Jun 2021 06:29:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=elUDiQS9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231180AbhFPMdW (ORCPT + 99 others); Wed, 16 Jun 2021 08:33:22 -0400 Received: from mail.kernel.org ([198.145.29.99]:50246 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231666AbhFPMdU (ORCPT ); Wed, 16 Jun 2021 08:33:20 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 7646161359; Wed, 16 Jun 2021 12:31:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623846674; bh=GaASMiuVf3PpMpuP0p95+Gtkk5/Lp7YuwNvCA4UDWL8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=elUDiQS9AZODCuaRaRE7GbvNCj6iNf1GHG7mXx1OI1y8HI5ZB6YRIUwJcMg4O/JPw fWJ/jsA7r8YIttWwNF+JMaHp9VKJmQEeBxCEbxGBQjZtRCK+27GfDumewcEkBw0jm+ OQvVyshLGtyEU83dUF7ZNJdLIR0Z62CRjgnWVysf6fcOa6JXoTzFUtXBBh8jcWzNwq Pev8CArgJzA++9lZrEFknZ53hwE6vzG6CALe3ctUNDpTIKixd3o1UKmNfegGbMi7h0 58ECmU5I85pLDe1F02WWCXPDldhO3cbFhD0oMPUGVxLl5Gdjp3/9nkuBYXxSNuEJNO kH7Yre050IN8A== Date: Wed, 16 Jun 2021 14:31:11 +0200 From: Frederic Weisbecker To: Peter Zijlstra Cc: "hasegawa-hitomi@fujitsu.com" , "'mingo@kernel.org'" , "'fweisbec@gmail.com'" , "'tglx@linutronix.de'" , "'juri.lelli@redhat.com'" , "'vincent.guittot@linaro.org'" , "'dietmar.eggemann@arm.com'" , "'rostedt@goodmis.org'" , "'bsegall@google.com'" , "'mgorman@suse.de'" , "'bristot@redhat.com'" , "'linux-kernel@vger.kernel.org'" Subject: Re: Utime and stime are less when getrusage (RUSAGE_THREAD) is executed on a tickless CPU. Message-ID: <20210616123111.GD801071@lothringen> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 19, 2021 at 11:24:58AM +0200, Peter Zijlstra wrote: > On Wed, May 19, 2021 at 06:30:36AM +0000, hasegawa-hitomi@fujitsu.com wrote: > > Hi Ingo, Peter, Juri, and Vincent. > > > > > > > Your email is malformed. > > > > I'm sorry. I was sent in the wrong format. I correct it and resend. > > Thank you, Peter, for pointing this out. > > > > > > I found that when I run getrusage(RUSAGE_THREAD) on a tickless CPU, > > the utime and stime I get are less than the actual time, unlike when I run > > getrusage(RUSAGE_SELF) on a single thread. > > This problem seems to be caused by the fact that se.sum_exec_runtime is not > > updated just before getting the information from 'current'. > > In the current implementation, task_cputime_adjusted() calls task_cputime() to > > get the 'current' utime and stime, then calls cputime_adjust() to adjust the > > sum of utime and stime to be equal to cputime.sum_exec_runtime. On a tickless > > CPU, sum_exec_runtime is not updated periodically, so there seems to be a > > discrepancy with the actual time. > > Therefore, I think I should include a process to update se.sum_exec_runtime > > just before getting the information from 'current' (as in other processes > > except RUSAGE_THREAD). I'm thinking of the following improvement. > > > > @@ void getrusage(struct task_struct *p, int who, struct rusage *r) > > if (who == RUSAGE_THREAD) { > > + task_sched_runtime(current); > > task_cputime_adjusted(current, &utime, &stime); > > > > Is there any possible problem with this? > > Would be superfluous for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y > architectures at the very least. > > It also doesn't help any of the other callers, like for example procfs. > > Something like the below ought to work and fix all variants I think. But > it does make the call significantly more expensive. > > Looking at thread_group_cputime() that already does something like this, > but that's also susceptible to a variant of this very same issue; since > it doesn't call it unconditionally, nor on all tasks, so if current > isn't part of the threadgroup and/or another task is on a nohz_full cpu, > things will go wobbly again. > > There's a note about syscall performance there, so clearly someone seems > to care about that aspect of things, but it does suck for nohz_full. > > Frederic, didn't we have remote ticks that should help with this stuff? Only once per second :-s > > And mostly I think the trade-off here is that if you run on nohz_full, > you're not expected to go do syscalls anyway (because they're sodding > expensive) and hence the accuracy of these sort of things is mostly > irrelevant. I guess you can, before you enter some critical workload. > > So it might be the use-case is just fundamentally bonkers and we > shouldn't really bother fixing this. > > Anyway? > > --- > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index 872e481d5098..620871c8e4f8 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -612,7 +612,7 @@ void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev, > void task_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st) > { > struct task_cputime cputime = { > - .sum_exec_runtime = p->se.sum_exec_runtime, > + .sum_exec_runtime = task_sched_runtime(p), > }; > > task_cputime(p, &cputime.utime, &cputime.stime); Or perhaps just return task_cputime() if the task runs in a nohz_full CPU? The check for that would be racy though and there might be jumps between values if the thread goes/leave a nohz_full CPU.