Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752471AbdCHLXn (ORCPT ); Wed, 8 Mar 2017 06:23:43 -0500 Received: from mail-ot0-f194.google.com ([74.125.82.194]:34793 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751527AbdCHLXk (ORCPT ); Wed, 8 Mar 2017 06:23:40 -0500 MIME-Version: 1.0 In-Reply-To: References: <78aa4c26-2a31-06e5-d862-6e777d204b2b@yandex.ru> From: Richard Genoud Date: Wed, 8 Mar 2017 12:14:16 +0100 Message-ID: Subject: Re: [REGRESSION] Two issues that prevent process accounting (taskstats) from working correctly To: Dmitry Romanov , Al Viro Cc: LKML , Peter Zijlstra , Ingo Molnar , Andrew Morton , Ingo Molnar , Nicolas Dichtel , Jay Lan , Martin Steigerwald , Gerlof Langeveld , Marc Haber , Ben Hutchings , Balbir Singh , Tejun Heo , kernel-team@fb.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8004 Lines: 226 Cc += Al Viro 2017-02-12 16:44 GMT+01:00 Dmitry Romanov : > On Mon, Dec 19, 2016 at 01:06:00PM +0100, Martin Steigerwald wrote: >> >> 1) Sometimes process accounting does not work at all. >> >> The acct() system call (to activate process accounting) return value 0, >> which means that process accounting is activated successfully. >> However, no process accounting records are written whatsoever. This >> situation can be reproduced with the program 'acctdemo.c' >> that you can find as attachment. When this program gives the message >> "found a process accounting record!", the situation is okay >> and process accounting works fine to the file '/tmp/mypacct'. When the >> message 'No process accounting record yet....' is repeatedly given, >> process accounting does not work and will not work at all. It might be >> that you have to start this program several times before you get >> this situation (preferably start/finish lots of processes in the mean time). >> This problem is probably caused by a new mechanism introduced in the >> kernel code (..../linux/kernel/acct.c) that is called 'slow accounting' >> and has to be solved in the kernel code. >> >> I experience this problem on Debian8 with a 4.8 kernel and on CentOS7 >> with a 4.8 kernel. > > I reported same problem on bugzilla as: > > Bug 180841 - Process accounting sometimes do not append records for terminated > processes > https://bugzilla.kernel.org/show_bug.cgi?id=180841 > > I think I found cause of this problem and can suggest patch which correct this > problem. > > Problem arise in such situation: > > Problem arise if between process accounting activation with call acct(...) and > first termination of process pass not less than one jiffy. (Note, acct() return > successfully, with zero.) In such situation records for terminated processes > do nod append to accounting file, until process accounting is restarted. > > I wrote test program test.c for illustration described problem for kernel > version 3.17-rc1 and later. This program create empty file for accounting, > call system call acct() with this file, sleep for not less than one jiffy, > create new process and exit this process. Then, program test size of accounting > file. If size of file remain zero, it seems problem and program display message > "Accounting file size = 0, process accounting did not add record to accounting > file!". > On my system problem reproduce almost always by this test.c. > > ---------- > test.c > ---------- > #include > #include > #include > #include > #include > #include > > /* Accounting file name : */ > #define ACCTFILENAME "/var/log/pacct" > > int main() > { > int fd; > int pid; > struct stat sb; > > /* Turn off accounting */ > if ( acct(NULL) < 0 ) { > perror("Process accounting off"); > return -1; > } > > /* Create empty accounting file */ > fd = creat(ACCTFILENAME, S_IRUSR | S_IWUSR); > if ( fd == -1 ) { > perror("Accounting file creation"); > return -1; > } > if ( close(fd) == -1) { > perror("Accounting file closing"); > return -1; > } > > /* Switch accounting on */ > if ( acct(ACCTFILENAME) < 0 ) { > perror("Process accounting on"); > return -1; > } > > /* Delay must be at least 1 jiffy. > For reproducing bug, some process exit must not happen during first jiffy. > If HZ >= 100, need delay minimum 10 ms. */ > usleep(10000); > > /* Create and exit child process. The record for this process must be append > by activated process accounting. */ > pid = fork(); > if (pid < 0) { > perror("Child process creating"); > return -1; > } > /* Exit child process */ > if (pid == 0) { > exit(0); > } > /* Wait for child process termination */ > wait(NULL); > > /* Get size of accounting file. */ > if ( stat(ACCTFILENAME, &sb) == -1 ) { > perror("Getting accounting file status"); > return -1; > } > > if ( sb.st_size == 0 ) { > printf("Accounting file size = 0, process accounting did not add record to accounting file!\n"); > } > else { > printf("Accounting file size > 0, expected behaviour.\n"); > } > > /* Turn off accounting and remove accounting file*/ > if ( acct(NULL) < 0 ) { > perror("Process accounting off"); > return -1; > } > if ( remove(ACCTFILENAME) == -1 ) { > perror("Removing accounting file"); > return -1; > } > > return 0; > > } > ---------- > > I suppose that this problem may be solve in kernel versions 3.17-rc1 and > later by following patch: > > Signed-off-by: Dmitry Romanov > --- > kernel/acct.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/acct.c b/kernel/acct.c > index 74963d1..37f1dc6 100644 > --- a/kernel/acct.c > +++ b/kernel/acct.c > @@ -99,7 +99,7 @@ static int check_free_space(struct bsd_acct_struct *acct) > { > struct kstatfs sbuf; > > - if (time_is_before_jiffies(acct->needcheck)) > + if (time_is_after_jiffies(acct->needcheck)) > goto out; > > /* May block */ > -- > 1.9.1 > > Line "if (time_is_before_jiffies(acct->needcheck))" use for check whether the > time for check free space. It seems function "time_is_after_jiffies" need here. > > ---------- > In kernel versions 3.3-rc1 - 3.16 activation of process accounting implemented > differently, so delay between call acct(filename) and first process termination > do not produce problem, and program test.c do not detect problem. But, it seems, > using function time_is_before_jiffies is not right similarly. > Another problem arise, if during work of process accounting happen that current > jiffies is greater than acct->needcheck (for example, if between two consecutive > process terminations happen interval greater than ACCT_TIMEOUT seconds). > Then in lines: > > if (!file || time_is_before_jiffies(acct->needcheck)) > goto out; > > always will branch with "goto" and acct->needcheck will not change. So free > space will not check more, until process accounting is restarted. > Note, that in version 3.17-rc1 and later this problem is also present. > > Therefore I suppose that this problem for kernel version 3.3-rc1 - 3.16 may be > solve by following patch: > > Signed-off-by: Dmitry Romanov > --- > kernel/acct.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/acct.c b/kernel/acct.c > index 808a86f..591bdcd 100644 > --- a/kernel/acct.c > +++ b/kernel/acct.c > @@ -107,7 +107,7 @@ static int check_free_space(struct bsd_acct_struct *acct, struct file *file) > > spin_lock(&acct_lock); > res = acct->active; > - if (!file || time_is_before_jiffies(acct->needcheck)) > + if (!file || time_is_after_jiffies(acct->needcheck)) > goto out; > spin_unlock(&acct_lock); > > -- > 1.9.1 > > > In kernel 3.2 another method used for define time to check > free space (by timer). So I did not find in this version such problem. > > Dmitry Playing with bootchart, I also had some problems with BSD process acounting V3 on v4.11-rc1. The file kernel_pacct created by bootchartd was always empty. After a git bisect, the culprit was : commit 795a2f22a8ea ("acct() should honour the limits from the very beginning") If I revert this commit, the accounting starts working again test platform: ARM at91sam9g35 test software : busybox bootchartd Richard.