Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932281AbcLSMN0 (ORCPT ); Mon, 19 Dec 2016 07:13:26 -0500 Received: from mailout.teamix.de ([194.150.191.118]:56839 "EHLO mailout.teamix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759595AbcLSMNX (ORCPT ); Mon, 19 Dec 2016 07:13:23 -0500 X-Greylist: delayed 438 seconds by postgrey-1.27 at vger.kernel.org; Mon, 19 Dec 2016 07:13:22 EST From: Martin Steigerwald To: Ingo Molnar , Andrew Morton CC: Ingo Molnar , Peter Zijlstra , LKML , Nicolas Dichtel , Balbir Singh , Shailabh Nagar , Jay Lan , Martin Steigerwald , Gerlof Langeveld , Marc Haber , Martin Subject: [REGRESSION] Two issues that prevent process accounting (taskstats) from working correctly Date: Mon, 19 Dec 2016 13:06:00 +0100 Message-ID: <5967400.cFS0L5jxeH@merkaba> Organization: teamix GmbH User-Agent: KMail/5.2.3 (Linux/4.8.14-tp520-btrfstrim+; KDE/5.28.0; x86_64; ; ) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="nextPart1575151.WTPb6e5RmZ" Content-Transfer-Encoding: 7Bit X-EXCLAIMER-MD-CONFIG: a9df7ef8-491e-4414-ab39-652b660186c0 X-EXCLAIMER-MD-BIFURCATION-INSTANCE: 0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5082 Lines: 166 --nextPart1575151.WTPb6e5RmZ Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" =EF=BB=BFHello Ingo, Peter, Nicolas, Andrew, Balbir, Shailabh, Jay, Gerlof = and Marc, starting from a Debian bug report of mine, Gerlof Langeveld, developer of=20 system and process monitor atop=C2=B9, found two issues with process accoun= ting. [1] http://atoptool.nl/ I did some guess work on who might be the maintainer for this, but please f= eel=20 free to add further Cc=C2=B4s as you see fit. Or ask for removal for Cc if = you are=20 not working on this anymore. Gerlof found two issues which I also reported to the kernel bug tracker. I= =20 copy and paste the summaries that Gerlof prepared: 1) Sometimes process accounting does not work at all. The acct() system call (to activate process accounting) return value 0,=20 which means that process accounting is activated successfully. However, no process accounting records are written whatsoever. This=20 situation can be reproduced with the program 'acctdemo.c' that you can find as attachment. When this program gives the message=20 "found a process accounting record!", the situation is okay and process accounting works fine to the file '/tmp/mypacct'. When the=20 message 'No process accounting record yet....' is repeatedly given, process accounting does not work and will not work at all. It might be=20 that you have to start this program several times before you get this situation (preferably start/finish lots of processes in the mean time)= . This problem is probably caused by a new mechanism introduced in the=20 kernel code (..../linux/kernel/acct.c) that is called 'slow accounting' and has to be solved in the kernel code. I experience this problem on Debian8 with a 4.8 kernel and on CentOS7=20 with a 4.8 kernel. I reported this as: Bug 190271 - process accounting sometimes does not work=20 https://bugzilla.kernel.org/show_bug.cgi?id=3D190271 2) When using the NETLINK inface, the command TASKSTATS_CMD_GET=20 consequently returns -EINVAL. The code that is used by the atopacctd daemon is based on the demo code=20 'getdelays.c' that can be found in the kernel source code tree (..../linux/Documentation/accounting/getdelays.c). Also this 'getdelays'=20 program does not work any more (also -EINVAL on the same call) with the newer kernels. I really spent a lot of time on this issue to=20 get the code running (there are many places in the kernel code where -EINVAL for this call can be given), but I did not succeed. It is really=20 an incompatibility introduced by the kernel code. It would be nice if the kernel maintainers provide a working version of=20 the getdelays program in the kernel source tree. I only experience this problem on Debian8 with a 4.8 kernel (virtual=20 machine with 4 cores). On CentOS7 with a 4.8 kernel it works fine (physical machine with 4 cores). I will anyhow adapt atopacctd for this issue that it detects and logs=20 the -EINVAL and terminates. The current version of atopacctd keeps running which is not useful at all. I reported this as: Bug 190711 - Process accounting: Using the NETLINK inface, the command=20 TASKSTATS_CMD_GET returns -EINVAL https://bugzilla.kernel.org/show_bug.cgi?id=3D190711 Marc Haber, maintainer of atop package, Gerlof Langeveld, developer of atop= =20 and I are currently discussing workarounds with atop and/or systemd service= =20 fail for the time till upstream kernels with this issues fixed are shipped = by=20 distributions. Still it would be nice to remove those work-arounds and have= =20 the kernel work correctly again at some time in the future. Thanks, --=20 Martin Steigerwald | Trainer teamix GmbH S=C3=BCdwestpark 43 90449 N=C3=BCrnberg Tel.: +49 911 30999 55 | Fax: +49 911 30999 99 mail: martin.steigerwald@teamix.de | web: http://www.teamix.de | blog: htt= p://blog.teamix.de Amtsgericht N=C3=BCrnberg, HRB 18320 | Gesch=C3=A4ftsf=C3=BChrer: Oliver K= =C3=BCgow, Richard M=C3=BCller teamix Support Hotline: +49 911 30999-112 =20 *** Bitte liken Sie uns auf Facebook: facebook.com/teamix *** --nextPart1575151.WTPb6e5RmZ Content-Disposition: attachment; filename="acctdemo.c" Content-Transfer-Encoding: 7Bit Content-Type: text/x-csrc; charset="UTF-8"; name="acctdemo.c" #include #include #include #include #include #include #define ACCTFILE "/tmp/mypacct" main() { int fd; char buf[1024]; if ( (fd = open(ACCTFILE, O_RDWR|O_CREAT|O_TRUNC, 0777)) == -1) { perror("Open " ACCTFILE); exit(1); } if (acct(ACCTFILE) == -1) { perror("Switch on accounting"); exit(1); } if ( fork() == 0 ) // fork new process exit(0); // child process: finish // parent process: // wait for child to finish wait((int *)0); // read the process accounting record of the finished child while (read(fd, buf, sizeof buf) == 0) { printf("No process accounting record yet....\n"); sleep(1); } printf("Yeeeeah, found a process accounting record!\n"); } --nextPart1575151.WTPb6e5RmZ--