Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934769AbdIYKrj (ORCPT ); Mon, 25 Sep 2017 06:47:39 -0400 Received: from mail-qk0-f193.google.com ([209.85.220.193]:33167 "EHLO mail-qk0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933162AbdIYKrh (ORCPT ); Mon, 25 Sep 2017 06:47:37 -0400 X-Google-Smtp-Source: AOwi7QCf462WdHVxmSCJinLqivmKfA9O9GsCEVIpPry1M/Rqrt2YRufARby4KyelyMt63vikaOwJ4HLWkrLoTx3Sv/A= MIME-Version: 1.0 In-Reply-To: <20170924200822.GB24368@avx2> References: <20170924200620.GA24368@avx2> <20170924200822.GB24368@avx2> From: Djalal Harouni Date: Mon, 25 Sep 2017 11:47:35 +0100 Message-ID: Subject: Re: [PATCH v2 2/2] pidmap(2) To: Alexey Dobriyan Cc: Andrew Morton , linux-kernel , Linux API , Randy Dunlap , tglx@linutronix.de, Alexey Gladkov , Tatsiana Brouka , Aliaksandr Patseyenak , Andy Lutomirski Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v8PAlwAS027446 Content-Length: 2399 Lines: 66 Hi Alexey, On Sun, Sep 24, 2017 at 9:08 PM, Alexey Dobriyan wrote: > From: Tatsiana Brouka > > Implement system call for bulk retrieveing of pids in binary form. > > Using /proc is slower than necessary: 3 syscalls + another 3 for each thread + > converting with atoi() + instantiating dentries and inodes. > > /proc may be not mounted especially in containers. Natural extension of > hidepid=2 efforts is to not mount /proc at all. Actually I am not sure if software will work if /proc is not mounted, last time (years) I checked glibc was doing extra checks during initialization using /proc/self/* memory inodes and it may fail. Also fexecve() glibc is implemented using /proc/self/... so it depends on which library and the use case for cloud containers... Also for the natural extension of hidepid=2 where we only want pids inside /proc without kernel data, we have already a clean patch on top of the procfs modernization [1] , this is the result of the previous months. > > It could be used by programs like ps, top or CRIU. Speed increase will > become more drastic once combined with bulk retrieval of process statistics. Yes the numbers are nice, seems that you want to move from filesystem syscalls on procfs, to only use direct syscalls, hmm this does not help to fix procfs. Tools like ps, top and others can be updated, but anyone can *continue* to use open+read on procfs and access the data. I think this will be a bit hard to fix from our side, since with your patches you are doing it from current context, where from procfs it will be from: current+procfs mount context. What if procfs is mounted with "ptracepids=true" the new "hidepid=" but whithout "gid=" interaction, and then you read from /proc//pidmap/* as suggested by Andy ? /proc//pidmap/{tasks|proc|children} I am not sure about the PIDMAP_IGNORE_KTHREADS case... > Benchmark: > > N=1<<16 times > ~130 processes (~250 task_structs) on a regular desktop system > opendir + readdir + closedir /proc + the same for every /proc/$PID/task > (roughly what htop(1) does) vs pidmap > > /proc 16.80 ± 0.73% > pidmap 0.06 ± 0.31% Thanks! [1] https://github.com/legionus/linux/commit/993a2a5b9af95b0ac901ff41d32124b72ed676e3 P.S. for the procfs modernization we are planning patches next days. -- tixxdz