Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752649AbdI0PEu (ORCPT ); Wed, 27 Sep 2017 11:04:50 -0400 Received: from mail-io0-f179.google.com ([209.85.223.179]:49114 "EHLO mail-io0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752139AbdI0PEs (ORCPT ); Wed, 27 Sep 2017 11:04:48 -0400 X-Google-Smtp-Source: AOwi7QAS65M9d0kM4sEZT02lv1FHpYUAz+rSoCkzPaTRhCqUQ5NRUnS9lGbaysJco+mRBEE9SkJHcFxN66M3vAhiROo= MIME-Version: 1.0 In-Reply-To: <20170926184643.GC14724@avx2> References: <20170924200620.GA24368@avx2> <20170924200822.GB24368@avx2> <20170926184643.GC14724@avx2> From: Andy Lutomirski Date: Wed, 27 Sep 2017 08:04:26 -0700 Message-ID: Subject: Re: [PATCH v2 2/2] pidmap(2) To: Alexey Dobriyan Cc: Andrew Morton , "linux-kernel@vger.kernel.org" , Linux API , Randy Dunlap , Thomas Gleixner , Djalal Harouni , Alexey Gladkov , Tatsiana Brouka , Aliaksandr Patseyenak Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by nfs id v8RF4wwS028340 Content-Length: 2277 Lines: 53 On Tue, Sep 26, 2017 at 11:46 AM, Alexey Dobriyan wrote: > On Sun, Sep 24, 2017 at 02:27:00PM -0700, Andy Lutomirski wrote: >> On Sun, Sep 24, 2017 at 1:08 PM, Alexey Dobriyan wrote: >> > From: Tatsiana Brouka >> > >> > Implement system call for bulk retrieveing of pids in binary form. >> > >> > Using /proc is slower than necessary: 3 syscalls + another 3 for each thread + >> > converting with atoi() + instantiating dentries and inodes. >> > >> > /proc may be not mounted especially in containers. Natural extension of >> > hidepid=2 efforts is to not mount /proc at all. >> > >> > It could be used by programs like ps, top or CRIU. Speed increase will >> > become more drastic once combined with bulk retrieval of process statistics. >> > >> > Benchmark: >> > >> > N=1<<16 times >> > ~130 processes (~250 task_structs) on a regular desktop system >> > opendir + readdir + closedir /proc + the same for every /proc/$PID/task >> > (roughly what htop(1) does) vs pidmap >> > >> > /proc 16.80 ą 0.73% >> > pidmap 0.06 ą 0.31% >> > >> > PIDMAP_* flags are modelled after /proc/task_diag patchset. >> > >> > >> > PIDMAP(2) Linux Programmer's Manual PIDMAP(2) >> > >> > NAME >> > pidmap - get allocated PIDs >> > >> > SYNOPSIS >> > long pidmap(pid_t pid, int *pids, unsigned int count , unsigned int start, int flags); >> >> I think we will seriously regret a syscall that does this. Djalal is >> working on fixing the turd that is hidepid, and this syscall is >> basically incompatible with ever fixing hidepids. I think that, to >> make it less regrettable, it needs to take an fd to a proc mount as a >> parameter. This makes me wonder why it's a syscall at all -- why not >> just create a new file like /proc/pids? > > See reply to fdmap(2). > > pidmap(2) is indeed more complex case exactly because of > pid/tgid/tid/everything else + pidnamespaces + ->hide_pid. > However the problem remains: query task tree without all the bullshit. > C/R people succumbed with /proc/*/children, it was a mistake IMO. Your syscall cannot be implemented sanely. It doesn't remove bullshit -- it adds bullshit. NAK.