Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752968AbdIGCFM (ORCPT ); Wed, 6 Sep 2017 22:05:12 -0400 Received: from mail-io0-f178.google.com ([209.85.223.178]:34767 "EHLO mail-io0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751978AbdIGCFJ (ORCPT ); Wed, 6 Sep 2017 22:05:09 -0400 X-Google-Smtp-Source: AOwi7QDnLVgFEYh/1zo5i10he2Xjbo0Yflk0qPsEJeT9+JWBIWo2WcpHDrfexfun5u492JOKYKL2wlLjAnw8op/ol9A= MIME-Version: 1.0 In-Reply-To: References: <20170905190500.GA13746@avx2> <20170905155320.a683a4853b21a3be32d8b529@linux-foundation.org> From: Andy Lutomirski Date: Wed, 6 Sep 2017 19:04:48 -0700 Message-ID: Subject: Re: [PATCH 1/2] pidmap(2) To: Alexey Dobriyan , Djalal Harouni Cc: Randy Dunlap , Andrew Morton , Tatsiana Brouka , "linux-kernel@vger.kernel.org" , Linux API , Aliaksandr Patseyenak Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2206 Lines: 54 On Wed, Sep 6, 2017 at 2:04 AM, Alexey Dobriyan wrote: > On 9/6/17, Randy Dunlap wrote: >> On 09/05/17 15:53, Andrew Morton wrote: >>> On Tue, 5 Sep 2017 22:05:00 +0300 Alexey Dobriyan >>> wrote: >>> >>>> Implement system call for bulk retrieveing of pids in binary form. >>>> >>>> Using /proc is slower than necessary: 3 syscalls + another 3 for each >>>> thread + >>>> converting with atoi(). >>>> >>>> /proc may be not mounted especially in containers. Natural extension of >>>> hidepid=2 efforts is to not mount /proc at all. >>>> >>>> It could be used by programs like ps, top or CRIU. Speed increase will >>>> become more drastic once combined with bulk retrieval of process >>>> statistics. >>> >>> The patches are performance optimizations, but their changelogs contain >>> no performance measurements! >>> >>> Demonstration of some compelling real-world performance benefits would >>> help things along a lot. >>> >> >> also, I expect that the tiny kernel people will want kconfig options for >> these syscalls. > > We'll add it but the question if it is a good idea. Ideally these system calls > should be mandatory and /proc optional. > > $ size kernel/pidmap.o fs/fdmap.o > text data bss dec hex filename > 560 0 0 560 230 kernel/pidmap.o > 617 0 0 617 269 fs/fdmap.o After much discussion at LPC/KS last year, I thought the idea was to try to speed up /proc rather than replacing it outright. The two specific ideas I recall were: 1. Add a syscall like readfileat() that you can use to, in a single operation, open, read, and close a /proc file (or other file). This should vastly reduce locking and RCU overhead. 2. Add a /proc file that has a nice binary format for task info. (nl_attr?) I don't see why pidmap() deserves to be significantly faster than getdents(). Also, a pidmap() syscall like this inherently bypasses any security restrictions implied by the way that /proc is mounted. It can respect hidepid, but hidepid (as a per-namespace concept) is an enormous turd that badly needs to be deprecated, and Djalal is working on exactly that.