Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753821AbdIGFGE (ORCPT ); Thu, 7 Sep 2017 01:06:04 -0400 Received: from mail-qt0-f193.google.com ([209.85.216.193]:35041 "EHLO mail-qt0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753468AbdIGFGC (ORCPT ); Thu, 7 Sep 2017 01:06:02 -0400 X-Google-Smtp-Source: ADKCNb5+UeCUaBYZZSgYw5V5IuQ8cnAXhBbwWu6MGByD7OCbrxiqKrrwsnPfCWR2PNTEPdFK+QdSlVeEZa7v51BTB2A= MIME-Version: 1.0 In-Reply-To: References: <20170905190500.GA13746@avx2> <20170905155320.a683a4853b21a3be32d8b529@linux-foundation.org> From: Djalal Harouni Date: Thu, 7 Sep 2017 07:06:00 +0200 Message-ID: Subject: Re: [PATCH 1/2] pidmap(2) To: Andy Lutomirski , Alexey Dobriyan Cc: Randy Dunlap , Andrew Morton , Tatsiana Brouka , "linux-kernel@vger.kernel.org" , Linux API , Aliaksandr Patseyenak , Alexey Gladkov Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2753 Lines: 65 Hi Alexey, On Thu, Sep 7, 2017 at 4:04 AM, Andy Lutomirski wrote: > On Wed, Sep 6, 2017 at 2:04 AM, Alexey Dobriyan wrote: >> On 9/6/17, Randy Dunlap wrote: >>> On 09/05/17 15:53, Andrew Morton wrote: [...] >>> >>> also, I expect that the tiny kernel people will want kconfig options for >>> these syscalls. >> >> We'll add it but the question if it is a good idea. Ideally these system calls >> should be mandatory and /proc optional. >> >> $ size kernel/pidmap.o fs/fdmap.o >> text data bss dec hex filename >> 560 0 0 560 230 kernel/pidmap.o >> 617 0 0 617 269 fs/fdmap.o > > After much discussion at LPC/KS last year, I thought the idea was to > try to speed up /proc rather than replacing it outright. The two > specific ideas I recall were: > > 1. Add a syscall like readfileat() that you can use to, in a single > operation, open, read, and close a /proc file (or other file). This > should vastly reduce locking and RCU overhead. > > 2. Add a /proc file that has a nice binary format for task info. (nl_attr?) > > I don't see why pidmap() deserves to be significantly faster than getdents(). > > Also, a pidmap() syscall like this inherently bypasses any security > restrictions implied by the way that /proc is mounted. It can respect > hidepid, but hidepid (as a per-namespace concept) is an enormous turd > that badly needs to be deprecated, and Djalal is working on exactly > that. Yes as noted by Andy, me and Alexey Gladkov are working on modernizing procfs [1] and to reduce/remove ties within pid namespaces which has lot of problems now. We just picked the task again, and this was the result of discussion with Andy some months ago, on how to improve hidepid, but also how to improve procfs in general, so we can add other mechanisms to hide or return NULL on other /proc/_file_not_needed_by_containers_ or /proc/_specific_module_files_ everything that is not virtualized , or mount only some specific view of the whole /proc API this will also be used by containers. This also should make it hard for attackers since we are planning to have a backward compatible options on how to better treat some of these files in regard of some namespaces. The syscall or readfileat() for one operation is a nice addition definitively. But in general it would be better to treat /proc as a filesystem and not add other specific interfaces that may abstract it with pidns, as it is the situation now which make it from userspace perspective: hard to use especially for security context. Alexey, could you please Cc'us on future, thank you very much! [1] https://lkml.org/lkml/2017/4/25/282 -- tixxdz