Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp3284836ybz; Mon, 27 Apr 2020 13:19:47 -0700 (PDT) X-Google-Smtp-Source: APiQypKJ8pVMg36t9tuLsmU4XY3gZ4ZIYa2/MoMIk7AACuZmXyqew/DDMbVLRc28Fdi+miTlJBkE X-Received: by 2002:aa7:dcd7:: with SMTP id w23mr20155657edu.300.1588018786755; Mon, 27 Apr 2020 13:19:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588018786; cv=none; d=google.com; s=arc-20160816; b=QGb27JbA2BAOnx8STpJPLIoye/RJIa9RHhh0CjSDB1gm3f0rzo8BIfcRl0Bl0EcwaI waiJ0xOuVoJmqlQHG0Z8+vUuWDITOrWcvX59bKmWmMempar5Dqranv/W3FNeP7vFZpV1 86AikBREQKAyU6ONDfRWNbBtuzyseI8uACgQJnTBULaXslAVzRj3P8HXlYsKdk6Tk0PA ocC08/X44DsuWw9BGJr8JJ9BthmZAGAVcCYNTZqQUCoNc/K4lAeA6E/Ul5qInjADEWmA 82/00MumOUGKJd2NATUaMLwjHiT1DHQPsKJR/kqIaSli1l1o5VJAtbbxNXr46nT+WFFU dTRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=7iL3P/JzmGV/4FhZYK78DwvqGRt8Z/9KyiOaVtEAOYU=; b=z8Db5aF4LN7kFJuYOv678z5mKBVSnrTPsMmAr1yLL8IJpXKuRTi9vbwYioaff5UfDR kRvm6APCGliMTwZjU4csUDudpDwhf4KJS1U69CayEgMj3rhDrrcRcPH/KYPyY1bLqMT3 Ntt5QHxBvhbI3kqWjqzefwn56PFkKfkssuy8O6Zs1o2yw2rTnf2SCfF1PzvnB6JrarhD lHR4VyWMrQdbguIFfoMDPN5dPcmWgwJpcpniaHyGdaQU5o+I0BEgRX8NiMFU4SKH3LwJ 1v1vYR6SZ38P03fpJNql1/pSuNCH0C7Gq02TELxkwICX0GZUO6ODodtIESHOjL1roinp S/kQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id oo21si394504ejb.453.2020.04.27.13.19.22; Mon, 27 Apr 2020 13:19:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726921AbgD0UNO (ORCPT + 99 others); Mon, 27 Apr 2020 16:13:14 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:43909 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726832AbgD0UNN (ORCPT ); Mon, 27 Apr 2020 16:13:13 -0400 Received: from ip5f5af183.dynamic.kabel-deutschland.de ([95.90.241.131] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jTA7l-0006kq-1D; Mon, 27 Apr 2020 20:13:05 +0000 Date: Mon, 27 Apr 2020 22:13:03 +0200 From: Christian Brauner To: Arnd Bergmann Cc: Hagen Paul Pfeifer , "Eric W. Biederman" , Jann Horn , kernel list , Florian Weimer , Al Viro , Christian Brauner , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Brian Gerst , Sami Tolvanen , David Howells , Aleksa Sarai , Andy Lutomirski , Oleg Nesterov , Arnaldo Carvalho de Melo , Sargun Dhillon , Linux API , linux-arch , Linus Torvalds , Greg Kroah-Hartman Subject: Re: [RFC v2] ptrace, pidfd: add pidfd_ptrace syscall Message-ID: <20200427201303.tbiipopeapxofn6h@wittgenstein> References: <20200426130100.306246-1-hagen@jauu.net> <20200426163430.22743-1-hagen@jauu.net> <20200427170826.mdklazcrn4xaeafm@wittgenstein> <87zhawdc6w.fsf@x220.int.ebiederm.org> <20200427185929.GA1768@laniakea> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 27, 2020 at 10:08:03PM +0200, Arnd Bergmann wrote: > On Mon, Apr 27, 2020 at 8:59 PM Hagen Paul Pfeifer wrote: > > > > * Eric W. Biederman | 2020-04-27 13:18:47 [-0500]: > > > > >I am conflicted about that but I have to agree. Instead of > > >duplicating everything it would be good enough to duplicate the once > > >that cause the process to be attached to use. Then there would be no > > >more pid races to worry about. > > > > >How does this differ using the tracing related infrastructure we have > > >for the kernel on a userspace process? I suspect augmenting the tracing > > >infrastructure with the ability to set breakpoints and watchpoints (aka > > >stopping userspace threads and processes might be a more fertile > > >direction to go). > > > > > >But I agree either we want to just address the races in PTRACE_ATTACH > > >and PTRACE_SIEZE or we want to take a good hard look at things. > > > > > >There is a good case for minimal changes because one of the cases that > > >comes up is how much work will it take to change existing programs. But > > >ultimately ptrace pretty much sucks so a very good set of test cases and > > >documentation for what we want to implement would be a very good idea. > > > > Hey Eric, Jann, Christian, Arnd, > > > > thank you for your valuable input! IMHO I think we have exactly two choices > > here: > > > > a) we go with my patchset that is 100% ptrace feature compatible - except the > > pidfd thing - now and in the future. If ptrace is extended pidfd_ptrace is > > automatically extended and vice versa. Both APIs are feature identical > > without any headaches. > > b) leave ptrace completely behind us and design ptrace that we have always > > dreamed of! eBPF filters, ftrace kernel architecture, k/uprobe goodness, > > a speedy API to copy & modify large chunks of data, io_uring/epoll support > > and of course: pidfd based (missed likely thousands of other dreams) > > > > I think a solution in between is not worth the effort! It will not be > > compatible in any way for the userspace and the benefit will be negligible. > > Ptrace is horrible API - everybody knows that but developers get comfy with > > it. You find examples everywhere, why should we make it harder for the user for > > no or little benefit (except that stable handle with pidfd and some cleanups)? > > > > Any thoughts on this? > > The way I understood Jann was that instead of a new syscall that duplicates > everything in ptrace(), there would only need to be a new ptrace request > such as PTRACE_ATTACH_PIDFD that behaves like PTRACE_ATTACH > but takes a pidfd as the second argument, perhaps setting the return value > to the pid on success. Same for PTRACE_SEIZE. That was my initial suggestion, yes. Any enum that identifies a target by a pid will get a new _PIDFD version and the pidfd is passed as pid_t argument. That should work and is similar to what I did for waitid() P_PIDFD. Realistically, there shouldn't be any system where pid_t is smaller than an int that we care about. > > In effect this is not much different from your a), just a variation on the > calling conventions. The main upside is that it avoids adding another > ugly interface, the flip side is that it makes the existing one slightly worse > by adding complexity. Basically, if a new syscall than please a proper re-design with real benefits. In the meantime we could make due with the _PIDFD variant. And then if someone wants to do the nitty gritty work of adding a ptrace variant purely based on pidfds and with a better api and features that e.g. Jann pointed out then by all means, please do so. I'm sure we would all welcome this as well. Christian