Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp4667001ybp; Mon, 14 Oct 2019 08:10:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqzKbtYss+qYCIr1YUhBB3XpEezImdjRKUQRHIb8CUn6qW3xfECEddARNXjWv0aKotgsOsGU X-Received: by 2002:a17:906:1f16:: with SMTP id w22mr29436342ejj.5.1571065858217; Mon, 14 Oct 2019 08:10:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571065858; cv=none; d=google.com; s=arc-20160816; b=fI+/sGszwQkWFzQ//p6sSusH9IAszbVVQmh4Q/jsiuaZRvew/1KH7e1Ep1MTz+Vl+c OIjET8l0eOtOgEmQlkBAkXG7KJJ70wvdQzQVggu+alCMBy7gG5iu+LdEuO6wS3cSHWnO NSez9JEORbHupSiiMC+0FeKOTQ/a7HztuOkoZmoSYO0Cn0ivo5kOFe77RpXZ4r6ckaAL syqG90hMWC8aSKo6W7i1+ch1UNOoX7cDeXJuEceV6jZUeN+ETvb0fXcnCx1pmwnWKjfR K6p4QxZmfHDi/ChWne9JrXvDvvbE8cNT+xUG5mXBaWznq9QPLRT+o0uEW/fhZbwVN7pP z97A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=hu8Yhu1dG5x2/sxZOgW+fGVZWm5nOL8RLM6NxRrm9l0=; b=I7NC+ak/Z7vbuk67c37ikgTRYQS8VRo6XjtAdoLO074UW15yRWjQqc60tOen/Hi8Lq FTxV/CF81SHDhMwZYSq6UcHystMZOlhzzsOmKBIVO/KWB9TWPVdZqdy3PUYTzYjdMbm+ OBQZtsXW4xVe1F5ZcuRjiUdGr0kzG4beshUnQThIlvDavNgWaTkuT+UdQ6Q5P0hGg8uF GCfshBSoue6Ittlc/cZuYdT33nIKmEz4MahvH4UUgebCdH+R5KdO2oXkbX9UnJTe1mpn 3yNcP0mentDNdC4bvzXAYKXXfV2fQKPS+d4vPfty2rAet3LD6MgJsQzCrKBhFHuXf6qc DcNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=irihhdtC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p22si11526268ejf.278.2019.10.14.08.10.33; Mon, 14 Oct 2019 08:10:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=irihhdtC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733206AbfJNPK1 (ORCPT + 99 others); Mon, 14 Oct 2019 11:10:27 -0400 Received: from mail-oi1-f194.google.com ([209.85.167.194]:44445 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732647AbfJNPK0 (ORCPT ); Mon, 14 Oct 2019 11:10:26 -0400 Received: by mail-oi1-f194.google.com with SMTP id w6so13978961oie.11 for ; Mon, 14 Oct 2019 08:10:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hu8Yhu1dG5x2/sxZOgW+fGVZWm5nOL8RLM6NxRrm9l0=; b=irihhdtCoJ00M0IblFdd5syP/fq3aYa9IW6f96WKv0Y8g+0kG5OZzWwJg/e9aulJz9 +Czl87cK+4NMVqCtZPXaQYGsFhqTOCeDgrU9Bu+doIdrW2b6FDnjdI6kLkpxO9ZlU1gn OcAI3fAZql6up7es1jIfjWz6QfXGT2AGMzvZOKSEy8Ab1lezC8/1Wg2IuZMJLy8/oG2I 91UovkoFjQ9hmkIpeAvZ2hVqRY0894cSK/P4/8HKR5yXq5uZO7nt8gbOqqdWB+5md6KF uRPY0eRgQTu2jNYlMt7nayxBWml0N9RWBUDDt3vRzJgtFGSrgdtU3v64A/oPOJ3AKPms zDqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hu8Yhu1dG5x2/sxZOgW+fGVZWm5nOL8RLM6NxRrm9l0=; b=SWC5cGymME054pbWfjvKY5iKRvNh1wG26ylF2zDIkkaqb6hmen+ijli4LiyXHNd0+o bL+PygNChWmODVvuzFXQJ016v+psA8CQcKjGd42jwbSpQkBR1EfTlKlcv2pJ4OQHEaFQ 24JgajaICoANJZYUtKbtzakUOZUXlVhLML8IpTGyJZ++mPpFcnW0Borj62T29xKI5EbC kD5sPcGZ9AoucsmVxybUSHgkv8tTFP9iPA2yvbzTO2XbujK8pGS3N9vdZBQu7mDhpnaa aWWfH1QGv4b81CVPwfQKk4dvKb9FYJ6p++J6xXgzR1uWfhzYa7XvMD7YFvPS+YdLwXKY zXVg== X-Gm-Message-State: APjAAAWlzmEwMRofTe6fjyLt70Q4QGql5KDImJOV/SOZb+/mMaKrOOou sZSXutsUKIlml1AkDjP92g3Z/ol0MAUGUDpPIAsrZQ== X-Received: by 2002:aca:5c06:: with SMTP id q6mr25208555oib.175.1571065824954; Mon, 14 Oct 2019 08:10:24 -0700 (PDT) MIME-Version: 1.0 References: <20191012101922.24168-1-christian.brauner@ubuntu.com> In-Reply-To: <20191012101922.24168-1-christian.brauner@ubuntu.com> From: Jann Horn Date: Mon, 14 Oct 2019 17:09:58 +0200 Message-ID: Subject: Re: [PATCH] pidfd: add NSpid entries to fdinfo To: Christian Brauner Cc: Andrea Arcangeli , Andrew Morton , Christian Kellner , Christian Kellner , Aleksa Sarai , Elena Reshetova , Roman Gushchin , "Dmitry V. Levin" , Linux API , kernel list , Michal Hocko , Ingo Molnar , Peter Zijlstra , Thomas Gleixner , Al Viro Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 12, 2019 at 12:19 PM Christian Brauner wrote: > Currently, the fdinfo file of contains the field Pid: nit: something missing after "of"? > It contains the pid a given pidfd refers to in the pid namespace of the > opener's procfs instance. s/opener's // here and in various places below? "the opener's" is kinda misleading since it sounds as if it has something to do with task identity. > If the pid namespace of the process is not a descendant of the pid > namespace of the procfs instance 0 will be shown as its pid. This is > similar to calling getppid() on a process who's parent is out of it's nit: s/who's/whose/ nit: s/it's/its/ > pid namespace (e.g. when moving a process into a sibling pid namespace > via setns()). You can't move a process into a PID namespace via setns(), you can only set the PID namespace for its children; and even then, you can't set that to a sibling PID namespace, you can only move into descendant PID namespaces. > Add an NSpid field for easy retrieval of the pid in all descendant pid > namespaces: > If pid namespaces are supported this field will contain the pid a given > pidfd refers to for all descendant pid namespaces starting from the > current pid namespace of the opener's procfs instance, i.e. the first s/current // - neither tasks nor procfs instances can change which pid namespace they're associated with > pid entry for Pid and NSpid will be identical. *the Pid field and the first entry in the NSpid field? > If the pid namespace of the process is not a descendant of the pid > namespace of the procfs instance 0 will be shown as its first NSpid and > no other NSpid entries will be shown. > Note that this differs from the Pid and NSpid fields in > /proc//status where Pid and NSpid are always shown relative to the > pid namespace of the opener's procfs instace. The difference becomes > obvious when sending around a pidfd between pid namespaces from > different trees, i.e. where no ancestoral relation is present between > the pid namespaces: > 1. sending around pidfd: > - create two new pid namespaces ns1 and ns2 in the initial pid namespace > (Also take care to create new mount namespaces in the new pid > namespace and mount procfs.) > - create a process with a pidfd in ns1 > - send pidfd from ns1 to ns2 > - read /proc/self/fdinfo/ and observe that Pid and NSpid entry > are 0 > - create a process with a pidfd in truncated phrase? > - open a pidfd for a process in the initial pid namespace > 2. sending around /proc//status fd: > - create two new pid namespaces ns1 and ns2 in the initial pid namespace > (Also take care to create new mount namespaces in the new pid > namespace and mount procfs.) > - create a process in ns1 > - open /proc//status in the initial pid namespace for the process > you created in ns1 > - send statusfd from initial pid namespace to ns2 > - read statusfd and observe: > - that Pid will contain the pid of the process as seen from the init > - that NSpid will contain the pids of the process for all descendant > pid namespaces starting from the initial pid namespace You don't need fd passing for case 2, you can just look at the procfs instance that's mounted in the initial mount namespace. Using fd passing in this example kinda obfuscates what's going on, in my opinion. > +/** > + * pidfd_show_fdinfo - print information about a pidfd > + * @m: proc fdinfo file > + * @f: file referencing a pidfd > + * > + * Pid: > + * This function will print the pid a given pidfd refers to in the pid > + * namespace of the opener's procfs instance. > + * If the pid namespace of the process is not a descendant of the pid > + * namespace of the procfs instance 0 will be shown as its pid. This is > + * similar to calling getppid() on a process who's parent is out of it's > + * pid namespace (e.g. when moving a process into a sibling pid namespace > + * via setns()). > + * NSpid: > + * If pid namespaces are supported then this function will also print the > + * pid a given pidfd refers to for all descendant pid namespaces starting > + * from the current pid namespace of the opener's procfs instance, i.e. the > + * first pid entry for Pid and NSpid will be identical. > + * If the pid namespace of the process is not a descendant of the pid > + * namespace of the procfs instance 0 will be shown as its first NSpid and > + * no other NSpid entries will be shown. > + * Note that this differs from the Pid and NSpid fields in > + * /proc//status where Pid and NSpid are always shown relative to the > + * pid namespace of the opener's procfs instace. The difference becomes > + * obvious when sending around a pidfd between pid namespaces from > + * different trees, i.e. where no ancestoral relation is present between > + * the pid namespaces: > + * 1. sending around pidfd: > + * - create two new pid namespaces ns1 and ns2 in the initial pid namespace > + * (Also take care to create new mount namespaces in the new pid > + * namespace and mount procfs.) > + * - create a process with a pidfd in ns1 > + * - send pidfd from ns1 to ns2 > + * - read /proc/self/fdinfo/ and observe that Pid and NSpid entry > + * are 0 > + * - create a process with a pidfd in > + * - open a pidfd for a process in the initial pid namespace > + * 2. sending around /proc//status fd: > + * - create two new pid namespaces ns1 and ns2 in the initial pid namespace > + * (Also take care to create new mount namespaces in the new pid > + * namespace and mount procfs.) > + * - create a process in ns1 > + * - open /proc//status in the initial pid namespace for the process > + * you created in ns1 > + * - send statusfd from initial pid namespace to ns2 > + * - read statusfd and observe: > + * - that Pid will contain the pid of the process as seen from the init > + * - that NSpid will contain the pids of the process for all descendant > + * pid namespaces starting from the initial pid namespace > + */ See above, same problems as in the commit message. Actually, since you have a big comment block with this text, there's no reason to repeat it in the commit message, right? (By the way, only slightly related to this patch: At the moment, if you have a pidfd to a task that has already been reaped, and the PID has been reallocated to another task, the pidfd will still show up in /proc/*/fdinfo/* as if it referred to a non-dead process, right? Might be worth considering whether you want to probe ->tasks[PIDTYPE_PID] or ->tasks[PIDTYPE_TGID] for NULL-ness when printing fdinfo, or something like that.)