Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp3490578img; Mon, 25 Mar 2019 11:20:06 -0700 (PDT) X-Google-Smtp-Source: APXvYqwcmMN3LjKT6KqyuIX0o/W5d6grCZZewIadjQIZ99wB9q+2UxFqwQhtBe7S3DRhNsmysvhq X-Received: by 2002:a63:465b:: with SMTP id v27mr6273853pgk.165.1553538005964; Mon, 25 Mar 2019 11:20:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553538005; cv=none; d=google.com; s=arc-20160816; b=gou9fn+lsIq9tF63nijv8eWub0Y7kv0Dt2ZD3K4+agDcCZURd8AOT8iX9Ighq5q7ZL WZ9nO1vWtRrp/M+x8MuFdGQepCruauTdQ0hv1+A3GZF/P2b4+EcRuZgeUu92FnkRXvI4 I1K6njLu/NyHaO1LftUihP+mLZLfb+bYF7xdtEXlPS2+FGbBiMXNWVxFYoruTB2SlLzh hS+x2mFP+gjEvBwENiK2uSb72poEv6uqicUmSDTLcUTlCMF1vcB3jw6uUFL7dXFtWAP2 xOC04vPabojGizv7OejDSF8gVLDnNq7Hnc6Sugnp5Db+IHzlZc5njudTvdvQdQQgL20k N5Tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=kpfZEB8seTa5rcxqxvtG1a1pQgXUVafxXpBfTveNUuA=; b=FmRf+kw6eRjrfj8dChTO/aeukvnpo5rBbEV/OY90Pdh1dqL7IQOZ2y9Ow/47wHucCL 3EdBNFGu5PhB5L/IdVcbACN+smCAZf//SgzNQtsZSHzgYGMhO1XI8M+AGq9obYMfRZnt H3/DbvQt7cmenTMSc8ZDRYyMpgSi1X2z2rViw3MMaro5HTvkJPRKVJhZK91dsE7LgZIj sgXmwbIg+/eUNWhXNLjLp97zUK9uvBFeN8uKeUBIRuMOEseutitnbu9JkkhyVMkjyKPK vMYedUvcQV+pTX03rrk6jH5IjiS5xqQB6M2ZFudfKGtx0B7P1VYG9gl6Ay7OBN+g5l1h G0rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=V3YZDTHG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 17si13093452pfw.6.2019.03.25.11.19.50; Mon, 25 Mar 2019 11:20:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=V3YZDTHG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729914AbfCYSTK (ORCPT + 99 others); Mon, 25 Mar 2019 14:19:10 -0400 Received: from mail-oi1-f194.google.com ([209.85.167.194]:42491 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729548AbfCYSTJ (ORCPT ); Mon, 25 Mar 2019 14:19:09 -0400 Received: by mail-oi1-f194.google.com with SMTP id w139so7757341oie.9 for ; Mon, 25 Mar 2019 11:19:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kpfZEB8seTa5rcxqxvtG1a1pQgXUVafxXpBfTveNUuA=; b=V3YZDTHGdewcTOVjIzQCzIq9kUZ8v8nnV2JhmhAwF+0wvlSRSL4L/xI7fGjiMJS8Sa kPCrbA3l/HheWbwq6LIpQiWCuNNX/GHSq91JRJASAJI8sfKuiLHGGkIxplKOeZk02/8Z PeUV5hwdTcM2dcVngctKQjnp3sKKbdcitOEXnr+asnMmVnbH7a0LXBu6o15wL147+ef3 QUQ8zvEQUv1x4r9WD1rX/FuLEkLSyxUzvVmryGbyCvwy7a338B4m3z4bMQyvYknOd9A0 2uaDWR5rlnOCm1hhMmipy6aWIQmnLzjOWSyVwO/q9/Duflb/VUyaMvEt/gGM1cZeMvEH MGVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kpfZEB8seTa5rcxqxvtG1a1pQgXUVafxXpBfTveNUuA=; b=WFy23lRFw62aAqx8k907ybzAJY2BJTRChEN3/aPcWNjoOPpJpeRkYMdCDUuyBVThHx BEDuzWSpCN6rsAPCtxaxhstg+vgpQoYsNBnJ5H3LGqv29XwDh8a0SBs+mj8UiD4xqqB1 7WR7wFw6zbL9xpZoKCKkS5vyCpWzLcvKGVpxOwl82XZ8Y2unuE7xXmuutEa9z2gMNNaY DQyLOiT35D65TJUqCLKBODCuNGrbI47f9SN32diufpT/xaHTQZBnvLh5Y4o4tQpI41sv hWhOHTzYyyCL2oIp3sP6poNChagqGNWuwJI97ySHmx7ouqvbedqSs3RMRun2+R/z4zIH OhWA== X-Gm-Message-State: APjAAAWXg3NArOd4FDG9j3nfmKYgwF15YhT6j9g2beyhVd7EDkglY1yP 8m6P0EUsa++XTMOYYj9n7A5kyNyPl3UrOXVvInpt1g== X-Received: by 2002:aca:360a:: with SMTP id d10mr12691911oia.68.1553537948348; Mon, 25 Mar 2019 11:19:08 -0700 (PDT) MIME-Version: 1.0 References: <20190325162052.28987-1-christian@brauner.io> <20190325162052.28987-3-christian@brauner.io> In-Reply-To: <20190325162052.28987-3-christian@brauner.io> From: Jann Horn Date: Mon, 25 Mar 2019 19:18:42 +0100 Message-ID: Subject: Re: [PATCH 2/4] pid: add pidctl() To: Christian Brauner , Andy Lutomirski Cc: Konstantin Khlebnikov , David Howells , "Serge E. Hallyn" , "Eric W. Biederman" , Linux API , kernel list , Arnd Bergmann , Kees Cook , Alexey Dobriyan , Thomas Gleixner , Michael Kerrisk-manpages , bl0pbl33p@gmail.com, "Dmitry V. Levin" , Andrew Morton , Oleg Nesterov , Nagarathnam Muthusamy , cyphar@cyphar.com, Al Viro , "Joel Fernandes (Google)" , Daniel Colascione Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 25, 2019 at 5:21 PM Christian Brauner wrote: > The pidctl() syscalls builds on, extends, and improves translate_pid() [4]. > I quote Konstantins original patchset first that has already been acked and > picked up by Eric before and whose functionality is preserved in this > syscall: [...] > + > +static struct pid_namespace *get_pid_ns_by_fd(int fd) > +{ > + struct pid_namespace *pidns = ERR_PTR(-EINVAL); > + > + if (fd >= 0) { > +#ifdef CONFIG_PID_NS > + struct ns_common *ns; > + struct file *file = proc_ns_fget(fd); > + if (IS_ERR(file)) > + return ERR_CAST(file); > + > + ns = get_proc_ns(file_inode(file)); > + if (ns->ops->type == CLONE_NEWPID) > + pidns = get_pid_ns( > + container_of(ns, struct pid_namespace, ns)); This increments the refcount of the pidns... > + > + fput(file); > +#endif > + } else { > + pidns = task_active_pid_ns(current); ... but this doesn't. That's pretty subtle; could you please put a comment on top of this function that points this out? Or even better, change the function to always take a reference, so that the caller doesn't have to worry about figuring this out. > + } > + > + return pidns; > +} [...] > +SYSCALL_DEFINE5(pidctl, unsigned int, cmd, pid_t, pid, int, source, int, target, > + unsigned int, flags) > +{ > + struct pid_namespace *source_ns = NULL, *target_ns = NULL; > + struct pid *struct_pid; > + pid_t result; > + > + switch (cmd) { > + case PIDCMD_QUERY_PIDNS: > + if (pid != 0) > + return -EINVAL; > + pid = 1; > + /* fall through */ > + case PIDCMD_QUERY_PID: > + if (flags != 0) > + return -EINVAL; > + break; > + case PIDCMD_GET_PIDFD: > + if (flags & ~PIDCTL_CLOEXEC) > + return -EINVAL; > + break; > + default: > + return -EOPNOTSUPP; > + } > + > + source_ns = get_pid_ns_by_fd(source); > + result = PTR_ERR(source_ns); I very much dislike using PTR_ERR() on pointers before checking whether they contain an error value or not. I understand that the result of this won't actually be used, but it still seems weird to have what is essentially a cast of a potentially valid pointer to a potentially smaller integer here. Could you maybe move the PTR_ERR() into the error branch? Like so: if (IS_ERR(source_ns)) { result = PTR_ERR(source_ns); goto err_source; } > + if (IS_ERR(source_ns)) > + goto err_source; > + > + target_ns = get_pid_ns_by_fd(target); > + result = PTR_ERR(target_ns); > + if (IS_ERR(target_ns)) > + goto err_target; > + > + if (cmd == PIDCMD_QUERY_PIDNS) { > + result = pidns_related(source_ns, target_ns); > + } else { > + rcu_read_lock(); > + struct_pid = find_pid_ns(pid, source_ns); find_pid_ns() doesn't take a reference on its return value, the return value is only pinned into memory by the RCU read-side critical section... > + result = struct_pid ? pid_nr_ns(struct_pid, target_ns) : -ESRCH; > + rcu_read_unlock(); ... which ends here, making struct_pid a dangling pointer... > + > + if (cmd == PIDCMD_GET_PIDFD) { > + int cloexec = (flags & PIDCTL_CLOEXEC) ? O_CLOEXEC : 0; > + if (result > 0) > + result = pidfd_create_fd(struct_pid, cloexec); ... and then here you continue to use struct_pid. That seems bogus. > + else if (result == 0) > + result = -ENOENT; You don't need to have flags for this for new syscalls, you can just make everything O_CLOEXEC; if someone wants to preserve an fd across execve(), they can just clear that bit with fcntl(). (I thiiink it was Andy Lutomirski who said that before regarding another syscall? But my memory of that is pretty foggy, might've been someone else.) > + } > + } > + > + if (target) > + put_pid_ns(target_ns); > +err_target: > + if (source) > + put_pid_ns(source_ns); I think you probably intended to check for "if (target >= 0)" and "if (source >= 0)" instead of these conditions, matching the condition in get_pid_ns_by_fd()? The current code looks as if it will leave the refcount one too high when used with target==0 or source==0, and leave the refcount one too low when used with target==-1 or source==-1. Anyway, as stated above, I think it would be simpler to unconditionally take a reference instead. > +err_source: > + return result; > +}