Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp17793imm; Thu, 20 Sep 2018 16:43:38 -0700 (PDT) X-Google-Smtp-Source: ANB0VdaEa7+DydzOv5qtFw3BN6cdXhG7kViFcNCkrtk7pxVKrvSDHiPvtfh2Nju9EIN4grAXBx6p X-Received: by 2002:a17:902:bc41:: with SMTP id t1-v6mr41927627plz.52.1537487018504; Thu, 20 Sep 2018 16:43:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537487018; cv=none; d=google.com; s=arc-20160816; b=TJD1HaA0f0UC9lTKU2RI8E9HIuPi71pcy3j4jdPiDnr0StKyiNvn/vtpa2T5eU5iuI 5xylF361H76b0caA11YPOM8dS+IZuaURLLd36ha0t+3jAAMOk/Jw5NKet62OTziDm2s+ 6QeRfr0Ql0WEflwO93DwfpIO1pC371iW4CRzoh8i6BuYNbCW1O/WDy/2rqAFley/hezn ln2LUOjz/ops5sdjDJmV5u5X1JlDJrCxB4J7ucZX/RmbOeCcL+P96td8I2vd4TIiGlAL 2w/fDgmUimyOETOdrYHu3M7NuRNrY0j0CTXPHXxaXDTEui/z8+MqQzBMhfaoILMVe6gx Tgxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=dpIOOV+Mtr5VfamdhOMnuHxZkJT+62YBqVkdnKLU620=; b=nmZqNO2wyCGTJBi1Nc+N0bFNRdVBB5Sw10TtJsWsE0u9+2u90Vp0kiFfk7Tgt2PjRO /Gssc7KwqAkIsYEeC7KB6BaVnnp9U6e4tk8MsbFNohSMcbv3PLl9BfkYRhuvWu14tewv O2G5oMmzbBxd1ndLx3gqkf5TQ6U5AJq0fLNQsYH1zL5eEwoxiCF7HD5fN6jtaJy+DOWN IUzgjnz13LrVxfuLdgeDMnfqW93yDbpaWrjsDjds0P1ip3ZXU44HKWn2HpOaC/4t9Xj9 x+yfvbkbJgVgtreX4QCaYBn5lTGj6u7SzLCfQ8QEJcWvRQC3kX0rfQPlTnjSLYzaI5bU 07mA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@tycho-ws.20150623.gappssmtp.com header.s=20150623 header.b=1bHtqjXK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u76-v6si29139014pfj.58.2018.09.20.16.43.22; Thu, 20 Sep 2018 16:43:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@tycho-ws.20150623.gappssmtp.com header.s=20150623 header.b=1bHtqjXK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388567AbeIUF2s (ORCPT + 99 others); Fri, 21 Sep 2018 01:28:48 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:43524 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725749AbeIUF2s (ORCPT ); Fri, 21 Sep 2018 01:28:48 -0400 Received: by mail-wr1-f65.google.com with SMTP id k5-v6so11033646wre.10 for ; Thu, 20 Sep 2018 16:42:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tycho-ws.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=dpIOOV+Mtr5VfamdhOMnuHxZkJT+62YBqVkdnKLU620=; b=1bHtqjXKfmwbcpf3x4YIyOn2GRcGPVYPLqGegFjFeydt+coECwYmu0n07S6TIG4GdI I5i5uRIG1B9ysNkaY4/DfLn4Z6KEP34O6nKnlgcQ2hvokm01tPi0nDsPaqCC0b73AkDA qoB+ZCxxyUf+y6RfGDJ0lAMs+ls7am531w7MtnpGT07zkU9bfIh+7yCVExVGg5VPlVp2 xMG8AItzhCAoI0RGXgraDhbB8v5n00G5uNHB48L75tgTQxLRlYgdV847H3xmD+6W9Cq7 iP/UK++QqN1l8EJGHphTavVmuiKWqOqNV2ZOQa2+JrDBt7K/UfuyV2aSsR06dOsQB1M+ YCZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=dpIOOV+Mtr5VfamdhOMnuHxZkJT+62YBqVkdnKLU620=; b=IeiPTHavHh3+BXJ0eDSlI5Ze8MryvlB2jsUptLU+OjU8FtsHKbUWB0G8kOOzIeo4j6 S6bBNSF8lSYjBqBwm7b6wh3/FNHBo9mq3hXU6lJoVSyTQuTZKS+IXA3pcLy2OrimLmdz o7fXg5Xc19xruHpNi36TnKjQOdeEqxLW71lmQkdNXw8Zfp9f1GRICvP5qq748KdX9M9t 2FWjSddfIZVG12LU+QRxXuTSzcObPt69XBgriiecUzGKRO5k+p7X/WBJ+hDN1NDIca8C KQgsuV0zpzHevcsETunkpBqKW70xPTHO69H+t1NkezCTuF9tYNxM8bWjI33KGF5YpZmG jvMA== X-Gm-Message-State: APzg51Do+Aezt73/biozfNWPy6zXaYzyFxQw7DEZ4hQeQirLclTaI9PU WmZxxcb6I9CqueMPx0J4UyQibA== X-Received: by 2002:a5d:610b:: with SMTP id v11-v6mr10799781wrt.265.1537486966458; Thu, 20 Sep 2018 16:42:46 -0700 (PDT) Received: from cisco ([173.38.220.55]) by smtp.gmail.com with ESMTPSA id c15-v6sm2131759wmb.2.2018.09.20.16.42.42 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 20 Sep 2018 16:42:45 -0700 (PDT) Date: Thu, 20 Sep 2018 17:42:40 -0600 From: Tycho Andersen To: Andy Lutomirski Cc: Kees Cook , LKML , Linux Containers , Linux API , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Jann Horn Subject: Re: [PATCH v6 4/5] seccomp: add support for passing fds via USER_NOTIF Message-ID: <20180920234240.GR4672@cisco> References: <20180906152859.7810-1-tycho@tycho.ws> <20180906152859.7810-5-tycho@tycho.ws> <20180919095536.GM4672@cisco> <20180919143842.GN4672@cisco> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 19, 2018 at 12:58:20PM -0700, Andy Lutomirski wrote: > On Wed, Sep 19, 2018 at 7:38 AM, Tycho Andersen wrote: > > On Wed, Sep 19, 2018 at 07:19:56AM -0700, Andy Lutomirski wrote: > >> > >> > >> > On Sep 19, 2018, at 2:55 AM, Tycho Andersen wrote: > >> > > >> >> On Wed, Sep 12, 2018 at 04:52:38PM -0700, Andy Lutomirski wrote: > >> >>> On Thu, Sep 6, 2018 at 8:28 AM, Tycho Andersen wrote: > >> >>> The idea here is that the userspace handler should be able to pass an fd > >> >>> back to the trapped task, for example so it can be returned from socket(). > >> >>> > >> >>> I've proposed one API here, but I'm open to other options. In particular, > >> >>> this only lets you return an fd from a syscall, which may not be enough in > >> >>> all cases. For example, if an fd is written to an output parameter instead > >> >>> of returned, the current API can't handle this. Another case is that > >> >>> netlink takes as input fds sometimes (IFLA_NET_NS_FD, e.g.). If netlink > >> >>> ever decides to install an fd and output it, we wouldn't be able to handle > >> >>> this either. > >> >> > >> >> An alternative could be to have an API (an ioctl on the listener, > >> >> perhaps) that just copies an fd into the tracee. There would be the > >> >> obvious set of options: do we replace an existing fd or allocate a new > >> >> one, and is it CLOEXEC. Then the tracer could add an fd and then > >> >> return it just like it's a regular number. > >> >> > >> >> I feel like this would be more flexible and conceptually simpler, but > >> >> maybe a little slower for the common cases. What do you think? > >> > > >> > I'm just implementing this now, and there's one question: when do we > >> > actually do the fd install? Should we do it when the user calls > >> > SECCOMP_NOTIF_PUT_FD, or when the actual response is sent? It feels > >> > like we should do it when the response is sent, instead of doing it > >> > right when SECCOMP_NOTIF_PUT_FD is called, since if there's a > >> > subsequent signal and the tracer decides to discard the response, > >> > we'll have to implement some delete mechanism to delete the fd, but it > >> > would have already been visible to the process, etc. So I'll go > >> > forward with this unless there are strong objections, but I thought > >> > I'd point it out just to avoid another round trip. > >> > > >> > > >> > >> Can you do that non-racily? That is, you need to commit to an fd *number* right away, but what if another thread uses the number before you actually install the fd? > > > > I was thinking we could just do an __alloc_fd() and then do the > > fd_install() when the response is sent or clean up the case that the > > listener or task dies. I haven't actually tried to run the code yet, > > so it's possible the locking won't work :) > > I would be very surprised if the locking works. How can you run a > thread in a process when another thread has allocated but not > installed an fd and is blocked for an arbitrarily long time? I think the trick is that there's no actual locking required (except for a brief locking of task->files). I've run the patch below and it seems to work. But perhaps that's abusing __alloc_fd a little too hard, I don't really know. > > > >> Do we really allow non-“kill” signals to interrupt the whole process? It might be the case that we don’t really need to clean up from signals if there’s a guarantee that the thread dies. > > > > Yes, we do, because of this: https://lkml.org/lkml/2018/3/15/1122 > > > > I'm still not sure I see the problem. Suppose I'm implementing a user > notifier for a nasty syscall like recvmsg(). If I'm the tracer, by > the time I decide to install an fd, I've committed to returning > something other than -EINTR, even if a non-fatal signal is sent before > I finish. No rollback should be necessary. I don't understand why this is true. Surely you could stop a handler on receipt of a new signal, and have it do something else entirely? > In the (unlikely?) event that some tracer needs to be able to rollback > an fd installation to return -EINTR, a SECCOMP_NOTIF_CLOSE_FD > operation should be good enough, I think. Or maybe PUT_FD can put -1 > to delete an fd. Yes, I think even with something like what I did below we'd need some sort of REMOVE_FD option, because otherwise there's no way to change your mind and send -EINTR without the fd you just PUT_FD'd. Tycho From bfca7337cb53791aca74b595eb45e9afa3babac2 Mon Sep 17 00:00:00 2001 From: Tycho Andersen Date: Thu, 20 Sep 2018 06:49:49 -0600 Subject: [PATCH] implement SECCOMP_NOTIF_PUT_FD ioctl Signed-off-by: Tycho Andersen --- include/uapi/linux/seccomp.h | 8 ++ kernel/seccomp.c | 74 ++++++++++++------- tools/testing/selftests/seccomp/seccomp_bpf.c | 24 +++++- 3 files changed, 77 insertions(+), 29 deletions(-) diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h index 8fb2c024c0a1..62e474c372d4 100644 --- a/include/uapi/linux/seccomp.h +++ b/include/uapi/linux/seccomp.h @@ -80,6 +80,12 @@ struct seccomp_notif_resp { __u32 fd_flags; }; +struct seccomp_notif_put_fd { + __u64 id; + __s32 fd; + __u32 fd_flags; +}; + #define SECCOMP_IOC_MAGIC 0xF7 /* Flags for seccomp notification fd ioctl. */ @@ -89,5 +95,7 @@ struct seccomp_notif_resp { struct seccomp_notif_resp) #define SECCOMP_NOTIF_ID_VALID _IOR(SECCOMP_IOC_MAGIC, 2, \ __u64) +#define SECCOMP_NOTIF_PUT_FD _IOR(SECCOMP_IOC_MAGIC, 3, \ + struct seccomp_notif_put_fd) #endif /* _UAPI_LINUX_SECCOMP_H */ diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 21b24cc07237..6bdf413863ca 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -41,6 +41,7 @@ #include #include #include +#include #include enum notify_state { @@ -51,6 +52,7 @@ enum notify_state { struct seccomp_knotif { /* The struct pid of the task whose filter triggered the notification */ + struct task_struct *task; struct pid *pid; /* The "cookie" for this request; this is unique for this filter. */ @@ -80,7 +82,7 @@ struct seccomp_knotif { int error; long val; struct file *file; - unsigned int flags; + int fd; /* Signals when this has entered SECCOMP_NOTIFY_REPLIED */ struct completion ready; @@ -748,9 +750,11 @@ static void seccomp_do_user_notification(int this_syscall, if (!match->notif) goto out; + n.task = current; n.pid = task_pid(current); n.state = SECCOMP_NOTIFY_INIT; n.data = sd; + n.fd = -1; n.id = seccomp_next_notify_id(match); init_completion(&n.ready); @@ -786,16 +790,8 @@ static void seccomp_do_user_notification(int this_syscall, } if (n.file) { - int fd; struct socket *sock; - fd = get_unused_fd_flags(n.flags); - if (fd < 0) { - err = fd; - ret = -1; - goto remove_list; - } - /* * Similar to what SCM_RIGHTS does, let's re-set the cgroup * data to point ot the tracee's cgroups instead of the @@ -807,21 +803,20 @@ static void seccomp_do_user_notification(int this_syscall, sock_update_classid(&sock->sk->sk_cgrp_data); } - ret = fd; - err = 0; - - fd_install(fd, n.file); + fd_install(n.fd, n.file); /* Don't fput, since fd has a reference now */ n.file = NULL; - } else { - ret = n.val; - err = n.error; + n.fd = -1; } + ret = n.val; + err = n.error; remove_list: if (n.file) fput(n.file); + if (n.fd >= 0) + put_unused_fd(n.fd); list_del(&n.list); out: @@ -1683,15 +1678,6 @@ static long seccomp_notify_send(struct seccomp_filter *filter, goto out; } - if (resp.return_fd) { - knotif->flags = resp.fd_flags; - knotif->file = fget(resp.fd); - if (!knotif->file) { - ret = -EBADF; - goto out; - } - } - ret = size; knotif->state = SECCOMP_NOTIFY_REPLIED; knotif->error = resp.error; @@ -1731,6 +1717,42 @@ static long seccomp_notify_id_valid(struct seccomp_filter *filter, return ret; } +static long seccomp_notify_put_fd(struct seccomp_filter *filter, + unsigned long arg) +{ + struct seccomp_notif_put_fd req; + void __user *buf = (void __user *)arg; + struct seccomp_knotif *knotif = NULL; + long ret; + + if (copy_from_user(&req, buf, sizeof(req))) + return -EFAULT; + + ret = mutex_lock_interruptible(&filter->notify_lock); + if (ret < 0) + return ret; + + ret = -ENOENT; + list_for_each_entry(knotif, &filter->notif->notifications, list) { + unsigned long max_files; + + if (knotif->id != req.id) + continue; + + max_files = task_rlimit(knotif->task, RLIMIT_NOFILE); + + knotif->file = fget(req.fd); + knotif->fd = __alloc_fd(knotif->task->files, 0, max_files, + req.fd_flags); + + ret = knotif->fd; + break; + } + + mutex_unlock(&filter->notify_lock); + return ret; +} + static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -1743,6 +1765,8 @@ static long seccomp_notify_ioctl(struct file *file, unsigned int cmd, return seccomp_notify_send(filter, arg); case SECCOMP_NOTIF_ID_VALID: return seccomp_notify_id_valid(filter, arg); + case SECCOMP_NOTIF_PUT_FD: + return seccomp_notify_put_fd(filter, arg); default: return -EINVAL; } diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c index 3dec856717a7..ae8daf992231 100644 --- a/tools/testing/selftests/seccomp/seccomp_bpf.c +++ b/tools/testing/selftests/seccomp/seccomp_bpf.c @@ -169,6 +169,9 @@ struct seccomp_metadata { struct seccomp_notif_resp) #define SECCOMP_NOTIF_ID_VALID _IOR(SECCOMP_IOC_MAGIC, 2, \ __u64) +#define SECCOMP_NOTIF_PUT_FD _IOR(SECCOMP_IOC_MAGIC, 3, \ + struct seccomp_notif_put_fd) + struct seccomp_notif { __u16 len; __u64 id; @@ -186,6 +189,12 @@ struct seccomp_notif_resp { __u32 fd; __u32 fd_flags; }; + +struct seccomp_notif_put_fd { + __u64 id; + __s32 fd; + __u32 fd_flags; +}; #endif #ifndef seccomp @@ -3239,11 +3248,12 @@ TEST(get_user_notification_ptrace) TEST(user_notification_pass_fd) { pid_t pid; - int status, listener; + int status, listener, fd; int sk_pair[2]; char c; struct seccomp_notif req = {}; struct seccomp_notif_resp resp = {}; + struct seccomp_notif_put_fd putfd = {}; long ret; ASSERT_EQ(socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, sk_pair), 0); @@ -3295,9 +3305,15 @@ TEST(user_notification_pass_fd) resp.len = sizeof(resp); resp.id = req.id; - resp.return_fd = 1; - resp.fd = sk_pair[1]; - resp.fd_flags = 0; + + putfd.id = req.id; + putfd.fd = sk_pair[1]; + putfd.fd_flags = 0; + + fd = ioctl(listener, SECCOMP_NOTIF_PUT_FD, &putfd); + EXPECT_GE(fd, 0); + resp.val = fd; + EXPECT_EQ(ioctl(listener, SECCOMP_NOTIF_SEND, &resp), sizeof(resp)); close(sk_pair[1]); -- 2.17.1