Received: by 10.223.185.116 with SMTP id b49csp2430742wrg; Thu, 15 Feb 2018 11:27:00 -0800 (PST) X-Google-Smtp-Source: AH8x226QxNdDWY1o09kFq6nvC9WXwoDNXbwQOmzmdHb5eYzlESc8jZQFwPCUru8iN+JLdSF7Y7bM X-Received: by 2002:a17:902:6c41:: with SMTP id h1-v6mr3427380pln.25.1518722820199; Thu, 15 Feb 2018 11:27:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518722820; cv=none; d=google.com; s=arc-20160816; b=QyP3wDHAqyKAC3RglEnHV0paXUkJLm3YzfX/vmr1ugofV1hQy1EpOXB+5tOAxD+U/E hptPXmCRtRYsC/T17A+oIPxBdWfhIWmjkh8WEAKPg08ZpRbmWo04qyxJhqGe5sNmiI8o 5lQXEZRTRRjaGss9VnuiGUiPVc+ayuLX60Z8Up0lykXbQHkmtFRKOa02+SHUbpeVEnXf hA46Tm+ij9hLn5FK5V/z5s1mwQQ+wEwOQElV2k3y9x+jDkxv5KbUkRlTH3zYOjnVjNlo DT0CjvDmfHWQ2x0VhwaC1tDYk4H+HwSZSgWDolAGSWdby0acemWYT9Du7dUEw+rVxoY1 sseQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=kNSCrKLkV1vncMx3QKdPRG2uQ35dBP81Qn9UH8/tk+c=; b=b1eo9a3Q1ywYCXF+miuKOeIkPQXl36tz2u8AXQbpgPTLFMhvG9xOB3v0dV5ln7663E V2YZywgSzS11zslzkcqDmksPzVqCuLaya6ol+zPmhGBKjgxJnQ+luaDmnTpMbvrVFvTI 8PYJBnyxFqPbG7TZ89Mf51+yDs8rofRoEg3VwWv1NAwlFnbOMilaSgUOyckgP7L/c/Bv 4WOALcrZmBw3oYKTu6LbejqyBV7MSu6McaWMQCbiVTbUY7z5mUy4xuC3UuMj5GzPxiXN 3RPrE7ELLBKtf4NozkbZfPYmFXuci4cSoybBy1uZyEBgy8DbXyOpBubKbWqBhZUwHZTu a4uQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d21si189313pgv.365.2018.02.15.11.26.45; Thu, 15 Feb 2018 11:27:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1033524AbeBOOtB (ORCPT + 99 others); Thu, 15 Feb 2018 09:49:01 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:33199 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1033102AbeBOOs7 (ORCPT ); Thu, 15 Feb 2018 09:48:59 -0500 Received: from mail-wr0-f200.google.com ([209.85.128.200]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1emKqI-0004im-Lp for linux-kernel@vger.kernel.org; Thu, 15 Feb 2018 14:48:58 +0000 Received: by mail-wr0-f200.google.com with SMTP id y44so1918663wry.8 for ; Thu, 15 Feb 2018 06:48:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=kNSCrKLkV1vncMx3QKdPRG2uQ35dBP81Qn9UH8/tk+c=; b=iqoDQV7LP9ZxvvHdqoF0MZiq41tErwarNvMbCqNHnLpRKvh5ZEMVX/d7bnsHWXlwSg 7aY3lz2aXkDkrXHa7u7zcfM8sJEouxe+8j2U+6H5PK+7xJHCNMm5Cm/frg8DF5WN5DZb X/31/WneC+nH7/g+nk93MXnp2hZmnK/U16B6uI6L79XL+7wxLf/klzb23sK3IVWUx6r8 mmXm4CiwBH2pafSQ7tSLVn0zR0P2nCdQD1L2B4NPL9APp85HEmAxo6E7TVvwcmTS/O88 ujJhfReGv6RyOv1Kb+2Gx+iWxg22n9Ioif3AtKCLz7xttmb2iQOM1zuYGv39fure9ftB QB+Q== X-Gm-Message-State: APf1xPBMf+2rvkXfEyjxuJI/CpD2+jKZ5kYIAOsLAAjqdH73OK3FP0iP p97pdytfn71VOA8MitBAExWZrbJdNn5UpDt4N/OGkMGGgtuKs7uXDIpSmld2VeRhTwPaNfyInAP FtU+Cf2f/XtU3aVbTqnjBnzzWIXLfCjFbRJX39hDQrg== X-Received: by 10.223.193.65 with SMTP id w1mr2929799wre.88.1518706138305; Thu, 15 Feb 2018 06:48:58 -0800 (PST) X-Received: by 10.223.193.65 with SMTP id w1mr2929784wre.88.1518706138063; Thu, 15 Feb 2018 06:48:58 -0800 (PST) Received: from gmail.com (u-087-c099.eap.uni-tuebingen.de. [134.2.87.99]) by smtp.gmail.com with ESMTPSA id k7sm18085076wrg.68.2018.02.15.06.48.56 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 15 Feb 2018 06:48:57 -0800 (PST) Date: Thu, 15 Feb 2018 15:48:56 +0100 From: Christian Brauner To: Andy Lutomirski Cc: Tycho Andersen , Kees Cook , LKML , Linux Containers , Oleg Nesterov , "Eric W . Biederman" , "Serge E . Hallyn" , Christian Brauner , Tyler Hicks , Akihiro Suda , Tom Hromatka , Sargun Dhillon , Paul Moore Subject: Re: [RFC 1/3] seccomp: add a return code to trap to userspace Message-ID: <20180215144855.GA16088@gmail.com> References: <20180204104946.25559-1-tycho@tycho.ws> <20180204104946.25559-2-tycho@tycho.ws> <20180214152958.cjgwh2k52zji2jxk@cisco> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.3 (2018-01-21) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 14, 2018 at 05:19:52PM +0000, Andy Lutomirski wrote: > On Wed, Feb 14, 2018 at 3:29 PM, Tycho Andersen wrote: > > Hey Kees, > > > > Thanks for taking a look! > > > > On Tue, Feb 13, 2018 at 01:09:20PM -0800, Kees Cook wrote: > >> On Sun, Feb 4, 2018 at 2:49 AM, Tycho Andersen wrote: > >> > This patch introduces a means for syscalls matched in seccomp to notify > >> > some other task that a particular filter has been triggered. > >> > > >> > The motivation for this is primarily for use with containers. For example, > >> > if a container does an init_module(), we obviously don't want to load this > >> > untrusted code, which may be compiled for the wrong version of the kernel > >> > anyway. Instead, we could parse the module image, figure out which module > >> > the container is trying to load and load it on the host. > >> > > >> > As another example, containers cannot mknod(), since this checks > >> > capable(CAP_SYS_ADMIN). However, harmless devices like /dev/null or > >> > /dev/zero should be ok for containers to mknod, but we'd like to avoid hard > >> > coding some whitelist in the kernel. Another example is mount(), which has > >> > many security restrictions for good reason, but configuration or runtime > >> > knowledge could potentially be used to relax these restrictions. > >> > >> Related to the eBPF seccomp thread, can the logic for these things be > >> handled entirely by eBPF? My assumption is that you still need to stop > >> the process to do something (i.e. do a mknod, or a mount) before > >> letting it continue. Is there some "wait for notification" system in > >> eBPF? > > > > I replied in the other thread > > (https://patchwork.ozlabs.org/cover/872938/#1856642 for those > > following along at home), but no, at least not that I know of. > > eBPF can call functions. One of those functions could put the caller > to sleep. In fact, I think I once proposed doing this for the seccomp > logging action as well. > > >> I wonder if this communication should be netlink, which gives a more > >> well-structured way to describe what's on the wire? The reason I ask > >> is because if we ever change the seccomp_data structure, we'll now > >> have two places where we need to deal with it (the first being within > >> the BPF itself). My initial idea was to prefix the communication with > >> a size field, then send the structure, and then I had nightmares, and > >> realized this was basically netlink reinvented. > > > > I suggested netlink in LA, and everyone (especially Andy) groaned very > > loudly :). I'm happy to switch it to netlink if you like, although i > > think memcpy() of structs should be safe here, since the return value > > from read or write can indicate the size of things. > > I could easily get on board with "netlink" (i.e. NLA) messages sent > over an fd. I will object strongly to the use of netlink *sockets*. I think sending netlink messages makes perfect sense here although we burden userspace with all those nice macros to parse these messages. Are there already other cases where userspace gets netlink messages on fds without having opened a netlink socket. > > > > >> An ERRNO filter would block a USER_NOTIF because it's unconditional. > >> TRACE could be either, USER_NOTIF could be either. > >> > >> This means TRACE rules would be bumped by a USER_NOTIF... hmm. > > > > Yes, I didn't exactly know what to do here. ERRNO, TRAP, and KILL all > > seemed more important than USER_NOTIF, but TRACE didn't. I don't have > > a strong opinion about what to do here, because users can adjust their > > filters accordingly. Let me know what you prefer. > > If we switched to eBPF functions, this whole issue goes away.