Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp264751pxx; Thu, 29 Oct 2020 01:57:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzmxdCN5sBlx7A4ne501A4hk/2aVxGC6G078+k54AQ6Puor6BTUC1EZLI98H97OpPVTytXh X-Received: by 2002:aa7:d948:: with SMTP id l8mr2872152eds.159.1603961855994; Thu, 29 Oct 2020 01:57:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603961855; cv=none; d=google.com; s=arc-20160816; b=jR2hSDW2QXzBM7wRvd14jrLqq6dxofOBG/5WEEIpMDpTSz77W2FWURrmfx/p/EbGbQ CTqa1qi62tkImcSXLRm2+UmtzI/ga11DofWuvns13TU7TFsnUjq3UwMwq2Y18ZR6UPQk Je+1+B4WoBZTOnjkbk2VT4dRE4ZP8f+69yKzKgEM1BiiUtWjLNKKSJvV0nzEjNw4OacL GhbXPRAXbj5AvgzYwlXAiioiN4ZtcJTbAJcQXtvNSasUTNYed3z3Rqxb1GBgCyPB4kl0 Ms9HiUSRJnmY/SnjtAihcDcnkRSFo+6YVfy86t9n9iA8h1hVpfg5rpe9DG/2km5kjVrb XPJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Fy6Znzt3u33elvon+9Qe6dQoyNNBKSytaQDr42C2Z0E=; b=LYKU7x19U0Dd8frFHYZGw6S+9fCI6NKItn1VO6Esg3Iy6x+4IHLVpAsV9wHMcBOiEr ukhtbaHG+zzneu9KMVQHbSYEObriPANm+cbpxSdVe9taAeXzvuiamzhfXzvabhTM46yM p2Vjae8Ht2qlEFI13AcjCE81v9QcuXAYbTRSsDzRWD/X4uTMuOCuA2EVstHocvwgWcvq kXUCS/Pjyq1tKcIt+Pke1mxrjWucEBZKyKQObhtIif2buD2Qw0sDPmUJgwip1MUkxk91 9nAm5QDgb4ATAWvpkEes95A4uFeXOUAynEHqXbs15tKxAtzMU08oDnzeiVpsFKesVRk8 341Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i26si1430638edq.25.2020.10.29.01.57.14; Thu, 29 Oct 2020 01:57:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404427AbgJ2CWq (ORCPT + 99 others); Wed, 28 Oct 2020 22:22:46 -0400 Received: from brightrain.aerifal.cx ([216.12.86.13]:38080 "EHLO brightrain.aerifal.cx" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726748AbgJ1VfW (ORCPT ); Wed, 28 Oct 2020 17:35:22 -0400 Date: Wed, 28 Oct 2020 14:50:11 -0400 From: Rich Felker To: Jann Horn Cc: Camille Mougey , Kees Cook , lkml , Tycho Andersen , Sargun Dhillon , Christian Brauner , "Michael Kerrisk (man-pages)" , Denis Efremov , Andy Lutomirski Subject: Re: [seccomp] Request for a "enable on execve" mode for Seccomp filters Message-ID: <20201028185011.GF534@brightrain.aerifal.cx> References: <20201028164936.GC534@brightrain.aerifal.cx> <20201028175241.GD534@brightrain.aerifal.cx> <20201028183511.GE534@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 28, 2020 at 07:39:41PM +0100, Jann Horn wrote: > On Wed, Oct 28, 2020 at 7:35 PM Rich Felker wrote: > > On Wed, Oct 28, 2020 at 07:25:45PM +0100, Jann Horn wrote: > > > On Wed, Oct 28, 2020 at 6:52 PM Rich Felker wrote: > > > > On Wed, Oct 28, 2020 at 06:34:56PM +0100, Jann Horn wrote: > > > > > On Wed, Oct 28, 2020 at 5:49 PM Rich Felker wrote: > > > > > > On Wed, Oct 28, 2020 at 01:42:13PM +0100, Jann Horn wrote: > > > > > > > On Wed, Oct 28, 2020 at 12:18 PM Camille Mougey wrote: > > > > > > > You're just focusing on execve() - I think it's important to keep in > > > > > > > mind what happens after execve() for normal, dynamically-linked > > > > > > > binaries: The next step is that the dynamic linker runs, and it will > > > > > > > poke around in the file system with access() and openat() and fstat(), > > > > > > > it will mmap() executable libraries into memory, it will mprotect() > > > > > > > some memory regions, it will set up thread-local storage (e.g. using > > > > > > > arch_prctl(); even if the process is single-threaded), and so on. > > > > > > > > > > > > > > The earlier you install the seccomp filter, the more of these steps > > > > > > > you have to permit in the filter. And if you want the filter to take > > > > > > > effect directly after execve(), the syscalls you'll be forced to > > > > > > > permit are sufficient to cobble something together in userspace that > > > > > > > effectively does almost the same thing as execve(). > > > > > > > > > > > > I would assume you use SECCOMP_RET_USER_NOTIF to implement policy for > > > > > > controlling these operations and allowing only the ones that are valid > > > > > > during dynamic linking. This also allows you to defer application of > > > > > > the filter until after execve. So unless I'm missing some reason why > > > > > > this doesn't work, I think the requested functionality is already > > > > > > available. > > > > > > > > > > Ah, yeah, good point. > > > > > > > > > > > If you really just want the "activate at exec" behavior, it might be > > > > > > possible (depending on how SECCOMP_RET_USER_NOTIF behaves when there's > > > > > > no notify fd open; I forget) > > > > > > > > > > syscall returns -ENOSYS. Yeah, that'd probably do the job. (Even > > > > > though it might be a bit nicer if userspace had control over the errno > > > > > there, such that it could be EPERM instead... oh well.) > > > > > > > > EPERM is a major bug in current sandbox implementations, so ENOSYS is > > > > at least mildly better, but indeed it should be controllable, probably > > > > by allowing a code path for the BPF to continue with a jump to a > > > > different logic path if the notify listener is missing. > > > > > > I guess we might be able to expose the listener status through a bit / > > > a field in the struct seccomp_data, and then filters could branch on > > > that. (And the kernel would run the filter twice if we raced with > > > filter detachment.) I don't know whether it would look pretty, but I > > > think it should be doable... > > > > I was thinking the race wouldn't be salvagable, but indeed since the > > filter is side-effect-free you can just re-run it if the status > > changes between start of filter processing and the attempt at > > notification. This sounds like it should work. > > > > I guess it's not possible to chain two BPF filters to do this, because > > that only works when the first one allows? Or am I misunderstanding > > the multiple-filters case entirely? (I've never gotten that far with > > programming it.) > > I'm not sure if I'm understanding the question correctly... > At the moment you basically can't have multiple filters with notifiers. > The rule with multiple filters is always that all the filters get run, > and the actual action taken is the most restrictive result of all of > them. I probably just don't understand how multiple filters work then, which is pretty much what I expected. But in any case it seems correct that they're not a tool for solving the problem here. Rich