Received: by 2002:a05:6a10:9e8c:0:0:0:0 with SMTP id y12csp264777pxx; Thu, 29 Oct 2020 01:57:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzEfKY/aK6qTXUBAAYT0ZfESkKO0LIaKPK8BzOzjyihiPZbhTziXwBSjmLMMY6UbryOyF2T X-Received: by 2002:aa7:c90a:: with SMTP id b10mr2870504edt.163.1603961858014; Thu, 29 Oct 2020 01:57:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1603961858; cv=none; d=google.com; s=arc-20160816; b=Xwr9EgIdcFGHJcaAQTvXtk3ruIqEOYpj3RdJnkhIX1o0GRiPAKS6yBUI218nmgKGV1 4MVmcKZb/Bpd7scAEXA4+1Ldwt8ef1vuBYsvrgNUjGuCgzaYky44DJDB+T6MzC0OEGvT nWIXdOJsirk7B6T6TOlIwvjBYAmGr5htkFkTTtpE43TfKRDajtS3t5AL1lrdOM3lZMZy eqrtf73UrYdMh+79VXs4nROvcu9qgZCcQMDk2stjhbQToiaEOjP3Os3TZ8pJ3gCX2532 sz1ngZ2PIU+/QXwVAD56JvxEirQLNa4wdN9EEe/sR/O6EX9sHbv+1DFU6UCNPLZgzQV6 0OvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=p9doCyujFvSCnbhQLnl6Vcfe80H/qWN1d4U79Lx/L1Q=; b=bZR4bUN/VFdZdvDz9rYmwx3+OsaaX6973lhWz/JodQR26RVwMexwQV0knplFidPoPL vEX9kKU29i7PfsYI4z01cQ4uGkHdNS91ecQSeZPZmxmiBR03TZLd1gsD/6Ft1wC8B2zJ 6V0a7jh56862mcF6qeAp4QA2IKAFWoHiZoLlqdGarIhNPGDJdH8ynx+J8aZk995lwi2n rJXpAcdQ2ruGaCHe8ZrVsBy/4ZU09Rjdk6Z2QJvtzxKKzyfr4EbFsCHpjUZVOMZkL0wb PL0u56ARbZRTEosMRNo++oLJe42SHoAJow9ra0dtQV70Opr4soMp7LWKYtxmUK+gwyCK 7Z9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dp1si1613372ejc.286.2020.10.29.01.57.16; Thu, 29 Oct 2020 01:57:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404420AbgJ2CWq (ORCPT + 99 others); Wed, 28 Oct 2020 22:22:46 -0400 Received: from brightrain.aerifal.cx ([216.12.86.13]:38082 "EHLO brightrain.aerifal.cx" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726753AbgJ1VfX (ORCPT ); Wed, 28 Oct 2020 17:35:23 -0400 Date: Wed, 28 Oct 2020 14:35:11 -0400 From: Rich Felker To: Jann Horn Cc: Camille Mougey , Kees Cook , lkml , Tycho Andersen , Sargun Dhillon , Christian Brauner , "Michael Kerrisk (man-pages)" , Denis Efremov , Andy Lutomirski Subject: Re: [seccomp] Request for a "enable on execve" mode for Seccomp filters Message-ID: <20201028183511.GE534@brightrain.aerifal.cx> References: <20201028164936.GC534@brightrain.aerifal.cx> <20201028175241.GD534@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 28, 2020 at 07:25:45PM +0100, Jann Horn wrote: > On Wed, Oct 28, 2020 at 6:52 PM Rich Felker wrote: > > On Wed, Oct 28, 2020 at 06:34:56PM +0100, Jann Horn wrote: > > > On Wed, Oct 28, 2020 at 5:49 PM Rich Felker wrote: > > > > On Wed, Oct 28, 2020 at 01:42:13PM +0100, Jann Horn wrote: > > > > > On Wed, Oct 28, 2020 at 12:18 PM Camille Mougey wrote: > > > > > You're just focusing on execve() - I think it's important to keep in > > > > > mind what happens after execve() for normal, dynamically-linked > > > > > binaries: The next step is that the dynamic linker runs, and it will > > > > > poke around in the file system with access() and openat() and fstat(), > > > > > it will mmap() executable libraries into memory, it will mprotect() > > > > > some memory regions, it will set up thread-local storage (e.g. using > > > > > arch_prctl(); even if the process is single-threaded), and so on. > > > > > > > > > > The earlier you install the seccomp filter, the more of these steps > > > > > you have to permit in the filter. And if you want the filter to take > > > > > effect directly after execve(), the syscalls you'll be forced to > > > > > permit are sufficient to cobble something together in userspace that > > > > > effectively does almost the same thing as execve(). > > > > > > > > I would assume you use SECCOMP_RET_USER_NOTIF to implement policy for > > > > controlling these operations and allowing only the ones that are valid > > > > during dynamic linking. This also allows you to defer application of > > > > the filter until after execve. So unless I'm missing some reason why > > > > this doesn't work, I think the requested functionality is already > > > > available. > > > > > > Ah, yeah, good point. > > > > > > > If you really just want the "activate at exec" behavior, it might be > > > > possible (depending on how SECCOMP_RET_USER_NOTIF behaves when there's > > > > no notify fd open; I forget) > > > > > > syscall returns -ENOSYS. Yeah, that'd probably do the job. (Even > > > though it might be a bit nicer if userspace had control over the errno > > > there, such that it could be EPERM instead... oh well.) > > > > EPERM is a major bug in current sandbox implementations, so ENOSYS is > > at least mildly better, but indeed it should be controllable, probably > > by allowing a code path for the BPF to continue with a jump to a > > different logic path if the notify listener is missing. > > I guess we might be able to expose the listener status through a bit / > a field in the struct seccomp_data, and then filters could branch on > that. (And the kernel would run the filter twice if we raced with > filter detachment.) I don't know whether it would look pretty, but I > think it should be doable... I was thinking the race wouldn't be salvagable, but indeed since the filter is side-effect-free you can just re-run it if the status changes between start of filter processing and the attempt at notification. This sounds like it should work. I guess it's not possible to chain two BPF filters to do this, because that only works when the first one allows? Or am I misunderstanding the multiple-filters case entirely? (I've never gotten that far with programming it.) Rich