Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1488780pxk; Fri, 25 Sep 2020 16:51:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwTFjshF7wq5R1oh432sRf/DcTcVlp5f2cWbg9VA58G7OxsC6ZyifFU7pOgoDmnZkhcuV4J X-Received: by 2002:aa7:d417:: with SMTP id z23mr3955768edq.62.1601077887247; Fri, 25 Sep 2020 16:51:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1601077887; cv=none; d=google.com; s=arc-20160816; b=ZqaBsAzdVP5tYeVc7S14V10DM8q1WkQMEhKoxovxr7ZUdffDaav7nJbR8dB0Wb02zi B0gsgZqpRHHZUwk0+08xDCZW9kdDY8d0pHvrbqtTLs3a7w87/2qQFFtXoTxgm0rwg8y7 cxWU2QjTf8x/JmiwTr/waIjUkaO9ApCM5awiF0m67mQinMuK1ItSIXA268o10LyDp9hT B/Jcu20/vaUSxqjCk86C85mlOi8t4Jz6C9OnBQP/qHVIAjXl0qEWpqP8hdzRFhnHjrZr Uv606lZXHd7PAYR3MU0XPdErzfpz8t4JjBMSp5apnVvtg3Gldgf2vc07bBKr+H048ARW KoDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=14YkEVFhDzQ9kmT8n7KkV3/flMwf37f3cRskNvS9iFY=; b=Q/ww6HspnNz/DM1ZKrJlAxShPbnq5iGTECivAyvvrPi4w1QaN+UzzaAYyW8ZGPXBtg /XrJvTNV1wO9aXSt1g+h0Vhzvz8FIyM4sRRNSLU1wmFjZIU+LTSUIGhh5rrRjcDOY6qc rrkbTSnlqE8EcULG18A2ztINRtFmSDZIZyr9aGKgpbL5K99NerLkJcoYcnRvrU1A7r9V qadtSEWgms9jQ1ApcYszw73VMnWCmlOd2tDT6En9DfV4dhlb6xaHfd5JwoN9ehZYU2ID ezL3EEED4svT9gJGNgsSyYFlwEYTURRNI1nvGg+Cfq6/31Yq7sBW8x2eXO4t6mFFqPMS S2mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=Dhq8sDfx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n17si2798874eje.585.2020.09.25.16.51.03; Fri, 25 Sep 2020 16:51:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=Dhq8sDfx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729551AbgIYXtQ (ORCPT + 99 others); Fri, 25 Sep 2020 19:49:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726587AbgIYXtP (ORCPT ); Fri, 25 Sep 2020 19:49:15 -0400 Received: from mail-pg1-x541.google.com (mail-pg1-x541.google.com [IPv6:2607:f8b0:4864:20::541]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C504C0613CE for ; Fri, 25 Sep 2020 16:49:15 -0700 (PDT) Received: by mail-pg1-x541.google.com with SMTP id k133so3876554pgc.7 for ; Fri, 25 Sep 2020 16:49:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=14YkEVFhDzQ9kmT8n7KkV3/flMwf37f3cRskNvS9iFY=; b=Dhq8sDfxopl4Q3cF0hfW8uacmrZsnxnqM469tF/cbaqQgWxUcLW341eZhdaC9sJJyn Vd3sKc0u2BFXSQkNKik6hSID7YkG5IuBWahBwm5WIw/afnoJUjS8JHe0fhpl8qgmW47h c60dmZk7pCOxyF5yfTJ+xxs8KpLF0NEK+0wgA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=14YkEVFhDzQ9kmT8n7KkV3/flMwf37f3cRskNvS9iFY=; b=uY8ONfiHJO9fNlEtf8V0hi19G/fvRzq398WcfzV6cIBvesb2Zpo95L9xASUF6v0+6i Ujjr5h0iPAno5XrFqPVuCBbtcKvj5HZZJlXKTg/w2HNlbikF7wx849NkDzki4AmNdqlY 8W/FoDlpTcxyfM8DTBp8ChzuMWZ8flvAPrcVP8mSpV69OIm5fMiVD7ibCSgbS7dYy7lt rf2ZZNZepnyViNo8M+Ooi4/DYUe7CR/om25Xkn7RciBmSlQGvHLLY+a5rKxbXVQgTzod 5cyKBysWDoNKU0VsWJcymlbcuHsH1t48zWEa7M8d0waXWvrEF9ZsVYj4cVn0poVQypIu 28Wg== X-Gm-Message-State: AOAM531MbHXfgyT6mEKhVmrtA4fcR/OOQOw2kO63UkcgAKr5JdBcvvnE FZyeXGtyGG7noamNzNAf+XVqbQ== X-Received: by 2002:a17:902:ba98:b029:d1:e598:3ff2 with SMTP id k24-20020a170902ba98b02900d1e5983ff2mr1757301pls.44.1601077754849; Fri, 25 Sep 2020 16:49:14 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id ml20sm240719pjb.20.2020.09.25.16.49.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Sep 2020 16:49:14 -0700 (PDT) Date: Fri, 25 Sep 2020 16:49:13 -0700 From: Kees Cook To: Andy Lutomirski Cc: YiFei Zhu , Linux Containers , YiFei Zhu , bpf , kernel list , Aleksa Sarai , Andrea Arcangeli , Dimitrios Skarlatos , Giuseppe Scrivano , Hubertus Franke , Jack Chen , Jann Horn , Josep Torrellas , Tianyin Xu , Tobin Feldman-Fitzthum , Tycho Andersen , Valentin Rothberg , Will Drewry Subject: Re: [PATCH v2 seccomp 3/6] seccomp/cache: Add "emulator" to check if filter is arg-dependent Message-ID: <202009251648.4AA27D5B@keescook> References: <202009251223.8E46C831E2@keescook> <2FA23A2E-16B0-4E08-96D5-6D6FE45BBCF6@amacapital.net> <202009251332.24CE0C58@keescook> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 25, 2020 at 02:07:46PM -0700, Andy Lutomirski wrote: > On Fri, Sep 25, 2020 at 1:37 PM Kees Cook wrote: > > > > On Fri, Sep 25, 2020 at 12:51:20PM -0700, Andy Lutomirski wrote: > > > > > > > > > > On Sep 25, 2020, at 12:42 PM, Kees Cook wrote: > > > > > > > > On Fri, Sep 25, 2020 at 11:45:05AM -0500, YiFei Zhu wrote: > > > >> On Thu, Sep 24, 2020 at 10:04 PM YiFei Zhu wrote: > > > >>>> Why do the prepare here instead of during attach? (And note that it > > > >>>> should not be written to fail.) > > > >>> > > > >>> Right. > > > >> > > > >> During attach a spinlock (current->sighand->siglock) is held. Do we > > > >> really want to put the emulator in the "atomic section"? > > > > > > > > It's a good point, but I had some other ideas around it that lead to me > > > > a different conclusion. Here's what I've got in my head: > > > > > > > > I don't view filter attach (nor the siglock) as fastpath: the lock is > > > > rarely contested and the "long time" will only be during filter attach. > > > > > > > > When performing filter emulation, all the syscalls that are already > > > > marked as "must run filter" on the previous filter can be skipped for > > > > the new filter, since it cannot change the outcome, which makes the > > > > emulation step faster. > > > > > > > > The previous filter's bitmap isn't "stable" until siglock is held. > > > > > > > > If we do the emulation step before siglock, we have to always do full > > > > evaluation of all syscalls, and then merge the bitmap during attach. > > > > That means all filters ever attached will take maximal time to perform > > > > emulation. > > > > > > > > I prefer the idea of the emulation step taking advantage of the bitmap > > > > optimization, since the kernel spends less time doing work over the life > > > > of the process tree. It's certainly marginal, but it also lets all the > > > > bitmap manipulation stay in one place (as opposed to being split between > > > > "prepare" and "attach"). > > > > > > > > What do you think? > > > > > > > > > > > > > > I’m wondering if we should be much much lazier. We could potentially wait until someone actually tries to do a given syscall before we try to evaluate whether the result is fixed. > > > > That seems like we'd need to track yet another bitmap of "did we emulate > > this yet?" And it means the filter isn't really "done" until you run > > another syscall? eeh, I'm not a fan: it scratches at my desire for > > determinism. ;) Or maybe my implementation imagination is missing > > something? > > > > We'd need at least three states per syscall: unknown, always-allow, > and need-to-run-filter. > > The downsides are less determinism and a bit of an uglier > implementation. The upside is that we don't need to loop over all > syscalls at load -- instead the time that each operation takes is > independent of the total number of syscalls on the system. And we can > entirely avoid, say, evaluating the x32 case until the task tries an > x32 syscall. > > I think it's at least worth considering. Yeah, worth considering. I do still think the time spent in emulation is SO small that it doesn't matter running all of the syscalls at attach time. The filters are tiny and fail quickly if anything "interesting" start to happen. ;) -- Kees Cook