Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp274335pxk; Thu, 24 Sep 2020 05:33:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJznfugqrUiACOXAbIc7Enju2gtQe/Yqv7FP4IaJnpXaIk6tWoMf5zmil9Pw6P2TSF7eKNB4 X-Received: by 2002:a17:906:344e:: with SMTP id d14mr833415ejb.42.1600950786133; Thu, 24 Sep 2020 05:33:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600950786; cv=none; d=google.com; s=arc-20160816; b=UzjnGvyrNUVbjpgAB03z06NfCAYYS99k1QTrsR8fKLFekvgZiMTFYoeqyl/d4lwjQc fnswjEzzctcEpwuKfcF1jvM3exTtB1p28Htszmd1PFra0K8HiAeJHhkGUEkuVZjuf0HI OnPGaZpsmrxzfuzWLaARCb8KGafOTv4hgGc5y+apBLEkXY8T2F/eza1m8JUTl/1ypT+I g+BXPo3K65qDg1ihiUkNun8RenmBwfp6bYpR0hK/7QiKfVyv61qOESeW/m3yHPiKd5hY bDjnMaBJi8LO6x/SA7BrbpDv0HxDvv1G1/2BTeteXLxRq5Bp3+OWkFKSkXQOJTlCTc9P 8f3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=UcT+pZ/PTUp9wa/y0kDn97OtjvB6CdX4Q+3kJBuVUCc=; b=ICg4BRcfBOLoMykr66TF4sxSogO0ZniGNS7eSXsyt8ZGmaG2951WXwhU5EaHxxjAFo RzLmO5uwLsCu2AGHIcBfUWxPsf26t8IICPNDWEikpHhvayc7BN0BNat6wWorwhkGjQhY QPmyvdu95ly+hf+DxuK7xp0FZKqax2ree+QIWqQqTqYlZ9gTn8aGoUvnS5zuB5IDjcGh 9SJteDWr5dIv05lFWFfdt4BHFAwXlHtgJjM/wSFn7YXqdhkI9Bww7JdgWItrPcm6xTsT XpqfRPr+6P52KfzgW8LGVH9an3J9G6spOShXWuDslY21CyD1FC+eJ7gdQXVQE5jkQ4j8 MTTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ti6cmqL3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qq22si1904021ejb.281.2020.09.24.05.32.42; Thu, 24 Sep 2020 05:33:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ti6cmqL3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727704AbgIXM3B (ORCPT + 99 others); Thu, 24 Sep 2020 08:29:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727624AbgIXM3A (ORCPT ); Thu, 24 Sep 2020 08:29:00 -0400 Received: from mail-ej1-x644.google.com (mail-ej1-x644.google.com [IPv6:2a00:1450:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 788A9C0613D3 for ; Thu, 24 Sep 2020 05:29:00 -0700 (PDT) Received: by mail-ej1-x644.google.com with SMTP id z22so4204153ejl.7 for ; Thu, 24 Sep 2020 05:29:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UcT+pZ/PTUp9wa/y0kDn97OtjvB6CdX4Q+3kJBuVUCc=; b=Ti6cmqL3Ww4/dmp4GkFf//pZM1Cyddfmt3F68rkT3FctP4csEwJV1NJkB1tib9WozY oNc6naTBxuclc80J+TWCWbGJd8Fuf4AnFoG86p+ut3WbzBD2lCKKaYvNSLkPp+nC7IjM Qoh75X6YKrgeZuI2F6HmflLcxhevKrt2jbmwkdoUuKN2fFfeUW12Csre4b3NreUUbUOV OxmzFfkvFy1NN4fo100JLfgqLlNL1LDX6jPfiBgqOlQYhv0rAGiWjFOcE1Z4bg1rQR9Z FSprGO32fR+tUWM8L24aC4BCxXNI5RVcCZCQJE1wV7IG+4RhTsKhd09adZuNylejwzhE HLhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UcT+pZ/PTUp9wa/y0kDn97OtjvB6CdX4Q+3kJBuVUCc=; b=a4nz6sUlkuFXxg5jCEuyq4BH+GSZ2bj7/1hj0vvi/CtTNo/pX0iIJqMuPJu7qsp32I OhNEDjrxctD7vN32EDbUTMvVT1Z98eAaTpjDHBkLpd/1/L3xyBcvDT6zVi3E9rBcAW11 HTZfxZKaQ43uN4IXQTHEFW77ciQZ7kXjncJW0pk6uTs+Ldg9AIgg0SZtsY+FLu16V2e3 u0HkxpY/QpsdFo2bzlvYcr6egj3koWkV7M/xLwsJ2epay/hAtheNG7EKodVFpy1Sh+1z /YvsR57IGe0jIBfiwCcbQspQDho4kitnJlKJ4bC6bAutlrKRcZ5o3cJc0fBHJamTX1lj tgSw== X-Gm-Message-State: AOAM530sBVLyWpqy/6+QY6/ZghdO2W4JdjhWKayG9mNmqKngSBEM/rG4 5bF0st2p6F14coVNKLCsfMjB6jzx1no4jP+5BKS48A== X-Received: by 2002:a17:906:c447:: with SMTP id ck7mr767066ejb.358.1600950538813; Thu, 24 Sep 2020 05:28:58 -0700 (PDT) MIME-Version: 1.0 References: <20200923232923.3142503-1-keescook@chromium.org> <20200923232923.3142503-4-keescook@chromium.org> <202009240018.A4D8274F@keescook> In-Reply-To: <202009240018.A4D8274F@keescook> From: Jann Horn Date: Thu, 24 Sep 2020 14:28:32 +0200 Message-ID: Subject: Re: [PATCH 3/6] seccomp: Implement constant action bitmaps To: Kees Cook Cc: YiFei Zhu , Christian Brauner , Tycho Andersen , Andy Lutomirski , Will Drewry , Andrea Arcangeli , Giuseppe Scrivano , Tobin Feldman-Fitzthum , Dimitrios Skarlatos , Valentin Rothberg , Hubertus Franke , Jack Chen , Josep Torrellas , Tianyin Xu , bpf , Linux Containers , Linux API , kernel list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 24, 2020 at 9:37 AM Kees Cook wrote: > On Thu, Sep 24, 2020 at 02:25:03AM +0200, Jann Horn wrote: > > On Thu, Sep 24, 2020 at 1:29 AM Kees Cook wrote: [...] > > (However, a "which syscalls have a fixed result" bitmap might make > > sense if we want to export the list of permitted syscalls as a text > > file in procfs, as I mentioned over at > > .) > > I haven't found a data structure I'm happy with for this. It seemed like > NR_syscalls * sizeof(u32) was rather a lot (i.e. to store the BPF_RET > value). However, let me discuss that more in the "why in in thread?" > below... [...] > > > +#endif > > > +}; > > > + > > > struct seccomp_filter; > > > /** > > > * struct seccomp - the state of a seccomp'ed process > > > @@ -45,6 +56,13 @@ struct seccomp { > > > #endif > > > atomic_t filter_count; > > > struct seccomp_filter *filter; > > > + struct seccomp_bitmaps native; > > > +#ifdef CONFIG_COMPAT > > > + struct seccomp_bitmaps compat; > > > +#endif > > > +#ifdef SECCOMP_MULTIPLEXED_SYSCALL_TABLE_ARCH > > > + struct seccomp_bitmaps multiplex; > > > +#endif > > > > Why do we have one bitmap per thread (in struct seccomp) instead of > > putting the bitmap for a given filter and all its ancestors into the > > seccomp_filter? > > I explicitly didn't want to add code that was run per-filter; I wanted > O(1), not O(n) even if the n work was a small constant. There is > obviously a memory/perf tradeoff here. I wonder if the middle ground > would be to put a bitmap and "constant action" results in the filter.... > oh duh. The "top" filter is already going to be composed with its > ancestors. That's all that needs to be checked. Yeah - when adding a new filter, you can evaluate each syscall for the newly added filter. For both the "accept" bitmap and the "constant action" bitmap, you can AND the bitmap of the existing filter into the new filter's bitmap. Although actually, I think my "constant action" bitmap proposal was a stupid idea... when someone asks for an analysis of the filter via procfs (which shouldn't be a common action, so speed doesn't really matter there), we can just dynamically evaluate the entire filter tree using our filter-evaluation helper. Let's drop the "constant action" bitmap idea. > Then the tri-state can be: > > bitmap accept[NR_syscalls]: accept or check "known" bitmap > bitmap filter[NR_syscalls]: run filter or return known action > u32 known_action[NR_syscalls]; Actually, maybe we should just have an "accept" list, nothing else, to keep it straightforward and with minimal memory usage... > (times syscall numbering "architecture" counts) > > Though perhaps it would be just as fast as: > > bitmap run_filter[NR_syscalls]: run filter or return known_action > u32 known_action[NR_syscalls]; > > where accept isn't treated special... Using a bitset for accepted syscalls instead of a big array would probably have far less cache impact on the syscall entry path. If we just have an "accept" bitmask, we can store information about 512 syscalls per cache line - that's almost the entire syscall table. In contrast, a known_action list can only store information about 16 syscalls in a cache line, and we'd additionally still have to query the "filter" bitmap. I think our goal here should be that if a syscall is always allowed, seccomp should execute the smallest amount of instructions we can get away with, and touch the smallest amount of memory possible (and preferably that memory should be shared between threads). The bitmap fastpath should probably also avoid populate_seccomp_data().