Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp115745pxk; Thu, 24 Sep 2020 00:40:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx9NpKtpklbc+DhbElDv0fatK++nLc0am5XQGE5bmwi3J31mF5an8h8mFZ3yFTT8pTPjDF2 X-Received: by 2002:aa7:c38a:: with SMTP id k10mr3276198edq.325.1600933221383; Thu, 24 Sep 2020 00:40:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600933221; cv=none; d=google.com; s=arc-20160816; b=y5wnSbrlZKX+aEnbxYI/6QXX3FOBqyCej7xninKo/qOenzGlNvmWFhGGAUxfHxI3UK Cp46C7BkIDw5IeHGeOFRXXdzN7OTSRJayMUmYMB6lnm17Iy6ZsmwZMqCrTQedr/Z6LFU xh/7xrGvM35fhARars47GSdRDsnC3h6IF5moZYHn1gwKZwODwhTDWj1orjSVotOI0SBj l4RYjsAZWUSWVOh8TbeNEUoXHyM4ZdE9+AecN8e8PxGCHpXVtOpuCtbpC1ub3kWfZ2oK vnEsMrvH5WyiuWmSBWsbJdiAM9wR5q//GQhWgLngLqt5oC9l0OE+QzHLc7G16GTX58se DJSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=WLJqIscJmXcvk+sqZWhshYhF2CZRRuw+A8ZGwocEwRQ=; b=RYPs5emcxccNFvq3qsoe+BQ3vskSLYdvLyhNBOLkdxGamvciLU+juhZcQSEet08TPb TwNYn/Ksum7rCKthXoQPGo783dcXRQyMPFwwjbJoee30SikNsKzY6qqmtsS4EUB7FVVB eSN0YOK0DPLTWrPEGbJQb+KctQLUziKpzAKiMMYDA25sHfAzZ9lcqYC0kzGGGO4RZ1it BNhOwT75nc6lhKZDWkywp7aFJ32spMTZyaHmvXHy51W8i+57xMRkevrqB9zX3C2Lulnr Bx8xRPcF6hUF0jiUn303FZRpiAseEv+o9TMRQiI1QOQEbH1uNDhlMh86Cu2wQu/jTJWP iIOQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=G+UWjHqR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k6si1681904eds.235.2020.09.24.00.39.57; Thu, 24 Sep 2020 00:40:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=G+UWjHqR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727164AbgIXHg7 (ORCPT + 99 others); Thu, 24 Sep 2020 03:36:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50490 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727112AbgIXHg7 (ORCPT ); Thu, 24 Sep 2020 03:36:59 -0400 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 292F7C0613D4 for ; Thu, 24 Sep 2020 00:36:59 -0700 (PDT) Received: by mail-pg1-x544.google.com with SMTP id u24so1401172pgi.1 for ; Thu, 24 Sep 2020 00:36:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=WLJqIscJmXcvk+sqZWhshYhF2CZRRuw+A8ZGwocEwRQ=; b=G+UWjHqRmZtL7aOwJRe0U89n54w3ZzxUfJM96kgA17jcRmleJkjoPM8xG/IgtKX+ka Ru6WiurvmQ7B/2qcOKO6h91E7rZHfAvvKXRQSk6IwPOMzwJAsA6GS4xx79fuZEIZsMMa O32mwaxtv3Spvme5j6mLBMKNpV4AWuTd4s9VA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=WLJqIscJmXcvk+sqZWhshYhF2CZRRuw+A8ZGwocEwRQ=; b=kH82Kd994ESDHsu2hhMDpG5Tfp0oLLZe/WTmuCT1wj5ryog7LdxTYuf+lyVWtwFqgy qd62UzYpbC3zLWuQqsZBIOKHQLIre0MhVLO4lgArNp0OaZjaeFp0nw12+0Jja9rnOJVk JQDWZN49EIUF4Ln+uAZzLd6LU7s1op8mN7/emEh8rJZGQzEUwqNqByJ/3rOVLbjbJCcm DBAZNAgC69oXNpL0NPy1bFygEn/Dofg8LDba7fHDP9/ljD3QFU6fysM+cdF+PmIFFhOl bu5IcAB88jauY5l/DCHE0P+EZCIeCWn1mAdu05XtiNWsPW5z+yxYyasxDK/Ctal5jMay X9ww== X-Gm-Message-State: AOAM5329g6crr4cRTVHHx6FyUIoBQ7RtxWBtuWiiR5jZooF1NL1GZLrJ KKe6ZDWeCdP2Lkhq+MsqNOjGrw== X-Received: by 2002:a63:5d08:: with SMTP id r8mr2852677pgb.174.1600933018492; Thu, 24 Sep 2020 00:36:58 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id a27sm1817356pfk.52.2020.09.24.00.36.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Sep 2020 00:36:57 -0700 (PDT) Date: Thu, 24 Sep 2020 00:36:56 -0700 From: Kees Cook To: Jann Horn Cc: YiFei Zhu , Christian Brauner , Tycho Andersen , Andy Lutomirski , Will Drewry , Andrea Arcangeli , Giuseppe Scrivano , Tobin Feldman-Fitzthum , Dimitrios Skarlatos , Valentin Rothberg , Hubertus Franke , Jack Chen , Josep Torrellas , Tianyin Xu , bpf , Linux Containers , Linux API , kernel list Subject: Re: [PATCH 3/6] seccomp: Implement constant action bitmaps Message-ID: <202009240018.A4D8274F@keescook> References: <20200923232923.3142503-1-keescook@chromium.org> <20200923232923.3142503-4-keescook@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 24, 2020 at 02:25:03AM +0200, Jann Horn wrote: > On Thu, Sep 24, 2020 at 1:29 AM Kees Cook wrote: > > +/* When no bits are set for a syscall, filters are run. */ > > +struct seccomp_bitmaps { > > +#ifdef SECCOMP_ARCH > > + /* "allow" are initialized to set and only ever get cleared. */ > > + DECLARE_BITMAP(allow, NR_syscalls); > > This bitmap makes sense. > > > + /* These are initialized to clear and only ever get set. */ > > + DECLARE_BITMAP(kill_thread, NR_syscalls); > > + DECLARE_BITMAP(kill_process, NR_syscalls); > > I don't think these bitmaps make sense, this is not part of any fastpath. That's a fair point. I think I arrived at this design because it ended up making filter addition faster ("don't bother processing this one, it's already 'kill'"), but it's likely not worse the memory usage trade-off. > (However, a "which syscalls have a fixed result" bitmap might make > sense if we want to export the list of permitted syscalls as a text > file in procfs, as I mentioned over at > .) I haven't found a data structure I'm happy with for this. It seemed like NR_syscalls * sizeof(u32) was rather a lot (i.e. to store the BPF_RET value). However, let me discuss that more in the "why in in thread?" below... > The "NR_syscalls" part assumes that the compat syscall tables will not > be bigger than the native syscall table, right? I guess that's usually > mostly true nowadays, thanks to the syscall table unification... > (might be worth a comment though) Hrm, I had convinced myself it was a max() of compat. But I see no evidence of that now. Which means that I can add these to the per-arch seccomp defines with something like: # define SECCOMP_NR_NATIVE NR_syscalls # define SECCOMP_NR_COMPAT X32_NR_syscalls ... > > +#endif > > +}; > > + > > struct seccomp_filter; > > /** > > * struct seccomp - the state of a seccomp'ed process > > @@ -45,6 +56,13 @@ struct seccomp { > > #endif > > atomic_t filter_count; > > struct seccomp_filter *filter; > > + struct seccomp_bitmaps native; > > +#ifdef CONFIG_COMPAT > > + struct seccomp_bitmaps compat; > > +#endif > > +#ifdef SECCOMP_MULTIPLEXED_SYSCALL_TABLE_ARCH > > + struct seccomp_bitmaps multiplex; > > +#endif > > Why do we have one bitmap per thread (in struct seccomp) instead of > putting the bitmap for a given filter and all its ancestors into the > seccomp_filter? I explicitly didn't want to add code that was run per-filter; I wanted O(1), not O(n) even if the n work was a small constant. There is obviously a memory/perf tradeoff here. I wonder if the middle ground would be to put a bitmap and "constant action" results in the filter.... oh duh. The "top" filter is already going to be composed with its ancestors. That's all that needs to be checked. Then the tri-state can be: bitmap accept[NR_syscalls]: accept or check "known" bitmap bitmap filter[NR_syscalls]: run filter or return known action u32 known_action[NR_syscalls]; (times syscall numbering "architecture" counts) Though perhaps it would be just as fast as: bitmap run_filter[NR_syscalls]: run filter or return known_action u32 known_action[NR_syscalls]; where accept isn't treated special... > > > }; > > > > #ifdef CONFIG_HAVE_ARCH_SECCOMP_FILTER > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > > index 0a3ff8eb8aea..111a238bc532 100644 > > --- a/kernel/seccomp.c > > +++ b/kernel/seccomp.c > > @@ -318,7 +318,7 @@ static inline u8 seccomp_get_arch(u32 syscall_arch, u32 syscall_nr) > > > > #ifdef SECCOMP_MULTIPLEXED_SYSCALL_TABLE_ARCH > > if (syscall_arch == SECCOMP_MULTIPLEXED_SYSCALL_TABLE_ARCH) { > > - seccomp_arch |= (sd->nr & SECCOMP_MULTIPLEXED_SYSCALL_TABLE_MASK) >> > > + seccomp_arch |= (syscall_nr & SECCOMP_MULTIPLEXED_SYSCALL_TABLE_MASK) >> > > SECCOMP_MULTIPLEXED_SYSCALL_TABLE_SHIFT; > > This belongs over into patch 1. Thanks! I was rushing to get this posted so YiFei Zhu wouldn't spend time fighting with arch and Kconfig stuff. :) I'll clean this (and the other random cruft) up. -- Kees Cook