Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp3797021pxu; Mon, 12 Oct 2020 01:02:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGdeV+X9fbiQL90Aw8FPzwgaWYvkue6O0ZTfCUxm1qMs8CYUJA8ohraoA+2LDUiM2zQUbd X-Received: by 2002:aa7:da12:: with SMTP id r18mr12470590eds.169.1602489753240; Mon, 12 Oct 2020 01:02:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602489753; cv=none; d=google.com; s=arc-20160816; b=dXYjfDZ/eo5r4FqMT8E0Il+CDB2Btm8lb21z7jFLMVYOH2mvoxH6WxpMXMTQ1nT5zS ZZt5jOLAq9E+EO9Uz9I4OBLwTrThWT010iNtUvqLAE+RJZkf5UtpWQCDztI5I0s+uSz4 6r+Wp3O66x97A9+z3cOS3wlg7o9CwhkdXhT68YICsgnbbCCEz8c+DjKusNwReh5ArwRf p2es1OuwKHzqVUEp+Btrm1HbUJdLXYD8aWgFMLE9IBIrPEHxUA65zk6izpAN3+St2rcZ E68m+Jlgp9yvcNeylTdsYH0VBYdOFz0phJSEMipZTyXFfUNjzqEGhs2tUWY27B1J+svV nwqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=8L82QjxMFG+r7QSsdjzTxfz9BqpiVnaUpK+324xJ45o=; b=SAfDh0aNTmq05GQJQI3ZPhpKmJGwlqWja5n9KQpfL+/A3AGuKWfaQt0LGq874hDzG2 FzTbNjQh9UFI2La/T5O3a1xyn2iz1tJWAyJ1UTs4XqnSJGNucixDc2wthi/GuA9g9dav 1dCoBZ79g6TMS8FFXKGlugSGjWdgWQF1ke4cbD7ZW/90cI9vGZEny4sRDVCtmUFEMHip Nu6bTwneIhb2/ZstFBTZHC4nbhZu2Y60zqCS85jlHGsjKFTTRbwg8Mb0iKUf/ENwgAWF e7HKkD2q5mXeJx4+AvyCYibAYhIvwIh+1BezbtJriwBgs3UEYsTFS31vJEpG83GpZaKs i2Fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=liJEn3P8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v7si15341188edj.392.2020.10.12.01.02.10; Mon, 12 Oct 2020 01:02:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=liJEn3P8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726348AbgJLGnT (ORCPT + 99 others); Mon, 12 Oct 2020 02:43:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725941AbgJLGnT (ORCPT ); Mon, 12 Oct 2020 02:43:19 -0400 Received: from mail-ej1-x644.google.com (mail-ej1-x644.google.com [IPv6:2a00:1450:4864:20::644]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E4E0DC0613CE for ; Sun, 11 Oct 2020 23:43:18 -0700 (PDT) Received: by mail-ej1-x644.google.com with SMTP id p15so21607735ejm.7 for ; Sun, 11 Oct 2020 23:43:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8L82QjxMFG+r7QSsdjzTxfz9BqpiVnaUpK+324xJ45o=; b=liJEn3P8GhTC9buz4u9pgXW2jZr7I9OZqGoH9rI6dOu/SvvD+mgE2tMDI/XC8+UyEa hkRR2uD0cJ3mxMZQwctrmWg6mIC/iBQIHjYQkAe/7TE3UoNtaU9H280J2Jlc1756UknI 6CrZmT9ZxNGkPJncUXIO0tIyvokI7yydua6P/beBqwQ2PddwXuLQnhJ0g5sNDtEwrz94 4vlCC4C0QnujNTx1CIx8/jDcfCZyohwTZePfYMDYQpSCyOKl12t3iOAnIHnMmOU9Muz8 HmeEpOwTDkslb4i6FqVRfZZ0Yp9vKV+KTVJTsqEwFCvDWpg5M/taMPJ1xppOvFG87pH7 MFAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8L82QjxMFG+r7QSsdjzTxfz9BqpiVnaUpK+324xJ45o=; b=C4pEpXQBySn7+/e5kewXXwM4HPNo1nT2UxyQmdBqsh4BrR3mxxKmWYWB7Zw5+e3H+k RGMNEOZfxt9mI5Jn8UwhOMx2oSsz1xl9hP24mwQ8gnFCaKTfWgojCXDeMrcsaeENJy9X xjJDPIP0qA0NE77MFLN4GSMKXvRtm5qjiFYDa14NLCpAWb6bUzYxVjTARsqPlS2AYtdW xxFkdCcRAo3m92jQ+t0mGHn89+YE0EahM9rJV+Zq2gDTQuPHKLoV0vmCkCGIQoqV8vGb tCpxVx0u0AnsBTsQv23x7vUNbgDwExkPRd9hV7LFLMQa0etwIHSurB1Ybm7InG8f2ZsX mSlg== X-Gm-Message-State: AOAM530jwe4Qddy8UGWWJ3IA/k1naQmnvrZlZo8sPKZ3z7uaFwcJyO/a fl64HUOLcxH/opgqiKJw6hcBcEZksyEjaPspm915BQ== X-Received: by 2002:a17:906:86c3:: with SMTP id j3mr27448642ejy.493.1602484997186; Sun, 11 Oct 2020 23:43:17 -0700 (PDT) MIME-Version: 1.0 References: <10f91a367ec4fcdea7fc3f086de3f5f13a4a7436.1602431034.git.yifeifz2@illinois.edu> In-Reply-To: <10f91a367ec4fcdea7fc3f086de3f5f13a4a7436.1602431034.git.yifeifz2@illinois.edu> From: Jann Horn Date: Mon, 12 Oct 2020 08:42:50 +0200 Message-ID: Subject: Re: [PATCH v5 seccomp 1/5] seccomp/cache: Lookup syscall allowlist bitmap for fast path To: YiFei Zhu Cc: Linux Containers , YiFei Zhu , bpf , kernel list , Aleksa Sarai , Andrea Arcangeli , Andy Lutomirski , David Laight , Dimitrios Skarlatos , Giuseppe Scrivano , Hubertus Franke , Jack Chen , Josep Torrellas , Kees Cook , Tianyin Xu , Tobin Feldman-Fitzthum , Tycho Andersen , Valentin Rothberg , Will Drewry Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Oct 11, 2020 at 5:48 PM YiFei Zhu wrote: > The overhead of running Seccomp filters has been part of some past > discussions [1][2][3]. Oftentimes, the filters have a large number > of instructions that check syscall numbers one by one and jump based > on that. Some users chain BPF filters which further enlarge the > overhead. A recent work [6] comprehensively measures the Seccomp > overhead and shows that the overhead is non-negligible and has a > non-trivial impact on application performance. > > We observed some common filters, such as docker's [4] or > systemd's [5], will make most decisions based only on the syscall > numbers, and as past discussions considered, a bitmap where each bit > represents a syscall makes most sense for these filters. > > The fast (common) path for seccomp should be that the filter permits > the syscall to pass through, and failing seccomp is expected to be > an exceptional case; it is not expected for userspace to call a > denylisted syscall over and over. > > When it can be concluded that an allow must occur for the given > architecture and syscall pair (this determination is introduced in > the next commit), seccomp will immediately allow the syscall, > bypassing further BPF execution. > > Each architecture number has its own bitmap. The architecture > number in seccomp_data is checked against the defined architecture > number constant before proceeding to test the bit against the > bitmap with the syscall number as the index of the bit in the > bitmap, and if the bit is set, seccomp returns allow. The bitmaps > are all clear in this patch and will be initialized in the next > commit. > > When only one architecture exists, the check against architecture > number is skipped, suggested by Kees Cook [7]. > > [1] https://lore.kernel.org/linux-security-module/c22a6c3cefc2412cad00ae14c1371711@huawei.com/T/ > [2] https://lore.kernel.org/lkml/202005181120.971232B7B@keescook/T/ > [3] https://github.com/seccomp/libseccomp/issues/116 > [4] https://github.com/moby/moby/blob/ae0ef82b90356ac613f329a8ef5ee42ca923417d/profiles/seccomp/default.json > [5] https://github.com/systemd/systemd/blob/6743a1caf4037f03dc51a1277855018e4ab61957/src/shared/seccomp-util.c#L270 > [6] Draco: Architectural and Operating System Support for System Call Security > https://tianyin.github.io/pub/draco.pdf, MICRO-53, Oct. 2020 > [7] https://lore.kernel.org/bpf/202010091614.8BB0EB64@keescook/ > > Co-developed-by: Dimitrios Skarlatos > Signed-off-by: Dimitrios Skarlatos > Signed-off-by: YiFei Zhu Reviewed-by: Jann Horn