Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2552300ybt; Tue, 16 Jun 2020 08:53:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw7WsKP4IN11vsmmPqNjszyN/VwKW04cbpmGr9o7X9bLSgeQ+CjhkJk4MPZnO+X65+8foMa X-Received: by 2002:a05:6402:30ae:: with SMTP id df14mr3100910edb.310.1592322787072; Tue, 16 Jun 2020 08:53:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592322787; cv=none; d=google.com; s=arc-20160816; b=PZcd/Sa4hz1LbUWLWx5IFXkF8MZjVKl37Ezxj6bcjc1fzlxinTufbzYMydXrsvZosZ sdO7mOO/OEmo6YiVCpTmXKQCi4mEKe8v0nou7HO4iUPAVRgia6yqWBlTeeNJsah44LJ2 Tkcrwy+SC3unyMxghnFQkuo/29YQX1TNND83xdGFRLPHvDd9ZDFRdOjtj+rSlTHM2x/4 dPrzE7V9M0E6CIEqram6ooUayyF/zBVY+nnq2cLZb5OZ2ZC/112QunL2pyuKY2mxHt3k J5aSgO52Efhaxw0oJHSdRLgXTCAmDnzvRSQnDz0gI4s+R+Kzqjcamd4kirUlfwRqtg7w haig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=aH/epTbtusDwdVV/kmnk+gLwl4nwhM0mg4o9WDdCqyA=; b=hMZQTQRc5/1GELYIGPGNRJm401aMU7XVHs2wh3I79VEE7qCpymskdpreUXBD3RjZiR 7GJm1/26A1dvEbTTNR9wIA/qv+bjI+fhakpSCaj65BOQOdwtgh4FCIXuJyQWjjb5k/yi q+Pkd7O50AzCHO31BzQQBD6akkRdBRQgXk/oKnzXHnAIN0g6rAr9mqcucV94QxUBwq// C3PwiYRYvL7phkUFoIBFi5eXvXErjeQwW9dqFtZkQqa9PmOIqfrLtx51bKTvKRs5LaNW 9RBrO6qcf2h+5E0/OKtq9RyEwE/N6ScLZYhSg+j68MgXETOVWqM2sz77wh+zQWmOgkqS w99Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="RMe1EOF/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r4si3450587edb.585.2020.06.16.08.52.44; Tue, 16 Jun 2020 08:53:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="RMe1EOF/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732232AbgFPPtC (ORCPT + 99 others); Tue, 16 Jun 2020 11:49:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732220AbgFPPs7 (ORCPT ); Tue, 16 Jun 2020 11:48:59 -0400 Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 421F9C06174E for ; Tue, 16 Jun 2020 08:48:59 -0700 (PDT) Received: by mail-pf1-x441.google.com with SMTP id h185so9700654pfg.2 for ; Tue, 16 Jun 2020 08:48:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=aH/epTbtusDwdVV/kmnk+gLwl4nwhM0mg4o9WDdCqyA=; b=RMe1EOF/QbamaW1pKPoBlac6ekdFsLU6mz3tWml5zqb7Z7Uujg9yC1sk7chzVhvyVA sLDAvg0ARkYWCLwsNMrtTTAD8UMmfyaR/ZMzpg8PIYbdq5v157fkg2dRK2ZbWmeIjWFQ S3gviQeWP5HFbvl3bpwsFl/6aB6LZ4cW2zpAI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=aH/epTbtusDwdVV/kmnk+gLwl4nwhM0mg4o9WDdCqyA=; b=bceS5t76YBS9A1zTWw+WEAoCe5gmlerYGEGr1hRdCpTYJmTq+nGqL/JAkakJ3zsAse oNoZaapiEiiunHXmPhzAepzexJ08iGKkV3xW4cEYVC69riXwC2rEDTQRyWIv1qOh3L7e hRq1Ei59E6R97QCbWde3Vxy0n1gitWn05yZLTUetTPYr+RV1Bpsm0AHciNi5euvgamch ActOLwg2Ab0mhguRQgOc+Q4zeGUant2DjQNKtp5J5zq+w7Z08VMkU3qThSWNyJLqVrTU /fIXiWa4EJVNpLjZ+euj9/BcJo9l7cls6vleAsm3sbT58+Yk2XFFvzAdrFuFM8he2ubI PmxQ== X-Gm-Message-State: AOAM531ExChZGfKKWn1wOlu2YbA0T7zQDc3yXYgxb1o9boOc6ScUcxzQ 5IZaxVosC91pEhpNMbIFEe3rnA== X-Received: by 2002:a63:690:: with SMTP id 138mr1514366pgg.122.1592322538740; Tue, 16 Jun 2020 08:48:58 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id w24sm17371351pfn.11.2020.06.16.08.48.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jun 2020 08:48:57 -0700 (PDT) Date: Tue, 16 Jun 2020 08:48:56 -0700 From: Kees Cook To: Jann Horn Cc: kernel list , Christian Brauner , Sargun Dhillon , Tycho Andersen , "zhujianwei (C)" , Dave Hansen , Matthew Wilcox , Andy Lutomirski , Will Drewry , Shuah Khan , Matt Denton , Chris Palmer , Jeffrey Vander Stoep , Aleksa Sarai , Hehuazhen , the arch/x86 maintainers , Linux Containers , linux-security-module , Linux API Subject: Re: [PATCH 4/8] seccomp: Implement constant action bitmaps Message-ID: <202006160757.99FD9B785@keescook> References: <20200616074934.1600036-1-keescook@chromium.org> <20200616074934.1600036-5-keescook@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 16, 2020 at 02:14:47PM +0200, Jann Horn wrote: > Wouldn't it be simpler to use a function that can run a subset of > seccomp cBPF and bails out on anything that indicates that a syscall's > handling is complex or on instructions it doesn't understand? For > syscalls that have a fixed policy, a typical seccomp filter doesn't > even use any of the BPF_ALU ops, the scratch space, or the X register; > it just uses something like the following set of operations, which is > easy to emulate without much code: > > BPF_LD | BPF_W | BPF_ABS > BPF_JMP | BPF_JEQ | BPF_K > BPF_JMP | BPF_JGE | BPF_K > BPF_JMP | BPF_JGT | BPF_K > BPF_JMP | BPF_JA > BPF_RET | BPF_K Initially, I started down this path. It needed a bit of plumbing into BPF to better control the lifetime of the cBPF "saved original filter" (normally used by CHECKPOINT_RESTORE uses), and then I needed to keep making exceptions (same list you have: ALU, X register, scratch, etc) in the name of avoiding too much complexity in the emulator. I decided I'd rather reuse the existing infrastructure to actually execute the filter (no cBPF copy needed to be saved, no separate code, and full instruction coverage). > > Something like (completely untested): > > /* > * Try to statically determine whether @filter will always return a fixed result > * when run for syscall @nr under architecture @arch. > * Returns true if the result could be determined; if so, the result will be > * stored in @action. > */ > static bool seccomp_check_syscall(struct sock_filter *filter, unsigned int arch, > unsigned int nr, unsigned int *action) > { > int pc; > unsigned int reg_value = 0; > > for (pc = 0; 1; pc++) { > struct sock_filter *insn = &filter[pc]; > u16 code = insn->code; > u32 k = insn->k; > > switch (code) { > case BPF_LD | BPF_W | BPF_ABS: > if (k == offsetof(struct seccomp_data, nr)) { > reg_value = nr; > } else if (k == offsetof(struct seccomp_data, arch)) { > reg_value = arch; > } else { > return false; /* can't optimize (non-constant value load) */ > } > break; > case BPF_RET | BPF_K: > *action = insn->k; > return true; /* success: reached return with constant values only */ > case BPF_JMP | BPF_JA: > pc += insn->k; > break; > case BPF_JMP | BPF_JEQ | BPF_K: > case BPF_JMP | BPF_JGE | BPF_K: > case BPF_JMP | BPF_JGT | BPF_K: > default: > if (BPF_CLASS(code) == BPF_JMP && BPF_SRC(code) == BPF_K) { > u16 op = BPF_OP(code); > bool op_res; > > switch (op) { > case BPF_JEQ: > op_res = reg_value == k; > break; > case BPF_JGE: > op_res = reg_value >= k; > break; > case BPF_JGT: > op_res = reg_value > k; > break; > default: > return false; /* can't optimize (unknown insn) */ > } > > pc += op_res ? insn->jt : insn->jf; > break; > } > return false; /* can't optimize (unknown insn) */ > } > } > } I didn't actually finish going down the emulator path (I stopped right around the time I verified that libseccomp does use BPF_ALU -- though only BPF_AND), so I didn't actually evaluate the filter contents for other filter builders (i.e. Chrome). But, if BPF_ALU | BPF_AND were added to your code above, it would cover everything libseccomp generates (which covers a lot of the seccomp filters, e.g. systemd, docker). I just felt funny about an "incomplete" emulator. Though now you've got me looking. It seems this is the core of Chrome's BPF instruction generation: https://github.com/chromium/chromium/blob/master/sandbox/linux/bpf_dsl/policy_compiler.cc It also uses ALU|AND, but adds JMP|JSET. So... that's only 2 more instructions to cover what I think are likely the two largest seccomp instruction generators. > That way, you won't need any of this complicated architecture-specific stuff. There are two arch-specific needs, and using a cBPF-subset emulator just gets rid of the local TLB flush. The other part is distinguishing the archs. Neither requirement is onerous (TLB flush usually just needs little more than an extern, arch is already documented in the per-arch syscall_get_arch()). The awkward part I ran into for arm64 was a header include loop for compat due to how unistd is handled for getting NR_syscalls for the bitmap sizing (which I'm sure is solvable, but I just wanted to get the x86 RFC posted first). -- Kees Cook