Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp103468pxk; Thu, 24 Sep 2020 00:14:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyj6KAROEqHr9cPBeoLGL81oSfQWifJ3SfYZgaC2O2gMSbAqviyJAnsJAcfuw6w0JuO3t6u X-Received: by 2002:a05:6402:b0f:: with SMTP id bm15mr3060447edb.388.1600931651312; Thu, 24 Sep 2020 00:14:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600931651; cv=none; d=google.com; s=arc-20160816; b=acbzjcig61I/zJSfCowt0rPT4Lwz+eMjCc+ODtAiWLYH2X6+WAW9Ykcgcx5O8hFrY8 s8bHo8A2E4iglMgjkDOM9iXlU1xNH8qs3l213vywnfMIVQpPGEj/m1bMgOxC5yGguIOi S8a7A+F0L6l/5tDgTIRnPVIWzRl78gGXPPlAem0tPNDcl8kY/LEVgHIwBOHKhY5pNW4q oN9qZcSQKSs6QgNs2LmMG/mg1hT7ekyhICcEYN8DgeiFw8hhQ3BlS6Rbwrf5M5zTYXmK PO1VJzE+eL9cW+6VWVooR+K26E+++zr1nhxYZpiRvs/985tQx1xmKu9wN4uGeIwLN83P Sl+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=mqsr49yCo2iSjf83d4dKwy3/hReLF+9ZnbvUsAtVECg=; b=UyTrwkK0hJjNs/TMGfeiKdgCwKbRr7XVOyJdqq0y8nXr+RWo2XUSRs5upSs9USdq+P u7IkHG4793KJ5GyAMo88TQU6F3FR6Yn3Her2FHqepYU909tsrKNTTuClRC77lQb8FN// XZDKPazo8iuam2+HctvgUsTGFSpdbbkaoc6isA97y/HIYBHwBpM1QfW1zXt9GNzzcGJY pGjVOulqaKOV3j11zPchqRVx8/EIdnpQRyEOVss2TIirQVeDh920gL3IQx5EMPxZ32jO ZPUH1Tp2IIzKdyCF7gT5FQNSyHiWZONdG0mfRlJ+AImfjGbArqh8o8N2jWxhLFdAig+z 2AnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=fHf6+8JT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t21si1641543edy.123.2020.09.24.00.13.48; Thu, 24 Sep 2020 00:14:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=fHf6+8JT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727070AbgIXHLD (ORCPT + 99 others); Thu, 24 Sep 2020 03:11:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727014AbgIXHLD (ORCPT ); Thu, 24 Sep 2020 03:11:03 -0400 Received: from mail-pl1-x643.google.com (mail-pl1-x643.google.com [IPv6:2607:f8b0:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95075C0613CE for ; Thu, 24 Sep 2020 00:11:03 -0700 (PDT) Received: by mail-pl1-x643.google.com with SMTP id u4so1196732plr.4 for ; Thu, 24 Sep 2020 00:11:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=mqsr49yCo2iSjf83d4dKwy3/hReLF+9ZnbvUsAtVECg=; b=fHf6+8JTZGz4OZkc5RmqWDmjG2fNb0kA8IZ9XG1XzWZ1L9jU/DFRoaUaQL3dguCzW1 KXX48DwoGZNKKYWmcgQIrTNZc6mbItX/wyDrJs5ia9YGU+pRfubkzAvJYqXf+oFYv3h7 Gdrct237K6JumwHGC7pwM7AuAmOiTK1HTe46A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mqsr49yCo2iSjf83d4dKwy3/hReLF+9ZnbvUsAtVECg=; b=k4ZghLrcXgv8/rzHfyyzqFuWf4EAhdLn3xOW7liQ6eBzBfPFQ3GOnOci8lI9JkCHnM aiPPGU11Pxr4eNCEbPenTehS79N9yaINOv0B169XUXoOaLmDQLZF8vjkDhtVGPyWAJ4Z af0RnF2GM0quF/m49ybWY3Hk70nvev4eO0BGMvCZPLD2v+nZhk/i2eSZWV+UIfB5izrJ BGZd4OCCHoy+8xzJchMB8eTCHYNsoUK3SWA4Wc/MKPJp4vVkfTVGl/25vQpTkogdpjfg WZsCGFyWvtobyY68FvxqZdgy/JuK/rjoh8YL0JZSsasakFVqyUHJMWMwSFe/YTtrAJ23 SffA== X-Gm-Message-State: AOAM533Ls4MrhUbPdxZGgjsANjn8Fi8dswVWHBK8mxzOqOnSB+uv1DV/ lriscIa4QJNVdGdxcuP8SjNZqQ== X-Received: by 2002:a17:902:ff07:b029:d1:e5fa:aa1d with SMTP id f7-20020a170902ff07b02900d1e5faaa1dmr3259173plj.84.1600931463013; Thu, 24 Sep 2020 00:11:03 -0700 (PDT) Received: from www.outflux.net (smtp.outflux.net. [198.145.64.163]) by smtp.gmail.com with ESMTPSA id gn24sm1360225pjb.8.2020.09.24.00.11.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Sep 2020 00:11:02 -0700 (PDT) Date: Thu, 24 Sep 2020 00:11:01 -0700 From: Kees Cook To: Jann Horn Cc: YiFei Zhu , Christian Brauner , Tycho Andersen , Andy Lutomirski , Will Drewry , Andrea Arcangeli , Giuseppe Scrivano , Tobin Feldman-Fitzthum , Dimitrios Skarlatos , Valentin Rothberg , Hubertus Franke , Jack Chen , Josep Torrellas , Tianyin Xu , bpf , Linux Containers , Linux API , kernel list Subject: Re: [PATCH 1/6] seccomp: Introduce SECCOMP_PIN_ARCHITECTURE Message-ID: <202009232353.FD011DAA0@keescook> References: <20200923232923.3142503-1-keescook@chromium.org> <20200923232923.3142503-2-keescook@chromium.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 24, 2020 at 02:41:36AM +0200, Jann Horn wrote: > On Thu, Sep 24, 2020 at 1:29 AM Kees Cook wrote: > > For systems that provide multiple syscall maps based on audit > > architectures (e.g. AUDIT_ARCH_X86_64 and AUDIT_ARCH_I386 via > > CONFIG_COMPAT) or via syscall masks (e.g. x86_x32), allow a fast way > > to pin the process to a specific syscall table, instead of needing > > to generate all filters with an architecture check as the first filter > > action. > > > > This creates the internal representation that seccomp itself can use > > (which is separate from the filters, which need to stay runtime > > agnostic). Additionally paves the way for constant-action bitmaps. > > I don't really see the point in providing this UAPI - the syscall > number checking will probably have much more performance cost than the > architecture number check, and it's not like this lets us avoid the > check, we're just moving it over into C code. It's desirable for libseccomp and is a request from systemd (which is, at this point, the largest seccomp user I know of), as they have no way to force an arch without doing it in filters, which doesn't help much with reducing filter runtime. > > > Signed-off-by: Kees Cook > > --- > > include/linux/seccomp.h | 9 +++ > > include/uapi/linux/seccomp.h | 1 + > > kernel/seccomp.c | 79 ++++++++++++++++++- > > tools/testing/selftests/seccomp/seccomp_bpf.c | 33 ++++++++ > > 4 files changed, 120 insertions(+), 2 deletions(-) > > > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > > index 02aef2844c38..0be20bc81ea9 100644 > > --- a/include/linux/seccomp.h > > +++ b/include/linux/seccomp.h > > @@ -20,12 +20,18 @@ > > #include > > #include > > > > +#define SECCOMP_ARCH_IS_NATIVE 1 > > +#define SECCOMP_ARCH_IS_COMPAT 2 > > FYI, mips has three different possible "arch" values (per kernel build > config; the __AUDIT_ARCH_LE flag can also be set, but that's fixed > based on the config): > > - AUDIT_ARCH_MIPS > - AUDIT_ARCH_MIPS | __AUDIT_ARCH_64BIT > - AUDIT_ARCH_MIPS | __AUDIT_ARCH_64BIT | __AUDIT_ARCH_CONVENTION_MIPS64_N32 > > But I guess we can deal with that once someone wants to actually add > support for this on mips. Yup! > > > +#define SECCOMP_ARCH_IS_MULTIPLEX 3 > > Why should X32 be handled specially? If the seccomp filter allows Because it's a masked lookup into a separate table; the syscalls don't map to x86_64's table; so for seccomp to correctly figure out which bitmap to use, it has to do this decoding. > specific syscalls (as it should), we don't have to care about X32. > Only in weird cases where the seccomp filter wants to deny specific > syscalls (a horrible idea), X32 is a concern, and in such cases, the > userspace code can generate a single conditional jump to deal with it. I feel like I must not understand what you mean. The x32-aware seccomp filters are using syscall tests with 0x40000000 included in the values. So seccomp's bitmap cannot handle this because it must know how many syscalls to include in a linearly-allocated bitmap. > And when seccomp is used properly to allow specific syscalls, the > kernel will just waste time uselessly checking this X32 stuff. It not measurable in my tests -- seccomp_data::nr is rather hot in the cache. ;) That said, if it's unwanted, then CONFIG_X86_X32=n is the way to go. > [...] > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > [...] > > +static long seccomp_pin_architecture(void) > > +{ > > +#ifdef SECCOMP_ARCH > > + struct task_struct *task = current; > > + > > + u8 arch = seccomp_get_arch(syscall_get_arch(task), > > + syscall_get_nr(task, task_pt_regs(task))); > > + > > + /* How did you even get here? */ > > Via a racing TSYNC, that's how. Yes; thanks. This will need to take ¤t->sighand->siglock. > > > + if (task->seccomp.arch && task->seccomp.arch != arch) > > + return -EBUSY; > > + > > + task->seccomp.arch = arch; > > +#endif > > + return 0; > > +} > > Why does this return 0 if SECCOMP_ARCH is not defined? That suggests > to userspace that we have successfully pinned the ABI, even though > we're actually unable to do so. Yup; thanks for the catch. This is a logical leftover from the RFC. This should be, I think: + task->seccomp.arch = arch; + return 0; +#else + return -EINVAL; +#endif -- Kees Cook