Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1817349pxu; Tue, 24 Nov 2020 09:33:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJwAwlUq7yXYhwcTWeIOxta4rZTnPSjSnxLmbeVLlt26zU0+sdJg/qKjNE+TfBjf7d/MwUNS X-Received: by 2002:a17:906:b01:: with SMTP id u1mr5417074ejg.427.1606239234412; Tue, 24 Nov 2020 09:33:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606239234; cv=none; d=google.com; s=arc-20160816; b=OPe/hR9lhR8tmATCE+U3KGZneGrzMsCuDur9fUsBMocY3ufHsCptQ9PAwPOwXdQ5a7 hwQdvhe65fuZWa72M/raKGfFgDs+juAk6bgFGuQhxk7Y5N9Tb8Rf5LLuEF5SfEB+YvgE BMcnx31gTOgUzOrqs2URqQxJsW5zFjWKk1n/dfph+9aV0e40e3s0xfTbg8QiVEoHVw6r 4DH26EVSG1MFn9NJMyOsqkoNRwE4u83WpnpRSRY36PywYaVaEfFovoT4/NzkdsWpLLUt +9nq2ojlk+GHMeAS/qzU/3ZWDiLfitf1x1Hc+FOOvlXPQhYJgnyDZNSLb00x+S6kxHNp YicA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=OADMwQE1HfwIqpykz3TZHnF1kNpcSHUWVvcwMzNsJSQ=; b=YHO7vVtiPyUP3PRm25Ou4tRd5S/x8s5rv2Psiq+NczwQHDYAx38yJztpELepn9r5Yf OIhLR7hic4Sk3kBEOxRDxtlvui6bhZrRMwx9Fgib92ZI7CILzchoYuntq+9v0u5Tisk6 cF4PGVXkB+8TBmIIBrslUfxmuSmP7ubm+ztv0tSL92uxelAOpiUJAF+Ig7QnVmppChEN fWX3iQeme6sldFzyRxIcjnQFpn6v0/8XZQtnM9WBkWxifbfmN7dSkxtzDnEbfwGocd15 fHsaTHbNkOUsWfyUkSx0yGPhjVdrlnST7lkeNmuYuOamOCUVHa93q0cudDhazHunq9cS GBOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=PJShiVFy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q19si10415184edv.85.2020.11.24.09.33.29; Tue, 24 Nov 2020 09:33:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=PJShiVFy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390781AbgKXRa5 (ORCPT + 99 others); Tue, 24 Nov 2020 12:30:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390612AbgKXRa5 (ORCPT ); Tue, 24 Nov 2020 12:30:57 -0500 Received: from mail-lj1-x243.google.com (mail-lj1-x243.google.com [IPv6:2a00:1450:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA09CC061A4D for ; Tue, 24 Nov 2020 09:30:56 -0800 (PST) Received: by mail-lj1-x243.google.com with SMTP id f18so4995913ljg.9 for ; Tue, 24 Nov 2020 09:30:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OADMwQE1HfwIqpykz3TZHnF1kNpcSHUWVvcwMzNsJSQ=; b=PJShiVFyF1QNS10ZK4hJqy3gWb157c8QAhEOlJGh2WvhpNpcKK4xyS2IJjsn1qT0o9 7z5S5VxcVYvqQEhJ+/qHSd+14/OcThKF0xaB0Zrai3oT+DFHkJgtmI/4gbsw2VGDCFTf 9lYnYnkuCE+t0kqHdYmUa4AheBKK5giwYdR7ynp4suEuBcxe+TwkKY/2jUjhQmBkwsoi KrHHhun/l4cIx8YdaDMakKI0Op0/sCEYNA/DmDr6zDJD8cdttK+bAs27ymsbZlrhYsXB UX9kzKbv1leE1HsdepaQRJIOVpkmu5ogHpKKcDBoiFcNJnUMu17DN+V02PxWNMwk7zhb Zz4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OADMwQE1HfwIqpykz3TZHnF1kNpcSHUWVvcwMzNsJSQ=; b=RvIUwhkKpZUWhTwwKsjEdoEqZXZUtCP7NaHSWa47tN6CrfI3urwiDtAGkrnIfzN2/J cVLFqj6oI7M5TUkCPceTE/7IEJ7kk23Yesfn7qNZXNSPxKClLZo65pReDr77qndHT2eV R+w2Su+7QRcGVP79UroMyh0trlZD5AH25UEuq2Gwzj+bi9LO1KQf+wWFVDhawAlDyweF GeauzpDIW226+VK0IKE8SFQn8WbquK7MdiXhRgcx029N8Tsg6hzzsZndlP49Z2r5S6ZM ThU529jvzaY0E5mmCIQfJ/VV9MruuHm1t2rIKQ9WENZKuhKHWU7W5LTBiqDbEn6H844N ZEmQ== X-Gm-Message-State: AOAM530er5fvZK7LmPwV0z6VwuyT8/JHunq/mIlo2UpKad7fbP7uBEZQ T6UcZlECcGTjZ1UqxcDomTqiDUCrQPMt4qO+lxs7hQ== X-Received: by 2002:a05:651c:1035:: with SMTP id w21mr100232ljm.326.1606239055088; Tue, 24 Nov 2020 09:30:55 -0800 (PST) MIME-Version: 1.0 References: <87lfer2c0b.fsf@oldenburg2.str.redhat.com> <20201124122639.x4zqtxwlpnvw7ycx@wittgenstein> <878saq3ofx.fsf@oldenburg2.str.redhat.com> <20201124164546.GA14094@infradead.org> In-Reply-To: From: Jann Horn Date: Tue, 24 Nov 2020 18:30:28 +0100 Message-ID: Subject: Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround To: Greg KH Cc: Christoph Hellwig , Kees Cook , Andy Lutomirski , Will Drewry , Mark Wielaard , Florian Weimer , Christian Brauner , Linux API , "open list:DOCUMENTATION" , kernel list , dev@opencontainers.org, Jonathan Corbet , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 24, 2020 at 6:15 PM Greg KH wrote: > On Tue, Nov 24, 2020 at 06:06:38PM +0100, Jann Horn wrote: > > +seccomp maintainers/reviewers > > [thread context is at > > https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/ > > ] > > > > On Tue, Nov 24, 2020 at 5:49 PM Christoph Hellwig wrote: > > > On Tue, Nov 24, 2020 at 03:08:05PM +0100, Mark Wielaard wrote: > > > > For valgrind the issue is statx which we try to use before falling back > > > > to stat64, fstatat or stat (depending on architecture, not all define > > > > all of these). The problem with these fallbacks is that under some > > > > containers (libseccomp versions) they might return EPERM instead of > > > > ENOSYS. This causes really obscure errors that are really hard to > > > > diagnose. > > > > > > So find a way to detect these completely broken container run times > > > and refuse to run under them at all. After all they've decided to > > > deliberately break the syscall ABI. (and yes, we gave the the rope > > > to do that with seccomp :(). > > > > FWIW, if the consensus is that seccomp filters that return -EPERM by > > default are categorically wrong, I think it should be fairly easy to > > add a check to the seccomp core that detects whether the installed > > filter returns EPERM for some fixed unused syscall number and, if so, > > prints a warning to dmesg or something along those lines... > > Why? seccomp is saying "this syscall is not permitted", so -EPERM seems > like the correct error to provide here. It's not -ENOSYS as the syscall > is present. > > As everyone knows, there are other ways to have -EPERM be returned from > a syscall if you don't have the correct permissions to do something. > Why is seccomp being singled out here? It's doing the correct thing. AFAIU from what the others have said, it's being singled out because it means that for two semantically equivalent operations (e.g. openat() vs open()), one can fail while the other works because the filter doesn't know about one of the syscalls. Normally semantically equivalent syscalls are supposed to be subject to the same checks, and if one of them fails, trying the other one won't help. But if you can't tell whether the more modern syscall failed because of a seccomp filter, you may be forced to retry with an older syscall even on systems where the new syscall works fine, and such a fallback may reduce security or reliability if you're trying to use some flags that only the new syscall provides for security, or something like that. (As a contrived example, imagine being forced to retry any tgkill() that fails with EPERM as a tkill() just in case you're running under a seccomp filter.)