Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2048239pxu; Tue, 24 Nov 2020 15:49:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJz5C3ZGwYCh8pKk9rBoU7BU7regnrUMAWH3FJ4bpykqpz3Xfjo+JyXuVklkGI6db+ZauuTm X-Received: by 2002:a17:906:34cd:: with SMTP id h13mr824695ejb.350.1606261796060; Tue, 24 Nov 2020 15:49:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606261796; cv=none; d=google.com; s=arc-20160816; b=qSEQnVghtuLtacNHiOHdjN2Ag5N0QkvMQJDE4dFgSJ82xgfear3FzsIeLutPIs3F0y JgyWL/YSQP13MiJufvKQ7G682BokD11E8eWInyvKl7hPXQ7AUmDR//OAsbjEveiKePMK t3jmxel8WJ/YLZ+k6R9yqmkxj1Cy2ma4rI+QmfRIomY+lk//TD6s1UZLxHGSLIkuiZoH FTlKc3cIRa0Gxy6PXRWKphWlSn1ifbCBBzUtJAn1yCH2M3hAm3/jY7TTzz2DqDx2hTe6 Y77w7B+CFXuS/nburC5ivPpBtH6gIMIuOoMCOjMxIZ4ERkqqD+iVfjX+vIqeh/XRo4tU jjMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=aBvdxE2ZMFJMjEWowg/8vkDR+EJtYbLtmKIhC1tTWa4=; b=cy4J455O8zqJXEEHqqzjoP/p7NerMAyjVWHPd2PwlViWt6ZI1X1uQb1woEOQ7Ep4gl 3aD1KdRF54rOwu5DxXAf15rE3YlA2jLt7EuuwxOT4mP8Qh2I9jZ6CZ68tLjfSOUMvT+M XEJergn4BkOm+pXNA00ED3kb8Vum3grfPagHXOP4MTXrr8nLNINaK83En0lof+oBFZAF /QtEiyzFb38UUQ2qFvWXXm7x0f1PqIAvsWJItb0RLHxAGpMbZqm7cP9vFCF24v3OE5nU oetIqku1WydNh8g7FGP6ba61E3Z183vnTdbzvBTDJAEiEdiHuuqOL1brAH+Eu0NYaGVu bfCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=eUfyso5q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hp24si255610ejc.590.2020.11.24.15.49.32; Tue, 24 Nov 2020 15:49:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=eUfyso5q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2404044AbgKXRrx (ORCPT + 99 others); Tue, 24 Nov 2020 12:47:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404040AbgKXRrw (ORCPT ); Tue, 24 Nov 2020 12:47:52 -0500 Received: from mail-lf1-x144.google.com (mail-lf1-x144.google.com [IPv6:2a00:1450:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52F14C061A4D for ; Tue, 24 Nov 2020 09:47:52 -0800 (PST) Received: by mail-lf1-x144.google.com with SMTP id d8so1271691lfa.1 for ; Tue, 24 Nov 2020 09:47:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aBvdxE2ZMFJMjEWowg/8vkDR+EJtYbLtmKIhC1tTWa4=; b=eUfyso5qnVBEifJ8Un7AiNAgM3bkJdE3LxUDITLN+Cp7b4V+Hmur6vRsVALDz8z6jM 7sUgbX/bPUY+WdI1ID3C4peVma+0/3KoNchjxF54EFV7aCQ0g5dZloVxY/qpercaEzqY 8llbSj9cbON+LzLGVLB8NcW+iZkjaf3VMbSAvDnUWuVIBOzw6UYZ2q6RPpISc0F3Ffu8 OoQFxGRt9FKL+lxgVfd3FUIljbypXESLb9WOCMv5AsakKM9Xwi76NXlhEdR2i91RpdvC djbyDa2By5GUoZgKn3ggm9XIfWAC9aYPcyu1ZrCG0J0HpUd5jy557qYVmw/JiWiukVb5 V4xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aBvdxE2ZMFJMjEWowg/8vkDR+EJtYbLtmKIhC1tTWa4=; b=TFSFKTUMGBnZtuhenHZNmmrww3iwI2fm7EEYOJolzbtrdKO+6NXKfSfxzpsDU2eIoa HGfOA1MGZe6jZckGjrRNc4Cmk8NPvkdATbv5GnqVCZb/BTRNONXkHRx3gW5qZA6uJqKu QyqtnZucEjD4wtd4g9L6y9+DOMAYcYe2fBB3wsQMJRzyqvg3tuamBM9tkTHP3+26oqh1 6xPpFX+ZJI9m8+8gyjMeEmXqhnkgU5sDWvwRLMa7WK0v8f4U3VoOz1nx9yLj7vQaYULt aLqnTokFdsHfAStdRFYR1qHOrD2lEN1IVsRjYPGApi4gqvi6bnDNkuoVAfZMmGnexPZU RkOA== X-Gm-Message-State: AOAM530c9t/mHjQlL0WhTWv4mwiGsvy1C4kUFQ9GmgNxsIW94qDRJj4J OGe5gos9JlgslFgYCbJN+l4PkHHSrI+XpMxK6ZgzJA== X-Received: by 2002:a19:4b48:: with SMTP id y69mr2334660lfa.576.1606240070465; Tue, 24 Nov 2020 09:47:50 -0800 (PST) MIME-Version: 1.0 References: <87lfer2c0b.fsf@oldenburg2.str.redhat.com> <20201124122639.x4zqtxwlpnvw7ycx@wittgenstein> <878saq3ofx.fsf@oldenburg2.str.redhat.com> <20201124164546.GA14094@infradead.org> In-Reply-To: From: Jann Horn Date: Tue, 24 Nov 2020 18:47:24 +0100 Message-ID: Subject: Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround To: Greg KH Cc: Christoph Hellwig , Kees Cook , Andy Lutomirski , Will Drewry , Mark Wielaard , Florian Weimer , Christian Brauner , Linux API , "open list:DOCUMENTATION" , kernel list , dev@opencontainers.org, Jonathan Corbet , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 24, 2020 at 6:44 PM Greg KH wrote: > On Tue, Nov 24, 2020 at 06:30:28PM +0100, Jann Horn wrote: > > On Tue, Nov 24, 2020 at 6:15 PM Greg KH wrote: > > > On Tue, Nov 24, 2020 at 06:06:38PM +0100, Jann Horn wrote: > > > > +seccomp maintainers/reviewers > > > > [thread context is at > > > > https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/ > > > > ] > > > > > > > > On Tue, Nov 24, 2020 at 5:49 PM Christoph Hellwig wrote: > > > > > On Tue, Nov 24, 2020 at 03:08:05PM +0100, Mark Wielaard wrote: > > > > > > For valgrind the issue is statx which we try to use before falling back > > > > > > to stat64, fstatat or stat (depending on architecture, not all define > > > > > > all of these). The problem with these fallbacks is that under some > > > > > > containers (libseccomp versions) they might return EPERM instead of > > > > > > ENOSYS. This causes really obscure errors that are really hard to > > > > > > diagnose. > > > > > > > > > > So find a way to detect these completely broken container run times > > > > > and refuse to run under them at all. After all they've decided to > > > > > deliberately break the syscall ABI. (and yes, we gave the the rope > > > > > to do that with seccomp :(). > > > > > > > > FWIW, if the consensus is that seccomp filters that return -EPERM by > > > > default are categorically wrong, I think it should be fairly easy to > > > > add a check to the seccomp core that detects whether the installed > > > > filter returns EPERM for some fixed unused syscall number and, if so, > > > > prints a warning to dmesg or something along those lines... > > > > > > Why? seccomp is saying "this syscall is not permitted", so -EPERM seems > > > like the correct error to provide here. It's not -ENOSYS as the syscall > > > is present. > > > > > > As everyone knows, there are other ways to have -EPERM be returned from > > > a syscall if you don't have the correct permissions to do something. > > > Why is seccomp being singled out here? It's doing the correct thing. > > > > AFAIU from what the others have said, it's being singled out because > > it means that for two semantically equivalent operations (e.g. > > openat() vs open()), one can fail while the other works because the > > filter doesn't know about one of the syscalls. Normally semantically > > equivalent syscalls are supposed to be subject to the same checks, and > > if one of them fails, trying the other one won't help. > > They aren't being subject to the same checks, if the seccomp permissions > are different for both of them, they will get different answers. > > Trying to use this to determine if the syscall is present or not is not > ok, and as Christian just said, needs to be fixed in userspace. We > can't change the kernel ABI now, odds are someone else relies on the api > we have had in place and it can not be changed :) I don't think anyone was proposing changes to existing kernel API.