Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1811719pxu; Tue, 24 Nov 2020 09:26:05 -0800 (PST) X-Google-Smtp-Source: ABdhPJzQKT6tqSwb2G3JG9bvNKVg1Qd0Cs+VgghFcKDFczbi5MrQdRG9qOSzPmoPMaK/zOE4ypm2 X-Received: by 2002:a17:906:4705:: with SMTP id y5mr5139594ejq.112.1606238764844; Tue, 24 Nov 2020 09:26:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606238764; cv=none; d=google.com; s=arc-20160816; b=zPC49uORPUyJi6SUj+y4dbOQA+f8R1myk6HWQsW2SQQxZ2IH+fpY/0TxfHtf0aN112 5chbCg186x0cfDoXEh0FPteqW8aSjufWkc47b5z60I3ay54bq7o/dYg4r75JWF734pEj 9Pu8+p6B4s2kR3d1iswsuLrfUPMXKgOJqj5bz/Sk32AQ9Q9mq8GmpT2j6UNk7smiR5u7 1tWSRgq6Aor1L0MWdIQ21MGk25AzqeGEaej+FPDs5UwKKyQgZmhyedSeB5OyZA8AouC7 pZiTMobo+z0Np4XSzdloeP0mUsuH2MvUw/5wTTViY94ZvEpwGMjMMJ3l1ZDr2CdX6hui kzNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=rvZM0VXhiF1MCBMb0d1micy1RSQ/adVS8NewZDHzCgk=; b=ZtO0HF41J9q8oW082F60kA2iBD0WsMc4+H6kwlQ8dNcomc4tHa6jPHQ5JOE32Y3SVW 0mMJSqY2eTVzMnDRaVc0+q/PhHaMQHCG+AG+B1e+KsDJ/A3zpJoX81W3IpnClGypCRgV wb0/aWghCAC+9bwpcT6ae4NCWlrrs+yh9NUzWdR1072WHu8VfkccPUAny46Fevj46QJG suzRapHEvXELR9gm6h5XRLE0Lp3unPAhB/UYk7VYKeRayn1NXWgT624mZ1NwJgABFfd0 JNW2M/CrztF1eKLmOJYmIWN+sJC8T2/ERPtkyJYYBh/RUNvXdhpBU3g7NNeSkjxQhywV MUrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q19si10415184edv.85.2020.11.24.09.25.41; Tue, 24 Nov 2020 09:26:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390710AbgKXRWB (ORCPT + 99 others); Tue, 24 Nov 2020 12:22:01 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:33618 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731584AbgKXRWB (ORCPT ); Tue, 24 Nov 2020 12:22:01 -0500 Received: from ip5f5af0a0.dynamic.kabel-deutschland.de ([95.90.240.160] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1khc0m-0001KZ-W8; Tue, 24 Nov 2020 17:21:53 +0000 Date: Tue, 24 Nov 2020 18:21:52 +0100 From: Christian Brauner To: Greg KH Cc: Jann Horn , Christoph Hellwig , Kees Cook , Andy Lutomirski , Will Drewry , Mark Wielaard , Florian Weimer , Linux API , "open list:DOCUMENTATION" , kernel list , dev@opencontainers.org, Jonathan Corbet , Carlos O'Donell Subject: Re: [PATCH] syscalls: Document OCI seccomp filter interactions & workaround Message-ID: <20201124172152.q5egylertvj3zp3w@wittgenstein> References: <87lfer2c0b.fsf@oldenburg2.str.redhat.com> <20201124122639.x4zqtxwlpnvw7ycx@wittgenstein> <878saq3ofx.fsf@oldenburg2.str.redhat.com> <20201124164546.GA14094@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 24, 2020 at 06:15:36PM +0100, Greg KH wrote: > On Tue, Nov 24, 2020 at 06:06:38PM +0100, Jann Horn wrote: > > +seccomp maintainers/reviewers > > [thread context is at > > https://lore.kernel.org/linux-api/87lfer2c0b.fsf@oldenburg2.str.redhat.com/ > > ] > > > > On Tue, Nov 24, 2020 at 5:49 PM Christoph Hellwig wrote: > > > On Tue, Nov 24, 2020 at 03:08:05PM +0100, Mark Wielaard wrote: > > > > For valgrind the issue is statx which we try to use before falling back > > > > to stat64, fstatat or stat (depending on architecture, not all define > > > > all of these). The problem with these fallbacks is that under some > > > > containers (libseccomp versions) they might return EPERM instead of > > > > ENOSYS. This causes really obscure errors that are really hard to > > > > diagnose. > > > > > > So find a way to detect these completely broken container run times > > > and refuse to run under them at all. After all they've decided to > > > deliberately break the syscall ABI. (and yes, we gave the the rope > > > to do that with seccomp :(). > > > > FWIW, if the consensus is that seccomp filters that return -EPERM by > > default are categorically wrong, I think it should be fairly easy to > > add a check to the seccomp core that detects whether the installed > > filter returns EPERM for some fixed unused syscall number and, if so, > > prints a warning to dmesg or something along those lines... > > Why? seccomp is saying "this syscall is not permitted", so -EPERM seems > like the correct error to provide here. It's not -ENOSYS as the syscall > is present. > > As everyone knows, there are other ways to have -EPERM be returned from > a syscall if you don't have the correct permissions to do something. > Why is seccomp being singled out here? It's doing the correct thing. The correct solution to this problem is simple: the standard and the problematic container runtimes need to be fixed to return ENOSYS as I said in my first mail. Imho, the kernel neither should need to log anything or be opinionated about what error is correct or not. Imho, this is a broken standard and that's where the story ends. We've had that argument about ENOSYS being the correct errno in such scenarios in userspace already and that's been ignored for _years_. Now, as could be expected it's suddenly the kernel who's supposed to fix this. That's totally wrong imho. Christian