Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp696129ybt; Wed, 8 Jul 2020 09:25:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzdVuk7K/lKJFNXFixWol7HHR7j9U0NbGGAtOL0YEoNp1TAyC9Tv8qZOtFK9JpaAT7SGMHK X-Received: by 2002:a17:906:4a87:: with SMTP id x7mr53938624eju.44.1594225517786; Wed, 08 Jul 2020 09:25:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594225517; cv=none; d=google.com; s=arc-20160816; b=gUE0qRozAN1kcU22Y3EGfFZu9kZdCWY3UmUqvFDde1YUIJtwhq3byUuBIBn3fwC46N ElI91jop8jKQXhXKy6D1DUgZYkSJB9zto/qXyjorKVJA4PXWEVYrbHjRLxH7/9Qzlcve PkpryUCLxJGT4Bo7FTjnl3mYxPsTWGtX6EPlTW6sQ5rFDRf6YNmWkCDZGd2z8jvmWvqT ijgbc9/6TmGglDz1uGd2suwygzaCekb4pJACo85cRGNcxU6pZg/dMo9IUaeQE1jq0xTH i2c1rrIJGAbYX1GWxK1GGV66ky3y/m6LQQ0icZ5qFeTxqnAGXtFKXj2ONlkM1e2FMHXc oQxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=zZhEs24Buporkj94isSJEZZw6hfUvUPau9fcLRCEczc=; b=RpwOFLXynZ3q2cDE9pkLIEGliiAYbC3oFSVgyIsNHYMZqyADALwg+vqbWMSOohAaPx TUmqps1dEiENQswH3b4ZlXin1ZOX/ZON1ZonY9SXV/8feLWMRhAjleR2bw2IuNKvtDgb 4DTgwiJh28FyMhokD2S/E2eMlg5OvqnW2mRslY1wi/hBaK9ARqaOPz+oL7p79icbgcxA HN4jE3H8mJfM59DVJCN4ok9whNZC27+3ZZW9RSxBYUxLKdAFrdyCoh+6DEvxSSNarrDq IGM91+IGADuwwFdJJ2xOjb64isnVgF0peXB4dvN4j4ZSh64sorui+ljxRPQ0koLIofiW 129A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b12si265315edq.471.2020.07.08.09.24.54; Wed, 08 Jul 2020 09:25:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730665AbgGHQW7 (ORCPT + 99 others); Wed, 8 Jul 2020 12:22:59 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:59137 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730278AbgGHQW6 (ORCPT ); Wed, 8 Jul 2020 12:22:58 -0400 Received: from ip5f5af08c.dynamic.kabel-deutschland.de ([95.90.240.140] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jtCqO-0003Mm-3u; Wed, 08 Jul 2020 16:22:48 +0000 Date: Wed, 8 Jul 2020 18:22:47 +0200 From: Christian Brauner To: Mathieu Desnoyers Cc: Florian Weimer , Linus Torvalds , carlos , Thomas Gleixner , linux-kernel , Peter Zijlstra , paulmck , Boqun Feng , "H. Peter Anvin" , Paul Turner , linux-api , Dmitry Vyukov , Neel Natu Subject: Re: [RFC PATCH for 5.8 3/4] rseq: Introduce RSEQ_FLAG_RELIABLE_CPU_ID Message-ID: <20200708162247.txdleelcalxkrfjy@wittgenstein> References: <20200706204913.20347-1-mathieu.desnoyers@efficios.com> <20200706204913.20347-4-mathieu.desnoyers@efficios.com> <87fta3zstr.fsf@mid.deneb.enyo.de> <2088331919.943.1594118895344.JavaMail.zimbra@efficios.com> <874kqjzhkb.fsf@mid.deneb.enyo.de> <378862525.1039.1594123580789.JavaMail.zimbra@efficios.com> <87zh8bw158.fsf@mid.deneb.enyo.de> <1448906726.3717.1594222431276.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1448906726.3717.1594222431276.JavaMail.zimbra@efficios.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 08, 2020 at 11:33:51AM -0400, Mathieu Desnoyers wrote: > [ Context for Linus: I am dropping this RFC patch, but am curious to > hear your point of view on exposing to user-space which system call > behavior fixes are present in the kernel, either through feature > flags or system-call versioning. The intent is to allow user-space > to make better decisions on whether it should use a system call or > rely on fallback behavior. ] > > ----- On Jul 7, 2020, at 3:55 PM, Florian Weimer fw@deneb.enyo.de wrote: > > > * Carlos O'Donell: > > > >> It's not a great fit IMO. Just let the kernel version be the arbiter of > >> correctness. > > > > For manual review, sure. But checking it programmatically does not > > yield good results due to backports. Even those who use the stable > > kernel series sometimes pick up critical fixes beforehand, so it's not > > reliable possible for a program to say, “I do not want to run on this > > kernel because it has a bad version”. We had a recent episode of this > > with the Go runtime, which tried to do exactly this. > > FWIW, the kernel fix backport issue would also be a concern if we exposed > a numeric "fix level version" with specific system calls: what should > we do if a distribution chooses to include one fix in the sequence, > but not others ? Identifying fixes are "feature flags" allow > cherry-picking specific fixes in a backport, but versions would not > allow that. > > That being said, maybe it's not such a bad thing to _require_ the > entire series of fixes to be picked in backports, which would be a > fortunate side-effect of the per-syscall-fix-version approach. > > But I'm under the impression that such a scheme ends up versioning > a system call, which I suspect will be a no-go from Linus' perspective. I've been following this a little bit. The kernel version itself doesn't really mean anything and the kernel version is imho not at all interesting to userspace applications. Especially for cross-distro programs. We can't go around and ask Red Hat, SUSE, Ubuntu, Archlinux, openSUSE and god knows who what other distro what their fixed kernel version is. That's not feasible at all and not how must programs do it. Sure, a lot of programs name a minimal kernel version they require but realistically we can't keep bumping it all the time. So the best strategy for userspace imho has been to introduce a re-versioned flag or enum that indicates the fixed behavior. So I would suggest to just introduce RSEQ_FLAG_REGISTER_2 = (1 << 2), that's how these things are usually done (Netlink etc.). So not introducing a fix bit or whatever but simply reversion your flag/enum. We already deal with this today. (Also, as a side-note. I see that you're passing struct rseq *rseq with a length argument but you are not versioning by size. Is that intentional? That basically somewhat locks you to the current struct rseq layout and means users might run into problems when you extend struct rseq in the future as they can't pass the new struct down to older kernels. The way we deal with this is now - rseq might preceed this - is copy_struct_from_user() (for example in sched_{get,set}attr(), openat2(), bpf(), clone3(), etc.). Maybe you want to switch to that to keep rseq extensible? Users can detect the new rseq version by just passing a larger struct down to the kernel with the extra bytes set to 0 and if rseq doesn't complain they know they're dealing with an rseq that knows larger struct sizes. Might be worth it if you have any reason to belive that struct rseq might need to grow.) Christian