Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp5734831img; Wed, 27 Mar 2019 14:17:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqw49pl3bIGKL61vitmbLLna+kqpTVGGG+m9/KCrYdB+kW8QOCiN7jVOOqiR6JC3jfkunY+o X-Received: by 2002:a63:c45:: with SMTP id 5mr29337860pgm.385.1553721425652; Wed, 27 Mar 2019 14:17:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553721425; cv=none; d=google.com; s=arc-20160816; b=lrtDzw3Mq0nL4euTcN28UaXu8dJU0A+DxOyd5UjxOeVL6JAEm/1gq1cwn8UvMmgW6N wzROLkBIG5I4wepw1yc1WRSHIWwdiaOKqtffVrZY+SJArhwS52a8h2Nj4R4FG+Eqwdbr YFsILYXmE2wQnDG7VuGT1jBcLomhRHMOvlNqt6gO592SdUeEUcNIExoli+nT7igo4Kev XJPsvei8DueTH+9oDbft1wokN7HRgpmozGqZTGToLtydKYOnHTO0N4L2FJFYVyrDyRxr RkkQOFP9x7Qj4fI2qCc1nsi5eTj2674bDOEUnoLOV/wP3haknaNWPr0KINLq4QQ3WusD HJpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=pg+eJc7xaSqtk28WVDJFJv8aoDI0OQUPmkBsZ0niSO8=; b=f0irIb15CZeJOqjKlWlPDHVGp236GGijXvjf7w+QQziqLpa+uWISxXib8DaSAwHtzr YKVM/MLlzQPXRzGbuIs5e4fpXmf0tW5FrbbJ7z6C4qy3sYlUnFq+ckVAsjSWhq9TlaiV nH1/D9TJfsbrq8HprNY8/2PNy8MRDvkzyo/JQgOtRsWju/gvbhXN5GnB5n1XW19FeAtN GYR75apkHWI4kQlnPbC3iKc7aQ6U29t4+IlenTahIkTP/Lqd9nuv4Ux9KA/UADG1HL+l BHHWJOtdZ0c7PfMl2onLFufd/Crw9Sy+5PtZOFEbUBKSsHGyxsR7zYKwZ/GFXmxJLHmk 7BVw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b8si19611153plr.54.2019.03.27.14.16.48; Wed, 27 Mar 2019 14:17:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727376AbfC0VQE (ORCPT + 99 others); Wed, 27 Mar 2019 17:16:04 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:51704 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725972AbfC0VQE (ORCPT ); Wed, 27 Mar 2019 17:16:04 -0400 Received: from p5492e2fc.dip0.t-ipconnect.de ([84.146.226.252] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1h9Ftm-0002En-OI; Wed, 27 Mar 2019 22:15:50 +0100 Date: Wed, 27 Mar 2019 22:15:50 +0100 (CET) From: Thomas Gleixner To: Andi Kleen cc: "Chang S. Bae" , Ingo Molnar , Andy Lutomirski , "H . Peter Anvin" , Ravi Shankar , LKML , Andrew Cooper , x86@kernel.org, Linus Torvalds , Greg KH , Arjan van de Ven Subject: Re: New feature/ABI review process [was Re: [RESEND PATCH v6 04/12] x86/fsgsbase/64:..] In-Reply-To: <20190326225638.GQ18020@tassilo.jf.intel.com> Message-ID: References: <1552680405-5265-1-git-send-email-chang.seok.bae@intel.com> <1552680405-5265-5-git-send-email-chang.seok.bae@intel.com> <20190326003804.GK18020@tassilo.jf.intel.com> <20190326225638.GQ18020@tassilo.jf.intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 26 Mar 2019, Andi Kleen wrote: > As long as everything is cache hot it's likely only a couple > of cycles difference (as Intel CPUs are very good executing > crappy code too), but if it's not then you end up with a huge cache miss > cost, causing jitter. That's a problem for real time for example. That extra cache miss is really not the worst issue for realtime. The inherent latencies of contemporary systems have way worse to offer than that. Any realtime system has to cope with the worst case and an extra cache miss is not the end of the world. > > > Accessing user GSBASE needs a couple of SWAPGS operations. It is > > > avoidable if the user GSBASE is saved at kernel entry, being updated as > > > changes, and restored back at kernel exit. However, it seems to spend > > > more cycles for savings and restorations. Little or no benefit was > > > measured from experiments. > > > > So little or no benefit was measured. I don't see how that maps to your > > 'SWAPGS will be a lot faster' claim. One of those claims is obviously > > wrong. > > If everything is cache hot it won't make much difference, > but if you have a cache miss you end up eating the cost. > > > > > Aside of this needs more than numbers: > > > > 1) Proper documentation how the mixed bag is managed. > > How SWAPGS is managed? > > Like it always was since 20+ years when the x86_64 > port was originally born. I know how SWAPGS works. > The only case which has to do an two SWAPGS is the > context switch when it switches the base. Everything else > just does SWAPGS at the edges for kernel entries. And exactly here is the problem. You are not even describing it correctly now: You cannot do SWAPGS on _all_ edges. You cannot do SWAPGS in the paranoid entry when FSGSBASE is in use, because user space can write arbitrary values into GS. Which breaks the existing differentiation of kernel/user GS. That's why you have the FSGSBASE variant there. Is that documented? The changelog has some convoluted description of it: "The FSGSBASE instructions allow fast accesses on GSBASE. Now, at the paranoid_entry, the per-CPU base value can be always copied to GSBASE. And the original GSBASE value will be restored at the exit." So that part blurbs about fast access and comes first. Really useful. "So far, GSBASE modification has not been directly allowed from userspace. So, swapping GSBASE has been conditionally executed according to the kernel-enforced convention that a negative GSBASE indicates a kernel value. But when FSGSBASE is enabled, userspace can put an arbitrary value in GSBASE. The change will secure a correct GSBASE value with FSGSBASE." I can decode that because I'm familiar with the inner workings of the paranoid entry code. But that changelog is just not providing properly structured information and the full context. What's worse is the comment in the code itself: + * When FSGSBASE enabled, current GSBASE is always copied to %rbx. Where is the documentation that FSGSBASE is required to be used here and why? I can blody well see from the code that the FSGSBASE path does this unconditionally. But that does not explain why and it does not explain why FSGSBASE is not used all over the place instead of SWAPGS and just here. + * Without FSGSBASE, SWAPGS is needed when entering from userspace. + * A positive GSBASE means it is a user value and a negative GSBASE + * means it is a kernel value. So this has more explanation about the SWAPGS mode than about the subtlities of FSGSBASE. This stuff wants to be documented in great length for everyones sake including yourself when you have to stare into that code a year from now. I don't care about you're headache but I care about mine and that of people who might end up debugging some subtle bug in that area. Thanks, tglx