Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3030428yba; Mon, 6 May 2019 15:58:02 -0700 (PDT) X-Google-Smtp-Source: APXvYqyMU6Hr+52JY4nFjB7zvYLeIiaw5Y4yxoZ9k7h++Rxe2qJWFONIVKNUBlwoWbG9d0UVaFJb X-Received: by 2002:a65:5c82:: with SMTP id a2mr35861582pgt.378.1557183482856; Mon, 06 May 2019 15:58:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557183482; cv=none; d=google.com; s=arc-20160816; b=Tg50Dmc+TY69EpRwYqXNn0AJGkslHwAo2ruRNhlUj2U+J5n6i1kan45u13gU1IJSYg i4bDgcsuniE8icQZJbfUvkq3uWDYkLbNZtXTYZ6iyvhR4k5S1B2WjJNWT2zq/7ObSHOI EPh2s8yRaBRGxd4yW3MySFm6vKCBiRK4DnWsuy8qDeScyV4UnLpCgEZ7KYqm+F7xlIGF aFi1kox27mqqtjkDCQ+jr3HaU68hjnh893sPStvnxFS1Athzzv65KPJyuzvn+cSnmauC Jqeg2U+5lL3BEJ1yz1V+zw5t5NezdCwx22zU8haO2J1+9h07hu6MSwYp4+6pune+UA1H JiUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from; bh=HVy+bBJmWhKlelMSVR0c1BDAPCdacJX7C0WAFTX/39c=; b=e5WoOCbv8fShdJjoYR8SNmjnjcfUP4MA71lESR1Lyn4M2WZvkoMXNex97TOP5tTbXu BplsTEegqkzeBB3O3Py4JFgNuzLsmAMsNBgvwaGf1AxEhmJkNq789otdYkj6wqOGH3vz XKFPL3GwUGugTguNbI7MiI/RNsTKPyA6fnEw+QG6l9PPfXEeHWGVyQU6gds/F8vVabRx msmqvblhyA+DO4TURzMJtO0BQNiRs2t5aEJ7YPmPKvlWuF+AHdm8UQBqq50CIy4R3tWM rY3zygjSRuAQnlLc5hdhXevwbE/go4ItIr1aV9ao6LhwJFlMHPp1alYoXswD92CFqVZh T3QA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o32si18607367pld.190.2019.05.06.15.57.47; Mon, 06 May 2019 15:58:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726337AbfEFW4p convert rfc822-to-8bit (ORCPT + 99 others); Mon, 6 May 2019 18:56:45 -0400 Received: from mga04.intel.com ([192.55.52.120]:19501 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726218AbfEFW4p (ORCPT ); Mon, 6 May 2019 18:56:45 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 May 2019 15:56:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,439,1549958400"; d="scan'208";a="137572560" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga007.jf.intel.com with ESMTP; 06 May 2019 15:56:44 -0700 Received: from crsmsx103.amr.corp.intel.com (172.18.63.31) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.408.0; Mon, 6 May 2019 15:56:44 -0700 Received: from crsmsx101.amr.corp.intel.com ([169.254.1.116]) by CRSMSX103.amr.corp.intel.com ([169.254.4.184]) with mapi id 14.03.0415.000; Mon, 6 May 2019 16:56:41 -0600 From: "Bae, Chang Seok" To: Thomas Gleixner CC: Ingo Molnar , Andy Lutomirski , "H . Peter Anvin" , Andi Kleen , "Shankar, Ravi V" , LKML , Dave Hansen Subject: Re: [RESEND PATCH v6 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry Thread-Topic: [RESEND PATCH v6 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry Thread-Index: AQHU22qy2YcgqOSDVkqDsKh+C1SMYaYcjFAAgELfX4A= Date: Mon, 6 May 2019 22:56:41 +0000 Message-ID: <4A00F484-CA9B-4A75-9E33-F38CFD8B9638@intel.com> References: <1552680405-5265-1-git-send-email-chang.seok.bae@intel.com> <1552680405-5265-9-git-send-email-chang.seok.bae@intel.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.144.153.227] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 25, 2019, at 02:44, Thomas Gleixner wrote: > > On Fri, 15 Mar 2019, Chang S. Bae wrote: > >> The FSGSBASE instructions allow fast accesses on GSBASE. Now, at the >> paranoid_entry, the per-CPU base value can be always copied to GSBASE. >> And the original GSBASE value will be restored at the exit. > > Again you are describing WHAT but not the WHY. > >> So far, GSBASE modification has not been directly allowed from userspace. >> So, swapping GSBASE has been conditionally executed according to the >> kernel-enforced convention that a negative GSBASE indicates a kernel value. >> But when FSGSBASE is enabled, userspace can put an arbitrary value in >> GSBASE. The change will secure a correct GSBASE value with FSGSBASE. > > So that's some WHY, but it should be explained _BEFORE_ explaining the > change. This changelog style is as bad as top posting. Why? > > 1) FSGSBASE is fast > > 2) Copy GSBASE always on paranoid exit and restore on entry > > 3) Explain the context > > No. You want to explain context first and then explain why this needs a > change when FSGSBASE is enabled and how that change looks like at the > conceptual level. > >> Also, factor out the RDMSR-based GSBASE read into a new macro, >> READ_MSR_GSBASE. > > This new macro is related to this change in what way? None AFAICT. I'm fine > with the macro itself, but the benefit for a single usage site is dubious. > > Adding this macro and using it should be done with a separate patch before > this one, so this patch becomes simpler to review. > >> /* >> @@ -1178,9 +1185,38 @@ ENTRY(paranoid_entry) >> * This is also why CS (stashed in the "iret frame" by the >> * hardware at entry) can not be used: this may be a return >> * to kernel code, but with a user CR3 value. >> + * >> + * As long as this PTI macro doesn't depend on kernel GSBASE, >> + * we can do it early. This is because FIND_PERCPU_BASE >> + * references data in kernel space. > > It's not about 'can do it early'. The FSGSBASE handling requires that the > kernel page tables are switched in. > > And for review and bisectability sake moving the CR3 switch in front of the > GS handling should be done as a separate preparatory patch. > >> */ >> SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 >> >> + /* >> + * Read GSBASE by RDGSBASE. Kernel GSBASE is found >> + * from the per-CPU offset table with a CPU NR. > > That CPU NR comes out of thin air, right? This code is complex enough by > itself and does not need further confusion by comments which need a crystal > ball for decoding. > >> + */ > > Sigh. I can't see how that comment explains the ALTERNATIVE jump. > >> + ALTERNATIVE "jmp .Lparanoid_entry_no_fsgsbase", "",\ >> + X86_FEATURE_FSGSBASE > > Please separate the above from the below with a new line for readability > sake. > >> + rdgsbase %rbx >> + FIND_PERCPU_BASE %rax >> + wrgsbase %rax > > So this really should be wrapped in a macro like: > > SAVE_AND_SET_GSBASE %rbx, %rax > > which makes it entirely clear what this is about. > >> + ret >> + > >> @@ -1194,12 +1230,21 @@ END(paranoid_entry) >> * be complicated. Fortunately, we there's no good reason >> * to try to handle preemption here. >> * >> - * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it) >> + * On entry, >> + * With FSGSBASE, >> + * %rbx is original GSBASE that needs to be restored on the exit >> + * Without that, >> + * %ebx is "no swapgs" flag (1: don't need swapgs, 0: need it) >> */ >> ENTRY(paranoid_exit) >> UNWIND_HINT_REGS >> DISABLE_INTERRUPTS(CLBR_ANY) >> TRACE_IRQS_OFF_DEBUG >> + ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase", "nop",\ >> + X86_FEATURE_FSGSBASE >> + wrgsbase %rbx >> + jmp .Lparanoid_exit_no_swapgs; > > Again. A few newlines would make it more readable. > > This modifies the semantics of paranoid_entry and paranoid_exit. Looking at > the usage sites there is the following code in the nmi maze: > > /* > * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit > * as we should not be calling schedule in NMI context. > * Even with normal interrupts enabled. An NMI should not be > * setting NEED_RESCHED or anything that normal interrupts and > * exceptions might do. > */ > call paranoid_entry > UNWIND_HINT_REGS > > /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ > movq %rsp, %rdi > movq $-1, %rsi > call do_nmi > > /* Always restore stashed CR3 value (see paranoid_entry) */ > RESTORE_CR3 scratch_reg=%r15 save_reg=%r14 > > testl %ebx, %ebx /* swapgs needed? */ > jnz nmi_restore > nmi_swapgs: > SWAPGS_UNSAFE_STACK > nmi_restore: > POP_REGS > Oh!, almost miss this bit. Will be terrifying if leave them like this way. > I might be missing something, but how is that supposed to work when > paranoid_entry uses FSGSBASE? I think it's broken, but if it's not then > there is a big fat comment missing explaining why. > You will see a revision shortly. Thanks Chang