Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp49518yba; Fri, 5 Apr 2019 01:38:20 -0700 (PDT) X-Google-Smtp-Source: APXvYqyEBSm8xKq4YXV3RsrpyxaN7DywLT6Q5h7+XDA/b40nRJRZ6703hN+MnBYnEcTEtTceTKKf X-Received: by 2002:a63:e556:: with SMTP id z22mr10427737pgj.290.1554453500132; Fri, 05 Apr 2019 01:38:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554453500; cv=none; d=google.com; s=arc-20160816; b=EcEI2Mf6COe8txwH0UopxWL8zHAz/Hk0K7lhHlZan0OduKEeWTu7I2SlSfaPBSnc8u UqRT6Ck4WuUUntgteVenOilzdgItxclcsdupJWHWeFDwF77oDN1YVFfqpQ1S2YII/fup IANw6FXpzjcBA/vV+VpezkNIMCDuk6e7bmf5I4+xTOhJp871KHCZ6zggmSFw+bcGXitM 6u1kWAvpUOIvr7O/Zc+aXonvDOKuvOkf/HJaVTTU1rcaiCd/xmFmFtmqGBxEzZzyqnlm oqw5A2H0jDoM77e7fZj6lR3jnrvIaRC938DvSfRb79QfiOHQWT8iUI2RorIx570WjFvY PYmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=2N7wXWcjuCQd6p5F45dwyZYRm+Y6so9/k4UB3LdGIJo=; b=r4re3Cb3VWrOm6H8kr9IMn+fybooaEbUWLxNdcAtHD9nzcViGRsOs7/rErWZtcKm6T k6D8iWj4h/o9uRQk9Rlk9wwIIa1VEuEnJRGu6+gx3zcJrNR32iLAbbLzP0hTPrBecXDU 0aaXF0qM8lPYl7lbR+FL/oCVt6V+deOhVDdNTSp2Cyr8+q+wBw1S+DWFwilFHqhOH4Ym v9DxugE7lkSpxghtChPWGyyjJ9K7MllfNofSXEbDYVgjkIk8eIXUcsbNMJ7kM5qwOT2U Smp1toXLqwSaObRSIxZ8SZp+Lqo/INpRtVnxhbutNQeblarhQSWuOYah9S84qQ6qR1K4 Ax9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a67si12999896pgc.80.2019.04.05.01.38.04; Fri, 05 Apr 2019 01:38:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730331AbfDEIf6 (ORCPT + 99 others); Fri, 5 Apr 2019 04:35:58 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:47324 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729792AbfDEIf5 (ORCPT ); Fri, 5 Apr 2019 04:35:57 -0400 Received: from p5492e2fc.dip0.t-ipconnect.de ([84.146.226.252] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1hCKKD-0006g9-6X; Fri, 05 Apr 2019 10:35:49 +0200 Date: Fri, 5 Apr 2019 10:35:48 +0200 (CEST) From: Thomas Gleixner To: "Chang S. Bae" cc: Ingo Molnar , Andy Lutomirski , "H . Peter Anvin" , Andi Kleen , Ravi Shankar , LKML , Dave Hansen Subject: Re: [RESEND PATCH v6 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry In-Reply-To: Message-ID: References: <1552680405-5265-1-git-send-email-chang.seok.bae@intel.com> <1552680405-5265-9-git-send-email-chang.seok.bae@intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 25 Mar 2019, Thomas Gleixner wrote: > On Fri, 15 Mar 2019, Chang S. Bae wrote: > > ENTRY(paranoid_exit) > > UNWIND_HINT_REGS > > DISABLE_INTERRUPTS(CLBR_ANY) > > TRACE_IRQS_OFF_DEBUG > > + ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase", "nop",\ > > + X86_FEATURE_FSGSBASE > > + wrgsbase %rbx > > + jmp .Lparanoid_exit_no_swapgs; > > Again. A few newlines would make it more readable. > > This modifies the semantics of paranoid_entry and paranoid_exit. Looking at > the usage sites there is the following code in the nmi maze: > > /* > * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit > * as we should not be calling schedule in NMI context. > * Even with normal interrupts enabled. An NMI should not be > * setting NEED_RESCHED or anything that normal interrupts and > * exceptions might do. > */ > call paranoid_entry > UNWIND_HINT_REGS > > /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */ > movq %rsp, %rdi > movq $-1, %rsi > call do_nmi > > /* Always restore stashed CR3 value (see paranoid_entry) */ > RESTORE_CR3 scratch_reg=%r15 save_reg=%r14 > > testl %ebx, %ebx /* swapgs needed? */ > jnz nmi_restore > nmi_swapgs: > SWAPGS_UNSAFE_STACK > nmi_restore: > POP_REGS > > I might be missing something, but how is that supposed to work when > paranoid_entry uses FSGSBASE? I think it's broken, but if it's not then > there is a big fat comment missing explaining why. So this _is_ broken. On entry: rbx = rdgsbase() wrgsbase(KERNEL_GS) On exit: if (ebx == 0) swapgs The resulting matrix: | ENTRY GS | RBX | EXIT | GS on IRET | RESULT | | | | | 1 | KERNEL_GS | KERNEL_GS | EBX == 0 | USER_GS | FAIL | | | | | 2 | KERNEL_GS | KERNEL_GS | EBX != 0 | KERNEL_GS | ok | | | | | 3 | USER_GS | USER_GS | EBX == 0 | USER_GS | ok | | | | | 4 | USER_GS | USER_GS | EBX != 0 | KERNEL_GS | FAIL #1 Just works by chance because it's unlikely that the lower 32bits of a per CPU kernel GS are all 0. But it's just a question of probability that this turns into a non-debuggable once per year crash (think KASLR). #4 This can happen when the NMI hits the kernel in some other entry code _BEFORE_ or _AFTER_ swapgs. User space using GS addressing with GS[31:0] != 0 will crash and burn. IIRC FSGSBASE is about fast user space GS switching with (almost) no limits on the value ... Oh well. tglx