Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753265AbeADUpT (ORCPT + 1 other); Thu, 4 Jan 2018 15:45:19 -0500 Received: from mga06.intel.com ([134.134.136.31]:17029 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752550AbeADUpS (ORCPT ); Thu, 4 Jan 2018 15:45:18 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,315,1511856000"; d="scan'208";a="24388731" From: Dave Hansen Subject: Re: [PATCH 3/7] x86/enter: Use IBRS on syscall and interrupts To: Tim Chen , Thomas Gleixner , Andy Lutomirski , Linus Torvalds , Greg KH References: <0c525c4c6c817e9c42c7ed583d86dc591a86efde.1515086770.git.tim.c.chen@linux.intel.com> Cc: Andrea Arcangeli , Andi Kleen , Arjan Van De Ven , linux-kernel@vger.kernel.org Message-ID: Date: Thu, 4 Jan 2018 12:45:14 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <0c525c4c6c817e9c42c7ed583d86dc591a86efde.1515086770.git.tim.c.chen@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/04/2018 09:56 AM, Tim Chen wrote: > If NMI runs when exiting kernel between IBRS_DISABLE and > SWAPGS, the NMI would have turned on IBRS bit 0 and then it would have > left enabled when exiting the NMI. IBRS bit 0 would then be left > enabled in userland until the next enter kernel. > > That is a minor inefficiency only, but we can eliminate it by saving > the MSR when entering the NMI in save_paranoid and restoring it when > exiting the NMI. Can I suggest and alternate description for the NMI case? This is long-winded, but it should keep me from having to think through it yet again. :) " The normal interrupt code uses the 'error_entry' path which uses the Code Segment (CS) of the instruction that was interrupted to tell whether it interrupted the kernel or userspace and thus has to switch IBRS, or leave it alone. The NMI code is different. It uses 'paranoid_entry' because it can interrupt the kernel while it is running with a userspace IBRS (and %GS and CR3) value, but has a kernel CS. If we used the same approach as the normal interrupt code, we might do the following; SYSENTER_entry <-------------- NMI HERE IBRS=1 do_something() IBRS=0 SYSRET The NMI code might notice that we are running in the kernel and decide that it is OK to skip the IBRS=1. This would leave it running unprotected with IBRS=0, which is bad. However, if we unconditionally set IBRS=1, in the NMI, we might get the following case: SYSENTER_entry IBRS=1 do_something() IBRS=0 <-------------- NMI HERE (set IBRS=1) SYSRET and we would return to userspace with IBRS=1. Userspace would run slowly until we entered and exited the kernel again. (This is the case Tim is alluding to in the patch description). Instead of those two approaches, we chose a third one where we simply save the IBRS value in a scratch register (%r13) and then restore that value, verbatim. This is what PTI does with CR3 and it works beautifully. "