Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754299AbcCOAPJ (ORCPT ); Mon, 14 Mar 2016 20:15:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:57100 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750853AbcCOAPF (ORCPT ); Mon, 14 Mar 2016 20:15:05 -0400 Date: Tue, 15 Mar 2016 01:15:01 +0100 From: "Luis R. Rodriguez" To: Toshi Kani , Juergen Gross , Boris Ostrovsky , Matt Fleming , Olof Johansson , Paul Stewart Cc: "Luis R. Rodriguez" , Borislav Petkov , Ingo Molnar , "H. Peter Anvin" , Thomas Gleixner , Paul Gortmaker , X86 ML , "linux-kernel@vger.kernel.org" , xen-devel@lists.xensource.com Subject: Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code Message-ID: <20160315001501.GF25147@wotan.suse.de> References: <1457671546-13486-1-git-send-email-toshi.kani@hpe.com> <1457671546-13486-3-git-send-email-toshi.kani@hpe.com> <20160311092400.GB4347@pd.tnic> <1457722632.6393.130.camel@hpe.com> <20160311221747.GC25147@wotan.suse.de> <1457740571.6393.236.camel@hpe.com> <1457745396.6393.257.camel@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1457745396.6393.257.camel@hpe.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10315 Lines: 201 On Fri, Mar 11, 2016 at 06:16:36PM -0700, Toshi Kani wrote: > On Fri, 2016-03-11 at 15:34 -0800, Luis R. Rodriguez wrote: > > On Fri, Mar 11, 2016 at 3:56 PM, Toshi Kani wrote: > > > On Fri, 2016-03-11 at 23:17 +0100, Luis R. Rodriguez wrote: > > > > On Fri, Mar 11, 2016 at 11:57:12AM -0700, Toshi Kani wrote: > > > > > On Fri, 2016-03-11 at 10:24 +0100, Borislav Petkov wrote: > > > > > > On Thu, Mar 10, 2016 at 09:45:46PM -0700, Toshi Kani wrote: > > > > > > > MTRR manages PAT initialization as it implements a rendezvous > > > > > > > handler that initializes PAT as part of MTRR initialization. >  : > > > > > > > > > > No, it does not fix it. The problem in this particular case, i.e. > > > > > MTRR disabled by its MSR, is that mtrr_bp_init() calls pat_init() > > > > > (as PAT enabled) and initializes PAT on BSP. After APs are > > > > > launched, we need the MTRR's rendezvous handler to initialize PAT > > > > > on APs to be consistent with BSP. However, MTRR rendezvous handler > > > > > is no-op since MTRR is disabled. > > > > > > > > This seems like a hack on enabling PAT through MTRR code, can we have > > > > a PAT rendezvous handler on its own, or provide a generic rendezvous > > > > handler that lets you deal with whatever interfaces need setup. Then > > > > conflicts can just be negotiated early. > > > > > > The MTRR code can be enhanced so that the rendezvous handler can handle > > > MTRR and PAT state independently.  I noted this case as (*) in the > > > table of this patch description.  This is a separate item, however. > > > > > > MTRR calling PAT was not a hack (as I suppose we did not have VMs at > > > that time), although this can surely be improved.  As Intel SDM state > > > below, both MTRR and PAT require the same procedure, and the PAT > > > initialization sequence is defined in the MTRR section. > > > > > > === > > > 11.12.4 Programming the PAT > > >  : > > > The operating system is responsible for insuring that changes to a PAT > > > entry occur in a manner that maintains the consistency of the processor > > > caches and translation lookaside buffers (TLB). This is accomplished by > > > following the procedure as specified in Section 11.11.8, “MTRR > > > Considerations in MP Systems,” for changing the value of an MTRR in a > > > multiple processor system. It requires a specific sequence of > > > operations that includes flushing the processors caches and TLBs. > > > === > > > > > > > What I'm after is seeing if we can ultimately disable MTRR on kernel > > > > code but still have PAT enabled. I realize you've mentioned BIOS code > > > > may use some MTRR setup code but this is only true for some systems. > > > > I know for a fact Xen cannot use MTRR, it seems qemu32 does not > > > > enable > > > > it either. So why not have the ability to skip through its set up ? > > > > > > MTRR support has two meanings: > > >  1) The kernel keeps the MTRR setup by BIOS. > > >  2) The kernel modifies the MTRR setup. > > > > > > I am in a position that we need 1) but 2). > > > > I take it you meant "but not 2)" ? > > Yes. :) OK -- we are in agreement but we know 1) is only needed for a portion of systems: Xen and qemu32 systems fly with no MTRR set up, and as such it would be incorrect to run MTRR code on such systems. To these systems MTRR functionality code should be dead, since PAT currently depends on MTRR PAT should also be dead but as the report you're fixing shows it wasn't. That's an issue for qemu that uses the regular x86 init path but not for Xen. Its different for Xen as the hypervisor is the one that set up the MSR_IA32_CR_PAT for each CPU. The *only* thing Xen does is: void xen_start_kernel(void) { ... rdmsrl(MSR_IA32_CR_PAT, pat); pat_init_cache_modes(pat); ... } Fortunately we only have to call pat_init_cache_modes() once, its not per-CPU. Xen has shown then that you *can* live with PAT without any of the complex MTRR setup / code. Please add to your table the Xen case as well then as its important to consider. If you make it a strong requirement to have MTRR enabled to enable PAT you'd be disabling PAT on Xen guest boots. As-is then your this patch which calls pat_disable() on mtrr_bp_init() for the case where MTRR is disabled would essentially break PAT on Xen guests, so this cannot be done. It is no longer true that if MTRR is disabled you can force disable PAT. To do what you want you want to do we have to consider Xen. I don't think its a good idea to keep PAT initialization meshed together with MTRR and making it a strong requirement on enabling PAT. The MTRR code is extremely complex. I'd like instead to encourage for us to consider for this situation to let PAT become a first class citizen, if MTRR is disabled but you've enabled PAT you should be able to use it, just as Xen does. There are more reasons to enable such setup than not to. Long term I'm advocating to see if we can get an ACPI legacy MTRR flag that can tell us if the BIOS has MTRR ripped out, then we can at run time also take advantage of ignoring PAT completely as well. Note, if you insisted you didn't want to disable PAT on Xen, you could in theory check for the subarch -- but note that the subarch is unused yet on Xen, even though it was added to the x86 boot protocol years ago. I have a slew of patches to make use of it to help put paravirt_enabled() in the grave, but based on discussions with Ingo, we don't want to spread use of the subarch in random x86 code paths, we want to compartamentalize that. If you still want to follow your approach of just force-disabling PAT on MTRR code if MTRR was disabled you'd have to use semantics to figure out if the boot path came from Xen, to be more specific for Xen PV guest types only... The current agreed approach to avoid directly using subarch is to categorize differences between what some guests need and bare metal under an x86 platform quirk and legacy set of components. On the x86 init path we'd call something check for the subarch and based on that set a series of x86 legacy features / quirks that need to be disabled / enabled. We could add MTRR as one. I'm unifying some of this with a bit of what goes into the ACPI IA-PC boot architecture, see section 5.2.9.3 IA-PC Boot Architecture Flags [0]. In particular the paravirt_enabled() series I'm working on happens to also dabble into the no CMOS RTC case for Xen, generalizing this knocks a bit of birds with one stone. I think we can do the same with MTRR but also be proactive and see if we can get ACPI_FADT_NO_MTRR added as well for a future ACPI spec to enable BIOS manufacturers to rip MTRR out. [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */ #define ACPI_FADT_LEGACY_DEVICES (1) /* 00: [V2] System has LPC or ISA bus devices */ #define ACPI_FADT_8042 (1<<1) /* 01: [V3] System has an 8042 controller on port 60/64 */ #define ACPI_FADT_NO_VGA (1<<2) /* 02: [V4] It is not safe to probe for VGA hardware */ #define ACPI_FADT_NO_MSI (1<<3) /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */ #define ACPI_FADT_NO_ASPM (1<<4) /* 04: [V4] PCIe ASPM control must not be enabled */ #define ACPI_FADT_NO_CMOS_RTC (1<<5) /* 05: [V5] No CMOS real-time clock present */ > > There *are folks however who do > > more as I noted earlier. Perhaps not now, but in the future I'd > > encourage folks to rip MTRR out of their own BIOS, and enable a new > > ACPI legacy flag to say "MTRR required". That'd eventually can help > > bury MTRR for good while remaining backward compatible. > > Well, BIOS using MTRR is better than BIOS setting page tables in the SMI > handler. Can some BIOSes be developed without MTRR? For instance I suspect Google might be able to easily pull of ripping MTRR out of their BIOS if they didn't do it already for the ChromeOS devices. If possible not only should it help with removing complexity on the BIOS but not even having to think about that code *ever* running on the kernel at all should be nice. > The kernel can be ignorant of the MTRR setup as long as it does > not modify it. Sure, we're already there. The kernel no longer modifies the MTRR setup unless of course you boot without PAT enabled. I think we need to move beyond that to ACPI if we can to let regular Linux boots never have to deal with MTRR at all. The code is complex and nasty why not put let folks put a nail on the coffin for good? > > I can read the above description to also say: > > > > "Hey you need to implement PAT with the same skeleton code as MTRR" > > No, I did not say that.  MTRR's rendezvous handler can be generalized to > work with both MTRR and PAT.  We do not need two separate handlers.  In > fact, it needs to be a single handler so that both can be initialized > together. I'm not sure if that's really needed. Doesn't PAT just require setting the wrmsrl(MSR_IA32_CR_PAT, pat) for each AP? > > If we do that, we can pave the way to deprecate MTRR as legacy for > > good first on Linux. > > I do not think such change will deprecate MTRR. Not even for shiny new BIOSes? Post ACPI 5? > It just means that Linux can enable PAT on virtual CPUs with PAT & !MTRR capability. > > > > In fact, the kernel disabling MTRRs is the same as 2). > > > > > > > I'll also note Xen managed to enable PAT only without enabling MTRR, > > > > this was done through pat_init_cache_modes() -- not sure if this can > > > > be leveraged for qemu32... > > > > > > I am interested to know how Xen managed this.  Is this done by the Xen > > > hypervisor initializes guest's PAT on behalf of the guest kernel? > > > > Yup. And the cache read thingy was reading back its own setup, which > > was different than what Linux used by default IIRC. Juergen can > > elaborate more. > > Yeah, I'd like to make sure that my changes won't break it. I checked through code inspection and indeed, it seems it would break Xen's PAT setup. For the record: the issue here was code that should not run ran, that is dead code ran. I'm working towards a generic solution for this. Luis