Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752371AbbKIVIG (ORCPT ); Mon, 9 Nov 2015 16:08:06 -0500 Received: from mail-ig0-f174.google.com ([209.85.213.174]:37435 "EHLO mail-ig0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750805AbbKIVID (ORCPT ); Mon, 9 Nov 2015 16:08:03 -0500 MIME-Version: 1.0 In-Reply-To: References: <20151103111649.GA3477@gmail.com> <20151104233907.GA25925@codemonkey.org.uk> <20151105021710.GA22941@codemonkey.org.uk> <20151106065549.GA2031@gmail.com> <20151106123912.GC2651@codeblueprint.co.uk> <20151107070922.GC6235@gmail.com> Date: Mon, 9 Nov 2015 13:08:01 -0800 X-Google-Sender-Auth: _gU9vkNx1MTty6aM35bQF-CjmRI Message-ID: Subject: Re: [GIT PULL] x86/mm changes for v4.4 From: Kees Cook To: Ard Biesheuvel Cc: Ingo Molnar , Matt Fleming , Linus Torvalds , Stephen Smalley , Dave Jones , Linux Kernel Mailing List , Thomas Gleixner , "H. Peter Anvin" , Borislav Petkov , Andrew Morton , Andy Lutomirski , Denys Vlasenko , "linux-efi@vger.kernel.org" , Matthew Garrett Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6921 Lines: 152 On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel wrote: > On 8 November 2015 at 07:58, Kees Cook wrote: >> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel >> wrote: >>> On 7 November 2015 at 08:09, Ingo Molnar wrote: >>>> >>>> * Matt Fleming wrote: >>>> >>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >>>>> > >>>>> > 3) We should fix the EFI permission problem without relying on the firmware: it >>>>> > appears we could just mark everything R-X optimistically, and if a write fault >>>>> > happens (it's pretty rare in fact, only triggers when we write to an EFI >>>>> > variable and so), we can mark the faulting page RW- on the fly, because it >>>>> > appears that writable EFI sections, while not enumerated very well in 'old' >>>>> > firmware, are still supposed to be page granular. (Even 'new' firmware I >>>>> > wouldn't automatically trust to get the enumeration right...) >>>>> >>>>> Sorry, this isn't true. I misled you with one of my earlier posts on >>>>> this topic. Let me try and clear things up... >>>>> >>>>> Writing to EFI regions has to do with every invocation of the EFI >>>>> runtime services - it's not limited to when you read/write/delete EFI >>>>> variables. In fact, EFI variables really have nothing to do with this >>>>> discussion, they're a completely opaque concept to the OS, we have no >>>>> idea how the firmware implements them. Everything is done via the EFI >>>>> boot/runtime services. >>>>> >>>>> The firmware itself will attempt to write to EFI regions when we >>>>> invoke the EFI services because that's where the PE/COFF ".data" and >>>>> ".bss" sections live along with the heap. There's even some relocation >>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to >>>>> ".text" too. >>>>> >>>>> Now, the above PE/COFF sections are usually (always?) contained within >>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true >>>>> because the firmware folks have told us so, and because stopping that >>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >>>>> V2.5. >>>>> >>>>> The data sections within the region are also *not* guaranteed to be >>>>> page granular because work was required in Tianocore for emitting >>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >>>>> support. >>>>> >>>>> Ultimately, what this means is that if you were to attempt to >>>>> dynamically fixup those regions that required write permission, you'd >>>>> have to modify the mappings for the majority of the EFI regions >>>>> anyway. And if you're blindly allowing write permission as a fixup, >>>>> there's not much security to be had. >>>> >>>> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X >>>> to RW-, i.e. it would add 'write' permission but remove 'execute' permission. >>>> >>>> Note that there would be no 'RWX' permission at any given moment - which is the >>>> dangerous combination. >>>> >>> >>> The problem with that is that /any/ page in the UEFI runtime region >>> may intersect with both .text and .data of any of the PE/COFF images >>> that make up the runtime firmware (since the PE/COFF sections are not >>> necessarily page aligned). Such pages require RWX permissions. The >>> UEFI memory map does not provide the information to identify those >>> pages a priori (the entire region containing several PE/COFF images >>> could be covered by a single entry) so it is hard to guess which pages >>> should be allowed these RWX permissions. >> >> I'm sad that UEFI was designed without even the most basic of memory >> protections in mind. UEFI _itself_ should be setting up protective >> page mappings. :( >> > > Well, the 4 KB alignment of sections was considered prohibitive at the > time from code size pov. But this was a long time ago, obviously. Heh, yeah, I'd expect max 4K padding to get code/data correctly aligned on a 2MB binary to not be an issue. :) > >> For a boot firmware, it seems to me that safe page table layout would >> be a top priority bug. The "reporting issues" page for TianoCore >> doesn't actually seem to link to the "Project Tracker": >> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues >> >> Does anyone know how to get this correctly reported so future UEFI >> releases don't suffer from this? >> > > Ugh. Don't get me started on that topic. I have been working with the > UEFI forum since July to get a fundamentally broken implementation of > memory protections fixed. UEFI v2.5 defines a memory protection scheme > that is based on splitting PE/COFF images into separate memory regions > so that R-X and RW- permissions can be applied. Unfortunately, that > broke every OS in existence (including Windows 8), since the OS is > allowed to reorder memory regions when it lays out the virtual > remapping of the UEFI regions, resulting in PE/COFF .data and .text > potentially appearing out of order. > > The good news is that we fixed it for the upcoming release (v2.6). I > can't disclose any specifics, though :-( As long as there's motion to getting it fixed, that makes me happy! :) Does 2.6 get rid of the (AIUI) 2MB limit too? -Kees > > -- > Ard. > > >>>>> > If that 'supposed to be' turns out to be 'not true' (not unheard of in >>>>> > firmware land), then plan B would be to mark pages that generate write faults >>>>> > RWX as well, to not break functionality. (This 'mark it RWX' is not something >>>>> > that exploits would have easy access to, and we could also generate a warning >>>>> > [after the EFI call has finished] if it ever triggers.) >>>>> > >>>>> > Admittedly this approach might not be without its own complications, but it >>>>> > looks reasonably simple (I don't think we need per EFI call page tables, >>>>> > etc.), and does not assume much about the firmware being able to enumerate its >>>>> > permissions properly. Were we to merge EFI support today I'd have insisted on >>>>> > trying such an approach from day 1 on. >>>>> >>>>> We already have separate EFI page tables, though with the caveat that >>>>> we share some of swapper_pg_dir's PGD entries. The best solution would >>>>> be to stop sharing entries and isolate the EFI mappings from every >>>>> other page table structure, so that they're only used during the EFI >>>>> service calls. >>>> >>>> Absolutely. Can you try to fix this for v4.3? >>>> >>>> Thanks, >>>> >>>> Ingo >> >> >> >> -- >> Kees Cook >> Chrome OS Security -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/