Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751864AbbKJULW (ORCPT ); Tue, 10 Nov 2015 15:11:22 -0500 Received: from mail-ig0-f172.google.com ([209.85.213.172]:34098 "EHLO mail-ig0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751123AbbKJULU (ORCPT ); Tue, 10 Nov 2015 15:11:20 -0500 MIME-Version: 1.0 In-Reply-To: References: <20151103111649.GA3477@gmail.com> <20151104233907.GA25925@codemonkey.org.uk> <20151105021710.GA22941@codemonkey.org.uk> <20151106065549.GA2031@gmail.com> <20151106123912.GC2651@codeblueprint.co.uk> <20151107070922.GC6235@gmail.com> Date: Tue, 10 Nov 2015 12:11:18 -0800 X-Google-Sender-Auth: S3bvG3I4UbBC8-HVmzTWfsvyrzQ Message-ID: Subject: Re: [GIT PULL] x86/mm changes for v4.4 From: Kees Cook To: Ard Biesheuvel Cc: Ingo Molnar , Matt Fleming , Linus Torvalds , Stephen Smalley , Dave Jones , Linux Kernel Mailing List , Thomas Gleixner , "H. Peter Anvin" , Borislav Petkov , Andrew Morton , Andy Lutomirski , Denys Vlasenko , "linux-efi@vger.kernel.org" , Matthew Garrett Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8168 Lines: 161 On Mon, Nov 9, 2015 at 11:08 PM, Ard Biesheuvel wrote: > On 9 November 2015 at 22:08, Kees Cook wrote: >> On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel >> wrote: >>> On 8 November 2015 at 07:58, Kees Cook wrote: >>>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel >>>> wrote: >>>>> On 7 November 2015 at 08:09, Ingo Molnar wrote: >>>>>> >>>>>> * Matt Fleming wrote: >>>>>> >>>>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote: >>>>>>> > >>>>>>> > 3) We should fix the EFI permission problem without relying on the firmware: it >>>>>>> > appears we could just mark everything R-X optimistically, and if a write fault >>>>>>> > happens (it's pretty rare in fact, only triggers when we write to an EFI >>>>>>> > variable and so), we can mark the faulting page RW- on the fly, because it >>>>>>> > appears that writable EFI sections, while not enumerated very well in 'old' >>>>>>> > firmware, are still supposed to be page granular. (Even 'new' firmware I >>>>>>> > wouldn't automatically trust to get the enumeration right...) >>>>>>> >>>>>>> Sorry, this isn't true. I misled you with one of my earlier posts on >>>>>>> this topic. Let me try and clear things up... >>>>>>> >>>>>>> Writing to EFI regions has to do with every invocation of the EFI >>>>>>> runtime services - it's not limited to when you read/write/delete EFI >>>>>>> variables. In fact, EFI variables really have nothing to do with this >>>>>>> discussion, they're a completely opaque concept to the OS, we have no >>>>>>> idea how the firmware implements them. Everything is done via the EFI >>>>>>> boot/runtime services. >>>>>>> >>>>>>> The firmware itself will attempt to write to EFI regions when we >>>>>>> invoke the EFI services because that's where the PE/COFF ".data" and >>>>>>> ".bss" sections live along with the heap. There's even some relocation >>>>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to >>>>>>> ".text" too. >>>>>>> >>>>>>> Now, the above PE/COFF sections are usually (always?) contained within >>>>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true >>>>>>> because the firmware folks have told us so, and because stopping that >>>>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI >>>>>>> V2.5. >>>>>>> >>>>>>> The data sections within the region are also *not* guaranteed to be >>>>>>> page granular because work was required in Tianocore for emitting >>>>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE >>>>>>> support. >>>>>>> >>>>>>> Ultimately, what this means is that if you were to attempt to >>>>>>> dynamically fixup those regions that required write permission, you'd >>>>>>> have to modify the mappings for the majority of the EFI regions >>>>>>> anyway. And if you're blindly allowing write permission as a fixup, >>>>>>> there's not much security to be had. >>>>>> >>>>>> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X >>>>>> to RW-, i.e. it would add 'write' permission but remove 'execute' permission. >>>>>> >>>>>> Note that there would be no 'RWX' permission at any given moment - which is the >>>>>> dangerous combination. >>>>>> >>>>> >>>>> The problem with that is that /any/ page in the UEFI runtime region >>>>> may intersect with both .text and .data of any of the PE/COFF images >>>>> that make up the runtime firmware (since the PE/COFF sections are not >>>>> necessarily page aligned). Such pages require RWX permissions. The >>>>> UEFI memory map does not provide the information to identify those >>>>> pages a priori (the entire region containing several PE/COFF images >>>>> could be covered by a single entry) so it is hard to guess which pages >>>>> should be allowed these RWX permissions. >>>> >>>> I'm sad that UEFI was designed without even the most basic of memory >>>> protections in mind. UEFI _itself_ should be setting up protective >>>> page mappings. :( >>>> >>> >>> Well, the 4 KB alignment of sections was considered prohibitive at the >>> time from code size pov. But this was a long time ago, obviously. >> >> Heh, yeah, I'd expect max 4K padding to get code/data correctly >> aligned on a 2MB binary to not be an issue. :) >> > > This is not about section sizes on ARM. The PE/COFF format does not > use segments, like ELF, so the payload (the sections) needs to be > completely disjoint from the header. This means, when using 4 KB > alignment, that every PE/COFF image wastes ~4 KB in the header and 4 > KB on average in the section padding (assuming a .text/.data/.reloc > layout, as is common with PE/COFF) > > Considering that a typical UEFI firmware image consists of numerous > (around 50 on average, I think) PE/COFF images, and some of them Oooh, that's no fun. So the linker can't produce merged .text and .data sections? > execute from NOR flash, the Tianocore tooling (which is the reference > implementation) has always been geared towards keeping the alignment > as small as possible, typically 32 bytes unless data objects need > more. Since the UEFI runtime services are typically implemented by > several of these PE/COFF images, and since the memory they occupy may > be described by a single UEFI memory map entry, there is simply no > easy way to decide which pages need R-X, RW- or RWX. Even looking for > PE/COFF headers in the memory region is not guaranteed to work, since > the PE/COFF header is part of the file format, not the memory format > (i.e., since the header is disjoint from the payload, a PE/COFF loader > is not required to copy the header to memory) > >>> >>>> For a boot firmware, it seems to me that safe page table layout would >>>> be a top priority bug. The "reporting issues" page for TianoCore >>>> doesn't actually seem to link to the "Project Tracker": >>>> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues >>>> >>>> Does anyone know how to get this correctly reported so future UEFI >>>> releases don't suffer from this? >>>> >>> >>> Ugh. Don't get me started on that topic. I have been working with the >>> UEFI forum since July to get a fundamentally broken implementation of >>> memory protections fixed. UEFI v2.5 defines a memory protection scheme >>> that is based on splitting PE/COFF images into separate memory regions >>> so that R-X and RW- permissions can be applied. Unfortunately, that >>> broke every OS in existence (including Windows 8), since the OS is >>> allowed to reorder memory regions when it lays out the virtual >>> remapping of the UEFI regions, resulting in PE/COFF .data and .text >>> potentially appearing out of order. >>> >>> The good news is that we fixed it for the upcoming release (v2.6). I >>> can't disclose any specifics, though :-( >> >> As long as there's motion to getting it fixed, that makes me happy! :) >> Does 2.6 get rid of the (AIUI) 2MB limit too? >> > > No, there is no such limit in UEFI. If there is a limit like that, it > is an implementation detail of the UEFI support in the OS. > > For arm64 (and the upcoming ARM support), the UEFI runtime services > regions are remapped into a virtual userland range that is only active > during the time runtime services are being invoked. (x86 does > something similar, but it shares the page tables with the > suspend/resume code afaiu) These mappings could be page granularity > (since they don't require splitting PUDs or PMDs in the linear > region), with the side note that arm64 mandates 64 KB alignment (to > interoperate with 64 KB pages OSes). This requirement has been added > to the UEFI spec, i.e., a v2.5 compliant arm64 firmware should not > expose UEFI runtime regions that are not 64 KB aligned. Cool, thanks for the details! -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/