Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752765AbcD2Hs5 (ORCPT ); Fri, 29 Apr 2016 03:48:57 -0400 Received: from mail-wm0-f52.google.com ([74.125.82.52]:35264 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750847AbcD2Hs4 (ORCPT ); Fri, 29 Apr 2016 03:48:56 -0400 MIME-Version: 1.0 In-Reply-To: <20160429071805.GC28320@gmail.com> References: <1461888548-32439-1-git-send-email-keescook@chromium.org> <1461888548-32439-3-git-send-email-keescook@chromium.org> <20160429071805.GC28320@gmail.com> Date: Fri, 29 Apr 2016 00:48:54 -0700 X-Google-Sender-Auth: FmCh5yyNGntQX4ztidyNCorYZAo Message-ID: Subject: Re: [PATCH 2/6] x86/boot: Move compressed kernel to end of decompression buffer From: Kees Cook To: Ingo Molnar Cc: Yinghai Lu , Ingo Molnar , Baoquan He , "H. Peter Anvin" , Borislav Petkov , Vivek Goyal , Andy Lutomirski , Lasse Collin , Andrew Morton , Dave Young , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6722 Lines: 159 On Fri, Apr 29, 2016 at 12:18 AM, Ingo Molnar wrote: > > * Kees Cook wrote: > >> From: Yinghai Lu >> >> This change makes later calculations about where the kernel is located >> easier to reason about. To better understand this change, we must first >> clarify what VO and ZO are. They were introduced in commits by hpa: >> >> 77d1a49 x86, boot: make symbols from the main vmlinux available >> 37ba7ab x86, boot: make kernel_alignment adjustable; new bzImage fields >> >> Specifically: >> >> VO: >> - uncompressed kernel image >> - size: VO__end - VO__text ("VO_INIT_SIZE" define) >> >> ZO: >> - bootable compressed kernel image (boot/compressed/vmlinux) >> - head text + compressed kernel (VO and relocs table) + decompressor code >> - size: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below) >> >> The INIT_SIZE definition is used to find the larger of the two image sizes: >> >> #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset) >> #define VO_INIT_SIZE (VO__end - VO__text) >> #if ZO_INIT_SIZE > VO_INIT_SIZE >> #define INIT_SIZE ZO_INIT_SIZE >> #else >> #define INIT_SIZE VO_INIT_SIZE >> #endif >> >> The current code uses extract_offset to decide where to position the >> copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE >> currently includes the extract_offset.) > > Yeah, so I rewrote the above to: > > =================> > This change makes later calculations about where the kernel is located > easier to reason about. To better understand this change, we must first > clarify what 'VO' and 'ZO' are. These values were introduced in commits > by hpa: > > 77d1a4999502 ("x86, boot: make symbols from the main vmlinux available") > 37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields") > > Specifically: > > All names prefixed with 'VO_': > > - relate to the uncompressed kernel image > > - the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define) > > All names prefixed with 'ZO_': > > - relate to the bootable compressed kernel image (boot/compressed/vmlinux), > which is composed of the following memory areas: > - head text > - compressed kernel (VO image and relocs table) > - decompressor code > > - the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below) > > The 'INIT_SIZE' value is used to find the larger of the two image sizes: > > #define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset) > #define VO_INIT_SIZE (VO__end - VO__text) > > #if ZO_INIT_SIZE > VO_INIT_SIZE > # define INIT_SIZE ZO_INIT_SIZE > #else > # define INIT_SIZE VO_INIT_SIZE > #endif > > The current code uses extract_offset to decide where to position the > copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE > currently includes the extract_offset.) > <================= > > Assuming the edits I made are correct, this is the point where the changelog lost > me. It does not explain why ZO_z_extract_offset exists. Why isn't the ZO copied to > offset 0? > > I had to go into arch/x86/boot/compressed/mkpiggy.c, where ZO_z_extract_offset is > generated, to find the answer: it's needed because we are trying to minimize the > amount of RAM used for the whole act of creating an uncompressed, executable, > properly relocation-linked kernel image in system memory. We do this so that > kernels can be booted on even very small systems. > > To achieve the goal of minimal memory consumption we have implemented an in-place > decompression strategy: instead of cleanly separating the VO and ZO images and > also allocating some memory for the decompression code's runtime needs, we instead > create this elaborate layout of memory buffers where the output (decompressed) > stream, as it progresses, overlaps with and destroys the input (compressed) > stream. This can only be done safely if the ZO image is placed to the end of the > VO range, plus a certain amount of safety distance to make sure that when the last > bytes of the VO range are decompressed, the compressed stream pointer is safely > beyond the end of the VO range. Correct? > > This is a very essential central concept to the whole code, but nowhere is it > described clearly! That would certainly be worth calling out in the description, true. > But more importantly, especially in view of address space randomization, we should > realize that the days of 8 MB i386-DX systems are gone, and we should get rid of > all this crazy obfuscation that is hindering development in this area. I also > suspect that the actual temporary allocation size reduction savings from this > trick are relatively small, compared to the resulting total memory size. > > So my suggestion: let's just cleanly separate all the data areas and not try to do > any clever overlapping: the benefit will be minimal, and any system that has main > RAM less than twice of the VO+ZO image sizes is fundamentally unbootable and > unusable anyway. > > I.e. have a really clean size calculation of: > > ZO + VO + decompressor-stacks-size + decompressor-data-size > > and decompress accordingly without tricks, without overlaps, without any chance > for corruption - and, most importantly, without this metric ton of obfuscation > that very few people have managed to fight their way through in the last couple of > years, and which hinders essential features ... > > Agreed? I don't agree. We do still have embedded systems running x86 kernels, and we have cases where we're running multiple kernels in memory (like kdump). I think the memory savings is worth the complexity, especially since the complexity is being reduced up by this patch. But that's not all: If we moved the compressed kernel after the buffer, the only thing we'd do would be taking up more memory. We'd still have the head_*.S complexity of handling the relocation and handling the copy, we'd still have the extraction, etc, etc. The only thing would be literally changing extract_offset to INIT_SIZE. Everything else would be the same. If we moved the decompressed kernel after the compressed kernel, (ignoring KASLR for a moment) then we'd end up in a confusing situation where the kernel would be running somewhere other than where the boot loader asked it to load. I don't even want to think about the weird bug reports we might get from a change like that from old or weird loaders. This patch gets us a more reasonable layout with less complexity and no change to the memory footprint without changing the expectations of the boot loader. I really think this should stand. -Kees -- Kees Cook Chrome OS & Brillo Security