Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4300197imu; Mon, 12 Nov 2018 08:49:11 -0800 (PST) X-Google-Smtp-Source: AJdET5dnzY8qEmhT1MK8SdmkEabXCkksyVGOB6mgHODAIh0WzDya0sEUbyhxCheAn2iH93xSpGkm X-Received: by 2002:a63:6445:: with SMTP id y66mr1500638pgb.250.1542041351762; Mon, 12 Nov 2018 08:49:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542041351; cv=none; d=google.com; s=arc-20160816; b=PSufkk8iZjKySWvQPTRE6ujx6nLzhu/xJZbrgB7tH8FYhoSMqpl5OlQGvaVjvEFwrq FWAOKLn2xfnP+kUrimt77a02Ns6krh6IQe301A+jrKiMtzOEp9fg7QuAi9cn/a7BSgKl MRjEbwwyDCvoBSHtmXF9yfYFpTtzIsxf7vwWNSATx702HAmOHt/Tif0G+sKCl4vDpfaV 1l2gjPJuXPoAHYTEjMgDmbyq4+/xnQlf42gAeERfSRBdJPNNILlWZfrJkOuUHYaeaD9c kYtlnweCwWN9ylxblgH1bs8QU2xKSdzaL49fcYdgp8cIdduVB2CAwZScH0DfZnUwPzD8 g/Og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=Cuv5cI5cr3cpQFmJOMNxwq6Q51RltwNNEwyaorfc4Iw=; b=HaVX9p7NMxWAFEsuWduVZ+69MhIHmAmcl27eu2XxFfUnBTo6JlYLouHQztbpZtKnFE d+OELU9OCfxAEinHX/zFdanEKNTgH3+hfadGNTTYbogRxIDC8g0/eGxTEzOSVH4zWFs8 pyXHP731p48Jlui5+sF5su/nwCiUK/32XJZhMP+dDdGjljQ929hPbuXrnDzRXJaOzTDW UFGCp4TRDeU2B4SLORLGjx0Uh+/7SmsorgcersGw6M2/7WaOrNgNRvYJktJzhkXaHjDL fOOvf2tEsRCl8LZ97j3Jjo2234Bp40UtvA4vEX1Ilw1aYPGA4Qk9Iprar3DhnpXtyb/p KWGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a34si16119887pgb.458.2018.11.12.08.48.55; Mon, 12 Nov 2018 08:49:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729674AbeKMCmT (ORCPT + 99 others); Mon, 12 Nov 2018 21:42:19 -0500 Received: from terminus.zytor.com ([198.137.202.136]:36043 "EHLO mail.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727185AbeKMCmT (ORCPT ); Mon, 12 Nov 2018 21:42:19 -0500 Received: from carbon-x1.hos.anvin.org ([199.167.24.132]) (authenticated bits=0) by mail.zytor.com (8.15.2/8.15.2) with ESMTPSA id wACGlKpd3769132 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Mon, 12 Nov 2018 08:47:28 -0800 Subject: Re: [Qemu-devel] [RFC/PoC PATCH 1/3] i386: set initrd_max to 4G - 1 to allow up to 4G initrd To: Ingo Molnar Cc: Li Zhijian , Juergen Gross , Li Zhijian , Peter Maydell , x86@kernel.org, bp@alien8.de, mingo@redhat.com, tglx@linutronix.de, QEMU Developers , Philip Li , linux-kernel@vger.kernel.org, Linus Torvalds , Peter Zijlstra , Kees Cook References: <1541674784-25936-2-git-send-email-lizhijian@cn.fujitsu.com> <20181109072015.GA86700@gmail.com> <38905d35-29af-b522-1629-b13e98a47a42@intel.com> <20181112045624.GA28219@gmail.com> <20181112061940.GA61749@gmail.com> From: "H. Peter Anvin" Message-ID: Date: Mon, 12 Nov 2018 08:47:14 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <20181112061940.GA61749@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/11/18 10:19 PM, Ingo Molnar wrote: > >> In part as a result of this exchange I have spent some time thinking >> about the boot protocol and its dependencies, and there is, in fact, a >> much more serious problem that needs to be addressed: it is not >> currently possible in a forward-compatible way to map all data areas >> that may be occupied by bootloader-provided data. The kernel proper has >> an advantage here, in that the kernel will by definition always be the >> "owner of the protocol" (anything the kernel doesn't know how to map >> won't be used by the kernel anyway), but it really isn't a good >> situation. So I'm currently trying to think up a way to make that >> possible. > > I might be a bit dense early in the morning, but could you elaborate? > What do you mean by mapping all data areas? Alright, awake now... As it sits right now, the protocol contains a number of data structures with pointers, pointing to a variety of memory areas that can be set up by the bootloader. Now, consider something like KASLR or a secondary boot loader where we need to allocate memory in between the primary bootloader and the kernel to be run. With the kernel proper, in the absence of KASLR, we have solved this by marking out exactly how much memory the kernel may need before it has its own memory manager up and running, but KASLR needs to move it outside this range, and a secondary boot loader shim of some sort may need to allocate additional data structures. In the particular case of an UEFI system where we do the right thing (which Grub2 doesn't, by default) and enter via the kernel UEFI stub we are okay, but for other boot scenarios we are in trouble: even if we know where all the pointers are and how to determine the size of various data structures, once the protocol is updated with new information then that is no longer valid. The setup_data linked list solves that under certain circumstances, but in others it has turned out to not be adequate. There are a couple of options: a) Not allow any new pointers to memory areas in what is considered system RAM. Such data structures *must* have a setup_data linked list header. Pointers into E820 table reserved areas are still acceptable. b) Create a new E820 table memory type for "boot data", similar to what UEFI already has, and encourage boot loaders to mark any allocated memory structures that way. The main problem with that is that the poor quality of boot loaders may mean that that fails to happen, and because it wouldn't "fail hard" it is likely that they will get it wrong. The difference from the RESERVED memory type is that the kernel can reclaim that memory after the data has been recovered. c) This might be the preferred option: 1. Just like (a), do not allow new pointers to memory areas in system RAM in struct boot_params. 2. Create a subrange of struct setup_data (e.g. bit 30 = 1) explicitly containing pointers to other data structures, including sizes, in a way that can be parsed by generic code. 3. Encourage boot loaders to make sure the setup_data list is in order of ascending address (and WARN if it is not.) 4. Add (b) as an option, for responsible boot loaders ;) to provide an extra level of protection.