Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp7638904rwl; Fri, 30 Dec 2022 11:35:56 -0800 (PST) X-Google-Smtp-Source: AMrXdXtXtsaGzv3OqhzARL+eCuydAbyzg0MQZWt1F46FK1S2BZ+esYnLvOuufTuTQXvxt4+ObtrA X-Received: by 2002:a17:906:b150:b0:7c1:9b07:32cd with SMTP id bt16-20020a170906b15000b007c19b0732cdmr25374665ejb.39.1672428956553; Fri, 30 Dec 2022 11:35:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672428956; cv=none; d=google.com; s=arc-20160816; b=nuHSaAZRVXg2ZBHfjbN2r2WBW5ri2Qs7r6yzfz1p0cb92lCoDYJtJni5w+a2Xbi7d8 mT18pKNUllxYFDV0pE9JVPS/DoF4u5Qj8V+mxa1x7Q3BPSLquAIbD7e+hO00YJvxEDpQ viPN73DfqtnWtaQ+a13h43w/Pg8ke3c4wvLlqAFfCCX09AvHz9WMk9Sb3fhpHp0x4cR+ jeXP7h82bsFjEX2fGFuI8QTMvLX2ohae/oG7c4iX8o7eTMl7BP1xV44ODF9gZuWSJE/o W34Tl/iiagLz/N0kPNU84Wo8cwW/mAoNOUBi8FccXaSltjEbnq/z4yfVPv2yY5EBbCaY 67rA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:references:in-reply-to:user-agent:subject:cc:to:from :date:dkim-signature:dkim-filter; bh=F103lIyQ9HX4e4YADpdbkcMaF5Kx4eLLs91AFAzCwso=; b=NognNnxBeeYZezNUl/6FGyZR7pc7vldsyBUt77LaJJ1HkR3cB8VGSkERBMY3S2bJNZ /07VzlHYz4Qn77L+AyryCOfrWolwbhjz8h6B+5nqwv5n3JOZkkqwUvirIy3cqi5SqkWb mK/ErsnYGpZzoinzwSiGomgdPTGP1l+iRqPQZ/DkWM0p78+GtP8K3XXkRGyHLvTMbNJN Xwv+DMQbThJryHYm/S6PYFNb9OrWM/G2WTXhMMSRNfJFHVosiMeBVqlq8PwbhQ3SUM2Q DVI7+r8aOYpiNLwWoJz+iUcyF8/AZLlAYS/xkTd6rpoyn0cpJ08VrSdnuBQoSCf9uSKP jcDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zytor.com header.s=2022120601 header.b=K8NMsfzG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zytor.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id oz8-20020a1709077d8800b007c0c9bd6206si19670349ejc.553.2022.12.30.11.35.42; Fri, 30 Dec 2022 11:35:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@zytor.com header.s=2022120601 header.b=K8NMsfzG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zytor.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231193AbiL3TOH (ORCPT + 64 others); Fri, 30 Dec 2022 14:14:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229519AbiL3TOF (ORCPT ); Fri, 30 Dec 2022 14:14:05 -0500 Received: from mail.zytor.com (unknown [IPv6:2607:7c80:54:3::138]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABA921BE88 for ; Fri, 30 Dec 2022 11:14:04 -0800 (PST) Received: from [127.0.0.1] ([73.223.250.219]) (authenticated bits=0) by mail.zytor.com (8.17.1/8.17.1) with ESMTPSA id 2BUJDYOu1374083 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Fri, 30 Dec 2022 11:13:34 -0800 DKIM-Filter: OpenDKIM Filter v2.11.0 mail.zytor.com 2BUJDYOu1374083 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=zytor.com; s=2022120601; t=1672427615; bh=F103lIyQ9HX4e4YADpdbkcMaF5Kx4eLLs91AFAzCwso=; h=Date:From:To:CC:Subject:In-Reply-To:References:From; b=K8NMsfzGoSb1SBtWMNz3ggVlUOJRELCkRnUKzt6hw56EVmc4jFFV3+KRPutfMwvTk SdNMQ3OcZB8rU4Z/qdRN05To4PkZq5gGj4siHqlXnQCAE7gI2si76k9PVt0XRUuzEc AYe8HHshNVSquk5Go2lfyG8XSixPYCGjv4/pXvWm1FOU+yiVThTTGejW/+9OxiGFWi vCxxi4fWir2OJLP0A3XVM1Lbf4xit5k9XKVmJ5/bJfe57CnCpzuax4dzrhs7JvLZf+ soJWFlzkEqLz6c320Ylrcq7tfLeDntJnzMd3YvTDwRyK6P/rMC+Omx52vYpzJBxW5r lpUhynDeGeSpw== Date: Fri, 30 Dec 2022 11:13:32 -0800 From: "H. Peter Anvin" To: "Jason A. Donenfeld" CC: pbonzini@redhat.com, ebiggers@kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, ardb@kernel.org, kraxel@redhat.com, bp@alien8.de, philmd@linaro.org Subject: =?US-ASCII?Q?Re=3A_=5BPATCH_qemu=5D_x86=3A_don=27t_let_decomp?= =?US-ASCII?Q?ressed_kernel_image_clobber_setup=5Fdata?= User-Agent: K-9 Mail for Android In-Reply-To: References: <20221228143831.396245-1-Jason@zx2c4.com> <6cab26b5-06ae-468d-ac79-ecdecb86ef07@linaro.org> <9188EEE9-2759-4389-B39E-0FEBBA3FA57D@zytor.com> Message-ID: <6C1D0560-6D77-4733-9B8D-5184935AEC62@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RDNS_NONE,SPF_HELO_PASS, SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On December 30, 2022 7:59:30 AM PST, "Jason A=2E Donenfeld" wrote: >Hi, > >On Wed, Dec 28, 2022 at 11:31:34PM -0800, H=2E Peter Anvin wrote: >> On December 28, 2022 6:31:07 PM PST, "Jason A=2E Donenfeld" wrote: >> >Hi, >> > >> >Read this message in a fixed width text editor with a lot of columns= =2E >> > >> >On Wed, Dec 28, 2022 at 03:58:12PM -0800, H=2E Peter Anvin wrote: >> >> Glad you asked=2E >> >>=20 >> >> So the kernel load addresses are parameterized in the kernel image >> >> setup header=2E One of the things that are so parameterized are the = size >> >> and possible realignment of the kernel image in memory=2E >> >>=20 >> >> I'm very confused where you are getting the 64 MB number from=2E The= re >> >> should not be any such limitation=2E >> > >> >Currently, QEMU appends it to the kernel image, not to the initramfs a= s >> >you suggest below=2E So, that winds up looking, currently, like: >> > >> > kernel image setup_data >> > |--------------------------||----------------| >> >0x100000 0x100000+l1 0x100000+l1+l2 >> > >> >The problem is that this decompresses to 0x1000000 (one more zero)=2E = So >> >if l1 is > (0x1000000-0x100000), then this winds up looking like: >> > >> > kernel image setup_data >> > |--------------------------||----------------| >> >0x100000 0x100000+l1 0x100000+l1+l2 >> > >> > d e c o m p r e s s e d k e r n e l >> > |-------------------------------------------------------------| >> > 0x1000000 = 0x1000000+l3=20 >> > >> >The decompressed kernel seemingly overwriting the compressed kernel >> >image isn't a problem, because that gets relocated to a higher address >> >early on in the boot process=2E setup_data, however, stays in the same >> >place, since those links are self referential and nothing fixes them u= p=2E >> >So the decompressed kernel clobbers it=2E >> > >> >The solution in this commit adds a bunch of padding between the kernel >> >image and setup_data to avoid this=2E That looks like this: >> > >> > kernel image padding = setup_data >> > |--------------------------||--------------------------------------= -------------||----------------| >> >0x100000 0x100000+l1 = 0x1000000+l3 0x1000000+l3+l2 >> > >> > d e c o m p r e s s e d k e r n e l >> > |-------------------------------------------------------------| >> > 0x1000000 = 0x1000000+l3=20 >> > >> >This way, the decompressed kernel doesn't clobber setup_data=2E >> > >> >The problem is that if 0x1000000+l3-0x100000 is around 62 megabytes, >> >then the bootloader crashes when trying to dereference setup_data's >> >->len param at the end of initialize_identity_maps() in ident_map_64= =2Ec=2E >> >I don't know why it does this=2E If I could remove the 62 megabyte >> >restriction, then I could keep with this technique and all would be >> >well=2E >> > >> >> In general, setup_data should be able to go anywhere the initrd can >> >> go, and so is subject to the same address cap (896 MB for old kernel= s, >> >> 4 GB on newer ones; this address too is enumerated in the header=2E) >> > >> >It would be theoretically possible to attach it to the initrd image >> >instead of to the kernel image=2E As a last resort, I guess I can look >> >into doing that=2E However, that's going to require some serious rewor= k >> >and plumbing of a lot of different components=2E So if I can make it w= ork >> >as is, that'd be ideal=2E However, I need to figure out this weird 62 = meg >> >limitation=2E >> > >> >Any ideas on that? >> > >> >Jason >>=20 >> As far as a crash=2E=2E=2E that sounds like a big and a pretty serious = one at that=2E >>=20 >> Could you let me know what kernel you are using and how *exactly* you a= re booting it? > >I'll attach a =2Econfig file=2E Apply the patch at the top of this thread= to >qemu, except make one modification: > >diff --git a/hw/i386/x86=2Ec b/hw/i386/x86=2Ec >index 628fd2b2e9=2E=2Ea61ee23e13 100644 >--- a/hw/i386/x86=2Ec >+++ b/hw/i386/x86=2Ec >@@ -1097,7 +1097,7 @@ void x86_load_linux(X86MachineState *x86ms, >=20 > /* The early stage can't address past around 64 MB from the = original > * mapping, so just give up in that case=2E */ >- if (padded_size < 62 * 1024 * 1024) >+ if (true || padded_size < 62 * 1024 * 1024) > kernel_size =3D padded_size; > else { > fprintf(stderr, "qemu: Kernel image too large to hold se= tup_data\n"); > >Then build qemu=2E Run it with `-kernel bzImage`, based on the kernel >built with the =2Econfig I attached=2E > >You'll see that the CPU triple faults when hitting this line: > > sd =3D (struct setup_data *)boot_params->hdr=2Esetup_data; > while (sd) { > unsigned long sd_addr =3D (unsigned long)sd; > > kernel_add_identity_map(sd_addr, sd_addr + sizeof(*sd) + = sd->len); <---- > sd =3D (struct setup_data *)sd->next; > } > >, because it dereferences *sd=2E This does not happen if the decompressed >size of the kernel is < 62 megs=2E > >So that's the "big and pretty serious" bug that might be worthy of >investigation=2E > >Jason No kidding=2E Dereferencing data *before you map it* is generally frowned = upon=2E This needs to be split into to making calls=2E *Facepalm*