Received: by 2002:a05:7412:cfc7:b0:fc:a2b0:25d7 with SMTP id by7csp1642610rdb; Tue, 20 Feb 2024 02:31:12 -0800 (PST) X-Forwarded-Encrypted: i=2; AJvYcCU2Pyd5jBEPKxrBxXiQjb/DNGnXxOqHJ/QYi/Lk6MYrP69ual+mJFhwvlhdyKY3mswtz95ZYiXQ09RmI7ZawOKI1StEWFDuEhnx8hl+aw== X-Google-Smtp-Source: AGHT+IFWXBxrZcFDFSeuSIC0SvVp+edkBSsHNQvYR3bMc9lIoyw3d5i/jEcQurACUjp+EvX9Pnf/ X-Received: by 2002:aa7:c655:0:b0:561:548e:e4c4 with SMTP id z21-20020aa7c655000000b00561548ee4c4mr10466822edr.19.1708425071931; Tue, 20 Feb 2024 02:31:11 -0800 (PST) Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id r15-20020a50aacf000000b00564477609c3si2380454edc.378.2024.02.20.02.31.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Feb 2024 02:31:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-72831-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kernel.org header.s=k20201202 header.b=dv53lCXn; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-72831-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-72831-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 5EC5A1F272A3 for ; Tue, 20 Feb 2024 10:31:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5AD2362160; Tue, 20 Feb 2024 10:30:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dv53lCXn" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54766374C6; Tue, 20 Feb 2024 10:30:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708425042; cv=none; b=svf13cZ485JEfvox6Sty/jtMQTJpnJk6ReTvvXJrK3nsrWEgBkRrVATbVSylltki3e4W4EZgbXilHdtcmSRpcy6Hi3K88FE6qYhlOt4xuEdbsmMHecVrlXjU/t90iv+uVDm1DepL2px1YZTIjU191VvGs7gIJ+TNjRhJK1dFins= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708425042; c=relaxed/simple; bh=rDg/9pQz+JjJErUCeBdP4AE5y1aHUaEqxtVxVjDTBi0=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=sjwdvPULu7F8IMtSDbol9PWhciFkrJydt7NfJ3e687331IHvAWflQaO1+zAttu42S6r0HE6heVJ8p/m5KqPpIs8xQ+QPHJGFZn7PQQDy0QK6ux1nDFcsBpN9szyg4D3O+U8x+3OcGrQQA81mTNy6bji0cxX41LIu8A8cawYiPZI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dv53lCXn; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 29965C433F1; Tue, 20 Feb 2024 10:30:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1708425041; bh=rDg/9pQz+JjJErUCeBdP4AE5y1aHUaEqxtVxVjDTBi0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dv53lCXnUxAP5BUjIySykbuaNTQICqKsTGSDQF5bW3u7Cr+yRuZcvocze4Ikb9735 OmZ7vuyGPLaOIK9URG/85iZrOPUHhu441cQBZ4LYqviNRoFjJY4A1UvToClldVWpdZ WgUaGuvfKPRL0ZXOQB/sa/C4QRbrlUX5zGSVPsU3+38Q+XNke9VSAYAt/TPV5GELzn L/ju7NOORRvHekikdXfgrCAPpVRFbL0OcFUt+669NBF0PJv7dffBQun7e0a1rxMuZ2 OOadaojb0lQUEj7e19sUrNYpqbH75wyAbt18H1D12MTzaSXyN82rpvaYpqFs+fWFsQ qZPHw3iW0X/3Q== Date: Tue, 20 Feb 2024 12:30:01 +0200 From: Mike Rapoport To: Alexander Graf Cc: linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org, devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kexec@lists.infradead.org, linux-doc@vger.kernel.org, x86@kernel.org, Eric Biederman , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Steven Rostedt , Andrew Morton , Mark Rutland , Tom Lendacky , Ashish Kalra , James Gowans , Stanislav Kinsburskii , arnd@arndb.de, pbonzini@redhat.com, madvenka@linux.microsoft.com, Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt , Rob Herring , Krzysztof Kozlowski Subject: Re: [PATCH v3 09/17] x86: Add KHO support Message-ID: References: <20240117144704.602-1-graf@amazon.com> <20240117144704.602-10-graf@amazon.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240117144704.602-10-graf@amazon.com> Hi Alex, On Wed, Jan 17, 2024 at 02:46:56PM +0000, Alexander Graf wrote: > We now have all bits in place to support KHO kexecs. This patch adds > awareness of KHO in the kexec file as well as boot path for x86 and > adds the respective kconfig option to the architecture so that it can > use KHO successfully. > > In addition, it enlightens it decompression code with KHO so that its > KASLR location finder only considers memory regions that are not already > occupied by KHO memory. > > Signed-off-by: Alexander Graf > > --- > > v1 -> v2: > > - Change kconfig option to ARCH_SUPPORTS_KEXEC_KHO > - s/kho_reserve_mem/kho_reserve_previous_mem/g > - s/kho_reserve/kho_reserve_scratch/g > --- > arch/x86/Kconfig | 3 ++ > arch/x86/boot/compressed/kaslr.c | 55 +++++++++++++++++++++++++++ > arch/x86/include/uapi/asm/bootparam.h | 15 +++++++- > arch/x86/kernel/e820.c | 9 +++++ > arch/x86/kernel/kexec-bzimage64.c | 39 +++++++++++++++++++ > arch/x86/kernel/setup.c | 46 ++++++++++++++++++++++ > arch/x86/mm/init_32.c | 7 ++++ > arch/x86/mm/init_64.c | 7 ++++ > 8 files changed, 180 insertions(+), 1 deletion(-) .. > @@ -987,8 +1013,26 @@ void __init setup_arch(char **cmdline_p) > cleanup_highmap(); > > memblock_set_current_limit(ISA_END_ADDRESS); > + > e820__memblock_setup(); > > + /* > + * We can resize memblocks at this point, let's dump all KHO > + * reservations in and switch from scratch-only to normal allocations > + */ > + kho_reserve_previous_mem(); > + > + /* Allocations now skip scratch mem, return low 1M to the pool */ > + if (is_kho_boot()) { > + u64 i; > + phys_addr_t base, end; > + > + __for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, > + MEMBLOCK_SCRATCH, &base, &end, NULL) > + if (end <= ISA_END_ADDRESS) > + memblock_clear_scratch(base, end - base); > + } You had to mark lower 16M as MEMBLOCK_SCRATCH because at this point the mapping of the physical memory is not ready yet and page tables only cover lower 16M and the memory mapped in kexec::init_pgtable(). Hence the call for memblock_set_current_limit(ISA_END_ADDRESS) slightly above, which essentially makes scratch mem reserved by KHO unusable for allocations. I'd suggest to move kho_reserve_previous_mem() earlier, probably even right next to kho_populate(). kho_populate() already does memblock_add(scratch) and at that point it's the only physical memory that memblock knows of, so if it'll have to allocate, the allocations will end up there. Also, there are no kernel allocations before e820__memblock_setup(), so the only memory that might need to be allocated is for memblock_double_array() and that will be discarded later anyway. With this, it seems that MEMBLOCK_SCRATCH is not needed, as the scratch memory is anyway the only usable memory up to e820__memblock_setup(). > /* > * Needs to run after memblock setup because it needs the physical > * memory size. > @@ -1104,6 +1148,8 @@ void __init setup_arch(char **cmdline_p) > */ > arch_reserve_crashkernel(); > > + kho_reserve_scratch(); > + > memblock_find_dma_reserve(); > > if (!early_xdbc_setup_hardware()) > diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c > index b63403d7179d..6c3810afed04 100644 > --- a/arch/x86/mm/init_32.c > +++ b/arch/x86/mm/init_32.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -738,6 +739,12 @@ void __init mem_init(void) > after_bootmem = 1; > x86_init.hyper.init_after_bootmem(); > > + /* > + * Now that all KHO pages are marked as reserved, let's flip them back > + * to normal pages with accurate refcount. > + */ > + kho_populate_refcount(); This should go to mm_core_init(), there's nothing architecture specific there. > + > /* > * Check boundaries twice: Some fundamental inconsistencies can > * be detected at build time already. -- Sincerely yours, Mike.