Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp337327rwl; Wed, 4 Jan 2023 20:34:02 -0800 (PST) X-Google-Smtp-Source: AMrXdXvKnYH0osWwskv4QaqSNVTliBXVGXO6n1vKAetS3OGNO+xXxZOMiIwT6jOZGFyBFDjaApjB X-Received: by 2002:a05:6a20:158b:b0:a2:d594:6868 with SMTP id h11-20020a056a20158b00b000a2d5946868mr76282933pzj.9.1672893242638; Wed, 04 Jan 2023 20:34:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672893242; cv=none; d=google.com; s=arc-20160816; b=P0v7bqAYtFP4lc0or016ER8XqlCCT+kJlT3AmBY9dQRqWiktPW+mP8wuVFkaZd1heg d8uyW3MhVVXxNCiQihrQE38DUfPcPMvG2edYvZgQlrScujX1/OKBupiEYICH3oi+zWxg SkmW7GsQmvpAyS4EOkgW0/bjWIPWpPHakUCq7f7SKRgqJWZRG6cGlqPxssfn4zaLQd0P FYyUAdJhRvoINbz7t5yX7vj90r8iKvXP3WyTgE0bkUup7RsAuqvtLvKWG01dNkFE46w8 vyjtSPyUn4svLrL0D8Piu62vbzwSjy8Hop/dAki32Tx/fbE0upnkCs47hWDSxmsvJXqi hjpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:feedback-id:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:dkim-signature; bh=d6FDebMWjBtfyK0g460tmOAYf0uEUarjYi7SLHlQCJs=; b=ie0838zbjJQGriSaG6mBwQuXw+ezhzqfRbMyr32rYcsW8O6G0wY8X0nZjeq/lksUz9 v5kz5alQFSk6P9XizUaZCpxT6mxsLTD4OjTO8Iu6AB6xN2h5kEBrA4JdiBJhdU4jSe/0 j7Zu0Yi84uHZ2VdPFbSo9H/EYqZR4pL/MYsfZ2ahQfr0brFZE0TDCV9nNrD5bWvbovWj uaqlMebtB7w1Fwy3YYPO/bqIAOTOWiVYsZBG5n/D84V/QWm4Ga1+D9msSt6+s+LISNgf TRURixHR2HHlfIKivCDsh7uV9NtkjpvVj9Vb5BRBbSZ86GvMc4FVDYuMgcAFfiB0vuZb xnpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@aaront.org header.s=zp2ap7btoiiow65hultmctjebh3tse7g header.b=DvN1oZol; dkim=pass header.i=@amazonses.com header.s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw header.b="D/diO4iD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=aaront.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r6-20020a62e406000000b00581b8a0d794si19004041pfh.34.2023.01.04.20.33.55; Wed, 04 Jan 2023 20:34:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@aaront.org header.s=zp2ap7btoiiow65hultmctjebh3tse7g header.b=DvN1oZol; dkim=pass header.i=@amazonses.com header.s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw header.b="D/diO4iD"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=aaront.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229763AbjAEER3 (ORCPT + 57 others); Wed, 4 Jan 2023 23:17:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229535AbjAEERZ (ORCPT ); Wed, 4 Jan 2023 23:17:25 -0500 Received: from a8-41.smtp-out.amazonses.com (a8-41.smtp-out.amazonses.com [54.240.8.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E78C838BD; Wed, 4 Jan 2023 20:17:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=zp2ap7btoiiow65hultmctjebh3tse7g; d=aaront.org; t=1672892242; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding; bh=ztItl4b4AGozQuiVdDoNJP/Y4TYiTtgmkp/xzra7dzQ=; b=DvN1oZolBWw/4T0pNVS0kgp8xvmvDy2CfRShaw1vQtfNG14fvyvY+YwgEzTxCMU/ 0rbLNjEB+6mBjZ+IOvmtAb/B/Qy8FKdTC0xiT9fn/GUuT8jRgZ6DtyaB3DgW3gMfh2Z pXo5ws8KHN2kujyDbv71jfnFAkBKjI1quLrHaAHM= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=6gbrjpgwjskckoa6a5zn6fwqkn67xbtw; d=amazonses.com; t=1672892242; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:MIME-Version:Content-Transfer-Encoding:Feedback-ID; bh=ztItl4b4AGozQuiVdDoNJP/Y4TYiTtgmkp/xzra7dzQ=; b=D/diO4iDIPT1XpcgoNIPBAf8R21vDqjPlwgDkmg7QIKg09yzhw1c56u/Tj2/nOrS ICHW73K3MbulAuOGRIDNWby52Njmxvc91hQvEe2LIuJCIu/Omvb2GLN4N6tk7JrmY60 AecS0MjImpydVV+CuBxV9JCZ+DHkvZ/knIXIlsuQ= From: Aaron Thompson To: Mike Rapoport , linux-mm@kvack.org Cc: "H. Peter Anvin" , Alexander Potapenko , Andrew Morton , Andy Shevchenko , Ard Biesheuvel , Borislav Petkov , Darren Hart , Dave Hansen , David Rientjes , Dmitry Vyukov , Ingo Molnar , Marco Elver , Thomas Gleixner , kasan-dev@googlegroups.com, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, platform-driver-x86@vger.kernel.org, x86@kernel.org, Aaron Thompson Subject: [PATCH v2 0/1] Pages not released from memblock to the buddy allocator Date: Thu, 5 Jan 2023 04:17:21 +0000 Message-ID: <010001858025d78c-8d8b175b-8d52-4bc1-9c93-311868a527a2-000000@email.amazonses.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <010101857bbc3a41-173240b3-9064-42ef-93f3-482081126ec2-000000@us-west-2.amazonses.com> References: <010101857bbc3a41-173240b3-9064-42ef-93f3-482081126ec2-000000@us-west-2.amazonses.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Feedback-ID: 1.us-east-1.8/56jQl+KfkRukJqWjlnf+MtEL0x/NchId1fC0q616g=:AmazonSES X-SES-Outgoing: 2023.01.05-54.240.8.41 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changelog: v2: - Add comment in memblock_free_late() (suggested by Mike Rapoport) - Improve commit message, including an explanation of the x86_64 EFI boot issue (suggested by Mike Rapoport and David Rientjes) Hi all, (I've CC'ed the KMSAN and x86 EFI maintainers as an FYI; the only code change I'm proposing is in memblock.) I've run into a case where pages are not released from memblock to the buddy allocator. If deferred struct page init is enabled, and memblock_free_late() is called before page_alloc_init_late() has run, and the pages being freed are in the deferred init range, then the pages are never released. memblock_free_late() calls memblock_free_pages() which only releases the pages if they are not in the deferred range. That is correct for free pages because they will be initialized and released by page_alloc_init_late(), but memblock_free_late() is dealing with reserved pages. If memblock_free_late() doesn't release those pages, they will forever be reserved. All reserved pages were initialized by memblock_free_all(), so I believe the fix is to simply have memblock_free_late() call __free_pages_core() directly instead of memblock_free_pages(). In addition, there was a recent change (3c20650982609 "init: kmsan: call KMSAN initialization routines") that added a call to kmsan_memblock_free_pages() in memblock_free_pages(). It looks to me like it would also be incorrect to make that call in the memblock_free_late() case, because the KMSAN metadata was already initialized for all reserved pages by kmsan_init_shadow(), which runs before memblock_free_all(). Having memblock_free_late() call __free_pages_core() directly also fixes this issue. I encountered this issue when I tried to switch some x86_64 VMs I was running from BIOS boot to EFI boot. The x86 EFI code reserves all EFI boot services ranges via memblock_reserve() (part of setup_arch()), and it frees them later via memblock_free_late() (part of efi_enter_virtual_mode()). The EFI implementation of the VM I was attempting this on, an Amazon EC2 t3.micro instance, maps north of 170 MB in boot services ranges that happen to fall in the deferred init range. I certainly noticed when that much memory went missing on a 1 GB VM. I've tested the patch on EC2 instances, qemu/KVM VMs with OVMF, and some real x86_64 EFI systems, and they all look good to me. However, the physical systems that I have don't actually trigger this issue because they all have more than 4 GB of RAM, so their deferred init range starts above 4 GB (it's always in the highest zone and ZONE_DMA32 ends at 4 GB) while their EFI boot services mappings are below 4 GB. Deferred struct page init can't be enabled on x86_32 so those systems are unaffected. I haven't found any other code paths that would trigger this issue, though I can't promise that there aren't any. I did run with this patch on an arm64 VM as a sanity check, but memblock=debug didn't show any calls to memblock_free_late() so that system was unaffected as well. I am guessing that this change should also go the stable kernels but it may not apply cleanly (__free_pages_core() was __free_pages_boot_core() and memblock_free_pages() was __free_pages_bootmem() when this issue was first introduced). I haven't gone through that process before so please let me know if I can help with that. This is the end result on an EC2 t3.micro instance booting via EFI: v6.2-rc2: # grep -E 'Node|spanned|present|managed' /proc/zoneinfo Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 178867 v6.2-rc2 + patch: # grep -E 'Node|spanned|present|managed' /proc/zoneinfo Node 0, zone DMA spanned 4095 present 3999 managed 3840 Node 0, zone DMA32 spanned 246652 present 245868 managed 222816 Aaron Thompson (1): mm: Always release pages to the buddy allocator in memblock_free_late(). mm/memblock.c | 8 +++++++- tools/testing/memblock/internal.h | 4 ++++ 2 files changed, 11 insertions(+), 1 deletion(-) -- 2.30.2