Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp3586627pxb; Mon, 24 Jan 2022 12:53:55 -0800 (PST) X-Google-Smtp-Source: ABdhPJxvPDUmF6YrETg0tngVSH6O0uLvHoV4QiAakYkqwZyN3oEr9/ZGjYNMo/lGmYGzB9xEU1Xn X-Received: by 2002:a17:90b:1e05:: with SMTP id pg5mr161651pjb.50.1643057634798; Mon, 24 Jan 2022 12:53:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643057634; cv=none; d=google.com; s=arc-20160816; b=ncp79A6B0yg3qFn/sT40KonS4t2IZy4XoIRrxNsvkxfQcdSG3lNawB3y7FZROxhdGX yY7yzwpaILkOby1sbsAnEsbWURclCDXkbZoFw3zfKi9c0a9KEqn35XsN+JJYddEjCO4u QYu+si8A3VpZTrFOflCn7Q5R0N0XUlWyW0LrmOMDWI1bJdC+OH57HxsfgI6mlKk3d2ga 0FmFu6wl/teu6HmIUpo7b67oPqRL/q2KD5GeC9G8BxfixfbBZ0Lk6bD1j4UijYOg1Wez 54Xzo1ytRG2WpGsVS837Z7ADGEZX8Rqpdx7hT5VNcuJoOYc7k642gW6b/IkZNxtVmodS K6IQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=uuLtX8ZuQ/loBQjmlWvWTlI8YyNGrmjNboXrjIigE5I=; b=jr+pqHqvt8MqODX4HZCrXaTLuwrCUZNf4f9sCq/ea4rEyHIUPOPK3033YsKsTBjw/C ZxSdaR2punGwolm7IHhCoAggbUIs24C1qkVdkMKIDHT28FZ0ZPrYfsVChVHq5GxMZ9/N +8RasS7YI63171xAsAneL2ChdPNDew8FS3mQt+otJhB0kOx0dBQqD/7Qga5fstqSoLhr YD0r3mYCbQQSVMyODF6bHiKb7rz5bLT4r4nLmC/7JiR0P3KFyZh98iUwbRVfJE4TuJU4 6896LWO0www2OjGxneBt4QFTtomHxtyalEms17rmm0+u1yTU6RdaBu7g1m4DpHkmrD6W KssQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=wDh8bEEb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w191si1329108pgd.109.2022.01.24.12.53.41; Mon, 24 Jan 2022 12:53:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=wDh8bEEb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380252AbiAXUQA (ORCPT + 99 others); Mon, 24 Jan 2022 15:16:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235574AbiAXT4j (ORCPT ); Mon, 24 Jan 2022 14:56:39 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10108C02B85F; Mon, 24 Jan 2022 11:27:29 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id A74AEB81236; Mon, 24 Jan 2022 19:27:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CAD40C340E5; Mon, 24 Jan 2022 19:27:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1643052446; bh=vSHOw0cTxQAWZazSXAgmY+JvxSWBkyxVSahW+O9oOhY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=wDh8bEEb0PmPWKHUENOAV5Ib8IhHz5wo7xoZfJ1Kjv/npDcB7X6+z7OaX8/yK57b9 2BfiDSC2C3SGjDu3kEBjl6M6nwtrMeEpo2ixFX+b0I7ynbBQK1SwwEfiLhlyA9CX2k 7z648Ds4QW5ZZaSlGIuRYy/vhbB0dgRpP75HZsKA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Baoquan He , David Hildenbrand , John Donnelly , Christoph Hellwig , Christoph Lameter , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Pekka Enberg , David Rientjes , Joonsoo Kim , Vlastimil Babka , David Laight , Borislav Petkov , Marek Szyprowski , Robin Murphy , Andrew Morton , Linus Torvalds Subject: [PATCH 5.4 029/320] mm_zone: add function to check if managed dma zone exists Date: Mon, 24 Jan 2022 19:40:13 +0100 Message-Id: <20220124183954.747814708@linuxfoundation.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220124183953.750177707@linuxfoundation.org> References: <20220124183953.750177707@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Baoquan He commit 62b3107073646e0946bd97ff926832bafb846d17 upstream. Patch series "Handle warning of allocation failure on DMA zone w/o managed pages", v4. **Problem observed: On x86_64, when crash is triggered and entering into kdump kernel, page allocation failure can always be seen. --------------------------------- DMA: preallocated 128 KiB GFP_KERNEL pool for atomic allocations swapper/0: page allocation failure: order:5, mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 0 PID: 1 Comm: swapper/0 Call Trace: dump_stack+0x7f/0xa1 warn_alloc.cold+0x72/0xd6 ...... __alloc_pages+0x24d/0x2c0 ...... dma_atomic_pool_init+0xdb/0x176 do_one_initcall+0x67/0x320 ? rcu_read_lock_sched_held+0x3f/0x80 kernel_init_freeable+0x290/0x2dc ? rest_init+0x24f/0x24f kernel_init+0xa/0x111 ret_from_fork+0x22/0x30 Mem-Info: ------------------------------------ ***Root cause: In the current kernel, it assumes that DMA zone must have managed pages and try to request pages if CONFIG_ZONE_DMA is enabled. While this is not always true. E.g in kdump kernel of x86_64, only low 1M is presented and locked down at very early stage of boot, so that this low 1M won't be added into buddy allocator to become managed pages of DMA zone. This exception will always cause page allocation failure if page is requested from DMA zone. ***Investigation: This failure happens since below commit merged into linus's tree. 1a6a9044b967 x86/setup: Remove CONFIG_X86_RESERVE_LOW and reservelow= options 23721c8e92f7 x86/crash: Remove crash_reserve_low_1M() f1d4d47c5851 x86/setup: Always reserve the first 1M of RAM 7c321eb2b843 x86/kdump: Remove the backup region handling 6f599d84231f x86/kdump: Always reserve the low 1M when the crashkernel option is specified Before them, on x86_64, the low 640K area will be reused by kdump kernel. So in kdump kernel, the content of low 640K area is copied into a backup region for dumping before jumping into kdump. Then except of those firmware reserved region in [0, 640K], the left area will be added into buddy allocator to become available managed pages of DMA zone. However, after above commits applied, in kdump kernel of x86_64, the low 1M is reserved by memblock, but not released to buddy allocator. So any later page allocation requested from DMA zone will fail. At the beginning, if crashkernel is reserved, the low 1M need be locked down because AMD SME encrypts memory making the old backup region mechanims impossible when switching into kdump kernel. Later, it was also observed that there are BIOSes corrupting memory under 1M. To solve this, in commit f1d4d47c5851, the entire region of low 1M is always reserved after the real mode trampoline is allocated. Besides, recently, Intel engineer mentioned their TDX (Trusted domain extensions) which is under development in kernel also needs to lock down the low 1M. So we can't simply revert above commits to fix the page allocation failure from DMA zone as someone suggested. ***Solution: Currently, only DMA atomic pool and dma-kmalloc will initialize and request page allocation with GFP_DMA during bootup. So only initializ DMA atomic pool when DMA zone has available managed pages, otherwise just skip the initialization. For dma-kmalloc(), for the time being, let's mute the warning of allocation failure if requesting pages from DMA zone while no manged pages. Meanwhile, change code to use dma_alloc_xx/dma_map_xx API to replace kmalloc(GFP_DMA), or do not use GFP_DMA when calling kmalloc() if not necessary. Christoph is posting patches to fix those under drivers/scsi/. Finally, we can remove the need of dma-kmalloc() as people suggested. This patch (of 3): In some places of the current kernel, it assumes that dma zone must have managed pages if CONFIG_ZONE_DMA is enabled. While this is not always true. E.g in kdump kernel of x86_64, only low 1M is presented and locked down at very early stage of boot, so that there's no managed pages at all in DMA zone. This exception will always cause page allocation failure if page is requested from DMA zone. Here add function has_managed_dma() and the relevant helper functions to check if there's DMA zone with managed pages. It will be used in later patches. Link: https://lkml.kernel.org/r/20211223094435.248523-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20211223094435.248523-2-bhe@redhat.com Fixes: 6f599d84231f ("x86/kdump: Always reserve the low 1M when the crashkernel option is specified") Signed-off-by: Baoquan He Reviewed-by: David Hildenbrand Acked-by: John Donnelly Cc: Christoph Hellwig Cc: Christoph Lameter Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Vlastimil Babka Cc: David Laight Cc: Borislav Petkov Cc: Marek Szyprowski Cc: Robin Murphy Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/mmzone.h | 9 +++++++++ mm/page_alloc.c | 15 +++++++++++++++ 2 files changed, 24 insertions(+) --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -929,6 +929,15 @@ static inline int is_highmem_idx(enum zo #endif } +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void); +#else +static inline bool has_managed_dma(void) +{ + return false; +} +#endif + /** * is_highmem - helper function to quickly check if a struct zone is a * highmem zone or not. This is an attempt to keep references --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8694,3 +8694,18 @@ bool set_hwpoison_free_buddy_page(struct return hwpoisoned; } #endif + +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void) +{ + struct pglist_data *pgdat; + + for_each_online_pgdat(pgdat) { + struct zone *zone = &pgdat->node_zones[ZONE_DMA]; + + if (managed_zone(zone)) + return true; + } + return false; +} +#endif /* CONFIG_ZONE_DMA */