Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933453AbcLIMTl (ORCPT ); Fri, 9 Dec 2016 07:19:41 -0500 Received: from mail-io0-f180.google.com ([209.85.223.180]:33894 "EHLO mail-io0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933164AbcLIMTj (ORCPT ); Fri, 9 Dec 2016 07:19:39 -0500 MIME-Version: 1.0 In-Reply-To: References: <1480530091-1092-1-git-send-email-rrichter@cavium.com> From: Ard Biesheuvel Date: Fri, 9 Dec 2016 12:19:38 +0000 Message-ID: Subject: Re: [PATCH v2] arm64: mm: Fix memmap to be initialized for the entire section To: Yisheng Xie Cc: Robert Richter , Catalin Marinas , Will Deacon , David Daney , Mark Rutland , Hanjun Guo , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7526 Lines: 164 On 9 December 2016 at 12:14, Yisheng Xie wrote: > Hi Robert, > We have merged your patch to 4.9.0-rc8, however we still meet the similar problem > on our D05 board: > To be clear: does this issue also occur on D05 *without* the patch? > ----------------- > [ 5.081971] ------------[ cut here ]------------ > [ 5.086668] kernel BUG at mm/page_alloc.c:1863! > [ 5.091281] Internal error: Oops - BUG: 0 [#1] SMP > [ 5.096159] Modules linked in: > [ 5.099265] CPU: 61 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc8+ #58 > [ 5.105822] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon D05 UEFI 16.08 RC1 12/08/2016 > [ 5.114860] task: fffffe13f23d7400 task.stack: fffffe13f66f8000 > [ 5.120903] PC is at move_freepages+0x170/0x180 > [ 5.125514] LR is at move_freepages_block+0xa8/0xb8 > [ 5.130479] pc : [] lr : [] pstate: 200000c5 > [ 5.138008] sp : fffffe13f66fb910 > [ 5.141375] x29: fffffe13f66fb910 x28: 00000000ffffff80 > [ 5.146784] x27: fffffdff800b0000 x26: fffffe13fbf62b90 > [ 5.152193] x25: 000000000000000a x24: fffffdff800b0020 > [ 5.157598] x23: 0000000000000000 x22: fffffdff800b0000 > [ 5.163006] x21: fffffdff800fffc0 x20: fffffe13fbf62680 > [ 5.168412] x19: fffffdff80080000 x18: 000000004c4d6d26 > [ 5.173820] x17: 0000000000000000 x16: 0000000000000000 > [ 5.179226] x15: 000000005c589429 x14: 0000000000000000 > [ 5.184634] x13: 0000000000000000 x12: 00000000fe2ce6e0 > [ 5.190039] x11: 00000000bb5b525b x10: 00000000bb48417b > [ 5.195446] x9 : 0000000000000068 x8 : 0000000000000000 > [ 5.200854] x7 : fffffe13fbff2680 x6 : 0000000000000001 > [ 5.206260] x5 : fffffc0008e12328 x4 : 0000000000000000 > [ 5.211667] x3 : fffffe13fbf62680 x2 : 0000000000000000 > [ 5.217073] x1 : fffffe13fbff2680 x0 : fffffe13fbf62680 > [...] > [ 5.768991] [] move_freepages+0x170/0x180 > [ 5.774664] [] move_freepages_block+0xa8/0xb8 > [ 5.780691] [] __rmqueue+0x494/0x5f0 > [ 5.785921] [] get_page_from_freelist+0x5ec/0xb58 > [ 5.792302] [] __alloc_pages_nodemask+0x144/0xd08 > [ 5.798687] [] alloc_page_interleave+0x64/0xc0 > [ 5.804801] [] alloc_pages_current+0x108/0x168 > [ 5.810920] [] atomic_pool_init+0x78/0x1cc > [ 5.816680] [] arm64_dma_init+0x3c/0x44 > [ 5.822180] [] do_one_initcall+0x44/0x138 > [ 5.827853] [] kernel_init_freeable+0x1ec/0x28c > [ 5.834060] [] kernel_init+0x18/0x110 > [ 5.839378] [] ret_from_fork+0x10/0x20 > [ 5.844785] Code: 9108a021 9400afc5 d4210000 d503201f (d4210000) > [ 5.851024] ---[ end trace 3382df1ae82057db ]--- > > > On 2016/12/1 2:21, Robert Richter wrote: >> On ThunderX systems with certain memory configurations we see the >> following BUG_ON(): >> >> kernel BUG at mm/page_alloc.c:1848! >> >> This happens for some configs with 64k page size enabled. The BUG_ON() >> checks if start and end page of a memmap range belongs to the same >> zone. >> >> The BUG_ON() check fails if a memory zone contains NOMAP regions. In >> this case the node information of those pages is not initialized. This >> causes an inconsistency of the page links with wrong zone and node >> information for that pages. NOMAP pages from node 1 still point to the >> mem zone from node 0 and have the wrong nid assigned. >> >> The reason for the mis-configuration is a change in pfn_valid() which >> reports pages marked NOMAP as invalid: >> >> 68709f45385a arm64: only consider memblocks with NOMAP cleared for linear mapping >> >> This causes pages marked as nomap being no long reassigned to the new >> zone in memmap_init_zone() by calling __init_single_pfn(). >> >> Fixing this by restoring the old behavior of pfn_valid() to use >> memblock_is_memory(). Also changing users of pfn_valid() in arm64 code >> to use memblock_is_map_memory() where necessary. This only affects >> code in ioremap.c. The code in mmu.c still can use the new version of >> pfn_valid(). >> >> As a consequence, pfn_valid() can not be used to check if a physical >> page is RAM. It just checks if there is an underlying memmap with a >> valid struct page. Moreover, for performance reasons the whole memmap >> (with pageblock_nr_pages number of pages) has valid pfns (SPARSEMEM >> config). The memory range is extended to fit the alignment of the >> memmap. Thus, pfn_valid() may return true for pfns that do not map to >> physical memory. Those pages are simply not reported to the mm, they >> are not marked reserved nor added to the list of free pages. Other >> functions such a page_is_ram() or memblock_is_map_ memory() must be >> used to check for memory and if the page can be mapped with the linear >> mapping. >> >> Since NOMAP mem ranges may need to be mapped with different mem >> attributes (e.g. read-only or non-caching) we can not use linear >> mapping here. The use of memblock_is_memory() in pfn_valid() may not >> break this behaviour. Since commit: >> >> e7cd190385d1 arm64: mark reserved memblock regions explicitly in iomem >> >> NOMAP mem resources are no longer marked as system RAM (IORESOURCE_ >> SYSTEM_RAM). Now page_is_ram() and region_intersects() (see >> memremap()) do not detect NOMAP mem as system ram and NOMAP mem is not >> added to the linear mapping as system RAM is. >> >> v2: >> >> * Added Ack >> * updated description to reflect the discussion >> >> Acked-by: Ard Biesheuvel >> Signed-off-by: Robert Richter >> --- >> arch/arm64/mm/init.c | 2 +- >> arch/arm64/mm/ioremap.c | 5 +++-- >> 2 files changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> index 212c4d1e2f26..166911f4a2e6 100644 >> --- a/arch/arm64/mm/init.c >> +++ b/arch/arm64/mm/init.c >> @@ -147,7 +147,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max) >> #ifdef CONFIG_HAVE_ARCH_PFN_VALID >> int pfn_valid(unsigned long pfn) >> { >> - return memblock_is_map_memory(pfn << PAGE_SHIFT); >> + return memblock_is_memory(pfn << PAGE_SHIFT); >> } >> EXPORT_SYMBOL(pfn_valid); >> #endif >> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c >> index 01e88c8bcab0..c17c220b0c48 100644 >> --- a/arch/arm64/mm/ioremap.c >> +++ b/arch/arm64/mm/ioremap.c >> @@ -21,6 +21,7 @@ >> */ >> >> #include >> +#include >> #include >> #include >> #include >> @@ -55,7 +56,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size, >> /* >> * Don't allow RAM to be mapped. >> */ >> - if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr)))) >> + if (WARN_ON(memblock_is_map_memory(phys_addr))) >> return NULL; >> >> area = get_vm_area_caller(size, VM_IOREMAP, caller); >> @@ -96,7 +97,7 @@ EXPORT_SYMBOL(__iounmap); >> void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size) >> { >> /* For normal memory we already have a cacheable mapping. */ >> - if (pfn_valid(__phys_to_pfn(phys_addr))) >> + if (memblock_is_map_memory(phys_addr)) >> return (void __iomem *)__phys_to_virt(phys_addr); >> >> return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL), >> >