Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp630784pxv; Thu, 15 Jul 2021 12:00:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwqiNkkxWu5+Bh0pKf0pU39e7TSiTjwki2A06VY055iqYgE0f0SWwHh5RqJkQfUWMJnYZlv X-Received: by 2002:a5d:80da:: with SMTP id h26mr4293550ior.206.1626375602621; Thu, 15 Jul 2021 12:00:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626375602; cv=none; d=google.com; s=arc-20160816; b=nipUNY5JpW329DR+pDB2I3zKbLzeEuDGhIi+Rt5K/X59KLjDZA1nUnMhtglzeGufz1 MtAUExd0wI7x6hcvaEW6cApyQh/QxGcJVKah2wJ/NN4Lrn02QPB0jfF4IMcNOI+KQ6jM oB9vbgQ0xC4xH2KL4KAUhPz0xTNCAF5rR/sUj0oGOEeMvgQzn+OvAAlkpVeVTZVYk1m7 9lfRtvFojQv02yEqxmoUs7cEAa9mMNW3BNfKte/uYA688JekLfWwjs5xOSBq4wXFhZtF LjZ7M3Dy8bWXlvJJTtLTGgMuIzcaFA3oF1iL8AuNg0zJ7TREp8W9drzsbtApHrd1qWxT 4Apg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=6OmTcVGQg9otnv1WU/Bw36qSDhW+RR4U7OlPJ5DWYLY=; b=lRhDRW0LIjy+0iQ+dmaH5cqac6YXFOvzkUEc7QzBLjOgSlilub6DsDjL4l7kBJz6Nt hBxNmcQU7kUeoQHJ78ianDztS1LLxtT6IkDs1Noc6Odp3KldvtMy8skd7HLpaRmh0GXi kQoYdHzv802pZ0fQ+BPEZDIiDDGmq7Kbpsu5A2ARJbe2uXdLs2of8+Lv8JbCLZ4HXC3w vIEmGayyjjs+N+daY0SPI1F6xBnPJGWodIn2ixQaWZSFfuK88w0aDbM0S7/mpCIilJLJ elV5RPeZKiyQwppEzJDes4Me2bB+xzruXaW5jiW8pcWhAHfSPB0ZngCHiGrw2pbZia2X qudA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=vb4hiZAd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t11si8324623ilu.68.2021.07.15.11.59.49; Thu, 15 Jul 2021 12:00:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=vb4hiZAd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241952AbhGOTBN (ORCPT + 99 others); Thu, 15 Jul 2021 15:01:13 -0400 Received: from mail.kernel.org ([198.145.29.99]:58234 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240650AbhGOSxr (ORCPT ); Thu, 15 Jul 2021 14:53:47 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id AC6E5613E0; Thu, 15 Jul 2021 18:50:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1626375053; bh=LnyTqxZP34bLIB7FRMN7pRHYBnvKP0nOmUGm07Hm1l4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vb4hiZAdMVviUCUjxXLYYG+jOyTdx9DGJ1Ga30/UNTPBcGv2WvuN87p1dxxbyOlaN zdFUsJnewmmMX4VTecZ+6uRuVgjisUsd10tHA4n0LSDAD+RarQRymO5yEmsldFyhSZ iydgnpkZpWLlnoyPI5/OX1EdymqaTODeAv8oA2v0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Mike Rapoport , Boris Petkov , Robert Shteynfeld , Baoquan He , Vlastimil Babka , David Hildenbrand , Andrew Morton , Linus Torvalds Subject: [PATCH 5.10 138/215] mm/page_alloc: fix memory map initialization for descending nodes Date: Thu, 15 Jul 2021 20:38:30 +0200 Message-Id: <20210715182623.942552790@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210715182558.381078833@linuxfoundation.org> References: <20210715182558.381078833@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Mike Rapoport commit 122e093c1734361dedb64f65c99b93e28e4624f4 upstream. On systems with memory nodes sorted in descending order, for instance Dell Precision WorkStation T5500, the struct pages for higher PFNs and respectively lower nodes, could be overwritten by the initialization of struct pages corresponding to the holes in the memory sections. For example for the below memory layout [ 0.245624] Early memory node ranges [ 0.248496] node 1: [mem 0x0000000000001000-0x0000000000090fff] [ 0.251376] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff] [ 0.254256] node 1: [mem 0x0000000100000000-0x0000001423ffffff] [ 0.257144] node 0: [mem 0x0000001424000000-0x0000002023ffffff] the range 0x1424000000 - 0x1428000000 in the beginning of node 0 starts in the middle of a section and will be considered as a hole during the initialization of the last section in node 1. The wrong initialization of the memory map causes panic on boot when CONFIG_DEBUG_VM is enabled. Reorder loop order of the memory map initialization so that the outer loop will always iterate over populated memory regions in the ascending order and the inner loop will select the zone corresponding to the PFN range. This way initialization of the struct pages for the memory holes will be always done for the ranges that are actually not populated. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/YNXlMqBbL+tBG7yq@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=213073 Link: https://lkml.kernel.org/r/20210624062305.10940-1-rppt@kernel.org Fixes: 0740a50b9baa ("mm/page_alloc.c: refactor initialization of struct page for holes in memory layout") Signed-off-by: Mike Rapoport Cc: Boris Petkov Cc: Robert Shteynfeld Cc: Baoquan He Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/page_alloc.c | 100 +++++++++++++++++++++++++++++++++----------------------- 1 file changed, 60 insertions(+), 40 deletions(-) --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6129,7 +6129,7 @@ void __ref memmap_init_zone_device(struc return; /* - * The call to memmap_init_zone should have already taken care + * The call to memmap_init should have already taken care * of the pages reserved for the memmap, so we can just jump to * the end of that region and start processing the device pages. */ @@ -6194,7 +6194,7 @@ static void __meminit zone_init_free_lis /* * Only struct pages that correspond to ranges defined by memblock.memory * are zeroed and initialized by going through __init_single_page() during - * memmap_init_zone(). + * memmap_init_zone_range(). * * But, there could be struct pages that correspond to holes in * memblock.memory. This can happen because of the following reasons: @@ -6213,9 +6213,9 @@ static void __meminit zone_init_free_lis * zone/node above the hole except for the trailing pages in the last * section that will be appended to the zone/node below. */ -static u64 __meminit init_unavailable_range(unsigned long spfn, - unsigned long epfn, - int zone, int node) +static void __init init_unavailable_range(unsigned long spfn, + unsigned long epfn, + int zone, int node) { unsigned long pfn; u64 pgcnt = 0; @@ -6231,58 +6231,77 @@ static u64 __meminit init_unavailable_ra pgcnt++; } - return pgcnt; + if (pgcnt) + pr_info("On node %d, zone %s: %lld pages in unavailable ranges", + node, zone_names[zone], pgcnt); } #else -static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn, - int zone, int node) +static inline void init_unavailable_range(unsigned long spfn, + unsigned long epfn, + int zone, int node) { - return 0; } #endif -void __meminit __weak memmap_init(unsigned long size, int nid, - unsigned long zone, - unsigned long range_start_pfn) +static void __init memmap_init_zone_range(struct zone *zone, + unsigned long start_pfn, + unsigned long end_pfn, + unsigned long *hole_pfn) +{ + unsigned long zone_start_pfn = zone->zone_start_pfn; + unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages; + int nid = zone_to_nid(zone), zone_id = zone_idx(zone); + + start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn); + end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn); + + if (start_pfn >= end_pfn) + return; + + memmap_init_zone(end_pfn - start_pfn, nid, zone_id, start_pfn, + zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); + + if (*hole_pfn < start_pfn) + init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid); + + *hole_pfn = end_pfn; +} + +void __init __weak memmap_init(void) { - static unsigned long hole_pfn; unsigned long start_pfn, end_pfn; - unsigned long range_end_pfn = range_start_pfn + size; - int i; - u64 pgcnt = 0; + unsigned long hole_pfn = 0; + int i, j, zone_id, nid; - for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { - start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn); - end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn); + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + struct pglist_data *node = NODE_DATA(nid); - if (end_pfn > start_pfn) { - size = end_pfn - start_pfn; - memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn, - MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); - } + for (j = 0; j < MAX_NR_ZONES; j++) { + struct zone *zone = node->node_zones + j; - if (hole_pfn < start_pfn) - pgcnt += init_unavailable_range(hole_pfn, start_pfn, - zone, nid); - hole_pfn = end_pfn; + if (!populated_zone(zone)) + continue; + + memmap_init_zone_range(zone, start_pfn, end_pfn, + &hole_pfn); + zone_id = j; + } } #ifdef CONFIG_SPARSEMEM /* - * Initialize the hole in the range [zone_end_pfn, section_end]. - * If zone boundary falls in the middle of a section, this hole - * will be re-initialized during the call to this function for the - * higher zone. + * Initialize the memory map for hole in the range [memory_end, + * section_end]. + * Append the pages in this hole to the highest zone in the last + * node. + * The call to init_unavailable_range() is outside the ifdef to + * silence the compiler warining about zone_id set but not used; + * for FLATMEM it is a nop anyway */ - end_pfn = round_up(range_end_pfn, PAGES_PER_SECTION); + end_pfn = round_up(end_pfn, PAGES_PER_SECTION); if (hole_pfn < end_pfn) - pgcnt += init_unavailable_range(hole_pfn, end_pfn, - zone, nid); #endif - - if (pgcnt) - pr_info(" %s zone: %llu pages in unavailable ranges\n", - zone_names[zone], pgcnt); + init_unavailable_range(hole_pfn, end_pfn, zone_id, nid); } static int zone_batchsize(struct zone *zone) @@ -6981,7 +7000,6 @@ static void __init free_area_init_core(s set_pageblock_order(); setup_usemap(pgdat, zone, zone_start_pfn, size); init_currently_empty_zone(zone, zone_start_pfn, size); - memmap_init(size, nid, j, zone_start_pfn); } } @@ -7507,6 +7525,8 @@ void __init free_area_init(unsigned long node_set_state(nid, N_MEMORY); check_for_memory(pgdat, nid); } + + memmap_init(); } static int __init cmdline_parse_core(char *p, unsigned long *core,