Received: by 2002:a25:1104:0:0:0:0:0 with SMTP id 4csp40422ybr; Fri, 22 May 2020 00:03:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz0//1MwqH+ML3XwHk/u8tF+FYps2McWFJCiKu7Wt/kkQK52Cec9b/myNrABntseRV3B6Yq X-Received: by 2002:a17:906:a29a:: with SMTP id i26mr6433751ejz.199.1590131001878; Fri, 22 May 2020 00:03:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590131001; cv=none; d=google.com; s=arc-20160816; b=GYLiawup3NSKnYNSawOLt+anlxVNRRQ7DfhmYVIIPjFGiLbvWn15G4Onp1RYE5F5IY 2juCW5tro8hL/rwIk7eGKQ26/UaO2rZgFfJRQlU0c4GkwE0kBZLVQ+5vmvSa+97stOn6 QLrHq1DlTrMUoaHeU5A1uItE1Ut4V5v/FcPSs3w0MDDa6SenLhuwIC6ZWTRluznTgzjy Q1tsziMVil9UtdvuDisRjBa/3ExGDKwDU0NRjj+069QB89SaKFPBvIMuMee1407l/Poe 1NZsbAQe3n2ZUJiqw3SHHGAjr3nTG6yMaaDfOOjGWVPt/+hrVTGKYjlUibDT9Bh7lNp3 vLww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=sFWr7gpjYD9ttIAQ8TkDqEXM7t/nar6Qf5F2iEybFdM=; b=y52qKwyNh9O9ROxCwO3oaQVloW59JpH7n0Z1a+BfqfOL1JQSXaVLLcvdl13MM41DPd l77Rql3F/Fo6BK6fH8yoPp9Z6/4lIxmoOgpSD6kX1Qh0OgvWYNfjxinVjh2XaCUfw3HP FYT6DytkVyGGm9Wk8mvYS3Q39lyA4DKsRTzmss5hsR3rdpOZILmqb9C0X7Sym2/Qg1IY Mcpoib82ULC5i9YVQ4oAX7XGfp/TkC1nj/Iqfk6wEHV6tkChKvFf28AaSdtoTC1oNHDK 0SgWNLR+loJ/2HAk23k0Mc+EeMctaunIhCrQCFr105R3VJ1KrjV2PotY/RL1vI+p1V4U 6WHA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=W08j67Tq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r28si4549003edc.610.2020.05.22.00.02.58; Fri, 22 May 2020 00:03:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=W08j67Tq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728230AbgEVHBa (ORCPT + 99 others); Fri, 22 May 2020 03:01:30 -0400 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:53979 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728181AbgEVHB3 (ORCPT ); Fri, 22 May 2020 03:01:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1590130887; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sFWr7gpjYD9ttIAQ8TkDqEXM7t/nar6Qf5F2iEybFdM=; b=W08j67TqczEdPp8I2TD1HjPewrjCGDLSi0Mu3KmMXW6cS6JouY4/tr1XKMGhSfV74Jfqgf pEsiCiCON5EJrlnc0H+WhQ5EeqOC2LFBFVgmf6T35iECRRzAGCMwVBboxpo27iB+j6DA+C lTrVoUekoUEVHi9Hwe5E1E1dGZdJlwE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-453-5QCcVnr6NtmIXnocHMvyEA-1; Fri, 22 May 2020 03:01:22 -0400 X-MC-Unique: 5QCcVnr6NtmIXnocHMvyEA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6205A1853C61; Fri, 22 May 2020 07:01:21 +0000 (UTC) Received: from localhost (ovpn-12-170.pek2.redhat.com [10.72.12.170]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D304E6E9F3; Fri, 22 May 2020 07:01:17 +0000 (UTC) Date: Fri, 22 May 2020 15:01:14 +0800 From: Baoquan He To: Mike Rapoport , mgorman@suse.de Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, cai@lca.pw, mhocko@kernel.org Subject: Re: [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages() Message-ID: <20200522070114.GE26955@MiWiFi-R3L-srv> References: <20200521014407.29690-1-bhe@redhat.com> <20200521092612.GP1059226@linux.ibm.com> <20200521155225.GA20045@MiWiFi-R3L-srv> <20200521171836.GU1059226@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200521171836.GU1059226@linux.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/21/20 at 08:18pm, Mike Rapoport wrote: > On Thu, May 21, 2020 at 11:52:25PM +0800, Baoquan He wrote: > > On 05/21/20 at 12:26pm, Mike Rapoport wrote: > > > > For this kind of e820 reserved range, it won't be added to memblock allocator. > > > > However, init_unavailable_mem() will initialize to add them into node 0, > > > > zone 0. Before that commit, later, memmap_init() will add e820 reserved > > > > ranges into the zone where they are contained, because it can pass > > > > the checking of early_pfn_valid() and early_pfn_in_nid(). In this case, > > > > the e820 reserved range where fault page 0x2a800000 is located is added > > > > into DMA32 zone. After that commit, the e820 reserved rgions are kept > > > > in node 0, zone 0, since we iterate over memblock regions to iniatialize > > > > in memmap_init() instead, their node and zone won't be changed. > > > > > > I wonder why woudn't we add the reserved memory to memblock from the > > > very beginning... > > > I've tried to undestand why this was not done, but I couldn't find the > > > reasoning behind that. > > > > I have added some explanation when reply to Mel. Please check that in > > that thread. > > What I meant was that I've tried to find an explanation in the kernel logs > for why the reserved areas are not added to memblock.memory on x86. I > didn't mean that the your patch description lacks this explanation :) Ah, I misunderstood it, sorry about that. > > > As I said, the unavailable range includes firmware reserved ranges, and > > holes inside one boot memory section, if that boot memory section haves > > useable memory range, and firmware reserved ranges, and holes. Adding > > them all into memblock seems a little unreasonable, since they are never > > used by system in memblock, buddy or high level memory allocator. But I > > can see that adding them into memblock may have the same effect as the > > old code which is beofre your your patchset applied. Let's see if Mel or > > other people have some saying. I pesonally would not suggest doing it > > like this though. > > Adding reserved regions to memblock.memory will not have the same effect > as the old code. We anyway have to initialize struct page for these > areas, but unlike the old code we don't need to run them by the > early_pfn_in_nid() checks and we still get rid the > CONFIG_NODES_SPAN_OTHER_NODES option. Hmm, I mean adding them to memblock will let us have the same result, they are added into the node, zone where they should be, and marked as reserved, just as the old code did. Rethink about this, seems adding them into memblock is doable. But we may not need to add them from e820 reserved range, since that will skip hole range which share the same section with usable range, and may need to change code in different ARCHes. How about this: We add them into memblock in init_unavailable_range(), memmap_init() will add them into the right node and zone, reserve_bootmem_region() will initialize them and mark them as Reserved. From d019d0f9e7c958542dfcb142f93d07fcce6c7c22 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Fri, 22 May 2020 14:36:13 +0800 Subject: [PATCH] mm/page_alloc.c: Add unavailable ranges into memblock These unavailable ranges shares the same section with the usable range in boot memory, e.g the firmware reserved ranges, and holes. Previously, they are added into node 0, zone 0 in function init_unavailable_range(), and marked as Reserved. Later, in function memmap_init(), they will be added to appropriate node and zone, where they are covered. However, after the patchset ("mm: rework free_area_init*() funcitons") is applied, we change to iterate over memblock regions. These unavailable ranges are skipped, and the node and zone adjustment won't be done any more as the old code did. This cause a crash in compaction which is triggered by VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)). So let's add these unavailable ranges into memblock and reserve them in init_unavailable_range() instead. With this change, they will be added into appropriate node and zone in memmap_init(), and initialized in reserve_bootmem_region() just like any other memblock reserved regions. Signed-off-by: Baoquan He --- mm/page_alloc.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 603187800628..3973b5fdfe3f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6925,7 +6925,7 @@ static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn) static void __init init_unavailable_mem(void) { phys_addr_t start, end; - u64 i, pgcnt; + u64 i, pgcnt, size; phys_addr_t next = 0; /* @@ -6934,9 +6934,11 @@ static void __init init_unavailable_mem(void) pgcnt = 0; for_each_mem_range(i, &memblock.memory, NULL, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) { - if (next < start) - pgcnt += init_unavailable_range(PFN_DOWN(next), - PFN_UP(start)); + if (next < start) { + size = PFN_UP(start) - PFN_DOWN(next); + memblock_add(PFN_DOWN(next), size); + memblock_reserve(PFN_DOWN(next), size); + } next = end; } @@ -6947,8 +6949,11 @@ static void __init init_unavailable_mem(void) * considered initialized. Make sure that memmap has a well defined * state. */ - pgcnt += init_unavailable_range(PFN_DOWN(next), - round_up(max_pfn, PAGES_PER_SECTION)); + size = round_up(max_pfn, PAGES_PER_SECTION) - PFN_DOWN(next); + if (size) { + memblock_add(PFN_DOWN(next), size); + memblock_reserve(PFN_DOWN(next), size); + } /* * Struct pages that do not have backing memory. This could be because -- 2.17.2