Received: by 10.213.65.68 with SMTP id h4csp289158imn; Wed, 28 Mar 2018 03:44:00 -0700 (PDT) X-Google-Smtp-Source: AIpwx4+N8Jh47m/WNQZ6b+0F4jGBCwcQkqDm+fSepw/aQ7o20F0n/VRCClNPfMlkc7fwOkFjngrv X-Received: by 10.101.93.5 with SMTP id e5mr2234796pgr.247.1522233840412; Wed, 28 Mar 2018 03:44:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522233840; cv=none; d=google.com; s=arc-20160816; b=VXN2bS2AuMROLJ2igUZHbGPO7QKD7tTNbR/98uLRNvZSsJVh4Pj+EBJW1SnKgsU2u/ wVH/x0NG7Xwx032Fd3mDL5nrN1TlCKSFeoC4Kd1Y3i9nFKyKztpWVW1HcRtdORJ9f2oy apDok6gFPLHkMKBsRlm8GH3GglRmCeMx1toHsRdmGTcW+StVNvxfA9YJeU7LpgfvCXsV Y3GaADk76ojfOEARN3cnlSRZ+E90bJI1UP9YLudpmI/671+vRvB4y+T0AvW5z0bmNQRD A+O5dnTwRpPFx/S0PQr5f/6SLKWpKXbLu2mS0MkovmZ6WmcffK+57kqOuKGq49gSQhb0 WvKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:arc-authentication-results; bh=jhhQizLf3pKzV0BrT8sr1wus03yaC6qTXwtH+YATOJo=; b=oeuYgsEMq/XT6DAwGZDdBUUmFqQsPyjEJnTc+CPFdOlD2W46c8B0/998MPYZh3F2fl hhNafMR6I1Ov+FiQw6/T20fIlSj7KvxUUEMXC8oh+bF+mkXPWWYYPNDRhWSXm8wrIwqM RoN4BKshPTN4cHrGLKvao76SzLlgbFfDdtxAjGoZHjNWTvlwvMnFACDuomFT6veBXH2B 8XMXYNjMN5yhyvhUuuQJjoxDIehUJ7vtdn/ksXlyyrwGxj8qhdq0b35LtQuc9Ild09nd 6hBNkgRaSNgMR5G3EX8RNnIazBW+tUApkJbb9fuXMku/rjO03LASUm6mzJg3XSjk+Pm9 bWjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b17si2269568pgu.407.2018.03.28.03.43.45; Wed, 28 Mar 2018 03:44:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752705AbeC1Kmv (ORCPT + 99 others); Wed, 28 Mar 2018 06:42:51 -0400 Received: from mail-ot0-f194.google.com ([74.125.82.194]:34310 "EHLO mail-ot0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752067AbeC1Kmu (ORCPT ); Wed, 28 Mar 2018 06:42:50 -0400 Received: by mail-ot0-f194.google.com with SMTP id m7-v6so2136039otd.1 for ; Wed, 28 Mar 2018 03:42:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=jhhQizLf3pKzV0BrT8sr1wus03yaC6qTXwtH+YATOJo=; b=e+ex0+fT/SRBcBhLA8DFoWoUIBXI68qB8jaUvTBb6SlfZP4s33T0fXyCSgQy3cxP4I UbBCYKXGT7FMg8vgxEKB8aMCQyWAX42S5gD2wFj/VnzTR+L5oPnzn1Pi5GkR5x9+vDva GinwjXLQY65xfvKLe6FxXmz0Ohf4uLfMn935iKRrwp5NPpW/6OEdwzpCGwuZnRLFNwmi cYGUCPD7HaMW26hcW5/wx3A1xVTsMDZKZPE/HFNz1JjebVFAqYJRy1bbchgEouUFzq0E fr+mccn6DHDDa7Ii2OMPOe1n/69M/7ody5dKOOtUI//lGzpi7XVnqxhLyu3+9WtziPiz LLjw== X-Gm-Message-State: AElRT7FlQzSwCwu+qA3zm5O9nspSCoiioP6V6PqPkQzHZdD4xkZAVFPy /99dU+cjF3SIKNIuNCUAdjvYgPUNSX+5KF0p4m3r4g== X-Received: by 2002:a9d:478a:: with SMTP id b10-v6mr1965919otf.257.1522233769867; Wed, 28 Mar 2018 03:42:49 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:cef:0:0:0:0:0 with HTTP; Wed, 28 Mar 2018 03:42:49 -0700 (PDT) In-Reply-To: <113cb8cf-7c59-4fbb-5ad2-4ae6eeb1193c@gmail.com> References: <1521894282-6454-1-git-send-email-hejianet@gmail.com> <1521894282-6454-2-git-send-email-hejianet@gmail.com> <113cb8cf-7c59-4fbb-5ad2-4ae6eeb1193c@gmail.com> From: Daniel Vacek Date: Wed, 28 Mar 2018 12:42:49 +0200 Message-ID: Subject: Re: [PATCH v2 1/5] mm: page_alloc: remain memblock_next_valid_pfn() when CONFIG_HAVE_ARCH_PFN_VALID is enable To: Jia He Cc: Andrew Morton , Michal Hocko , Catalin Marinas , Mel Gorman , Will Deacon , Mark Rutland , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Pavel Tatashin , Daniel Jordan , AKASHI Takahiro , Gioh Kim , Steven Sistare , Eugeniu Rosca , Vlastimil Babka , open list , linux-mm@kvack.org, James Morse , Ard Biesheuvel , Steve Capper , x86@kernel.org, Greg Kroah-Hartman , Kate Stewart , Philippe Ombredanne , Johannes Weiner , Kemi Wang , Petr Tesarik , YASUAKI ISHIMATSU , Andrey Ryabinin , Nikolay Borisov , Jia He Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 28, 2018 at 11:26 AM, Jia He wrote: > > > On 3/28/2018 12:52 AM, Daniel Vacek Wrote: >> >> On Sat, Mar 24, 2018 at 1:24 PM, Jia He wrote: >>> >>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns >>> where possible") optimized the loop in memmap_init_zone(). But it causes >>> possible panic bug. So Daniel Vacek reverted it later. >>> >>> But memblock_next_valid_pfn is valid when CONFIG_HAVE_ARCH_PFN_VALID is >>> enabled. And as verified by Eugeniu Rosca, arm can benifit from this >>> commit. So remain the memblock_next_valid_pfn. >> >> It is not dependent on CONFIG_HAVE_ARCH_PFN_VALID option but on >> arm(64) implementation of pfn_valid() function, IIUC. So it should >> really be moved from generic source file to arm specific location. I'd >> say somewhere close to the pfn_valid() implementation. Such as to >> arch/arm{,64}/mm/ init.c-ish? >> >> --nX > > Hi Daniel > I didn't catch the reason why "It is not dependent on > CONFIG_HAVE_ARCH_PFN_VALID option but > on arm(64) implementation of pfn_valid() function"? Can you explain more > about it? Thanks Arm implementation of pfn_valid() is actually based on memblock as HAVE_MEMBLOCK is mandatory for arm so memblock is guaranteed to always be available, IIUC. Correct me if I am wrong here. With that said, you are fine with using memblock to skip gaps and finding next valid frame. Though the generic version of pfn_valid() is based on mem sections and memblock_next_valid_pfn() does not always return the next valid one but skips more resulting in some valid frames to be skipped (as if they were invalid). And that's why kernel was eventually crashing on some !arm machines. Now, if any other architecture defines CONFIG_HAVE_ARCH_PFN_VALID and implements it's own version of pfn_valid(), there is no guarantee that it will be based on memblock data or somehow equivalent to the arm implementation, right? At this moment only arm implements CONFIG_HAVE_ARCH_PFN_VALID. Maybe it could be generalized to something like CONFIG_MEMBLOCK_PFN_VALID and moved to generic code. And then you can base your changes on that. But I am not sure if that is possible. > What's your thought if I changed the codes as followed? > in include/linux/memblock.h > #ifdef CONFIG_HAVE_ARCH_PFN_VALID > extern unsigned long memblock_next_valid_pfn(unsigned long pfn); > #else > #define memblock_next_valid_pfn(pfn) (pfn + 1) > #endif I think I'd rather do something like this: if (!early_pfn_valid(pfn)) { pfn = skip_to_last_invalid_pfn(pfn); continue; } And than for arm define: #if (defined CONFIG_HAVE_MEMBLOCK) && (defined CONFIG_HAVE_ARCH_PFN_VALID) #define skip_to_last_invalid_pfn(pfn) (memblock_next_valid_pfn(pfn, &early_idx) - 1) #endif And for the generic fallback: #ifndef skip_to_last_invalid_pfn #define skip_to_last_invalid_pfn(pfn) (pfn) #endif --nX > Cheers, > Jia > >> >>> Signed-off-by: Jia He >>> --- >>> include/linux/memblock.h | 4 ++++ >>> mm/memblock.c | 29 +++++++++++++++++++++++++++++ >>> mm/page_alloc.c | 11 ++++++++++- >>> 3 files changed, 43 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h >>> index 0257aee..efbbe4b 100644 >>> --- a/include/linux/memblock.h >>> +++ b/include/linux/memblock.h >>> @@ -203,6 +203,10 @@ void __next_mem_pfn_range(int *idx, int nid, >>> unsigned long *out_start_pfn, >>> i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, >>> p_nid)) >>> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ >>> >>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID >>> +unsigned long memblock_next_valid_pfn(unsigned long pfn); >>> +#endif >>> + >>> /** >>> * for_each_free_mem_range - iterate through free memblock areas >>> * @i: u64 used as loop variable >>> diff --git a/mm/memblock.c b/mm/memblock.c >>> index ba7c878..bea5a9c 100644 >>> --- a/mm/memblock.c >>> +++ b/mm/memblock.c >>> @@ -1102,6 +1102,35 @@ void __init_memblock __next_mem_pfn_range(int >>> *idx, int nid, >>> *out_nid = r->nid; >>> } >>> >>> +#ifdef CONFIG_HAVE_ARCH_PFN_VALID >>> +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) >>> +{ >>> + struct memblock_type *type = &memblock.memory; >>> + unsigned int right = type->cnt; >>> + unsigned int mid, left = 0; >>> + phys_addr_t addr = PFN_PHYS(++pfn); >>> + >>> + do { >>> + mid = (right + left) / 2; >>> + >>> + if (addr < type->regions[mid].base) >>> + right = mid; >>> + else if (addr >= (type->regions[mid].base + >>> + type->regions[mid].size)) >>> + left = mid + 1; >>> + else { >>> + /* addr is within the region, so pfn is valid */ >>> + return pfn; >>> + } >>> + } while (left < right); >>> + >>> + if (right == type->cnt) >>> + return -1UL; >>> + else >>> + return PHYS_PFN(type->regions[right].base); >>> +} >>> +#endif /*CONFIG_HAVE_ARCH_PFN_VALID*/ >>> + >>> /** >>> * memblock_set_node - set node ID on memblock regions >>> * @base: base of area to set node ID for >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index c19f5ac..2a967f7 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -5483,8 +5483,17 @@ void __meminit memmap_init_zone(unsigned long >>> size, int nid, unsigned long zone, >>> if (context != MEMMAP_EARLY) >>> goto not_early; >>> >>> - if (!early_pfn_valid(pfn)) >>> + if (!early_pfn_valid(pfn)) { >>> +#if (defined CONFIG_HAVE_MEMBLOCK) && (defined >>> CONFIG_HAVE_ARCH_PFN_VALID) >>> + /* >>> + * Skip to the pfn preceding the next valid one >>> (or >>> + * end_pfn), such that we hit a valid pfn (or >>> end_pfn) >>> + * on our next iteration of the loop. >>> + */ >>> + pfn = memblock_next_valid_pfn(pfn) - 1; >>> +#endif >>> continue; >>> + } >>> if (!early_pfn_in_nid(pfn, nid)) >>> continue; >>> if (!update_defer_init(pgdat, pfn, end_pfn, >>> &nr_initialised)) >>> -- >>> 2.7.4 >>> > > -- > Cheers, > Jia >