Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp1161904ybi; Fri, 14 Jun 2019 09:37:13 -0700 (PDT) X-Google-Smtp-Source: APXvYqy6MMTdGGCeWNLmlTxQiul4hFC5leHadXJ0yPV46+iRuCnzfVu7/8sU0zHXcbZY7e0VgjuK X-Received: by 2002:a63:2a89:: with SMTP id q131mr37151389pgq.359.1560530232962; Fri, 14 Jun 2019 09:37:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560530232; cv=none; d=google.com; s=arc-20160816; b=bahg1zScHWt31t0X3TR1WgL1/Jyh2UTwwfOnT7BC0gpgVNI9RTZSo1sEDEsnN9pGU2 0ZlOBrqRk3cvQh2Y55edKBrgoXyjadtvPpriA/Hdkdx0b+cuhGyB92+wqJnIXLOZIMG1 uYoK2RuwLFlxTn7E+Jz3UInqM/tbzfFbJFuSgXbXpi6kxJT8DdohD0hdkMdUB4tY6erg 6AiAQysl16DFpFGq75MfFb522O+ydAjAy0MSL2P/QcJ7SQkZKJfyde1DgIqn2qMhGL2v RPbkK0Ij0uH7wUnzmHoVv4r0i8i9jjaNsaggDwTheU3PY7+ayYMf3M8Dh8pCJBCD+cbZ oJRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=EvAQexBEPL7nJ/QjTH44i3lQXyubE+6SE92rMt2TZnA=; b=c5hhVTdyykGdZLMI6rmIFbjbgOkXeCGXZtu07VyNPqBjs9oXstTddqmTPzRrXw1dAF NySBSC+61A2Ag59yibbQD9pfVuWPXD1TcUin9q8GC2Wy9+o9y2No1m+HB/wUx3DFq6Qg IZF9s7VlRH35jf2q1/oxR0D4zeayMJLkdYjHQGYdCLaFjsyCFoiFx64a6inH3guti5IG iMC4762B0EhJBQBUDgboa4JdJSL+ICyE/kJUApdJwTCCiZAPt7cjOli835kzJLEwiC/t 9RQd/TXIgiUzyckTNKLRsljW9jT7elKcg/4P9cNFLF+ia5lBxSh81rmZDplflolZoD7s unnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=xQp7xtj3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h12si3022717pgr.325.2019.06.14.09.36.56; Fri, 14 Jun 2019 09:37:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=xQp7xtj3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725942AbfFNQgp (ORCPT + 99 others); Fri, 14 Jun 2019 12:36:45 -0400 Received: from mail-oi1-f196.google.com ([209.85.167.196]:45463 "EHLO mail-oi1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725808AbfFNQgp (ORCPT ); Fri, 14 Jun 2019 12:36:45 -0400 Received: by mail-oi1-f196.google.com with SMTP id m206so2384209oib.12 for ; Fri, 14 Jun 2019 09:36:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EvAQexBEPL7nJ/QjTH44i3lQXyubE+6SE92rMt2TZnA=; b=xQp7xtj36pghq2SegeVzV1Nb6Jx+8KWpSM1bivXSZruRDpGMoWRWchUkI0VOEoENxP XrHss5QzoD3JdjLKm/7fhMkySRU7ZKwukK//Icf268cwemrp8asrCuCmjX4Icr7gPuyX Q4RK0eY8EPnZdU8cWgEV/iGcuJKS6Ht5Yewp3ICmMddmKbsxCODaQZXWIFRjevkOZqeQ p2CCFO0UtUlJPR13efKardTCBubi+djaxfZjklcwiNhGsemJilUwKnqh4jff1ZLVr89Y I+5zddT/p6HMYozKqgwPkpeoH94ISqHDOGgnAKs+m9EoB/jWQAxr2nSwWuUfIq+6gQ32 e8pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EvAQexBEPL7nJ/QjTH44i3lQXyubE+6SE92rMt2TZnA=; b=Spc18k/00WOCnGTF0KWexism88w/ZyeBzereUopByNJkeQZgHlbaOR8/9apTYLs1qk W1vnTOrDZpqWzVbDfvg5ogawxtowImpBOshHXDAby2f98iRbH26s7pcnAo/3yCM0Ac2L E6hWk9rYG+RWx67Ak8ICPwShacv7jFaOCq0lNGUIcQgYyg0fsavMkrmN10YXfsbXvREZ nOHLEzXP6lS0QWuVpByrFRuQqT1j2gnvwkVstygWwPV9c8GEo8kMWTExe/TkklBaCnIX RCN2w5C3E/9Ntd9yyt2HTEEMaufoS8WFGdH8uDpMD9lXNyqDum2n9mC/GFLNidGrooTG HT0A== X-Gm-Message-State: APjAAAVi37v7KXmL9A2uClcn5TTOT5zpnENtpyGcLLO7gSzGMUSz8qIi OS93DPnM8MkIraEXI6K7wxhCoG31rGWRhiKnsAVkBHjs X-Received: by 2002:aca:ec82:: with SMTP id k124mr2315505oih.73.1560530204171; Fri, 14 Jun 2019 09:36:44 -0700 (PDT) MIME-Version: 1.0 References: <1560366952-10660-1-git-send-email-cai@lca.pw> <1560376072.5154.6.camel@lca.pw> <87lfy4ilvj.fsf@linux.ibm.com> <20190614153535.GA9900@linux> <24fcb721-5d50-2c34-f44b-69281c8dd760@linux.ibm.com> In-Reply-To: <24fcb721-5d50-2c34-f44b-69281c8dd760@linux.ibm.com> From: Dan Williams Date: Fri, 14 Jun 2019 09:36:33 -0700 Message-ID: Subject: Re: [PATCH -next] mm/hotplug: skip bad PFNs from pfn_to_online_page() To: "Aneesh Kumar K.V" Cc: Oscar Salvador , Qian Cai , Andrew Morton , Linux MM , Linux Kernel Mailing List , jmoyer , linux-nvdimm Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 14, 2019 at 9:26 AM Aneesh Kumar K.V wrote: > > On 6/14/19 9:52 PM, Dan Williams wrote: > > On Fri, Jun 14, 2019 at 9:18 AM Aneesh Kumar K.V > > wrote: > >> > >> On 6/14/19 9:05 PM, Oscar Salvador wrote: > >>> On Fri, Jun 14, 2019 at 02:28:40PM +0530, Aneesh Kumar K.V wrote: > >>>> Can you check with this change on ppc64. I haven't reviewed this series yet. > >>>> I did limited testing with change . Before merging this I need to go > >>>> through the full series again. The vmemmap poplulate on ppc64 needs to > >>>> handle two translation mode (hash and radix). With respect to vmemap > >>>> hash doesn't setup a translation in the linux page table. Hence we need > >>>> to make sure we don't try to setup a mapping for a range which is > >>>> arleady convered by an existing mapping. > >>>> > >>>> diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c > >>>> index a4e17a979e45..15c342f0a543 100644 > >>>> --- a/arch/powerpc/mm/init_64.c > >>>> +++ b/arch/powerpc/mm/init_64.c > >>>> @@ -88,16 +88,23 @@ static unsigned long __meminit vmemmap_section_start(unsigned long page) > >>>> * which overlaps this vmemmap page is initialised then this page is > >>>> * initialised already. > >>>> */ > >>>> -static int __meminit vmemmap_populated(unsigned long start, int page_size) > >>>> +static bool __meminit vmemmap_populated(unsigned long start, int page_size) > >>>> { > >>>> unsigned long end = start + page_size; > >>>> start = (unsigned long)(pfn_to_page(vmemmap_section_start(start))); > >>>> > >>>> - for (; start < end; start += (PAGES_PER_SECTION * sizeof(struct page))) > >>>> - if (pfn_valid(page_to_pfn((struct page *)start))) > >>>> - return 1; > >>>> + for (; start < end; start += (PAGES_PER_SECTION * sizeof(struct page))) { > >>>> > >>>> - return 0; > >>>> + struct mem_section *ms; > >>>> + unsigned long pfn = page_to_pfn((struct page *)start); > >>>> + > >>>> + if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) > >>>> + return 0; > >>> > >>> I might be missing something, but is this right? > >>> Having a section_nr above NR_MEM_SECTIONS is invalid, but if we return 0 here, > >>> vmemmap_populate will go on and populate it. > >> > >> I should drop that completely. We should not hit that condition at all. > >> I will send a final patch once I go through the full patch series making > >> sure we are not breaking any ppc64 details. > >> > >> Wondering why we did the below > >> > >> #if defined(ARCH_SUBSECTION_SHIFT) > >> #define SUBSECTION_SHIFT (ARCH_SUBSECTION_SHIFT) > >> #elif defined(PMD_SHIFT) > >> #define SUBSECTION_SHIFT (PMD_SHIFT) > >> #else > >> /* > >> * Memory hotplug enabled platforms avoid this default because they > >> * either define ARCH_SUBSECTION_SHIFT, or PMD_SHIFT is a constant, but > >> * this is kept as a backstop to allow compilation on > >> * !ARCH_ENABLE_MEMORY_HOTPLUG archs. > >> */ > >> #define SUBSECTION_SHIFT 21 > >> #endif > >> > >> why not > >> > >> #if defined(ARCH_SUBSECTION_SHIFT) > >> #define SUBSECTION_SHIFT (ARCH_SUBSECTION_SHIFT) > >> #else > >> #define SUBSECTION_SHIFT SECTION_SHIFT > > That should be SECTION_SIZE_SHIFT > > >> #endif > >> > >> ie, if SUBSECTION is not supported by arch we have one sub-section per > >> section? > > > > A couple comments: > > > > The only reason ARCH_SUBSECTION_SHIFT exists is because PMD_SHIFT on > > PowerPC was a non-constant value. However, I'm planning to remove the > > distinction in the next rev of the patches. Jeff rightly points out > > that having a variable subsection size per arch will lead to > > situations where persistent memory namespaces are not portable across > > archs. So I plan to just make SUBSECTION_SHIFT 21 everywhere. > > > > > persistent memory namespaces are not portable across archs because they > have PAGE_SIZE dependency. We can fix that by reserving mem_map capacity assuming the smallest PAGE_SIZE across archs. > Then we have dependencies like the page size > with which we map the vmemmap area. How does that lead to cross-arch incompatibility? Even on a single arch the vmemmap area will be mapped with 2MB pages for 128MB aligned spans of pmem address space and 4K pages for subsections. > Why not let the arch > arch decide the SUBSECTION_SHIFT and default to one subsection per > section if arch is not enabled to work with subsection. Because that keeps the implementation from ever reaching a point where a namespace might be able to be moved from one arch to another. If we can squash these arch differences then we can have a common tool to initialize namespaces outside of the kernel. The one wrinkle is device-dax that wants to enforce the mapping size, but I think we can have a module-option compatibility override for that case for the admin to say "yes, I know this namespace is defined for 2MB x86 pages, but I want to force enable it with 64K pages on PowerPC"