Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp797781imm; Wed, 17 Oct 2018 08:28:26 -0700 (PDT) X-Google-Smtp-Source: ACcGV61gRCF82cKECUu2BYoUu9O9dTkBLCl44fDi8ml8Nf8JMsndpzu0t2b5LYTxLa+JrRyRTOiO X-Received: by 2002:a63:4f20:: with SMTP id d32-v6mr23342698pgb.231.1539790106915; Wed, 17 Oct 2018 08:28:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539790106; cv=none; d=google.com; s=arc-20160816; b=gR0Vxo7Xfsim63LkGLaJC1RlEpHqFKF8R/th4DL2+BDHGjOWiyy2TFu9sG8SkAr71J vpGa18ktGJzHIuhlgfYx10UH3H+0mIz4A/7mypyS53yaNcVuqv3SjJPvIifl6ZBGZoRn X3DBgQtL+nTPSOwumzdoIuBwCYRa28apkWA5Sz3L1WiSpW/CShZB4eSQ70qGOjUTRvCR LpfnLtZqQDbyy2Y+16GozVQUkYT4Q9aVchfvF0Cdpb6c3L18G/PEh9HSdL/CbHa0pfHQ R+26/I8MCt6zm72tT5IvUaPYty4pZ3BO8P75ZVTRlwIIFdy1BfhqToF0lByUmvPUxgDS NzGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=St2CS0XzMz7KlYn/YsPqu9rFJ2fcMKHP01Rq6ygxGJI=; b=oZH6NEnF+H0ii/1KgjHFa8w3kST2630c0jO0mwD1VIxSrbjgKahKle6j7gBl0Ajx5l mX53PuPzJrX7DWPgAj85qtlq+qHFLm+HNOY3mHLd27OW1UhwI8P1uKg/KfKrmmu5dJWy ECWx5wXoVL55QadQOlGA4Ivad+atPENXxlwnQd0N5Q5gFSdGJfAVbnJY0AHfTHTyX8qD HwRMroJ/4XQf7+KCG61e/+6kNWyfBkr8GsyU4EqlmvYTjsRvMkOmm21uWTyHWjfutz+5 VE/Yk+U0zmJncgV+DBISAWE9lwXEx0XIkF8m8vWkoR366IQ+oWOPdt2RqV/+0720ZVYA 3n3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v13-v6si17398329plo.182.2018.10.17.08.28.11; Wed, 17 Oct 2018 08:28:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727404AbeJQXWe (ORCPT + 99 others); Wed, 17 Oct 2018 19:22:34 -0400 Received: from mga07.intel.com ([134.134.136.100]:17493 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727032AbeJQXWe (ORCPT ); Wed, 17 Oct 2018 19:22:34 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 17 Oct 2018 08:26:20 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,392,1534834800"; d="scan'208";a="83365417" Received: from ahduyck-mobl.amr.corp.intel.com (HELO [10.7.198.154]) ([10.7.198.154]) by orsmga006.jf.intel.com with ESMTP; 17 Oct 2018 08:26:20 -0700 Subject: Re: [mm PATCH v3 4/6] mm: Move hot-plug specific memory init into separate functions and optimize To: Michal Hocko Cc: linux-mm@kvack.org, akpm@linux-foundation.org, pavel.tatashin@microsoft.com, dave.jiang@intel.com, linux-kernel@vger.kernel.org, willy@infradead.org, davem@davemloft.net, yi.z.zhang@linux.intel.com, khalid.aziz@oracle.com, rppt@linux.vnet.ibm.com, vbabka@suse.cz, sparclinux@vger.kernel.org, dan.j.williams@intel.com, ldufour@linux.vnet.ibm.com, mgorman@techsingularity.net, mingo@kernel.org, kirill.shutemov@linux.intel.com References: <20181015202456.2171.88406.stgit@localhost.localdomain> <20181015202716.2171.7284.stgit@localhost.localdomain> <20181017091824.GL18839@dhcp22.suse.cz> From: Alexander Duyck Message-ID: Date: Wed, 17 Oct 2018 08:26:20 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181017091824.GL18839@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/17/2018 2:18 AM, Michal Hocko wrote: > On Mon 15-10-18 13:27:16, Alexander Duyck wrote: >> This patch is going through and combining the bits in memmap_init_zone and >> memmap_init_zone_device that are related to hotplug into a single function >> called __memmap_init_hotplug. >> >> I also took the opportunity to integrate __init_single_page's functionality >> into this function. In doing so I can get rid of some of the redundancy >> such as the LRU pointers versus the pgmap. > > This patch depends on [1] which I've had some concerns about. It adds > more code on top. I am still not convinced this is the right direction. > > [1] http://lkml.kernel.org/r/20180925202053.3576.66039.stgit@localhost.localdomain I wouldn't say this patch depends on it. The fact is I could easily make this patch work with the code as it was before that change. This was actually something I had started on first and then decided to move until later since I knew this was more invasive than the memmap_init_zone_device function was. With that said I am also wondering if a possible solution to the complaints you had would be to look at just exporting the __init_pageblock function later and moving the call to memmap_init_zone_device out to the memremap or hotplug code when Dan gets the refactoring for HMM and memremap all sorted out. >> >> Signed-off-by: Alexander Duyck >> --- >> mm/page_alloc.c | 232 ++++++++++++++++++++++++++++++++++++++----------------- >> 1 file changed, 159 insertions(+), 73 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 20e9eb35d75d..92375e7867ba 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1192,6 +1192,94 @@ static void __meminit __init_single_page(struct page *page, unsigned long pfn, >> #endif >> } >> >> +static void __meminit __init_pageblock(unsigned long start_pfn, >> + unsigned long nr_pages, >> + unsigned long zone, int nid, >> + struct dev_pagemap *pgmap, >> + bool is_reserved) >> +{ >> + unsigned long nr_pgmask = pageblock_nr_pages - 1; >> + struct page *start_page = pfn_to_page(start_pfn); >> + unsigned long pfn = start_pfn + nr_pages - 1; >> +#ifdef WANT_PAGE_VIRTUAL >> + bool is_highmem = is_highmem_idx(zone); >> +#endif >> + struct page *page; >> + >> + /* >> + * Enforce the following requirements: >> + * size > 0 >> + * size < pageblock_nr_pages >> + * start_pfn -> pfn does not cross pageblock_nr_pages boundary >> + */ >> + VM_BUG_ON(((start_pfn ^ pfn) | (nr_pages - 1)) > nr_pgmask); >> + >> + /* >> + * Work from highest page to lowest, this way we will still be >> + * warm in the cache when we call set_pageblock_migratetype >> + * below. >> + * >> + * The loop is based around the page pointer as the main index >> + * instead of the pfn because pfn is not used inside the loop if >> + * the section number is not in page flags and WANT_PAGE_VIRTUAL >> + * is not defined. >> + */ >> + for (page = start_page + nr_pages; page-- != start_page; pfn--) { >> + mm_zero_struct_page(page); >> + >> + /* >> + * We use the start_pfn instead of pfn in the set_page_links >> + * call because of the fact that the pfn number is used to >> + * get the section_nr and this function should not be >> + * spanning more than a single section. >> + */ >> + set_page_links(page, zone, nid, start_pfn); >> + init_page_count(page); >> + page_mapcount_reset(page); >> + page_cpupid_reset_last(page); >> + >> + /* >> + * We can use the non-atomic __set_bit operation for setting >> + * the flag as we are still initializing the pages. >> + */ >> + if (is_reserved) >> + __SetPageReserved(page); >> + >> + /* >> + * ZONE_DEVICE pages union ->lru with a ->pgmap back >> + * pointer and hmm_data. It is a bug if a ZONE_DEVICE >> + * page is ever freed or placed on a driver-private list. >> + */ >> + page->pgmap = pgmap; >> + if (!pgmap) >> + INIT_LIST_HEAD(&page->lru); >> + >> +#ifdef WANT_PAGE_VIRTUAL >> + /* The shift won't overflow because ZONE_NORMAL is below 4G. */ >> + if (!is_highmem) >> + set_page_address(page, __va(pfn << PAGE_SHIFT)); >> +#endif >> + } >> + >> + /* >> + * Mark the block movable so that blocks are reserved for >> + * movable at startup. This will force kernel allocations >> + * to reserve their blocks rather than leaking throughout >> + * the address space during boot when many long-lived >> + * kernel allocations are made. >> + * >> + * bitmap is created for zone's valid pfn range. but memmap >> + * can be created for invalid pages (for alignment) >> + * check here not to call set_pageblock_migratetype() against >> + * pfn out of zone. >> + * >> + * Please note that MEMMAP_HOTPLUG path doesn't clear memmap >> + * because this is done early in sparse_add_one_section >> + */ >> + if (!(start_pfn & nr_pgmask)) >> + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); >> +} >> + >> #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> static void __meminit init_reserved_page(unsigned long pfn) >> { >> @@ -5513,6 +5601,36 @@ void __ref build_all_zonelists(pg_data_t *pgdat) >> return false; >> } >> >> +static void __meminit __memmap_init_hotplug(unsigned long size, int nid, >> + unsigned long zone, >> + unsigned long start_pfn, >> + struct dev_pagemap *pgmap) >> +{ >> + unsigned long pfn = start_pfn + size; >> + >> + while (pfn != start_pfn) { >> + unsigned long stride = pfn; >> + >> + pfn = max(ALIGN_DOWN(pfn - 1, pageblock_nr_pages), start_pfn); >> + stride -= pfn; >> + >> + /* >> + * The last argument of __init_pageblock is a boolean >> + * value indicating if the page will be marked as reserved. >> + * >> + * Mark page reserved as it will need to wait for onlining >> + * phase for it to be fully associated with a zone. >> + * >> + * Under certain circumstances ZONE_DEVICE pages may not >> + * need to be marked as reserved, however there is still >> + * code that is depending on this being set for now. >> + */ >> + __init_pageblock(pfn, stride, zone, nid, pgmap, true); >> + >> + cond_resched(); >> + } >> +} >> + >> /* >> * Initially all pages are reserved - free ones are freed >> * up by memblock_free_all() once the early boot process is >> @@ -5523,51 +5641,61 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, >> struct vmem_altmap *altmap) >> { >> unsigned long pfn, end_pfn = start_pfn + size; >> - struct page *page; >> >> if (highest_memmap_pfn < end_pfn - 1) >> highest_memmap_pfn = end_pfn - 1; >> >> + if (context == MEMMAP_HOTPLUG) { >> #ifdef CONFIG_ZONE_DEVICE >> - /* >> - * Honor reservation requested by the driver for this ZONE_DEVICE >> - * memory. We limit the total number of pages to initialize to just >> - * those that might contain the memory mapping. We will defer the >> - * ZONE_DEVICE page initialization until after we have released >> - * the hotplug lock. >> - */ >> - if (zone == ZONE_DEVICE) { >> - if (!altmap) >> - return; >> + /* >> + * Honor reservation requested by the driver for this >> + * ZONE_DEVICE memory. We limit the total number of pages to >> + * initialize to just those that might contain the memory >> + * mapping. We will defer the ZONE_DEVICE page initialization >> + * until after we have released the hotplug lock. >> + */ >> + if (zone == ZONE_DEVICE) { >> + if (!altmap) >> + return; >> + >> + if (start_pfn == altmap->base_pfn) >> + start_pfn += altmap->reserve; >> + end_pfn = altmap->base_pfn + >> + vmem_altmap_offset(altmap); >> + } >> +#endif >> + /* >> + * For these ZONE_DEVICE pages we don't need to record the >> + * pgmap as they should represent only those pages used to >> + * store the memory map. The actual ZONE_DEVICE pages will >> + * be initialized later. >> + */ >> + __memmap_init_hotplug(end_pfn - start_pfn, nid, zone, >> + start_pfn, NULL); >> >> - if (start_pfn == altmap->base_pfn) >> - start_pfn += altmap->reserve; >> - end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); >> + return; >> } >> -#endif >> >> for (pfn = start_pfn; pfn < end_pfn; pfn++) { >> + struct page *page; >> + >> /* >> * There can be holes in boot-time mem_map[]s handed to this >> * function. They do not exist on hotplugged memory. >> */ >> - if (context == MEMMAP_EARLY) { >> - if (!early_pfn_valid(pfn)) { >> - pfn = next_valid_pfn(pfn) - 1; >> - continue; >> - } >> - if (!early_pfn_in_nid(pfn, nid)) >> - continue; >> - if (overlap_memmap_init(zone, &pfn)) >> - continue; >> - if (defer_init(nid, pfn, end_pfn)) >> - break; >> + if (!early_pfn_valid(pfn)) { >> + pfn = next_valid_pfn(pfn) - 1; >> + continue; >> } >> + if (!early_pfn_in_nid(pfn, nid)) >> + continue; >> + if (overlap_memmap_init(zone, &pfn)) >> + continue; >> + if (defer_init(nid, pfn, end_pfn)) >> + break; >> >> page = pfn_to_page(pfn); >> __init_single_page(page, pfn, zone, nid); >> - if (context == MEMMAP_HOTPLUG) >> - __SetPageReserved(page); >> >> /* >> * Mark the block movable so that blocks are reserved for >> @@ -5594,14 +5722,12 @@ void __ref memmap_init_zone_device(struct zone *zone, >> unsigned long size, >> struct dev_pagemap *pgmap) >> { >> - unsigned long pfn, end_pfn = start_pfn + size; >> struct pglist_data *pgdat = zone->zone_pgdat; >> unsigned long zone_idx = zone_idx(zone); >> unsigned long start = jiffies; >> int nid = pgdat->node_id; >> >> - if (WARN_ON_ONCE(!pgmap || !is_dev_zone(zone))) >> - return; >> + VM_BUG_ON(!is_dev_zone(zone)); >> >> /* >> * The call to memmap_init_zone should have already taken care >> @@ -5610,53 +5736,13 @@ void __ref memmap_init_zone_device(struct zone *zone, >> */ >> if (pgmap->altmap_valid) { >> struct vmem_altmap *altmap = &pgmap->altmap; >> + unsigned long end_pfn = start_pfn + size; >> >> start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); >> size = end_pfn - start_pfn; >> } >> >> - for (pfn = start_pfn; pfn < end_pfn; pfn++) { >> - struct page *page = pfn_to_page(pfn); >> - >> - __init_single_page(page, pfn, zone_idx, nid); >> - >> - /* >> - * Mark page reserved as it will need to wait for onlining >> - * phase for it to be fully associated with a zone. >> - * >> - * We can use the non-atomic __set_bit operation for setting >> - * the flag as we are still initializing the pages. >> - */ >> - __SetPageReserved(page); >> - >> - /* >> - * ZONE_DEVICE pages union ->lru with a ->pgmap back >> - * pointer and hmm_data. It is a bug if a ZONE_DEVICE >> - * page is ever freed or placed on a driver-private list. >> - */ >> - page->pgmap = pgmap; >> - page->hmm_data = 0; >> - >> - /* >> - * Mark the block movable so that blocks are reserved for >> - * movable at startup. This will force kernel allocations >> - * to reserve their blocks rather than leaking throughout >> - * the address space during boot when many long-lived >> - * kernel allocations are made. >> - * >> - * bitmap is created for zone's valid pfn range. but memmap >> - * can be created for invalid pages (for alignment) >> - * check here not to call set_pageblock_migratetype() against >> - * pfn out of zone. >> - * >> - * Please note that MEMMAP_HOTPLUG path doesn't clear memmap >> - * because this is done early in sparse_add_one_section >> - */ >> - if (!(pfn & (pageblock_nr_pages - 1))) { >> - set_pageblock_migratetype(page, MIGRATE_MOVABLE); >> - cond_resched(); >> - } >> - } >> + __memmap_init_hotplug(size, nid, zone_idx, start_pfn, pgmap); >> >> pr_info("%s initialised, %lu pages in %ums\n", dev_name(pgmap->dev), >> size, jiffies_to_msecs(jiffies - start)); >> >