Received: by 2002:ac0:a591:0:0:0:0:0 with SMTP id m17-v6csp293835imm; Thu, 5 Jul 2018 00:00:03 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdwVsQaCi//5jrv1yOExa6TxdJIlkSKIfay9UYw2WxIUlAV0SDHOIsr+Vtpdzuqy68zJ//L X-Received: by 2002:a65:48cd:: with SMTP id o13-v6mr4282406pgs.99.1530774002980; Thu, 05 Jul 2018 00:00:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530774002; cv=none; d=google.com; s=arc-20160816; b=YR4sivCL9hO07VF59KjWdWIBLD1IxaLoYMM94TfrEQZlrcNaaHu4xtk1kCXdtJ1pPD Nii4d1iad8Ef3IRn7isVJ6nIbiO9AmCWn1pmuy6BNsDIaWWTwc45ENssw+tIoSxGB2mj I03qzbPgKmQ2MIvuJDtjRaQ7yMqdbRIdHKbX8SEBzDDIzXD7Vbs0EQpGa4bVeodvt3v4 BBDbhe6/Sowu0+YOjM/d8AUwL5fIFRf4YZxYXSNbKBiSpFFYebpJ5FP8mvbpjRlJvgQl QGm2Xi7GWlKHw6hdxFE0Ajy/yk44AsCDjbk8Gpkc0NYekLZr62Rzyy+X0hm37LkddRwy GH4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:message-id:date:cc:to:from:subject :arc-authentication-results; bh=zRH+tNztE8Qii9Khej5F5wLUOfbsY7/9Rv5k6AYAsGo=; b=jWovZdY7VqDb3hgu4WIayBuw8lWMopepbAhUcB2MuMW6YR/7hLNKfGnTPX0ecIiiN/ eOgWjzbiJ/PWk4gLhKMbUEeGEuGxJhNb3L00Y5u0AacDcme36exfAninF6KdWXovVTv+ 7N2k5/loYXBEYVkWhGWjUHrqblYlRYXDARvKg1Fm2HYsvgDWr0tD5pcb82dUKrMIweyY zJqAcL5+5X7x5joZyxPJXgZk4XbMfqE690mOKcKMRqwwqSkk1qjrju4Cep+U8kwhPHLF ORdk3t4dS7Kt3aF8MTq82You0Udf7Fl5OhkT5cbX+cZ1+OHN1Vf39ihYyIZzf82pW2ys Za9w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b18-v6si5126731pls.292.2018.07.04.23.59.48; Thu, 05 Jul 2018 00:00:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753151AbeGEG7B (ORCPT + 99 others); Thu, 5 Jul 2018 02:59:01 -0400 Received: from mga11.intel.com ([192.55.52.93]:28737 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752681AbeGEG7A (ORCPT ); Thu, 5 Jul 2018 02:59:00 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Jul 2018 23:58:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,311,1526367600"; d="scan'208";a="54560165" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by orsmga008.jf.intel.com with ESMTP; 04 Jul 2018 23:58:59 -0700 Subject: [PATCH 00/13] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE From: Dan Williams To: akpm@linux-foundation.org Cc: Tony Luck , Huaisheng Ye , Vishal Verma , Jan Kara , Dave Jiang , "H. Peter Anvin" , Thomas Gleixner , Rich Felker , Fenghua Yu , Yoshinori Sato , Benjamin Herrenschmidt , Michal Hocko , Paul Mackerras , Christoph Hellwig , =?utf-8?b?SsOpcsO0bWU=?= Glisse , Ingo Molnar , Johannes Thumshirn , Michael Ellerman , Heiko Carstens , x86@kernel.org, Logan Gunthorpe , Ross Zwisler , Jeff Moyer , Vlastimil Babka , Martin Schwidefsky , linux-nvdimm@lists.01.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Date: Wed, 04 Jul 2018 23:49:02 -0700 Message-ID: <153077334130.40830.2714147692560185329.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In order to keep pfn_to_page() a simple offset calculation the 'struct page' memmap needs to be mapped and initialized in advance of any usage of a page. This poses a problem for large memory systems as it delays full availability of memory resources for 10s to 100s of seconds. For typical 'System RAM' the problem is mitigated by the fact that large memory allocations tend to happen after the kernel has fully initialized and userspace services / applications are launched. A small amount, 2GB of memory, is initialized up front. The remainder is initialized in the background and freed to the page allocator over time. Unfortunately, that scheme is not directly reusable for persistent memory and dax because userspace has visibility to the entire resource pool and can choose to access any offset directly at its choosing. In other words there is no allocator indirection where the kernel can satisfy requests with arbitrary pages as they become initialized. That said, we can approximate the optimization by performing the initialization in the background, allow the kernel to fully boot the platform, start up pmem block devices, mount filesystems in dax mode, and only incur the delay at the first userspace dax fault. With this change an 8 socket system was observed to initialize pmem namespaces in ~4 seconds whereas it was previously taking ~4 minutes. These patches apply on top of the HMM + devm_memremap_pages() reworks [1]. Andrew, once the reviews come back, please consider this series for -mm as well. [1]: https://lkml.org/lkml/2018/6/19/108 --- Dan Williams (9): mm: Plumb dev_pagemap instead of vmem_altmap to memmap_init_zone() mm: Enable asynchronous __add_pages() and vmemmap_populate_hugepages() mm: Teach memmap_init_zone() to initialize ZONE_DEVICE pages mm: Multithread ZONE_DEVICE initialization mm: Allow an external agent to wait for memmap initialization filesystem-dax: Make mount time pfn validation a debug check libnvdimm, pmem: Initialize the memmap in the background device-dax: Initialize the memmap in the background libnvdimm, namespace: Publish page structure init state / control Huaisheng Ye (4): nvdimm/pmem: check the validity of the pointer pfn nvdimm/pmem-dax: check the validity of the pointer pfn s390/block/dcssblk: check the validity of the pointer pfn fs/dax: Assign NULL to pfn of dax_direct_access if useless arch/ia64/mm/init.c | 5 + arch/powerpc/mm/mem.c | 5 + arch/s390/mm/init.c | 8 + arch/sh/mm/init.c | 5 + arch/x86/mm/init_32.c | 8 + arch/x86/mm/init_64.c | 27 +++-- drivers/dax/Kconfig | 10 ++ drivers/dax/dax-private.h | 2 drivers/dax/device-dax.h | 2 drivers/dax/device.c | 16 +++ drivers/dax/pmem.c | 5 + drivers/dax/super.c | 64 +++++++----- drivers/nvdimm/nd.h | 2 drivers/nvdimm/pfn_devs.c | 54 ++++++++-- drivers/nvdimm/pmem.c | 17 ++- drivers/nvdimm/pmem.h | 1 drivers/s390/block/dcssblk.c | 5 + fs/dax.c | 10 +- include/linux/memmap_async.h | 55 ++++++++++ include/linux/memory_hotplug.h | 18 ++- include/linux/memremap.h | 31 ++++++ include/linux/mm.h | 8 + kernel/memremap.c | 85 ++++++++------- mm/memory_hotplug.c | 73 ++++++++++--- mm/page_alloc.c | 215 +++++++++++++++++++++++++++++++++------ mm/sparse-vmemmap.c | 56 ++++++++-- tools/testing/nvdimm/pmem-dax.c | 11 ++ 27 files changed, 610 insertions(+), 188 deletions(-) create mode 100644 include/linux/memmap_async.h