Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp748694imm; Mon, 9 Jul 2018 09:55:22 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeARswm2GnxNZZboXrF0nqsPjF1qwhtXxtdKA5ObjOJM5D21I7uKCUpxYzV26IPJMjNSvPk X-Received: by 2002:a63:7b4d:: with SMTP id k13-v6mr19439125pgn.64.1531155322662; Mon, 09 Jul 2018 09:55:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531155322; cv=none; d=google.com; s=arc-20160816; b=Pve0p1veblBpDFTomVzVjz/Ose+fXZC++Ky/0ZxfV+hJTwDC7804McIyvvlJBmqRxN VBsghb60z/UzkLPHz0wcE/3ZKVDzSMCC77mEb5PcdeqtWpKOtA0qD+fitwc9vV/iLuDE tsYFUP8la+yntH3qaWHq1DUcqmsGCzmYhXE7olsX/JqouunKiAAvLLxeWv8pTh8rwCAt ULOKh8FzRchj6A4+XMEmjT9MbJU7uMylrzFvWvZtsu5EWYmP/Vgt2Cr3SOCDE9qHpPZd y9MxUb6e0uTU4R/P1MvtVw7bzLriwVLFBD5e/BR7+xdSI/XuYaVCRmet/+l4thLjwCDH 4FLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=94MT9Qsi1hWCvG2S//dsBp69EFTRY6sPXCMC8bSQahg=; b=GlGeoenkqxdzFLfY4JwGh4zP1nXJGLCIg3RyUHN9ESRYB1Brb17P6bq+XOZ8gEJmMB fFQ4lDZOs2ifpTrOVTf2IBLfM2+54IwEDtIjKuoV/jkrVhfXNPwbwRVZAs9auPiXS95d Jh6xbv9agymDyU7L6yjl0Bd7RQCMHYCjSmcyiMNt89G+l8MmaU1BYJFd5ajxMdKkSNeG beH7LgnOkp1wwRPis3gprLS1ZEIrkqJg9e2uxjveWEXoti2Y9ifEy3+2jEjEuvg0i6r6 v1yt0LFjPf1h+kCHQ2xertNc57Z7/7vMbYwgmsSYEQJ1WzgnP7sP5EqSpRx7U5fR1qSD mFEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=cMD9c4v3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v21-v6si14462765pgn.371.2018.07.09.09.55.08; Mon, 09 Jul 2018 09:55:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=cMD9c4v3; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933475AbeGIQxm (ORCPT + 99 others); Mon, 9 Jul 2018 12:53:42 -0400 Received: from mail-oi0-f67.google.com ([209.85.218.67]:33764 "EHLO mail-oi0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932755AbeGIQxl (ORCPT ); Mon, 9 Jul 2018 12:53:41 -0400 Received: by mail-oi0-f67.google.com with SMTP id c6-v6so37197189oiy.0 for ; Mon, 09 Jul 2018 09:53:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=94MT9Qsi1hWCvG2S//dsBp69EFTRY6sPXCMC8bSQahg=; b=cMD9c4v3AiNwnkdyhSJ1mF77RxfrnujIOrS5xDlnMDv5atTzgkrVK91XTFVNRdKO4m rh23yLnslYGe/MZL+rotjpYIp/nH5eX4LNSnUUNYhkULen2fZv4wHY7C9Expg/ADsgfC sL8PLidkybqLocm+zC6xymNLxt0NyFsV27zaNWVBoc1beybHOHddkrtPtGzfvlYagR6N 5vQmH+2SS/nHTYGGmXSUIFH3hEZ12FHwXjKJf04OVjJaavIbcXa9CXvSKWXE1nuR7EH7 xjL3OpBezo8GoxKDEmEa2zWIR33BECYwOvk2XQgTzBQgZXRNOkWagJ7IxUHcTVZrHWGq GH3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=94MT9Qsi1hWCvG2S//dsBp69EFTRY6sPXCMC8bSQahg=; b=m9KawtmauvjTIyHXKoCcHyMKqH2D4FGfZduEkNjyCDE4aMlo4e1XtIhrpBUsGIMDqh PMzamkThlHPZGqgEXyjzd08FUNEU7F4pykQydFtVFrWuixmSPbCly7F1UW7NCZCU4UI0 FWVHGpXZpeRJguBDR2sphOxoRAZgVo/U771b0g11ggDdhULoRotdhf5LSQT04tqBhSCU GzuTt8Gup+1xbfr4OOv7gREUArSUM8+aSPDt3GqGMY5+4TftqoyhVLQAw3MFrDUXZ7JC /QamNWumQjEOWScujKNCQJ9hE+M5/Q6A3gSeVBcki66zHOE7aQ3zxlr0u1TX+7V1nBAz 9xLw== X-Gm-Message-State: APt69E2b6IPKUHHBlk1/YqgEQXKJV2dCATXr1LP8BaQVKCJ4BAjHHi42 h7oP+SrfesM71hcm6+EAgQSkVg4mwueEdXK8KZPAKg== X-Received: by 2002:aca:100f:: with SMTP id 15-v6mr25607452oiq.110.1531155220726; Mon, 09 Jul 2018 09:53:40 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:3495:0:0:0:0:0 with HTTP; Mon, 9 Jul 2018 09:53:40 -0700 (PDT) In-Reply-To: <20180709125641.xpoq66p4r7dzsgyj@quack2.suse.cz> References: <153077334130.40830.2714147692560185329.stgit@dwillia2-desk3.amr.corp.intel.com> <20180709125641.xpoq66p4r7dzsgyj@quack2.suse.cz> From: Dan Williams Date: Mon, 9 Jul 2018 09:53:40 -0700 Message-ID: Subject: Re: [PATCH 00/13] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE To: Jan Kara Cc: Andrew Morton , Tony Luck , Huaisheng Ye , Vishal Verma , Dave Jiang , "H. Peter Anvin" , Thomas Gleixner , Rich Felker , Fenghua Yu , Yoshinori Sato , Benjamin Herrenschmidt , Michal Hocko , Paul Mackerras , Christoph Hellwig , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Ingo Molnar , Johannes Thumshirn , Michael Ellerman , Heiko Carstens , X86 ML , Logan Gunthorpe , Ross Zwisler , Jeff Moyer , Vlastimil Babka , Martin Schwidefsky , linux-nvdimm , Linux MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 9, 2018 at 5:56 AM, Jan Kara wrote: > On Wed 04-07-18 23:49:02, Dan Williams wrote: >> In order to keep pfn_to_page() a simple offset calculation the 'struct >> page' memmap needs to be mapped and initialized in advance of any usage >> of a page. This poses a problem for large memory systems as it delays >> full availability of memory resources for 10s to 100s of seconds. >> >> For typical 'System RAM' the problem is mitigated by the fact that large >> memory allocations tend to happen after the kernel has fully initialized >> and userspace services / applications are launched. A small amount, 2GB >> of memory, is initialized up front. The remainder is initialized in the >> background and freed to the page allocator over time. >> >> Unfortunately, that scheme is not directly reusable for persistent >> memory and dax because userspace has visibility to the entire resource >> pool and can choose to access any offset directly at its choosing. In >> other words there is no allocator indirection where the kernel can >> satisfy requests with arbitrary pages as they become initialized. >> >> That said, we can approximate the optimization by performing the >> initialization in the background, allow the kernel to fully boot the >> platform, start up pmem block devices, mount filesystems in dax mode, >> and only incur the delay at the first userspace dax fault. >> >> With this change an 8 socket system was observed to initialize pmem >> namespaces in ~4 seconds whereas it was previously taking ~4 minutes. >> >> These patches apply on top of the HMM + devm_memremap_pages() reworks >> [1]. Andrew, once the reviews come back, please consider this series for >> -mm as well. >> >> [1]: https://lkml.org/lkml/2018/6/19/108 > > One question: Why not (in addition to background initialization) have > ->direct_access() initialize a block of struct pages around the pfn it > needs if it finds it's not initialized yet? That would make devices usable > immediately without waiting for init to complete... Hmm, yes, relatively immediately... it would depend on the granularity of the tracking where we can reliably steal initialization work from the background thread. I'll give it a shot, I'm thinking dividing each thread's work into 64 sub-units and track those units with a bitmap. The worst case init time then becomes the time to initialize the pages for a range that is namespace-size / (NR_MEMMAP_THREADS * 64).