Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3805555imm; Tue, 17 Jul 2018 10:33:35 -0700 (PDT) X-Google-Smtp-Source: AAOMgpf4o3b+23xz50UZ2MT5lN8O5GIWL0g1hGS1RdA+iruPOF1t+PplSrWtA9gEmMUDyxqyNgS/ X-Received: by 2002:a65:57c9:: with SMTP id q9-v6mr2504008pgr.128.1531848815320; Tue, 17 Jul 2018 10:33:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531848815; cv=none; d=google.com; s=arc-20160816; b=DATMW8DFf1d154cP6uA6Qq3WfcCg4Dg5nDgAHMWa261L63ZN+yXW/ENvruq5Fcs/fQ wfjb2lNoLVhuVlUHGCWekqVsI9DEH5TwuJDXyyqDPLVttn4eK9q9EobIhlmby7Kb+fV5 yyFHCaZMoqq0mKf60gao/0GwNBkMbwcflgqqqgteEqmA+nydabS8mYWoUPFeCg5Ie3f9 APqr/0xgUyX6izyVGdIFH6QY1isjZcVHGds0VmcLJOZT2FYZij195MYrrQiRkGHJLiib mE4Bqv6HkwTV0VdPo4Enffi9aCKy5wGkWk5bhXCAbsXJ4O6SaxI32k+KBZJWdcs6eODX LqMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=Ysh1gY8sUsIllb6gjGmMXZqHePjvlipDu2vSy/OTmRw=; b=RwW+ZpquDPsROrdlL10+ejTHyUdb7eerPIiS2qnV2Gd+nN3pax6K+1wFgH/7F7EDgp CyOAA563NrE0WoBg/6DuViWKAR4ltXpQFt4qJu4v+HburouUmgZBAomEGqV06hpdA2n8 HgM8kyfIpLDcufuVuK1FfxNSihv41bBuSu1xZdHyEqzWf+YOULkwoNZIozTpTLTmTXJN FCASA40hh5kRHrj2+wdot+/dpklwY41x3OfyHK44ssiiLXxeuPnaSphUhgQbASf3qPUY JmEYbogy11v8UZlkLnQxmmXP1iHpmnXRJH8qRtffe+cfZSq8YfEzQK92VR/kC4jFp4pd m7YQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=I8JFBfhf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n13-v6si1416601pfd.50.2018.07.17.10.33.19; Tue, 17 Jul 2018 10:33:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=I8JFBfhf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730000AbeGQSG0 (ORCPT + 99 others); Tue, 17 Jul 2018 14:06:26 -0400 Received: from mail-it0-f66.google.com ([209.85.214.66]:54767 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729713AbeGQSG0 (ORCPT ); Tue, 17 Jul 2018 14:06:26 -0400 Received: by mail-it0-f66.google.com with SMTP id s7-v6so267702itb.4 for ; Tue, 17 Jul 2018 10:32:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Ysh1gY8sUsIllb6gjGmMXZqHePjvlipDu2vSy/OTmRw=; b=I8JFBfhf5nVKBWAGjrOm8wtIV22SaT9MT4uR3bqOfbsXZnJGuK+81CryYTpqHbNHEW MkRX1O1DMNxoNQ45OYojp7tkW5Ih8T7FyxgUcHuAM9yAvw2fu9VrmiJJlIZuMqhFiLzP l23I8xKMlL3hQ6txGEvwqUpp0YMlmMYv0ucPq/UKwpeHvJ2L1iqFp8mJaWHrYVWD1up0 gkFRUhN5tVgtM1C4b9PnC4PJ0SJcmEGkBwccJqstbIy9lHkmqvDOZltfpfr2WjbIArS5 6pzw+OomwIn55/5e7NC8oiEF8SW+OdoUl/+OMazoeJPGnOlqT9WjwSRUiWSs0QqbDNAF 274w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Ysh1gY8sUsIllb6gjGmMXZqHePjvlipDu2vSy/OTmRw=; b=BLSW4NL0Hzyvat/O1Qgt4mBeiHWPqUV0VPHtXqOMS0DT83XKGXldC2fStHTE9WkR3x jE3RMhKC+ZCTL+iVhmVA3he1i4mFSpaxNHcRcMeRENDrJnIFvt9VxA6ZM5mp3cIC8qky WRNGnfDOOQEB2KYgGfbTBLLkr8Upcr5jOmlOadQpZb+nKl3C7ex7ItjSm5BBnUvdjYnJ YWmAi5vVzIcR1RQCboeQHehg8OGkqJuqScmORUMncMC9SrMhLxX2XwfUjYxlgegJB1bD XYcUpoEum2c8R4P5RYrnKfIifViJUlSJZVBvlo6erl3GzC3ZFz3tPuzKP8fUGprG0NVt 9AxQ== X-Gm-Message-State: AOUpUlFUrzV5WGaF3cVrIuiXBYlQsqhlnhT3GVZiakN+SD0MMHm5/rtK wtEntzpVYeDWrCrKLWeeBTVT8oFqNaffReiugHU= X-Received: by 2002:a24:3041:: with SMTP id q62-v6mr2354218itq.86.1531848764772; Tue, 17 Jul 2018 10:32:44 -0700 (PDT) MIME-Version: 1.0 References: <153176041838.12695.3365448145295112857.stgit@dwillia2-desk3.amr.corp.intel.com> <20180717155006.GL7193@dhcp22.suse.cz> In-Reply-To: <20180717155006.GL7193@dhcp22.suse.cz> From: Dan Williams Date: Tue, 17 Jul 2018 10:32:32 -0700 Message-ID: Subject: Re: [PATCH v2 00/14] mm: Asynchronous + multithreaded memmap init for ZONE_DEVICE To: Michal Hocko Cc: pasha.tatashin@oracle.com, dalias@libc.org, Jan Kara , Benjamin Herrenschmidt , Heiko Carstens , linux-mm , Paul Mackerras , "H. Peter Anvin" , Yoshinori Sato , "linux-nvdimm@lists.01.org" , "the arch/x86 maintainers" , Matthew Wilcox , daniel.m.jordan@oracle.com, Ingo Molnar , fenghua.yu@intel.com, Jerome Glisse , Thomas Gleixner , "Luck, Tony" , Linux Kernel Mailing List , Michael Ellerman , Martin Schwidefsky , Andrew Morton , Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 17, 2018 at 8:50 AM Michal Hocko wrote: > > On Tue 17-07-18 10:46:39, Pavel Tatashin wrote: > > > > Hi Dan, > > > > > > > > I am worried that this work adds another way to multi-thread struct > > > > page initialization without re-use of already existing method. The > > > > code is already a mess, and leads to bugs [1] because of the number of > > > > different memory layouts, architecture specific quirks, and different > > > > struct page initialization methods. > > > > > > Yes, the lamentations about the complexity of the memory hotplug code > > > are known. I didn't think this set made it irretrievably worse, but > > > I'm biased and otherwise certainly want to build consensus with other > > > mem-hotplug folks. > > > > > > > > > > > So, when DEFERRED_STRUCT_PAGE_INIT is used we initialize struct pages > > > > on demand until page_alloc_init_late() is called, and at that time we > > > > initialize all the rest of struct pages by calling: > > > > > > > > page_alloc_init_late() > > > > deferred_init_memmap() (a thread per node) > > > > deferred_init_pages() > > > > __init_single_page() > > > > > > > > This is because memmap_init_zone() is not multi-threaded. However, > > > > this work makes memmap_init_zone() multi-threaded. So, I think we > > > > should really be either be using deferred_init_memmap() here, or teach > > > > DEFERRED_STRUCT_PAGE_INIT to use new multi-threaded memmap_init_zone() > > > > but not both. > > > > > > I agree it would be good to look at unifying the 2 async > > > initialization approaches, however they have distinct constraints. All > > > of the ZONE_DEVICE memmap initialization work happens as a hotplug > > > event where the deferred_init_memmap() threads have already been torn > > > down. For the memory capacities where it takes minutes to initialize > > > the memmap it is painful to incur a global flush of all initialization > > > work. So, I think that a move to rework deferred_init_memmap() in > > > terms of memmap_init_async() is warranted because memmap_init_async() > > > avoids a global sync and supports the hotplug case. > > > > > > Unfortunately, the work to unite these 2 mechanisms is going to be > > > 4.20 material, at least for me, since I'm taking an extended leave, > > > and there is little time for me to get this in shape for 4.19. I > > > wouldn't be opposed to someone judiciously stealing from this set and > > > taking a shot at the integration, I likely will not get back to this > > > until September. > > > > Hi Dan, > > > > I do not want to hold your work, so if Michal or Andrew are OK with > > the general approach of teaching memmap_init_zone() to be async > > without re-using deferred_init_memmap() or without changing > > deferred_init_memmap() to use the new memmap_init_async() I will > > review your patches. > > Well, I would rather have a sane code base than rush anything in. I do > agree with Pavel that we the number of async methods we have right now > is really disturbing. Applying yet another one will put additional > maintenance burden on whoever comes next. I thought we only had the one async implementation presently, this makes it sound like we have more than one? Did I miss the other(s)? > Is there any reason that this work has to target the next merge window? > The changelog is not really specific about that. Same reason as any other change in this space, hardware availability continues to increase. These patches are a direct response to end user reports of unacceptable init latency with current kernels. > There no numbers or > anything that would make this sound as a high priority stuff. From the end of the cover letter: "With this change an 8 socket system was observed to initialize pmem namespaces in ~4 seconds whereas it was previously taking ~4 minutes." My plan if this is merged would be to come back and refactor it with the deferred_init_memmap() implementation, my plan if this is not merged would be to come back and refactor it with the deferred_init_memmap() implementation. In practical terms, 0day has noticed a couple minor build fixes are needed: https://lists.01.org/pipermail/kbuild-all/2018-July/050229.html https://lists.01.org/pipermail/kbuild-all/2018-July/050231.html ...and I'm going to be offline until September. I thought it best to post this before I go, and I'm open to someone else picking up this work to get in shape for merging per community feedback.