Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp276824img; Wed, 20 Mar 2019 20:13:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqz4zxxGNMTm9tyGMDCVel82AL+VeDovVk4S+JqiVCOWtPEo/EyrkPXmR01cm+jwsrDdiaTI X-Received: by 2002:a62:1187:: with SMTP id 7mr1179844pfr.119.1553138020405; Wed, 20 Mar 2019 20:13:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553138020; cv=none; d=google.com; s=arc-20160816; b=dLUjSZOU1nLmSxAXiV23YGhM1+OMbzJ0IlfD3ngj24Ml9uuqWjbxZIznM19jAT303L 7MRnn7G+0RY05ZhlHswSugJoGtuQ3mCKBI4DysKfcf+/ccM/W8ZO4ER7qs1kfWv2rI0P wfGhpOfX5hCvPNZmNFscIQW6NieMKGUOKiJkailpmWElWBQOQvbMEUUciR7+TZgDsiYG WB+41i2fq85nJcxZt/Km3FNaF26MnY+p5pxXTn+ZNnPrrm+8BuWNaL7fwnvwVQnpoUiZ n1QltUiOR6yaSTIVV1cS2dE/tTQuxTH2wqpCF/Ge/v4AGRIR/pt0kM90Ay//sHWc7r7b HEig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=hNRCxCN0CPAwwEtkaOPrHmrtVKrkPJRIrqfUj9WvgyM=; b=cnw5GL/8IfRywxr5mwBy9Cdl04qgrxPfwQgjUd04paRBgXhtGsH41FbfGs24UY9LAy gQHAfufy4yiXrtm2CoKE4wADHre6IbaFBJfPquZXllO24QUieB79+tqyOsvDJ0ChwDev nOMc/Dv92So1u90fi1ER8WJQqvmRLh+cRkLQuTnK2O3IN6daOOGJZ6cu6mHlyr0GwBDc aJzFrdi/F/rA51QBpOevEz+QRSFC9J/QGYD/OVDbi2qTlD2+AnInLAhcSCt/W0RH7nWk fV7B7gfeNvEFt1yjjIYEpOHPxeMdZ5iv0RRC7L+Gpv8kuoVqiHDQfy6uHr0waruXyP6V 4N5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=O8Sexf3a; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b5si3267335plx.64.2019.03.20.20.13.25; Wed, 20 Mar 2019 20:13:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=O8Sexf3a; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727707AbfCUDMr (ORCPT + 99 others); Wed, 20 Mar 2019 23:12:47 -0400 Received: from mail-oi1-f194.google.com ([209.85.167.194]:38359 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726914AbfCUDMr (ORCPT ); Wed, 20 Mar 2019 23:12:47 -0400 Received: by mail-oi1-f194.google.com with SMTP id w137so3611000oiw.5 for ; Wed, 20 Mar 2019 20:12:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=hNRCxCN0CPAwwEtkaOPrHmrtVKrkPJRIrqfUj9WvgyM=; b=O8Sexf3atM7lpmy9ftmXKY6Rq98VWw6kJJHp0jYrLWOxjKwD7nE0qEKAHLQw6tCuKX CM3Nvb9Wm0GwBB9S3f2lVuH2umS5O6KixZnJsQyPudc/BqA0kVTOWcwRgBy1XKC0eBiR 7gsbJROskxVR1QLdM0WLyxbmhIJNcahTso9WIpLFOAEnIk9FmrLxfXAUzdcWujrcRaks NApVOGxVkBqnNBpIXs+AdbmuKY/PCohhzkgNnFYt893m1QMRVT0cf+FGjN/q9omQ7D0c lRSZuUG8JgbCDVde+5KpKNZ/v/wMiSE5W4hsCzH/hGCaCBNP6w/5MQyi1tNi/IJipRVH ysDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hNRCxCN0CPAwwEtkaOPrHmrtVKrkPJRIrqfUj9WvgyM=; b=A/VpLRizBjnadArjHDNMGIrrSx2eZEQrUFNbXrfet1yYBJRu7jXvjks9UsyKNIYW5b EGun4OmMGkjdpMRNhg9I/qjSWilAkjEZ686fFR3k+0lvaXRutg7kYGQD7FtPDGE1TU6d xInEc3TJH2UihGMLcFH3dzIrcjYa22B6o5Dsbzmx9glnzKpJ+3sWLoeU5mBsgvvCfaia Rq85HiomwfhQJtl6U3/4efvd23L36FBSNd7UrAYKkYEY3sLs4q/4+cLrB+WGScRTz16m urNpHCIGjAZvzsLxQ6ywwBENzOd5r0qTGtbbe0l2IwlTyKbvL8HLLipuZvIqtKfOUzut vedA== X-Gm-Message-State: APjAAAWr0I3tZ0ymt2/ifhS05MJGDHTE1AkLvMH7ovEc3JhjYsSTwJYZ yqoSwz3YAoFVGHEBbGYumfBEgcJw/1ffIXCDE4Wy1Q== X-Received: by 2002:aca:ed88:: with SMTP id l130mr798985oih.70.1553137965862; Wed, 20 Mar 2019 20:12:45 -0700 (PDT) MIME-Version: 1.0 References: <20190228083522.8189-1-aneesh.kumar@linux.ibm.com> <20190228083522.8189-2-aneesh.kumar@linux.ibm.com> <87k1hc8iqa.fsf@linux.ibm.com> <871s3aqfup.fsf@linux.ibm.com> <87bm267ywc.fsf@linux.ibm.com> <878sxa7ys5.fsf@linux.ibm.com> In-Reply-To: From: Dan Williams Date: Wed, 20 Mar 2019 20:12:34 -0700 Message-ID: Subject: Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default To: Oliver Cc: "Aneesh Kumar K.V" , Jan Kara , linux-nvdimm , Michael Ellerman , Linux Kernel Mailing List , Linux MM , Ross Zwisler , Andrew Morton , linuxppc-dev , "Kirill A . Shutemov" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 20, 2019 at 8:09 PM Oliver wrote: > > On Thu, Mar 21, 2019 at 7:57 AM Dan Williams wrote: > > > > On Wed, Mar 20, 2019 at 8:34 AM Dan Williams wrote: > > > > > > On Wed, Mar 20, 2019 at 1:09 AM Aneesh Kumar K.V > > > wrote: > > > > > > > > Aneesh Kumar K.V writes: > > > > > > > > > Dan Williams writes: > > > > > > > > > >> > > > > >>> Now what will be page size used for mapping vmemmap? > > > > >> > > > > >> That's up to the architecture's vmemmap_populate() implementation. > > > > >> > > > > >>> Architectures > > > > >>> possibly will use PMD_SIZE mapping if supported for vmemmap. Now a > > > > >>> device-dax with struct page in the device will have pfn reserve area aligned > > > > >>> to PAGE_SIZE with the above example? We can't map that using > > > > >>> PMD_SIZE page size? > > > > >> > > > > >> IIUC, that's a different alignment. Currently that's handled by > > > > >> padding the reservation area up to a section (128MB on x86) boundary, > > > > >> but I'm working on patches to allow sub-section sized ranges to be > > > > >> mapped. > > > > > > > > > > I am missing something w.r.t code. The below code align that using nd_pfn->align > > > > > > > > > > if (nd_pfn->mode == PFN_MODE_PMEM) { > > > > > unsigned long memmap_size; > > > > > > > > > > /* > > > > > * vmemmap_populate_hugepages() allocates the memmap array in > > > > > * HPAGE_SIZE chunks. > > > > > */ > > > > > memmap_size = ALIGN(64 * npfns, HPAGE_SIZE); > > > > > offset = ALIGN(start + SZ_8K + memmap_size + dax_label_reserve, > > > > > nd_pfn->align) - start; > > > > > } > > > > > > > > > > IIUC that is finding the offset where to put vmemmap start. And that has > > > > > to be aligned to the page size with which we may end up mapping vmemmap > > > > > area right? > > > > > > Right, that's the physical offset of where the vmemmap ends, and the > > > memory to be mapped begins. > > > > > > > > Yes we find the npfns by aligning up using PAGES_PER_SECTION. But that > > > > > is to compute howmany pfns we should map for this pfn dev right? > > > > > > > > > > > > > Also i guess those 4K assumptions there is wrong? > > > > > > Yes, I think to support non-4K-PAGE_SIZE systems the 'pfn' metadata > > > needs to be revved and the PAGE_SIZE needs to be recorded in the > > > info-block. > > > > How often does a system change page-size. Is it fixed or do > > environment change it from one boot to the next? I'm thinking through > > the behavior of what do when the recorded PAGE_SIZE in the info-block > > does not match the current system page size. The simplest option is to > > just fail the device and require it to be reconfigured. Is that > > acceptable? > > The kernel page size is set at build time and as far as I know every > distro configures their ppc64(le) kernel for 64K. I've used 4K kernels > a few times in the past to debug PAGE_SIZE dependent problems, but I'd > be surprised if anyone is using 4K in production. Ah, ok. > Anyway, my view is that using 4K here isn't really a problem since > it's just the accounting unit of the pfn superblock format. The kernel > reading form it should understand that and scale it to whatever > accounting unit it wants to use internally. Currently we don't so that > should probably be fixed, but that doesn't seem to cause any real > issues. As far as I can tell the only user of npfns in > __nvdimm_setup_pfn() whih prints the "number of pfns truncated" > message. > > Am I missing something? No, I don't think so. The only time it would break is if a system with 64K page size laid down an info-block with not enough reserved capacity when the page-size is 4K (npfns too small). However, that sounds like an exceptional case which is why no problems have been reported to date.