Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp3495525imc; Wed, 13 Mar 2019 21:03:25 -0700 (PDT) X-Google-Smtp-Source: APXvYqyqqBr4izRTrXriqpy+fqbDzSN1FohSR7ri4Ce2lYzlTpbKTsKQBZ9oRFaWhDOgGH6/E7ps X-Received: by 2002:a17:902:e68e:: with SMTP id cn14mr48796002plb.67.1552536205473; Wed, 13 Mar 2019 21:03:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552536205; cv=none; d=google.com; s=arc-20160816; b=Iji3VoOVC9GPlHfeVxK2subFoBfT5vawOZogrYbrpvPv13fk6OB9jt2xJuupmKwbzB 9yy3PHNLSbGburRd9DZ3T/f4Pj8zJSdJzT7uObk/6NuDn/CFInHzKu7t4EHyg9eFugWy wYF20SF/6YfLdLDCyIRjbF84pVPlq7U5W9fI3IqNuYdbDVnY3rCVPCLOm0UmGSJdPatS TYG9Ics91dphBiggqBaZe4qLhG5OQek/EJgAMPg1eUbmaXtwVYo2TzMfINdME82m+usV 2nbYehfB6Cq3OymUlaCGAXQVPvX7VGd2SCIX2Cj1Htjvo0STJL9/3m/T5j2EqKszTvW7 JT9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=qs7tp1/zBqGtB73b9T3ZkeYLG/1DJ1WgPchIkfxvgMI=; b=Bo0K8n2icYQj9feUFz32aXw3AqOYGmp79XCTNXqk8UulEvf0xIOqrD+M6FUFOHRVAI jjCsXCb+GqZww0lxmWVf6quY9wA+IEBXbJqUtjyHPSy409Cb86DX4IC6ie3Tm0Zv/MHB i8XMaacNYhBn46Ynw1hZkD6B6wkuDkd5vBhbvwHb9NteiJjqRbSCNDx2Yf7gOBQseAJS HUWtEewxdjMMc6KgP7DCCTGxPvOaj7OJE5oXJD4tI7042Qms+FNq4CkC98bgciLYMBTr wSJ8ZGzPQfoZerZYwFwi6kiudijRflxhorWAw6HSz4Q4M7uRMSBqxU/LK3MIlfz9FDc9 pK0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=eU7vf8X9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n22si7124007plp.296.2019.03.13.21.02.55; Wed, 13 Mar 2019 21:03:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=eU7vf8X9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725956AbfCNECZ (ORCPT + 99 others); Thu, 14 Mar 2019 00:02:25 -0400 Received: from mail-ot1-f65.google.com ([209.85.210.65]:42485 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725784AbfCNECZ (ORCPT ); Thu, 14 Mar 2019 00:02:25 -0400 Received: by mail-ot1-f65.google.com with SMTP id i5so3874201oto.9 for ; Wed, 13 Mar 2019 21:02:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=qs7tp1/zBqGtB73b9T3ZkeYLG/1DJ1WgPchIkfxvgMI=; b=eU7vf8X9XCVloH4yFRSwPMJOtpSR/fMmFkniq4uHcPXAlIReWXJh0Z+q64meRuGP3M skI5X4A5JF/zw1rhA6o7JcFS6QQUIjOyw87HiIPJuTeV3ZFCWkuVzjiunjbGmnKIH8PP d0GUJS45X4BfE3LGcOkOgmFPbVoD1RlmOP/s4Iba+KMHCV7RQOG8tt5WRNcYCxFV8yGB uQgbxTR+iAgUImDt6CTAyz2+XDvw6vYu7YGLKi1XurFi/KZX9rHUjIia18cI51xDn6FC vI2KT31oSMg20EO444p+DGxfHVR1APWOjIF4q0elCkXZ/5bHEVX7dTtHnmLFxCnjmC/E 9rXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=qs7tp1/zBqGtB73b9T3ZkeYLG/1DJ1WgPchIkfxvgMI=; b=kXZYs7NwHXGEsFUeMT4shCx4H5WR/XG0bFME9VXTFD8Vyj6ysqtPmU8Yz/D1jHJEuw R6eLhfJgWPprKI+4dxkT+LOJqXmUwxpu4yQccRPOVG94tIKX16zFUfov/aIfj4FQ5Gxu Q/YsAjbnHNU9FjjefY+AMt2fPdXUfQlSUzTmBPgA9WHZ/VByz+z/rK77CovFLATKB7jR pZ6wiX7CIOx4oncuR4bnbJ7mfwKs7tzMUOhMUrSYFuXYkrH6vQAjE23vzyFG1WNbHRiH 40Zb5mgEk3d+kmkXLMiR2x5qK6KfWap4aWJtVGz8CnIWbvwd5Cd+13May21A7pr3Mexw m8Qw== X-Gm-Message-State: APjAAAULNz8c69y3nciEY1w34BuAk+jI35j0je369rd33suRgk79FF/Z M5SotKDmWAL4bhd/EvaFXQ0ekA9yczazhmmz7hVw3KsW X-Received: by 2002:a9d:77d1:: with SMTP id w17mr28800858otl.353.1552536143794; Wed, 13 Mar 2019 21:02:23 -0700 (PDT) MIME-Version: 1.0 References: <20190228083522.8189-1-aneesh.kumar@linux.ibm.com> <20190228083522.8189-2-aneesh.kumar@linux.ibm.com> <87k1hc8iqa.fsf@linux.ibm.com> <871s3aqfup.fsf@linux.ibm.com> In-Reply-To: <871s3aqfup.fsf@linux.ibm.com> From: Dan Williams Date: Wed, 13 Mar 2019 21:02:11 -0700 Message-ID: Subject: Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default To: "Aneesh Kumar K.V" Cc: Jan Kara , linux-nvdimm , Michael Ellerman , Linux Kernel Mailing List , Linux MM , Ross Zwisler , Andrew Morton , linuxppc-dev , "Kirill A . Shutemov" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 13, 2019 at 8:45 PM Aneesh Kumar K.V wrote: [..] > >> Now w.r.t to failures, can device-dax do an opportunistic huge page > >> usage? > > > > device-dax explicitly disclaims the ability to do opportunistic mappings. > > > >> I haven't looked at the device-dax details fully yet. Do we make the > >> assumption of the mapping page size as a format w.r.t device-dax? Is that > >> derived from nd_pfn->align value? > > > > Correct. > > > >> > >> Here is what I am working on: > >> 1) If the platform doesn't support huge page and if the device superblock > >> indicated that it was created with huge page support, we fail the device > >> init. > > > > Ok. > > > >> 2) Now if we are creating a new namespace without huge page support in > >> the platform, then we force the align details to PAGE_SIZE. In such a > >> configuration when handling dax fault even with THP enabled during > >> the build, we should not try to use hugepage. This I think we can > >> achieve by using TRANSPARENT_HUGEPAEG_DAX_FLAG. > > > > How is this dynamic property communicated to the guest? > > via device tree on powerpc. We have a device tree node indicating > supported page sizes. Ah, ok, yeah let's plumb that straight to the device-dax driver and leave out the interaction / interpretation of the thp-enabled flags. > > > > >> > >> Also even if the user decided to not use THP, by > >> echo "never" > transparent_hugepage/enabled , we should continue to map > >> dax fault using huge page on platforms that can support huge pages. > >> > >> This still doesn't cover the details of a device-dax created with > >> PAGE_SIZE align later booted with a kernel that can do hugepage dax.How > >> should we handle that? That makes me think, this should be a VMA flag > >> which got derived from device config? May be use VM_HUGEPAGE to indicate > >> if device should use a hugepage mapping or not? > > > > device-dax configured with PAGE_SIZE always gets PAGE_SIZE mappings. > > Now what will be page size used for mapping vmemmap? That's up to the architecture's vmemmap_populate() implementation. > Architectures > possibly will use PMD_SIZE mapping if supported for vmemmap. Now a > device-dax with struct page in the device will have pfn reserve area aligned > to PAGE_SIZE with the above example? We can't map that using > PMD_SIZE page size? IIUC, that's a different alignment. Currently that's handled by padding the reservation area up to a section (128MB on x86) boundary, but I'm working on patches to allow sub-section sized ranges to be mapped. Now, that said, I expect there may be bugs lurking in the implementation if PAGE_SIZE changes from one boot to the next simply because I've never tested that. I think this also indicates that the section padding logic can't be removed until all arch vmemmap_populate() implementations understand the sub-section case.