Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1101971ybl; Thu, 12 Dec 2019 09:38:44 -0800 (PST) X-Google-Smtp-Source: APXvYqwVPYIzPoZY6Ix74bdPmK2JgN9DZgRvkHZPxneMAigUGTWYkCUUp8okozPCUcDdoJ2NU5Zo X-Received: by 2002:a9d:175:: with SMTP id 108mr9091215otu.325.1576172324581; Thu, 12 Dec 2019 09:38:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576172324; cv=none; d=google.com; s=arc-20160816; b=B+fIZE7rZWayqoXClmL41gY2Bvrm9wgoCRZLO31PqK6Y/mBhN32cYwynDtDvUBjXmU hxruZ7yk/8eiqHbzF2YvQqzzfkVCi/iFilFYQ/ZEhX9z1D3aDq6P35JrB24Qcj/oFk3f Jk7+NteXEyB9DYvCUNmBYe20iXgnH93MJL9Qvk1voGRDnvywzcFjIL34dpm8x8NlpMmC +uZ7nV5STRzw/+6l17j+MwhF+QLuoFqqmnes9bCP4ku2vom0yVBlSTDAr0EgJHrUdx6A S8PBCXvr/XqziFBqH4wNcinOuqCV8aJm2Mdoqi079i7MIJCX9GXw2XSRl8C77zbHEweR 7uMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=85/Q4O/XPGPhGcxW85SoqxJkDWVGa36c4g0JD0LKUdQ=; b=ZRtzxPAC/nig5qb6IFeu3t1udkBsChpDHrP0JNcXlI38lLyXKQAyq/KlVJHXoQIZAg fpAxmTTNwazL3Ei09Krhq3rHBCa5fWcb6Qweq0j1wocPeSa04JH4wm+0dkG2cVUoAgfJ IWS2PPs1yucbjeRBRTf30sdNrqP596BmOu0yizNpZodsClbG0uE3J+xXJP0WSwIErFvF fQCdtWkbRZlN5jZ/rtzQkx2chNxPpGNkBJxZgGkRV5OGL/aAYHzjy2VvflF2CrWqT/gh SjTuF1oKtek+9k+DRRdIkyxWvhQOKXKM90/W10ambcyFUTIRSw2FGSoKqp6AqxW6b7+S fwqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=2QiartKI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f53si3543666otf.235.2019.12.12.09.38.30; Thu, 12 Dec 2019 09:38:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=2QiartKI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730198AbfLLRhe (ORCPT + 99 others); Thu, 12 Dec 2019 12:37:34 -0500 Received: from mail-oi1-f194.google.com ([209.85.167.194]:45835 "EHLO mail-oi1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730125AbfLLRhe (ORCPT ); Thu, 12 Dec 2019 12:37:34 -0500 Received: by mail-oi1-f194.google.com with SMTP id v10so1061107oiv.12 for ; Thu, 12 Dec 2019 09:37:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=85/Q4O/XPGPhGcxW85SoqxJkDWVGa36c4g0JD0LKUdQ=; b=2QiartKIZdQPCbLz3nE7oYW8NvdklpGuw+X/svYmd9KVlu553fO1rCfkF6KUg/yadG HIQRj6DFW4Gnh2+XqLfpISpj0DQ0TmfshCBvuCHn/u9Kt0Ixx5njGPl26Y+KXqdC1kPU sQvB5a0wbVjTrf0FqZ7HgLinJ5CTZ3uLcXdTbINhtBaH/7Ej6RXaxxVrf9ifEDGylntD 4K9sJvTSvFu8WJK6x73GzLtnjQDvHmU1Z/nwpMDzHJiw5p0G4im6aR5UD7iNk84xDYkB eAw2it3p6Sux2nAsZnBOVAKMTff+JnLEjDlV2yW59y0lgwa0SSiRAxEz0VkkrTo10WHE PqCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=85/Q4O/XPGPhGcxW85SoqxJkDWVGa36c4g0JD0LKUdQ=; b=LSRWE0uEVgyARWwtTTCggzdQOx2wpPatqraAoPxTqAfUlOVJXe7A35uEyJ6zwqpDkZ f58RhwxJgwNyamGS/FwN89tKAUo7C6jIMkXbCcGrmVecheix4CnORxb0cHxerImbmyg5 hTR4HyWyFiOf4Cl2d4+fayp6LxlxcywP7b5roSmVmfZ2y6fE0+nznHBs1WCrvKkp+b9q gWKqsBeOfDAJHlovI0X2SxEso1jT22HO2ocW7JouNWsJPl71EuyTRekIYJA8xy8JNJjC 1CHtXVjWOXyO3Q41XM7rBIsY+9qvUbxxK+YH4/bTg/ornzNWgQ3X6cWsf58Uff3AKZFE RGBg== X-Gm-Message-State: APjAAAXTYvG3KbqaN2R6bZoJMCSy4FBaqRCUw5kOuy757IEz+5fR+0jh jjtgKYV4d6nj7zQwxyLK416ltkgKiXubG3zlayNr/Q== X-Received: by 2002:a05:6808:a83:: with SMTP id q3mr6051900oij.0.1576172253406; Thu, 12 Dec 2019 09:37:33 -0800 (PST) MIME-Version: 1.0 References: <20191211213207.215936-1-brho@google.com> <20191211213207.215936-3-brho@google.com> <20191212173413.GC3163@linux.intel.com> In-Reply-To: <20191212173413.GC3163@linux.intel.com> From: Dan Williams Date: Thu, 12 Dec 2019 09:37:22 -0800 Message-ID: Subject: Re: [PATCH v4 2/2] kvm: Use huge pages for DAX-backed files To: Sean Christopherson Cc: Barret Rhoden , Paolo Bonzini , David Hildenbrand , Dave Jiang , Alexander Duyck , linux-nvdimm , X86 ML , KVM list , Linux Kernel Mailing List , "Zeng, Jason" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Dec 12, 2019 at 9:34 AM Sean Christopherson wrote: > > On Wed, Dec 11, 2019 at 04:32:07PM -0500, Barret Rhoden wrote: > > This change allows KVM to map DAX-backed files made of huge pages with > > huge mappings in the EPT/TDP. > > > > DAX pages are not PageTransCompound. The existing check is trying to > > determine if the mapping for the pfn is a huge mapping or not. For > > non-DAX maps, e.g. hugetlbfs, that means checking PageTransCompound. > > For DAX, we can check the page table itself. > > > > Note that KVM already faulted in the page (or huge page) in the host's > > page table, and we hold the KVM mmu spinlock. We grabbed that lock in > > kvm_mmu_notifier_invalidate_range_end, before checking the mmu seq. > > > > Signed-off-by: Barret Rhoden > > --- > > arch/x86/kvm/mmu/mmu.c | 36 ++++++++++++++++++++++++++++++++---- > > 1 file changed, 32 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 6f92b40d798c..cd07bc4e595f 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -3384,6 +3384,35 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > > return -EFAULT; > > } > > > > +static bool pfn_is_huge_mapped(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn) > > +{ > > + struct page *page = pfn_to_page(pfn); > > + unsigned long hva; > > + > > + if (!is_zone_device_page(page)) > > + return PageTransCompoundMap(page); > > + > > + /* > > + * DAX pages do not use compound pages. The page should have already > > + * been mapped into the host-side page table during try_async_pf(), so > > + * we can check the page tables directly. > > + */ > > + hva = gfn_to_hva(kvm, gfn); > > + if (kvm_is_error_hva(hva)) > > + return false; > > + > > + /* > > + * Our caller grabbed the KVM mmu_lock with a successful > > + * mmu_notifier_retry, so we're safe to walk the page table. > > + */ > > + switch (dev_pagemap_mapping_shift(hva, current->mm)) { > > + case PMD_SHIFT: > > + case PUD_SIZE: > > I assume this means DAX can have 1GB pages? Correct, it can. Not in the filesystem-dax case, but device-dax supports 1GB pages. > I ask because KVM's THP logic > has historically relied on THP only supporting 2MB. I cleaned this up in > a recent series[*], which is in kvm/queue, but I obviously didn't actually > test whether or not KVM would correctly handle 1GB non-hugetlbfs pages. Yeah, since device-dax is the only path to support longterm page pinning for vfio device assignment, testing with device-dax + 1GB pages would be a useful sanity check.