Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp5137809imu; Tue, 13 Nov 2018 01:36:55 -0800 (PST) X-Google-Smtp-Source: AJdET5fq6n942Q/S5EYiRxBPstSHYDabzP8FkUuWyOe6oF7NW3MxRhvwmmsDTOuB98Adn9SE/Ov+ X-Received: by 2002:a17:902:6f16:: with SMTP id w22-v6mr4406586plk.235.1542101815896; Tue, 13 Nov 2018 01:36:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542101815; cv=none; d=google.com; s=arc-20160816; b=cbfdnf3p0izmNhDc9o3LXbClfHQNV3wEmozhxABd5wVuCzvoNf2vG8uYAg2qEqpZc+ WRkkSX3BpEdtjzGh19khDXYkh1JHdMptk67ZcvlBCJ0DnDW5w7uw54RkHru9uMZxAama z0EYcOPWPZmR5LcGBbhlyqPeRwcgnpQkiVS/+opvBgOX8PPrKo40GsfWE5mj9ZJ3YMJ0 EUTa4AbUt4Yf1rvRVPUW80HXFWa1c0sXY6hgxZsFSP4urwmexdOPwayRnDlvAwy9Gy+x haxXWsYO31sQgI9KbGVqWssq/oT26xfHRXa7htnFJoPtLZqzTIw8/YUmIujMHhWJfH9x vTFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:autocrypt:openpgp:from:references:cc:to :subject; bh=mbnYBdJg+sbBHGoBwvw5Uqg006gPy/Tz6vMNeYpyYuo=; b=AdXcSAYpslSeG87DDfHtDDiZqO+l2l6vs2ep0Qu+VG2cG9P+Fz0sCx5+5E6n5beSPn 8DxDAlEloOx5jcK+oGhCt/nj+TKhtB97aEVm8NwnqlNVpMUsBm/BR1JDssQBlV8+W4BV 4zB+4jiqbY6stASLtrnVVfK9hZWex8c+BBW1LHoN4jap4PmKNVWVNX+wxTT59HWSUn6X 9TZjNBG7luD5DsyVRDBRh9lpuv0YA8LN0XuQ0XaEfWd/w+dBkXs+8RTrD0PHDzo5ah4o nOwSWvn2NwLA1Oio0XHThL+lWr3tL06iAhuCgaMbkmqgHbvh+rb7pLNnz5nRv74AiEhZ g+jg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r26-v6si19964708pgb.372.2018.11.13.01.36.39; Tue, 13 Nov 2018 01:36:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731580AbeKMTdd (ORCPT + 99 others); Tue, 13 Nov 2018 14:33:33 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41638 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727154AbeKMTdc (ORCPT ); Tue, 13 Nov 2018 14:33:32 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4791076207; Tue, 13 Nov 2018 09:36:17 +0000 (UTC) Received: from [10.36.118.33] (unknown [10.36.118.33]) by smtp.corp.redhat.com (Postfix) with ESMTP id F0ACE19744; Tue, 13 Nov 2018 09:36:13 +0000 (UTC) Subject: Re: [PATCH 2/2] kvm: Use huge pages for DAX-backed files To: Barret Rhoden , Dan Williams , Dave Jiang , Ross Zwisler , Vishal Verma , Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Thomas Gleixner , Ingo Molnar , Borislav Petkov Cc: linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, "H. Peter Anvin" , x86@kernel.org, kvm@vger.kernel.org, yu.c.zhang@intel.com, yi.z.zhang@intel.com References: <20181109203921.178363-1-brho@google.com> <20181109203921.178363-3-brho@google.com> From: David Hildenbrand Openpgp: preference=signencrypt Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwX4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+zsFNBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABwsFl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: <043a592d-6592-3053-15a0-68cc54a26deb@redhat.com> Date: Tue, 13 Nov 2018 10:36:13 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <20181109203921.178363-3-brho@google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 13 Nov 2018 09:36:17 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09.11.18 21:39, Barret Rhoden wrote: > This change allows KVM to map DAX-backed files made of huge pages with > huge mappings in the EPT/TDP. > > DAX pages are not PageTransCompound. The existing check is trying to > determine if the mapping for the pfn is a huge mapping or not. For > non-DAX maps, e.g. hugetlbfs, that means checking PageTransCompound. > For DAX, we can check the page table itself. > > Note that KVM already faulted in the page (or huge page) in the host's > page table, and we hold the KVM mmu spinlock (grabbed before checking > the mmu seq). I wonder if the KVM mmu spinlock is enough for walking (not KVM exclusive) host page tables. Can you elaborate? > > Signed-off-by: Barret Rhoden > --- > arch/x86/kvm/mmu.c | 34 ++++++++++++++++++++++++++++++++-- > 1 file changed, 32 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index cf5f572f2305..2df8c459dc6a 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -3152,6 +3152,36 @@ static int kvm_handle_bad_page(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn) > return -EFAULT; > } > > +static bool pfn_is_huge_mapped(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn) > +{ > + struct page *page = pfn_to_page(pfn); > + unsigned long hva, map_shift; > + > + if (!is_zone_device_page(page)) > + return PageTransCompoundMap(page); > + > + /* > + * DAX pages do not use compound pages. The page should have already > + * been mapped into the host-side page table during try_async_pf(), so > + * we can check the page tables directly. > + */ > + hva = gfn_to_hva(kvm, gfn); > + if (kvm_is_error_hva(hva)) > + return false; > + > + /* > + * Our caller grabbed the KVM mmu_lock with a successful > + * mmu_notifier_retry, so we're safe to walk the page table. > + */ > + map_shift = dev_pagemap_mapping_shift(hva, current->mm); You could get rid of that local variable map_shift. > + switch (map_shift) { > + case PMD_SHIFT: > + case PUD_SIZE: > + return true; > + } > + return false; > +} > + > static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, > gfn_t *gfnp, kvm_pfn_t *pfnp, > int *levelp) > @@ -3168,7 +3198,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, > */ > if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && > level == PT_PAGE_TABLE_LEVEL && > - PageTransCompoundMap(pfn_to_page(pfn)) && > + pfn_is_huge_mapped(vcpu->kvm, gfn, pfn) && > !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) { > unsigned long mask; > /* > @@ -5678,7 +5708,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, > */ > if (sp->role.direct && > !kvm_is_reserved_pfn(pfn) && > - PageTransCompoundMap(pfn_to_page(pfn))) { > + pfn_is_huge_mapped(kvm, sp->gfn, pfn)) { > pte_list_remove(rmap_head, sptep); > need_tlb_flush = 1; > goto restart; > This looks surprisingly simple to me :) -- Thanks, David / dhildenb