Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1244068ybl; Thu, 12 Dec 2019 11:57:07 -0800 (PST) X-Google-Smtp-Source: APXvYqysNNDAjityX3IefnI9eWnHinITCpiIq2DtGkv/24y6CFmB7YWc4Cl/1by8h3AetH57ZECH X-Received: by 2002:a9d:7357:: with SMTP id l23mr9605964otk.10.1576180627907; Thu, 12 Dec 2019 11:57:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576180627; cv=none; d=google.com; s=arc-20160816; b=MNw8Nj2fT++6292fJ1qmJWhWYNMNql4Km8AXDk5WatorBEManaIXaXJaKLdZ13Zizk GuUAzxrWZ8rLoFW0NvUo6LLYPjljoGIu4VsTzd2EvoWUtKpQGWg+DcZl4/kzVuN8aa4B hCoZGpIdz02HMSxuMkntM+Qu3kAVZfRe8RrgcHp93yT25NVxwhaNBgUUZ37Ske8KHN5Z KGAEZx/Y+zrk3vKp2cge7XN/li3a2hyjO0zF5eJB45zwOa0hKOmYIDtby8zk+8o/hZ5A 2zcBSU2XXfe44PA/fvVC4OImqGC9zwh6rs4Csp7Oh5iIuHkRPaVme5k5hlDsTNqxDDSJ 2n0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=TLCOxEuE47kolkwbHmkJmj7IG2EMMJZxExvD4wbLras=; b=tEWCEcaOmEOO6OdGXjuU21p9VfWtUHF42l9wN6b4MVvXIqeNhbCi6zWD9OX/hTRk5Z U1NlLDOueVe9/v5APqF8OvQWPI88syYZvpJ4VB4vcUJU5YkO9tWLV6EDAK+16w3sRha2 qffHprSfLCsjfks95rqLNiHhh6u3Tograg3aqySb0AI4yk3BW5E3Tym31CfNospqGhh8 hvqVW9A8ua0+QOJDGYJlwxfXmDa5A6b6caRA4oS7LVKa5IusrcCQFYsT5Ce8yl09xKXF 5TMn3njBfnb4+GifHWtwabOwsUQAuvIg6hXytVyh5QKLrNCYBNgNI0RtihNO/CF8315f h04A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=N5Jbii9l; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o23si3761885oie.146.2019.12.12.11.56.55; Thu, 12 Dec 2019 11:57:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=N5Jbii9l; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730755AbfLLTzI (ORCPT + 99 others); Thu, 12 Dec 2019 14:55:08 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:36713 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730284AbfLLTzI (ORCPT ); Thu, 12 Dec 2019 14:55:08 -0500 Received: by mail-pf1-f195.google.com with SMTP id x184so1404804pfb.3 for ; Thu, 12 Dec 2019 11:55:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=TLCOxEuE47kolkwbHmkJmj7IG2EMMJZxExvD4wbLras=; b=N5Jbii9lOcX8HpdB4OvQNSGLnmOeT6+/+rlmV3f2zodsX95/IUChnGHQM5ZR9+7otx xCv/jmEbc6D93pFCnCpXU2SDw93JebTnMOgIuJ8ohn6QjYcL69Xjt3FqJzI3CVS4nRQH dADcWC3VtCEZek6JWFoIvthRNIVETi0gLpefyHByeARkuqPnRxYLMzsYiTzOzVg31tfq 96TlL/xxXNaX2fQL+xu6IO8Y8p/8MMwUJ/7l+NfyTuhTMzsDds+ZEGokpkgCOXhaXCFH DVWOptf/VRWwif3t/hocRA2xwwTLgxuIoHH49eObl34a++JppwPR8IgcvfeN0I67KWsr t1/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=TLCOxEuE47kolkwbHmkJmj7IG2EMMJZxExvD4wbLras=; b=XVlTIqQCju6kfGYyDiGIUcsN369eDzc9K7JFYPCcN5KjxwS1Hrv0SFLMp2n5KRAI68 K+kqVW1nzFTbNBRnUlWfjYI75eyAHxi3ueAyavnbZTsmRtpGCeYrNX7SnQYDONXCTMIU fLtDuF6vK5Aq9IiQvbWjEONWihKcWFAd++jsnQizAe6hHkdf1M7dtuUpVMCobz4ur31J Ej9heQNBb6IwGpzixprHfa5FVH3nU2PZhYph7K1E6Su08696bk5i7zBy/IhahDNOr6Hz Nn/oanc45xZMUHr3q618rFJbkHU+KXLXQIee30bWCG062FaDuoX3/fy5MBmSN4rA2RlR P0OQ== X-Gm-Message-State: APjAAAXiAGNsbaMLdf+nPMqq5ioiX1xcbds1JTOUxx0vFX11i0Mgwzb+ tWBs02eDAb7YyLfDK6aGIImA6w== X-Received: by 2002:a65:640e:: with SMTP id a14mr11505680pgv.402.1576180507093; Thu, 12 Dec 2019 11:55:07 -0800 (PST) Received: from gnomeregan.cam.corp.google.com ([2620:15c:6:14:ad22:1cbb:d8fa:7d55]) by smtp.googlemail.com with ESMTPSA id g10sm7549833pgh.35.2019.12.12.11.55.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 12 Dec 2019 11:55:06 -0800 (PST) Subject: Re: [PATCH v5 2/2] kvm: Use huge pages for DAX-backed files To: Liran Alon Cc: Paolo Bonzini , Dan Williams , David Hildenbrand , Dave Jiang , Alexander Duyck , Sean Christopherson , linux-nvdimm@lists.01.org, x86@kernel.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jason.zeng@intel.com References: <20191212182238.46535-1-brho@google.com> <20191212182238.46535-3-brho@google.com> <06108004-1720-41EB-BCAB-BFA8FEBF4772@oracle.com> From: Barret Rhoden Message-ID: Date: Thu, 12 Dec 2019 14:55:04 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi - On 12/12/19 1:49 PM, Liran Alon wrote: > > >> On 12 Dec 2019, at 20:47, Liran Alon wrote: >> >> >> >>> On 12 Dec 2019, at 20:22, Barret Rhoden wrote: >>> >>> This change allows KVM to map DAX-backed files made of huge pages with >>> huge mappings in the EPT/TDP. >> >> This change isn’t only relevant for TDP. It also affects when KVM use shadow-paging. >> See how FNAME(page_fault)() calls transparent_hugepage_adjust(). Cool, I'll drop references to the EPT/TDP from the commit message. >>> DAX pages are not PageTransCompound. The existing check is trying to >>> determine if the mapping for the pfn is a huge mapping or not. >> >> I would rephrase “The existing check is trying to determine if the pfn >> is mapped as part of a transparent huge-page”. Can do. >> >>> For >>> non-DAX maps, e.g. hugetlbfs, that means checking PageTransCompound. >> >> This is not related to hugetlbfs but rather THP. I thought that PageTransCompound also returned true for hugetlbfs (based off of comments in page-flags.h). Though I do see the comment about the 'level == PT_PAGE_TABLE_LEVEL' check excluding hugetlbfs pages. Anyway, I'll remove the "e.g. hugetlbfs" from the commit message. >> >>> For DAX, we can check the page table itself. >>> >>> Note that KVM already faulted in the page (or huge page) in the host's >>> page table, and we hold the KVM mmu spinlock. We grabbed that lock in >>> kvm_mmu_notifier_invalidate_range_end, before checking the mmu seq. >>> >>> Signed-off-by: Barret Rhoden >> >> I don’t think the right place to change for this functionality is transparent_hugepage_adjust() >> which is meant to handle PFNs that are mapped as part of a transparent huge-page. >> >> For example, this would prevent mapping DAX-backed file page as 1GB. >> As transparent_hugepage_adjust() only handles the case (level == PT_PAGE_TABLE_LEVEL). >> >> As you are parsing the page-tables to discover the page-size the PFN is mapped in, >> I think you should instead modify kvm_host_page_size() to parse page-tables instead >> of rely on vma_kernel_pagesize() (Which relies on vma->vm_ops->pagesize()) in case >> of is_zone_device_page(). >> The main complication though of doing this is that at this point you don’t yet have the PFN >> that is retrieved by try_async_pf(). So maybe you should consider modifying the order of calls >> in tdp_page_fault() & FNAME(page_fault)(). >> >> -Liran > > Or alternatively when thinking about it more, maybe just rename transparent_hugepage_adjust() > to not be specific to THP and better handle the case of parsing page-tables changing mapping-level to 1GB. > That is probably easier and more elegant. I can rename it to hugepage_adjust(), since it's not just THP anymore. I was a little hesitant to change the this to handle 1 GB pages with this patchset at first. I didn't want to break the non-DAX case stuff by doing so. Specifically, can a THP page be 1 GB, and if so, how can you tell? If you can't tell easily, I could walk the page table for all cases, instead of just zone_device(). I'd also have to drop the "level == PT_PAGE_TABLE_LEVEL" check, I think, which would open this up to hugetlbfs pages (based on the comments). Is there any reason why that would be a bad idea? Thanks, Barret