Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp1192438ybe; Fri, 13 Sep 2019 12:30:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqzCmFela2LJEQ6UzmiljpFnKFYxAQ4Ys0Z0lt9CerByF3T8u+cP6UoNLJEyI14IOge1jpQ8 X-Received: by 2002:a05:6402:350:: with SMTP id r16mr21601060edw.296.1568403026212; Fri, 13 Sep 2019 12:30:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568403026; cv=none; d=google.com; s=arc-20160816; b=cZN5Jnj/IjmPgzZR/Mx/0OpZIQOC2S2JcQCGNGq5iv9mLIx0hO5ePrqEN/84vEmtM2 5z4C/Jzve+JWzCV1at106pOSMWUITghSsF/uJr289/5IQskCj3Tl477lHlF1r5Qa9pUt RBG3wMGjU1azdhrYn22exYqiHjA3gtH2uMF2obm9506FW9uPO9IK1WCkPY+sPn8nOh5s mLKGK7JE1ik+4ik9UNnNTpQGQSxq74z2SOu7fdNV/4lPktV2HYmhkyyNy2HZt3JNBnhx qW1V5n6uIEqj4UV4aXSBhQ9LEKpEqwH3Yq59VmdaQbB3pDLVhEui2utaidQ2eJQloZZV YicA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date; bh=dZd0MHszRH+2zZyYemFVcxG/Tt8XlRYxAgzmDNyt3kY=; b=zdh1LOch0P6cGT5AlkJEA5TBcFOWiGjJG0bDuwP1Za+Uu5fcZjakj3yPs8fC0u2bEq N9BE8XeVeADrQFu6PsKJN6IANT4WYl6y5KTakYjBDyhlfzam2k0h5E/KGJD0QZdC0H+W zCr5j262VqjHNy0/NU1920xKpiVqQK3PP4CkJkRA9gCDqHuOlSQJR6UYx742FbuPaAj9 0dTrlykThZaQtvz/Bcj5x0p0QQya6I6rlzJECF1K4Z690d4/jmOfOqQnB0Tipm8J6uqj EhBStNCNzUP3gxwctTcvY+WWArNqQdZ2gX//BaSOqYRfxJzZ65Ha+7e4AFFHTgycsjwV bI6w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j6si8426186ejm.157.2019.09.13.12.30.02; Fri, 13 Sep 2019 12:30:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388736AbfIMSF2 convert rfc822-to-8bit (ORCPT + 99 others); Fri, 13 Sep 2019 14:05:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35694 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387471AbfIMSF2 (ORCPT ); Fri, 13 Sep 2019 14:05:28 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9FEB5307CDFC; Fri, 13 Sep 2019 18:05:27 +0000 (UTC) Received: from x1.home (ovpn-118-102.phx2.redhat.com [10.3.118.102]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4855F60600; Fri, 13 Sep 2019 18:05:27 +0000 (UTC) Date: Fri, 13 Sep 2019 12:05:26 -0600 From: Alex Williamson To: Ben Luo Cc: cohuck@redhat.com, linux-kernel@vger.kernel.org, Andrea Arcangeli Subject: Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking Message-ID: <20190913120526.363e7592@x1.home> In-Reply-To: <5c9c57ba-ca5f-f080-3bb0-417a08788235@linux.alibaba.com> References: <20190827124041.4f986005@x1.home> <3517844d6371794cff59b13bf9c2baf1dcbe571c.1566966365.git.luoben@linux.alibaba.com> <20190828095501.12e71bd3@x1.home> <6c234632-b7e9-45c7-3d70-51a4c83161f6@linux.alibaba.com> <20190829110601.6dd74052@x1.home> <5c9c57ba-ca5f-f080-3bb0-417a08788235@linux.alibaba.com> Organization: Red Hat MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.49]); Fri, 13 Sep 2019 18:05:27 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2 Sep 2019 15:32:42 +0800 Ben Luo wrote: > 在 2019/8/30 上午1:06, Alex Williamson 写道: > > On Fri, 30 Aug 2019 00:58:22 +0800 > > Ben Luo wrote: > > > >> 在 2019/8/28 下午11:55, Alex Williamson 写道: > >>> On Wed, 28 Aug 2019 12:28:04 +0800 > >>> Ben Luo wrote: > >>> > >>>> currently, if the page is not a tail of compound page, it will be > >>>> checked twice for the same thing. > >>>> > >>>> Signed-off-by: Ben Luo > >>>> --- > >>>> drivers/vfio/vfio_iommu_type1.c | 3 +-- > >>>> 1 file changed, 1 insertion(+), 2 deletions(-) > >>>> > >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >>>> index 054391f..d0f7346 100644 > >>>> --- a/drivers/vfio/vfio_iommu_type1.c > >>>> +++ b/drivers/vfio/vfio_iommu_type1.c > >>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async) > >>>> static bool is_invalid_reserved_pfn(unsigned long pfn) > >>>> { > >>>> if (pfn_valid(pfn)) { > >>>> - bool reserved; > >>>> struct page *tail = pfn_to_page(pfn); > >>>> struct page *head = compound_head(tail); > >>>> - reserved = !!(PageReserved(head)); > >>>> if (head != tail) { > >>>> + bool reserved = PageReserved(head); > >>>> /* > >>>> * "head" is not a dangling pointer > >>>> * (compound_head takes care of that) > >>> Thinking more about this, the code here was originally just a copy of > >>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below. > >>> Should we instead do the same thing here? Thanks, > >>> > >>> Alex > >> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take > >> a look at that and compose a new patch > > I'm not sure if the further updates are quite as relevant for vfio, but > > appreciate your review of them. Thanks, > > > > Alex > > After studying the related code, my personal understandings are: > > kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO > mapped so that to set > the proper MTRR TYPE to spte. > > is_invalid_reserved_pfn() is used in two scenarios: >     1. to tell whether a page should be counted against user's mlock > limits, as the function's name > implies, all 'invalid' PFNs who are not backed by struct page and those > reserved pages (including > zero page and those from NVDIMM DAX) should be excluded. > 2. to check if we have got a valid and pinned pfn for the vma > with VM_PFNMAP flag. > > So, for the zero page and 'RAM' backed PFNs without 'struct page', > kvm_is_mmio_pfn() should > return false because they are not MMIO and are cacheable, but > is_invalid_reserved_pfn() should > return true since they are truely reserved or invalid and should not be > counted against user's > mlock limits. > > For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO > also returns this > error code to user, seems not support fsdax for now, so there is no > chance to call into > is_invalid_reserved_pfn() currently, if fsdax is to be supported, not > only this function needs to be > updated, vaddr_get_pfn() also needs some changes. > > Now, with the assumption that PFNs of compound pages with reserved bit > set in head will not be > passed to is_invalid_reserved_pfn(), we can simplify this function to: > > static bool is_invalid_reserved_pfn(unsigned long pfn) > { >         if (pfn_valid(pfn)) >                 return PageReserved(pfn_to_page(pfn)); > >         return true; > } > > But, I am not sure if the assumption above is true, if not, we still > need to check the reserved bit of > head for a tail page as this PATCH v2 does. I believe what you've described is correct. Andrea, have we missed anything? Thanks, Alex > > > >>> commit 11feeb498086a3a5907b8148bdf1786a9b18fc55 > >>> Author: Andrea Arcangeli > >>> Date: Thu Jul 25 03:04:38 2013 +0200 > >>> > >>> kvm: optimize away THP checks in kvm_is_mmio_pfn() > >>> > >>> The checks on PG_reserved in the page structure on head and tail pages > >>> aren't necessary because split_huge_page wouldn't transfer the > >>> PG_reserved bit from head to tail anyway. > >>> > >>> This was a forward-thinking check done in the case PageReserved was > >>> set by a driver-owned page mapped in userland with something like > >>> remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not > >>> possible right now). It was meant to be very safe, but it's overkill > >>> as it's unlikely split_huge_page could ever run without the driver > >>> noticing and tearing down the hugepage itself. > >>> > >>> And if a driver in the future will really want to map a reserved > >>> hugepage in userland using an huge pmd it should simply take care of > >>> marking all subpages reserved too to keep KVM safe. This of course > >>> would require such a hypothetical driver to tear down the huge pmd > >>> itself and splitting the hugepage itself, instead of relaying on > >>> split_huge_page, but that sounds very reasonable, especially > >>> considering split_huge_page wouldn't currently transfer the reserved > >>> bit anyway. > >>> > >>> Signed-off-by: Andrea Arcangeli > >>> Signed-off-by: Gleb Natapov > >>> > >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > >>> index d2836788561e..0fc25aed79a8 100644 > >>> --- a/virt/kvm/kvm_main.c > >>> +++ b/virt/kvm/kvm_main.c > >>> @@ -102,28 +102,8 @@ static bool largepages_enabled = true; > >>> > >>> bool kvm_is_mmio_pfn(pfn_t pfn) > >>> { > >>> - if (pfn_valid(pfn)) { > >>> - int reserved; > >>> - struct page *tail = pfn_to_page(pfn); > >>> - struct page *head = compound_trans_head(tail); > >>> - reserved = PageReserved(head); > >>> - if (head != tail) { > >>> - /* > >>> - * "head" is not a dangling pointer > >>> - * (compound_trans_head takes care of that) > >>> - * but the hugepage may have been splitted > >>> - * from under us (and we may not hold a > >>> - * reference count on the head page so it can > >>> - * be reused before we run PageReferenced), so > >>> - * we've to check PageTail before returning > >>> - * what we just read. > >>> - */ > >>> - smp_rmb(); > >>> - if (PageTail(tail)) > >>> - return reserved; > >>> - } > >>> - return PageReserved(tail); > >>> - } > >>> + if (pfn_valid(pfn)) > >>> + return PageReserved(pfn_to_page(pfn)); > >>> > >>> return true; > >>> }