Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Subject: Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access
 on disabled memory
To:     Jason Gunthorpe <jgg@ziepe.ca>
CC:     Peter Xu <peterx@redhat.com>,
        Alex Williamson <alex.williamson@redhat.com>,
        <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
        <cohuck@redhat.com>, <cai@lca.pw>,
        Andrea Arcangeli <aarcange@redhat.com>
References: <159017506369.18853.17306023099999811263.stgit@gimli.home>
 <20200523193417.GI766834@xz-x1> <20200523170602.5eb09a66@x1.home>
 <20200523235257.GC939059@xz-x1> <20200525122607.GC744@ziepe.ca>
 <20200525142806.GC1058657@xz-x1> <20200525144651.GE744@ziepe.ca>
 <20200525151142.GE1058657@xz-x1> <20200525165637.GG744@ziepe.ca>
 <3d9c1c8b-5278-1c4d-0e9c-e6f8fdb75853@nvidia.com>
 <20200526003705.GK744@ziepe.ca>
From:   John Hubbard <jhubbard@nvidia.com>
Message-ID: <b30fdc87-31e7-cb45-dab3-c37322ce05b1@nvidia.com>
Date:   Mon, 25 May 2020 17:46:34 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.8.0
MIME-Version: 1.0
In-Reply-To: <20200526003705.GK744@ziepe.ca>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On 2020-05-25 17:37, Jason Gunthorpe wrote:
...
>> commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7
>> Author: Gleb Natapov <gleb@redhat.com>
>> Date:   Tue Mar 22 16:30:51 2011 -0700
>>
>>      mm: allow GUP to fail instead of waiting on a page
>>
>>      GUP user may want to try to acquire a reference to a page if it is already
>>      in memory, but not if IO, to bring it in, is needed.  For example KVM may
>>      tell vcpu to schedule another guest process if current one is trying to
>>      access swapped out page.  Meanwhile, the page will be swapped in and the
>>      guest process, that depends on it, will be able to run again.
>>
>>      This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
>>      FOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in
>>      conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
>>      it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY
>>      instead.
> 
> So, from kvm's perspective it was to avoid excessively long blocking in
> common paths when it could rejoin the completed IO by somehow waiting
> on a page itself?


Or perhaps some variation on that, such as just retrying with an intervening
schedule() call. It's not clear just from that commit.

> 
> It all seems like it should not be used unless the page is going to go
> to IO?


That's my conclusion so far, yes.


> 
> Certainly there is no reason to optimize the fringe case of vfio
> sleeping if there is and incorrect concurrnent attempt to disable the
> a BAR.


Definitely agree with that position.

> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 8429d5aa31e44..e32e8e52a57ac 100644
>> +++ b/include/linux/mm.h
>> @@ -430,6 +430,15 @@ extern pgprot_t protection_map[16];
>>    * continuous faults with flags (b).  We should always try to detect pending
>>    * signals before a retry to make sure the continuous page faults can still be
>>    * interrupted if necessary.
>> + *
>> + * About @FAULT_FLAG_RETRY_NOWAIT: this is intended for callers who would like
>> + * to acquire a page, but only if the page is already in memory. If, on the
>> + * other hand, the page requires IO in order to bring it into memory, then fault
>> + * handlers will immediately return VM_FAULT_RETRY ("don't wait"), while leaving
>> + * mmap_lock held ("don't drop mmap_lock"). For example, this is useful for
>> + * virtual machines that have multiple guests running: if guest A attempts
>> + * get_user_pages() on a swapped out page, another guest can be scheduled while
>> + * waiting for IO to swap in guest A's page.
>>    */
>>   #define FAULT_FLAG_WRITE                       0x01
>>   #define FAULT_FLAG_MKWRITE                     0x02
> 
> It seems reasonable but people might complain about the kvm
> specifics of the explanation.
> 
> It might be better to explain how the caller is supposed to know when
> it is OK to try GUP again and expect success, as it seems to me this
> is really about externalizing the sleep for page wait?
> 

OK, good point. The example was helpful in the commit log, but not quite
appropriate in mm.h, yes.

thanks,
-- 
John Hubbard
NVIDIA