Subject: Re: xen_exit_mmap() questions
To: Andy Lutomirski <luto@kernel.org>
References: <CALCETrUPa9xvugPNcTmShJFfgSesa31dD-wy0hY3XnH1Knjn6g@mail.gmail.com>
 <f1c1b2e0-e377-2998-51cd-96d93995e868@oracle.com>
 <CALCETrWY7F0kFQKpKQDeAtwuYeZznJbQbA12QjNpkQ5faFLoWA@mail.gmail.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
        Juergen Gross <jgross@suse.com>, X86 ML <x86@kernel.org>,
        Borislav Petkov <bp@alien8.de>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Message-ID: <d3be0f11-5699-bc90-c1f0-b770bc7da596@oracle.com>
Date: Wed, 26 Apr 2017 20:55:55 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <CALCETrWY7F0kFQKpKQDeAtwuYeZznJbQbA12QjNpkQ5faFLoWA@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2380
Lines: 65


On 04/26/2017 06:49 PM, Andy Lutomirski wrote:
> On Wed, Apr 26, 2017 at 3:45 PM, Boris Ostrovsky
> <boris.ostrovsky@oracle.com> wrote:
>> On 04/26/2017 04:52 PM, Andy Lutomirski wrote:
>>> I was trying to understand xen_drop_mm_ref() to update it for some
>>> changes I'm working on, and I'm wondering whether we need
>>> xen_exit_mmap() at all.
>>>
>>> AFAICS the intent is to force all CPUs to drop their lazy uses of the
>>> mm being destroyed so it can be unpinned before tearing down the page
>>> tables, thus making it faster to tear down the page tables.  This
>>> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this
>>> seems like it may be of rather limited value.
>>
>> Why do you think it's of limited value? Without it we will end up with a
>> hypercall for each update.
>>
>> Or is your point that the number of those update is relatively small
>> when we are tearing down?
>
> The latter.  Also, unless I'm missing something, xen_set_pte() doesn't
> have the optimization.  I haven't looked at exactly how page table
> teardown works, but if it clears each PTE individually, then that's
> the bulk of the work.
>
>>
>>
>>>  Could we get away with
>>> deleting it?
>>>
>>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>>
>>>     /* If this cpu still has a stale cr3 reference, then make sure
>>>        it has been flushed. */
>>>     if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>>>         load_cr3(swapper_pg_dir);
>>>
>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>>> mode, why would load_cr3() help?  Shouldn't this be xen_mc_flush()
>>> instead?
>>
>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
>> -> xen_mc_issue().
>
> xen_mc_issue() does:
>
>         if ((paravirt_get_lazy_mode() & mode) == 0)
>                 xen_mc_flush();
>
> I assume the load_cr3() is intended to deal with the case where we're
> in lazy mode, but we'll still be in lazy mode, right?  Or does it
> serve some other purpose?

Of course. I can't read (I ignored the "== 0" part).

Apparently the early version had an explicit flush but then it 
disappeared (commit 9f79991d4186089e228274196413572cc000143b).

The point of CR3 loading here, I believe, is to make sure the hypervisor 
knows that the (v)CPU is no longer using the the mm's cr3 (we are 
loading swapper_pgdir here).

-boris