MIME-Version: 1.0
In-Reply-To: <d3be0f11-5699-bc90-c1f0-b770bc7da596@oracle.com>
References: <CALCETrUPa9xvugPNcTmShJFfgSesa31dD-wy0hY3XnH1Knjn6g@mail.gmail.com>
 <f1c1b2e0-e377-2998-51cd-96d93995e868@oracle.com> <CALCETrWY7F0kFQKpKQDeAtwuYeZznJbQbA12QjNpkQ5faFLoWA@mail.gmail.com>
 <d3be0f11-5699-bc90-c1f0-b770bc7da596@oracle.com>
From: Andy Lutomirski <luto@kernel.org>
Date: Wed, 26 Apr 2017 18:08:55 -0700
Message-ID: <CALCETrWZKERLWXf+wSghzyLEa=Xev2d_tpTbgj6uNhHAntJ2-A@mail.gmail.com>
Subject: Re: xen_exit_mmap() questions
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andy Lutomirski <luto@kernel.org>,
        "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
        Juergen Gross <jgross@suse.com>, X86 ML <x86@kernel.org>,
        Borislav Petkov <bp@alien8.de>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2869
Lines: 81

On Wed, Apr 26, 2017 at 5:55 PM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>
>
> On 04/26/2017 06:49 PM, Andy Lutomirski wrote:
>>
>> On Wed, Apr 26, 2017 at 3:45 PM, Boris Ostrovsky
>> <boris.ostrovsky@oracle.com> wrote:
>>>
>>> On 04/26/2017 04:52 PM, Andy Lutomirski wrote:
>>>>
>>>> I was trying to understand xen_drop_mm_ref() to update it for some
>>>> changes I'm working on, and I'm wondering whether we need
>>>> xen_exit_mmap() at all.
>>>>
>>>> AFAICS the intent is to force all CPUs to drop their lazy uses of the
>>>> mm being destroyed so it can be unpinned before tearing down the page
>>>> tables, thus making it faster to tear down the page tables.  This
>>>> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this
>>>> seems like it may be of rather limited value.
>>>
>>>
>>> Why do you think it's of limited value? Without it we will end up with a
>>> hypercall for each update.
>>>
>>> Or is your point that the number of those update is relatively small
>>> when we are tearing down?
>>
>>
>> The latter.  Also, unless I'm missing something, xen_set_pte() doesn't
>> have the optimization.  I haven't looked at exactly how page table
>> teardown works, but if it clears each PTE individually, then that's
>> the bulk of the work.
>>
>>>
>>>
>>>>  Could we get away with
>>>> deleting it?
>>>>
>>>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>>>
>>>>     /* If this cpu still has a stale cr3 reference, then make sure
>>>>        it has been flushed. */
>>>>     if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>>>>         load_cr3(swapper_pg_dir);
>>>>
>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>>>> mode, why would load_cr3() help?  Shouldn't this be xen_mc_flush()
>>>> instead?
>>>
>>>
>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
>>> -> xen_mc_issue().
>>
>>
>> xen_mc_issue() does:
>>
>>         if ((paravirt_get_lazy_mode() & mode) == 0)
>>                 xen_mc_flush();
>>
>> I assume the load_cr3() is intended to deal with the case where we're
>> in lazy mode, but we'll still be in lazy mode, right?  Or does it
>> serve some other purpose?
>
>
> Of course. I can't read (I ignored the "== 0" part).
>
> Apparently the early version had an explicit flush but then it disappeared
> (commit 9f79991d4186089e228274196413572cc000143b).
>
> The point of CR3 loading here, I believe, is to make sure the hypervisor
> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading
> swapper_pgdir here).

But that's what leave_mm() does.  To be fair, the x86 lazy TLB
management is a big mess, and this came up because I'm trying to clean
it up without removing it.

I suppose I can try to keep xen_exit_mmap() working.  Is there a
simple way to try to unpin but to not treat it as an error if the
hypervisor rejects it?