LinuxLists.cc - xen_exit_mmap() questions

2017-04-26 20:53:03

Subject: xen_exit_mmap() questions

I was trying to understand xen_drop_mm_ref() to update it for some
changes I'm working on, and I'm wondering whether we need
xen_exit_mmap() at all.

AFAICS the intent is to force all CPUs to drop their lazy uses of the
mm being destroyed so it can be unpinned before tearing down the page
tables, thus making it faster to tear down the page tables. This
seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this
seems like it may be of rather limited value. Could we get away with
deleting it?

Also, this code in drop_other_mm_ref() looks dubious to me:

/* If this cpu still has a stale cr3 reference, then make sure
it has been flushed. */
if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
load_cr3(swapper_pg_dir);

If cr3 hasn't been flushed to the hypervisor because we're in a lazy
mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
instead?

Anyway, the whitespace-damaged patch below seems to result in a
fully-functional kernel:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 37cb5aad71de..e4e073844cbf 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -998,101 +998,6 @@ static void xen_dup_mmap(struct mm_struct
*oldmm, struct mm_struct *mm)
spin_unlock(&mm->page_table_lock);
}

-
-#ifdef CONFIG_SMP
-/* Another cpu may still have their %cr3 pointing at the pagetable, so
- we need to repoint it somewhere else before we can unpin it. */
-static void drop_other_mm_ref(void *info)
-{
- struct mm_struct *mm = info;
- struct mm_struct *active_mm;
-
- active_mm = this_cpu_read(cpu_tlbstate.active_mm);
-
- if (active_mm == mm && this_cpu_read(cpu_tlbstate.state) != TLBSTATE_OK)
- leave_mm(smp_processor_id());
-
- /* If this cpu still has a stale cr3 reference, then make sure
- it has been flushed. */
- if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
- load_cr3(swapper_pg_dir);
-}
-
-static void xen_drop_mm_ref(struct mm_struct *mm)
-{
- cpumask_var_t mask;
- unsigned cpu;
-
- if (current->active_mm == mm) {
- if (current->mm == mm)
- load_cr3(swapper_pg_dir);
- else
- leave_mm(smp_processor_id());
- }
-
- /* Get the "official" set of cpus referring to our pagetable. */
- if (!alloc_cpumask_var(&mask, GFP_ATOMIC)) {
- for_each_online_cpu(cpu) {
- if (!cpumask_test_cpu(cpu, mm_cpumask(mm))
- && per_cpu(xen_current_cr3, cpu) != __pa(mm->pgd))
- continue;
- smp_call_function_single(cpu, drop_other_mm_ref, mm, 1);
- }
- return;
- }
- cpumask_copy(mask, mm_cpumask(mm));
-
- /* It's possible that a vcpu may have a stale reference to our
- cr3, because its in lazy mode, and it hasn't yet flushed
- its set of pending hypercalls yet. In this case, we can
- look at its actual current cr3 value, and force it to flush
- if needed. */
- for_each_online_cpu(cpu) {
- if (per_cpu(xen_current_cr3, cpu) == __pa(mm->pgd))
- cpumask_set_cpu(cpu, mask);
- }
-
- if (!cpumask_empty(mask))
- smp_call_function_many(mask, drop_other_mm_ref, mm, 1);
- free_cpumask_var(mask);
-}
-#else
-static void xen_drop_mm_ref(struct mm_struct *mm)
-{
- if (current->active_mm == mm)
- load_cr3(swapper_pg_dir);
-}
-#endif
-
-/*
- * While a process runs, Xen pins its pagetables, which means that the
- * hypervisor forces it to be read-only, and it controls all updates
- * to it. This means that all pagetable updates have to go via the
- * hypervisor, which is moderately expensive.
- *
- * Since we're pulling the pagetable down, we switch to use init_mm,
- * unpin old process pagetable and mark it all read-write, which
- * allows further operations on it to be simple memory accesses.
- *
- * The only subtle point is that another CPU may be still using the
- * pagetable because of lazy tlb flushing. This means we need need to
- * switch all CPUs off this pagetable before we can unpin it.
- */
-static void xen_exit_mmap(struct mm_struct *mm)
-{
- get_cpu(); /* make sure we don't move around */
- xen_drop_mm_ref(mm);
- put_cpu();
-
- spin_lock(&mm->page_table_lock);
-
- /* pgd may not be pinned in the error exit path of execve */
- if (xen_page_pinned(mm->pgd))
- xen_pgd_unpin(mm);
-
- spin_unlock(&mm->page_table_lock);
-}
-
static void xen_post_allocator_init(void);

static void __init pin_pagetable_pfn(unsigned cmd, unsigned long pfn)
@@ -1544,6 +1449,8 @@ static int xen_pgd_alloc(struct mm_struct *mm)

static void xen_pgd_free(struct mm_struct *mm, pgd_t *pgd)
{
+ xen_pgd_unpin(mm);
+
#ifdef CONFIG_X86_64
pgd_t *user_pgd = xen_get_user_pgd(pgd);

@@ -2465,7 +2372,6 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = {

.activate_mm = xen_activate_mm,
.dup_mmap = xen_dup_mmap,
- .exit_mmap = xen_exit_mmap,

.lazy_mode = {
.enter = paravirt_enter_lazy_mmu,

2017-04-26 22:45:43

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

On 04/26/2017 04:52 PM, Andy Lutomirski wrote:
> I was trying to understand xen_drop_mm_ref() to update it for some
> changes I'm working on, and I'm wondering whether we need
> xen_exit_mmap() at all.
>
> AFAICS the intent is to force all CPUs to drop their lazy uses of the
> mm being destroyed so it can be unpinned before tearing down the page
> tables, thus making it faster to tear down the page tables. This
> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this
> seems like it may be of rather limited value.

Why do you think it's of limited value? Without it we will end up with a
hypercall for each update.

Or is your point that the number of those update is relatively small
when we are tearing down?

> Could we get away with
> deleting it?
>
> Also, this code in drop_other_mm_ref() looks dubious to me:
>
> /* If this cpu still has a stale cr3 reference, then make sure
> it has been flushed. */
> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
> load_cr3(swapper_pg_dir);
>
> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
> instead?

load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
-> xen_mc_issue().

-boris

2017-04-26 22:50:22

by Andy Lutomirski

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

On Wed, Apr 26, 2017 at 3:45 PM, Boris Ostrovsky
<[email protected]> wrote:
> On 04/26/2017 04:52 PM, Andy Lutomirski wrote:
>> I was trying to understand xen_drop_mm_ref() to update it for some
>> changes I'm working on, and I'm wondering whether we need
>> xen_exit_mmap() at all.
>>
>> AFAICS the intent is to force all CPUs to drop their lazy uses of the
>> mm being destroyed so it can be unpinned before tearing down the page
>> tables, thus making it faster to tear down the page tables. This
>> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this
>> seems like it may be of rather limited value.
>
> Why do you think it's of limited value? Without it we will end up with a
> hypercall for each update.
>
> Or is your point that the number of those update is relatively small
> when we are tearing down?

The latter. Also, unless I'm missing something, xen_set_pte() doesn't
have the optimization. I haven't looked at exactly how page table
teardown works, but if it clears each PTE individually, then that's
the bulk of the work.

>
>
>> Could we get away with
>> deleting it?
>>
>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>
>> /* If this cpu still has a stale cr3 reference, then make sure
>> it has been flushed. */
>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>> load_cr3(swapper_pg_dir);
>>
>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
>> instead?
>
> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
> -> xen_mc_issue().

xen_mc_issue() does:

if ((paravirt_get_lazy_mode() & mode) == 0)
xen_mc_flush();

I assume the load_cr3() is intended to deal with the case where we're
in lazy mode, but we'll still be in lazy mode, right? Or does it
serve some other purpose?

--Andy

2017-04-27 00:57:31

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

On 04/26/2017 06:49 PM, Andy Lutomirski wrote:
> On Wed, Apr 26, 2017 at 3:45 PM, Boris Ostrovsky
> <[email protected]> wrote:
>> On 04/26/2017 04:52 PM, Andy Lutomirski wrote:
>>> I was trying to understand xen_drop_mm_ref() to update it for some
>>> changes I'm working on, and I'm wondering whether we need
>>> xen_exit_mmap() at all.
>>>
>>> AFAICS the intent is to force all CPUs to drop their lazy uses of the
>>> mm being destroyed so it can be unpinned before tearing down the page
>>> tables, thus making it faster to tear down the page tables. This
>>> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this
>>> seems like it may be of rather limited value.
>>
>> Why do you think it's of limited value? Without it we will end up with a
>> hypercall for each update.
>>
>> Or is your point that the number of those update is relatively small
>> when we are tearing down?
>
> The latter. Also, unless I'm missing something, xen_set_pte() doesn't
> have the optimization. I haven't looked at exactly how page table
> teardown works, but if it clears each PTE individually, then that's
> the bulk of the work.
>
>>
>>
>>> Could we get away with
>>> deleting it?
>>>
>>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>>
>>> /* If this cpu still has a stale cr3 reference, then make sure
>>> it has been flushed. */
>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>>> load_cr3(swapper_pg_dir);
>>>
>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
>>> instead?
>>
>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
>> -> xen_mc_issue().
>
> xen_mc_issue() does:
>
> if ((paravirt_get_lazy_mode() & mode) == 0)
> xen_mc_flush();
>
> I assume the load_cr3() is intended to deal with the case where we're
> in lazy mode, but we'll still be in lazy mode, right? Or does it
> serve some other purpose?

Of course. I can't read (I ignored the "== 0" part).

Apparently the early version had an explicit flush but then it
disappeared (commit 9f79991d4186089e228274196413572cc000143b).

The point of CR3 loading here, I believe, is to make sure the hypervisor
knows that the (v)CPU is no longer using the the mm's cr3 (we are
loading swapper_pgdir here).

-boris

2017-04-27 01:09:28

by Andy Lutomirski

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

On Wed, Apr 26, 2017 at 5:55 PM, Boris Ostrovsky
<[email protected]> wrote:
>
>
> On 04/26/2017 06:49 PM, Andy Lutomirski wrote:
>>
>> On Wed, Apr 26, 2017 at 3:45 PM, Boris Ostrovsky
>> <[email protected]> wrote:
>>>
>>> On 04/26/2017 04:52 PM, Andy Lutomirski wrote:
>>>>
>>>> I was trying to understand xen_drop_mm_ref() to update it for some
>>>> changes I'm working on, and I'm wondering whether we need
>>>> xen_exit_mmap() at all.
>>>>
>>>> AFAICS the intent is to force all CPUs to drop their lazy uses of the
>>>> mm being destroyed so it can be unpinned before tearing down the page
>>>> tables, thus making it faster to tear down the page tables. This
>>>> seems like it'll speed up xen_set_pud() and xen_set_pmd(), but this
>>>> seems like it may be of rather limited value.
>>>
>>>
>>> Why do you think it's of limited value? Without it we will end up with a
>>> hypercall for each update.
>>>
>>> Or is your point that the number of those update is relatively small
>>> when we are tearing down?
>>
>>
>> The latter. Also, unless I'm missing something, xen_set_pte() doesn't
>> have the optimization. I haven't looked at exactly how page table
>> teardown works, but if it clears each PTE individually, then that's
>> the bulk of the work.
>>
>>>
>>>
>>>> Could we get away with
>>>> deleting it?
>>>>
>>>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>>>
>>>> /* If this cpu still has a stale cr3 reference, then make sure
>>>> it has been flushed. */
>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>>>> load_cr3(swapper_pg_dir);
>>>>
>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
>>>> instead?
>>>
>>>
>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
>>> -> xen_mc_issue().
>>
>>
>> xen_mc_issue() does:
>>
>> if ((paravirt_get_lazy_mode() & mode) == 0)
>> xen_mc_flush();
>>
>> I assume the load_cr3() is intended to deal with the case where we're
>> in lazy mode, but we'll still be in lazy mode, right? Or does it
>> serve some other purpose?
>
>
> Of course. I can't read (I ignored the "== 0" part).
>
> Apparently the early version had an explicit flush but then it disappeared
> (commit 9f79991d4186089e228274196413572cc000143b).
>
> The point of CR3 loading here, I believe, is to make sure the hypervisor
> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading
> swapper_pgdir here).

But that's what leave_mm() does. To be fair, the x86 lazy TLB
management is a big mess, and this came up because I'm trying to clean
it up without removing it.

I suppose I can try to keep xen_exit_mmap() working. Is there a
simple way to try to unpin but to not treat it as an error if the
hypervisor rejects it?

2017-04-27 06:27:52

by Ingo Molnar

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

* Boris Ostrovsky <[email protected]> wrote:

> > xen_mc_issue() does:
> >
> > if ((paravirt_get_lazy_mode() & mode) == 0)
> > xen_mc_flush();
> >
> > I assume the load_cr3() is intended to deal with the case where we're
> > in lazy mode, but we'll still be in lazy mode, right? Or does it
> > serve some other purpose?
>
> Of course. I can't read (I ignored the "== 0" part).

Ha, ob'sidenote: the preferred form to write this is:

if (!(paravirt_get_lazy_mode() & mode))
xen_mc_flush();

... exactly due to the readability problem you ran into: a 'pre' negation is much
easier to read, plus '==' tends to trigger 'equal to' attributes in the brain,
which is the opposite of negation. So it's very easy to mis-read such syntactic
constructs even if they are technically correct.

I think '== 0' should be forbidden in all cases where the purpose is a logic
operation and should be used strictly only in cases where we do explicit integer
arithmetics.

(Bools and '== false' are suboptimal for similar reasons.)

... but I digress!

Ingo

2017-04-27 07:40:33

by Jan Beulich

[permalink] [raw]

Subject: Re: [Xen-devel] xen_exit_mmap() questions

>>> On 27.04.17 at 02:55, <[email protected]> wrote:
> The point of CR3 loading here, I believe, is to make sure the hypervisor
> knows that the (v)CPU is no longer using the the mm's cr3 (we are
> loading swapper_pgdir here).

Correct, or else there would still be a non-zero refcount for the
page tables hanging off of that CR3, disallowing those pages to
become writable, and in turn disallowing the use of direct writes
instead of hypercalls to clear the individual entries (or to be
precise, direct writes would still be allowed, but their use would
be even slower than using hypercalls because they would trap
into the hypervisor for emulation).

Jan

2017-04-27 13:22:07

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

>>>>
>>>>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>>>>
>>>>> /* If this cpu still has a stale cr3 reference, then make sure
>>>>> it has been flushed. */
>>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>>>>> load_cr3(swapper_pg_dir);
>>>>>
>>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
>>>>> instead?
>>>>
>>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
>>>> -> xen_mc_issue().
>>>
>>> xen_mc_issue() does:
>>>
>>> if ((paravirt_get_lazy_mode() & mode) == 0)
>>> xen_mc_flush();
>>>
>>> I assume the load_cr3() is intended to deal with the case where we're
>>> in lazy mode, but we'll still be in lazy mode, right? Or does it
>>> serve some other purpose?
>>
>> Of course. I can't read (I ignored the "== 0" part).
>>
>> Apparently the early version had an explicit flush but then it disappeared
>> (commit 9f79991d4186089e228274196413572cc000143b).
>>
>> The point of CR3 loading here, I believe, is to make sure the hypervisor
>> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading
>> swapper_pgdir here).
> But that's what leave_mm() does. To be fair, the x86 lazy TLB
> management is a big mess, and this came up because I'm trying to clean
> it up without removing it.

True. I don't know though if you can guarantee that leave_mm() (or
load_cr3() inside it) is actually called if we are in lazy mode.

>
> I suppose I can try to keep xen_exit_mmap() working. Is there a
> simple way to try to unpin but to not treat it as an error if the
> hypervisor rejects it?

Even if we managed to craft a call in Linux to do this (current
xen_pgd_unpin() will result in a WARNing in xen_mc_flush()) this will
still cause a bunch of warnings in the hypervisor (if it is built as
DEBUG, but bad nevertheless).

But even without that, it is an error for a reason so how are you
planning to continue if you ignore it?

-boris

2017-04-27 16:46:46

by Andy Lutomirski

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

On Thu, Apr 27, 2017 at 6:21 AM, Boris Ostrovsky
<[email protected]> wrote:
>
>>>>>
>>>>>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>>>>>
>>>>>> /* If this cpu still has a stale cr3 reference, then make sure
>>>>>> it has been flushed. */
>>>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>>>>>> load_cr3(swapper_pg_dir);
>>>>>>
>>>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>>>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
>>>>>> instead?
>>>>>
>>>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
>>>>> -> xen_mc_issue().
>>>>
>>>> xen_mc_issue() does:
>>>>
>>>> if ((paravirt_get_lazy_mode() & mode) == 0)
>>>> xen_mc_flush();
>>>>
>>>> I assume the load_cr3() is intended to deal with the case where we're
>>>> in lazy mode, but we'll still be in lazy mode, right? Or does it
>>>> serve some other purpose?
>>>
>>> Of course. I can't read (I ignored the "== 0" part).
>>>
>>> Apparently the early version had an explicit flush but then it disappeared
>>> (commit 9f79991d4186089e228274196413572cc000143b).
>>>
>>> The point of CR3 loading here, I believe, is to make sure the hypervisor
>>> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading
>>> swapper_pgdir here).
>> But that's what leave_mm() does. To be fair, the x86 lazy TLB
>> management is a big mess, and this came up because I'm trying to clean
>> it up without removing it.
>
> True. I don't know though if you can guarantee that leave_mm() (or
> load_cr3() inside it) is actually called if we are in lazy mode.

The code just before that makes these calls.

Anyway, I propose to rewrite the whole thing like this:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/tlbflush_cleanup&id=ff143a54bb3bafaaad6e32145a9cfbc112e8584f

>
>>
>> I suppose I can try to keep xen_exit_mmap() working. Is there a
>> simple way to try to unpin but to not treat it as an error if the
>> hypervisor rejects it?
>
> Even if we managed to craft a call in Linux to do this (current
> xen_pgd_unpin() will result in a WARNing in xen_mc_flush()) this will
> still cause a bunch of warnings in the hypervisor (if it is built as
> DEBUG, but bad nevertheless).
>
> But even without that, it is an error for a reason so how are you
> planning to continue if you ignore it?
>

I was imagining that we'd just try to unpin and carry on if it fails.
We can always unpin later in xen_pgd_free().

--Andy

2017-04-27 20:00:55

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

On 04/27/2017 12:46 PM, Andy Lutomirski wrote:
> On Thu, Apr 27, 2017 at 6:21 AM, Boris Ostrovsky
> <[email protected]> wrote:
>>>>>>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>>>>>>
>>>>>>> /* If this cpu still has a stale cr3 reference, then make sure
>>>>>>> it has been flushed. */
>>>>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>>>>>>> load_cr3(swapper_pg_dir);
>>>>>>>
>>>>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>>>>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
>>>>>>> instead?
>>>>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
>>>>>> -> xen_mc_issue().
>>>>> xen_mc_issue() does:
>>>>>
>>>>> if ((paravirt_get_lazy_mode() & mode) == 0)
>>>>> xen_mc_flush();
>>>>>
>>>>> I assume the load_cr3() is intended to deal with the case where we're
>>>>> in lazy mode, but we'll still be in lazy mode, right? Or does it
>>>>> serve some other purpose?
>>>> Of course. I can't read (I ignored the "== 0" part).
>>>>
>>>> Apparently the early version had an explicit flush but then it disappeared
>>>> (commit 9f79991d4186089e228274196413572cc000143b).
>>>>
>>>> The point of CR3 loading here, I believe, is to make sure the hypervisor
>>>> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading
>>>> swapper_pgdir here).
>>> But that's what leave_mm() does. To be fair, the x86 lazy TLB
>>> management is a big mess, and this came up because I'm trying to clean
>>> it up without removing it.
>> True. I don't know though if you can guarantee that leave_mm() (or
>> load_cr3() inside it) is actually called if we are in lazy mode.
> The code just before that makes these calls.

Yes, and I was unsure whether we always get to make these calls, based
on mm and cpu_tlbstate. I think we do and with your changes it is made
even more clear.

>
> Anyway, I propose to rewrite the whole thing like this:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/tlbflush_cleanup&id=ff143a54bb3bafaaad6e32145a9cfbc112e8584f

Can you explain xen_pgd_free() change? When do you expect
xen_exit_mmap() to fail unpinning (compared to what we have now)?

-boris

2017-04-27 21:22:35

by Andy Lutomirski

[permalink] [raw]

Subject: Re: xen_exit_mmap() questions

On Thu, Apr 27, 2017 at 1:00 PM, Boris Ostrovsky
<[email protected]> wrote:
> On 04/27/2017 12:46 PM, Andy Lutomirski wrote:
>> On Thu, Apr 27, 2017 at 6:21 AM, Boris Ostrovsky
>> <[email protected]> wrote:
>>>>>>>> Also, this code in drop_other_mm_ref() looks dubious to me:
>>>>>>>>
>>>>>>>> /* If this cpu still has a stale cr3 reference, then make sure
>>>>>>>> it has been flushed. */
>>>>>>>> if (this_cpu_read(xen_current_cr3) == __pa(mm->pgd))
>>>>>>>> load_cr3(swapper_pg_dir);
>>>>>>>>
>>>>>>>> If cr3 hasn't been flushed to the hypervisor because we're in a lazy
>>>>>>>> mode, why would load_cr3() help? Shouldn't this be xen_mc_flush()
>>>>>>>> instead?
>>>>>>> load_cr3() actually ends with xen_mc_flush() by way of xen_write_cr3()
>>>>>>> -> xen_mc_issue().
>>>>>> xen_mc_issue() does:
>>>>>>
>>>>>> if ((paravirt_get_lazy_mode() & mode) == 0)
>>>>>> xen_mc_flush();
>>>>>>
>>>>>> I assume the load_cr3() is intended to deal with the case where we're
>>>>>> in lazy mode, but we'll still be in lazy mode, right? Or does it
>>>>>> serve some other purpose?
>>>>> Of course. I can't read (I ignored the "== 0" part).
>>>>>
>>>>> Apparently the early version had an explicit flush but then it disappeared
>>>>> (commit 9f79991d4186089e228274196413572cc000143b).
>>>>>
>>>>> The point of CR3 loading here, I believe, is to make sure the hypervisor
>>>>> knows that the (v)CPU is no longer using the the mm's cr3 (we are loading
>>>>> swapper_pgdir here).
>>>> But that's what leave_mm() does. To be fair, the x86 lazy TLB
>>>> management is a big mess, and this came up because I'm trying to clean
>>>> it up without removing it.
>>> True. I don't know though if you can guarantee that leave_mm() (or
>>> load_cr3() inside it) is actually called if we are in lazy mode.
>> The code just before that makes these calls.
>
> Yes, and I was unsure whether we always get to make these calls, based
> on mm and cpu_tlbstate. I think we do and with your changes it is made
> even more clear.

:)

>
>>
>> Anyway, I propose to rewrite the whole thing like this:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/tlbflush_cleanup&id=ff143a54bb3bafaaad6e32145a9cfbc112e8584f
>
> Can you explain xen_pgd_free() change? When do you expect
> xen_exit_mmap() to fail unpinning (compared to what we have now)?

I don't expect it to fail, but I made fairly extensive changes.

--Andy