2014-04-09 14:52:29

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes

On Mon, Mar 10, 2014 at 10:01:49PM +0800, Xiao Guangrong wrote:
> Now we can flush all the TLBs out of the mmu lock without TLB corruption when
> write-proect the sptes, it is because:
> - we have marked large sptes readonly instead of dropping them that means we
> just change the spte from writable to readonly so that we only need to care
> the case of changing spte from present to present (changing the spte from
> present to nonpresent will flush all the TLBs immediately), in other words,
> the only case we need to care is mmu_spte_update()
>
> - in mmu_spte_update(), we haved checked
> SPTE_HOST_WRITEABLE | PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that
> means it does not depend on PT_WRITABLE_MASK anymore
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 25 +++++++++++++++++++++----
> arch/x86/kvm/mmu.h | 14 ++++++++++++++
> arch/x86/kvm/x86.c | 12 ++++++++++--
> 3 files changed, 45 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 17bb136..01a8c35 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -4281,15 +4281,32 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
> if (*rmapp)
> __rmap_write_protect(kvm, rmapp, false);
>
> - if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
> - kvm_flush_remote_tlbs(kvm);
> + if (need_resched() || spin_needbreak(&kvm->mmu_lock))
> cond_resched_lock(&kvm->mmu_lock);
> - }
> }
> }
>
> - kvm_flush_remote_tlbs(kvm);
> spin_unlock(&kvm->mmu_lock);
> +
> + /*
> + * kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
> + * which do tlb flush out of mmu-lock should be serialized by
> + * kvm->slots_lock otherwise tlb flush would be missed.
> + */
> + lockdep_assert_held(&kvm->slots_lock);
> +
> + /*
> + * We can flush all the TLBs out of the mmu lock without TLB
> + * corruption since we just change the spte from writable to
> + * readonly so that we only need to care the case of changing
> + * spte from present to present (changing the spte from present
> + * to nonpresent will flush all the TLBs immediately), in other
> + * words, the only case we care is mmu_spte_update() where we
> + * haved checked SPTE_HOST_WRITEABLE | SPTE_MMU_WRITEABLE
> + * instead of PT_WRITABLE_MASK, that means it does not depend
> + * on PT_WRITABLE_MASK anymore.
> + */
> + kvm_flush_remote_tlbs(kvm);
> }
>
> #define BATCH_ZAP_PAGES 10
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 2926152..585d6b1 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -96,6 +96,20 @@ static inline int is_present_gpte(unsigned long pte)
> return pte & PT_PRESENT_MASK;
> }
>
> +/*
> + * Please note PT_WRITABLE_MASK is not stable since
> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
> + * can write protect sptes but flush tlb out mmu-lock that means we may use
> + * the corrupt tlb entries which depend on this bit.
> + *
> + * Both cases do not modify the status of spte_is_locklessly_modifiable() so
> + * if you want to check whether the spte is writable on MMU you can check
> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
> + * instead. See the comments in spte_has_volatile_bits() and
> + * mmu_spte_update().
> + */
> static inline int is_writable_pte(unsigned long pte)
> {

Xiao,

Can't get the SPTE_MMU_WRITEABLE part.

So assume you are writing code to perform some action after guest memory
has been write protected. You would

spin_lock(mmu_lock);

if (writeable spte bit is set)
remove writeable spte bit from spte
remote TLB flush (*)
action

spin_unlock(mmu_lock);

(*) is necessary because reading the writeable spte bit as zero
does not guarantee remote TLBs have been flushed.

Now what SPTE_MMU_WRITEABLE has to do with it ?

Perhaps a recipe like that (or just the rules) would be useful.

The remaining patches look good.


2014-04-15 11:10:49

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes


Hi Marcelo,

Thanks your time to review it.

On 04/09/2014 10:51 PM, Marcelo Tosatti wrote:

>> +/*
>> + * Please note PT_WRITABLE_MASK is not stable since
>> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
>> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
>> + * can write protect sptes but flush tlb out mmu-lock that means we may use
>> + * the corrupt tlb entries which depend on this bit.
>> + *
>> + * Both cases do not modify the status of spte_is_locklessly_modifiable() so
>> + * if you want to check whether the spte is writable on MMU you can check
>> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
>> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
>> + * instead. See the comments in spte_has_volatile_bits() and
>> + * mmu_spte_update().
>> + */
>> static inline int is_writable_pte(unsigned long pte)
>> {
>
> Xiao,
>
> Can't get the SPTE_MMU_WRITEABLE part.
>
> So assume you are writing code to perform some action after guest memory
> has been write protected. You would
>
> spin_lock(mmu_lock);
>
> if (writeable spte bit is set)
> remove writeable spte bit from spte
> remote TLB flush (*)
> action
>
> spin_unlock(mmu_lock);
>
> (*) is necessary because reading the writeable spte bit as zero
> does not guarantee remote TLBs have been flushed.
>
> Now what SPTE_MMU_WRITEABLE has to do with it ?

It is contained in "remove writeable spte bit from spte" which
is done by mmu_spte_update():

if (spte_is_locklessly_modifiable(old_spte) &&
!is_writable_pte(new_spte))
ret = true;
>
> Perhaps a recipe like that (or just the rules) would be useful.

Okay, i am considering to improve this comments, how about like
this:

Currently, we have two sorts of write-protection, a) the first
one write-protects guest page to sync the guest modification,
b) another one is used to sync dirty bitmap when we do
KVM_GET_DIRTY_LOG. The differences between these two sorts are:
1) the first case clears SPTE_MMU_WRITEABLE bit.
2) the first case requires flushing tlb immediately avoiding
corrupting shadow page table between all vcpus so it should
be in the protection of mmu-lock. And the another case does
not need to flush tlb until returning the dirty bitmap to
userspace since it only write-protects the page logged in
the bitmap, that means the page in the dirty bitmap is not
missed, so it can flush tlb out of mmu-lock.

So, there is the problem: the first case can meet the corrupted
tlb caused by another case which write-protects pages but without
flush tlb immediately. In order to making the first case be aware
this problem we let it flush tlb if we try to write-protect
a spte whose SPTE_MMU_WRITEABLE bit is set, it works since another
case never touches SPTE_MMU_WRITEABLE bit.

Anyway, whenever a spte is updated (only permission and status bits
are changed) we need to check whether the spte with SPTE_MMU_WRITEABLE
becomes readonly, if that happens, we need to flush tlb. Fortunately,
mmu_spte_update() has already handled it perfectly.

The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK:
- if we want to see if it has writable tlb entry or if the spte can
be writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is
the most case, otherwise
- if we fix page fault on the spte or do write-protection by dirty logging,
check PT_WRITABLE_MASK.

TODO: introduce APIs to split these two cases.

>
> The remaining patches look good.

Thank you.