[permalink] [raw]

Subject: Re: [PATCH 02/15] KVM: MMU: do not update slot bitmap if spte is nonpresent

On 06/21/2011 12:28 AM, Marcelo Tosatti wrote:
> On Tue, Jun 07, 2011 at 08:59:25PM +0800, Xiao Guangrong wrote:
>> Set slot bitmap only if the spte is present
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 15 +++++++--------
>> 1 files changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index cda666a..125f78d 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -743,9 +743,6 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
>> struct kvm_mmu_page *sp;
>> unsigned long *rmapp;
>>
>> - if (!is_rmap_spte(*spte))
>> - return 0;
>> -
>
> Not sure if this is safe, what if the spte is set as nonpresent but
> rmap not removed?

It can not happen, since when we set the spte as nonpresent, we always use
drop_spte to remove the rmap, we also do it in set_spte()

>
> BTW i don't see what patch 1 and this have to do with the goal
> of the series.
>
>

There are the preparing work for mmio page fault:
- Patch 1 fix the bug in walking shadow page, so we can safely use it to
lockless-ly walk shadow page
- Patch 2 can avoid add rmap for the mmio spte :-)

2011-06-20 18:52:31

by Xiao Guangrong

[permalink] [raw]

Subject: Re: [PATCH 10/15] KVM: MMU: lockless walking shadow page table

On 06/21/2011 12:37 AM, Marcelo Tosatti wrote:

>> + if (atomic_read(&kvm->arch.reader_counter)) {
>> + free_mmu_pages_unlock_parts(invalid_list);
>> + sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
>> + list_del_init(invalid_list);
>> + call_rcu(&sp->rcu, free_invalid_pages_rcu);
>> + return;
>> + }
>
> This is probably wrong, the caller wants the page to be zapped by the
> time the function returns, not scheduled sometime in the future.
>

It can be freed soon and KVM does not reuse these pages anymore...
it is not too bad, no?

>> +
>> do {
>> sp = list_first_entry(invalid_list, struct kvm_mmu_page, link);
>> WARN_ON(!sp->role.invalid || sp->root_count);
>> @@ -2601,6 +2633,35 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gva_t vaddr,
>> return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access);
>> }
>>
>> +int kvm_mmu_walk_shadow_page_lockless(struct kvm_vcpu *vcpu, u64 addr,
>> + u64 sptes[4])
>> +{
>> + struct kvm_shadow_walk_iterator iterator;
>> + int nr_sptes = 0;
>> +
>> + rcu_read_lock();
>> +
>> + atomic_inc(&vcpu->kvm->arch.reader_counter);
>> + /* Increase the counter before walking shadow page table */
>> + smp_mb__after_atomic_inc();
>> +
>> + for_each_shadow_entry(vcpu, addr, iterator) {
>> + sptes[iterator.level-1] = *iterator.sptep;
>> + nr_sptes++;
>> + if (!is_shadow_present_pte(*iterator.sptep))
>> + break;
>> + }
>
> Why is lockless access needed for the MMIO optimization? Note the spte
> contents are copied to the array here are used for debugging purposes
> only, their contents are potentially stale.
>

Um, we can use it to check the mmio page fault if it is the real mmio access or the
bug of KVM, i discussed it with Avi:

===============================================
>
> Yes, it is, i just want to detect BUG for KVM, it helps us to know if "ept misconfig" is the
> real MMIO or the BUG. I noticed some "ept misconfig" BUGs is reported before, so i think doing
> this is necessary, and i think it is not too bad, since walking spte hierarchy is lockless,
> it really fast.

Okay. We can later see if it show up on profiles.
===============================================

And it is really fast, i will attach the 'perf result' when the v2 is posted.

Yes, their contents are potentially stale, we just use it to check mmio, after all, if we get the
stale spte, we will call page fault path to fix it.