Subject: Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from
 kvm_mmu_get_page() to link_shadow_page()
To: Paolo Bonzini <pbonzini@redhat.com>
References: <20151112204849.ba920599a8426d7196a0df73@lab.ntt.co.jp>
 <20151112205656.33f23effece7e6914c3e2646@lab.ntt.co.jp>
 <5644A1CE.1040906@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
From: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Message-ID: <564547AA.20002@lab.ntt.co.jp>
Date: Fri, 13 Nov 2015 11:15:06 +0900
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.3.0
MIME-Version: 1.0
In-Reply-To: <5644A1CE.1040906@redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4349
Lines: 133

On 2015/11/12 23:27, Paolo Bonzini wrote:

> On 12/11/2015 12:56, Takuya Yoshikawa wrote:
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 9d21b44..f414ca6 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>>   			goto out_gpte_changed;
>>
>>   		if (sp)
>> -			link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK);
>> +			link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK);
>>   	}
>>
>
> Here I think you can remove completely the
>
> 	if (sp)
> 		kvm_mmu_put_page(sp, it.sptep);
>
> later in FNAME(fetch).  Apart from this nit, it's okay.

Yes, that's what this patch does below:

> @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
>  	return emulate;
>
>  out_gpte_changed:
> -	if (sp)
> -		kvm_mmu_put_page(sp, it.sptep);
>  	kvm_release_pfn_clean(pfn);
>  	return 0;
>  }

Since this is the only user of kvm_mmu_put_page(), it also removes
the definition:

> @@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm,
>  		mmu_page_zap_pte(kvm, sp, sp->spt + i);
>  }
>
> -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
> -{
> -	mmu_page_remove_parent_pte(sp, parent_pte);
> -}
> -
>  static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
>  {
>  	u64 *sptep;

Actually, I don't understand why this is named kvm_mmu_put_page() for
just removing parent_pte pointer from the sp->parent_ptes pointer chain.


> On to kvm_mmu_get_page...
>
>          if (!direct) {
>                  if (rmap_write_protect(vcpu, gfn))
>                          kvm_flush_remote_tlbs(vcpu->kvm);
>                  if (level > PT_PAGE_TABLE_LEVEL && need_sync)
>                          kvm_sync_pages(vcpu, gfn);
>
> This seems fishy.
>
> need_sync is set if sp->unsync, but then the parents have not been
> unsynced yet.

Reaching here means that kvm_mmu_get_page() could not return sp
from inside the for_each_gfn_sp() loop above, so even without
this patch, mark_unsync() has not been called.

Here, sp holds the new page allocated by kvm_mmu_alloc_page().
One confusing thing is that hlist_add_head() right before this
"if (!direct)" line has already added the new sp to the hash
list, so it will be found by for_each_gfn_indirect_valid_sp()
in kvm_sync_pages().

Because this sp is new and sp->unsync is not set,  kvm_sync_pages()
will just skip it and look for other sp's whose ->unsync were found
to be set in the for_each_gfn_sp() loop.

I'm not 100% sure if the existence of the parent_pte pointer in the
newly created sp->parent_ptes chain alone makes any difference:
> @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
>  	sp = kvm_mmu_alloc_page(vcpu, direct);
>
>  	sp->parent_ptes.val = 0;
> -	mmu_page_add_parent_pte(vcpu, sp, parent_pte);
>
>  	sp->gfn = gfn;
>  	sp->role = role;


> On the other hand, all calls to kvm_mmu_get_page except for the
> roots are followed by link_shadow_page...  Perhaps if parent_pte != NULL
> you can call link_shadow_page directly from kvm_mmu_get_page.  The call
> would go before the "if (!direct)" and it would subsume all the existing
> calls.
>
> We could probably also warn if
>
> 	(parent_pte == NULL)
> 		!= (level == vcpu->arch.mmu.root_level)
>
> in kvm_mmu_get_page.

I think we should set the spte after init_shadow_page_table(), and
to make this subsume all the existing calls, we need to change the
"return sp;" in the for_each_gfn_sp() loop to a goto statement so
that the end of this function will become something like this:

     init_shadow_page(sp);
out:
     if (parent_pte) {
         mmu_page_add_parent_pte(vcpu, sp, parent_pte);
         link_shadow_page(parent_pte, sp, accessed);
     }
     trace_kvm_mmu_get_page(sp, created);
     return sp;

So, "bool accessed" needs to be passed to kvm_mmu_get_page().
But any way, we need to understand if mmu_page_add_parent_pte()
really needs to be placed before the "if (!direct)" block.

   Takuya


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/