The following kvm_flush_remote_tlbs() will call smp_mb() inside and so
remove smp_mb() here.
Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a54ecd9..6315416 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2383,12 +2383,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
return;
/*
- * wmb: make sure everyone sees our modifications to the page tables
- * rmb: make sure we see changes to vcpu->mode
- */
- smp_mb();
-
- /*
* Wait for all vcpus to exit guest mode and/or lockless shadow
* page table walks.
*/
--
1.8.4.rc0.1.g8f6a3e5.dirty
On Fri, 4 Mar 2016, Lan Tianyu wrote:
> The following kvm_flush_remote_tlbs() will call smp_mb() inside and so
> remove smp_mb() here.
>
> Signed-off-by: Lan Tianyu <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 6 ------
> 1 file changed, 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index a54ecd9..6315416 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2383,12 +2383,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
> return;
>
> /*
> - * wmb: make sure everyone sees our modifications to the page tables
> - * rmb: make sure we see changes to vcpu->mode
You want to leave the comment explaining the memory barriers and tell that
kvm_flush_remote_tlbs() contains the smp_mb().
> - */
> - smp_mb();
> -
> - /*
> * Wait for all vcpus to exit guest mode and/or lockless shadow
> * page table walks.
> */
> --
> 1.8.4.rc0.1.g8f6a3e5.dirty
>
>
On 2016年03月04日 15:21, Thomas Gleixner wrote:
> On Fri, 4 Mar 2016, Lan Tianyu wrote:
>
>> The following kvm_flush_remote_tlbs() will call smp_mb() inside and so
>> remove smp_mb() here.
>>
>> Signed-off-by: Lan Tianyu <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 6 ------
>> 1 file changed, 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index a54ecd9..6315416 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2383,12 +2383,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>> return;
>>
>> /*
>> - * wmb: make sure everyone sees our modifications to the page tables
>> - * rmb: make sure we see changes to vcpu->mode
>
> You want to leave the comment explaining the memory barriers and tell that
> kvm_flush_remote_tlbs() contains the smp_mb().
That sounds more reasonable. Will update. Thanks.
>
>> - */
>> - smp_mb();
>> -
>> - /*
>> * Wait for all vcpus to exit guest mode and/or lockless shadow
>> * page table walks.
>> */
>> --
>> 1.8.4.rc0.1.g8f6a3e5.dirty
>>
>>
--
Best regards
Tianyu Lan
On 04/03/2016 02:35, Lan Tianyu wrote:
> The following kvm_flush_remote_tlbs() will call smp_mb() inside and so
> remove smp_mb() here.
>
> Signed-off-by: Lan Tianyu <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 6 ------
> 1 file changed, 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index a54ecd9..6315416 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2383,12 +2383,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
> return;
>
> /*
> - * wmb: make sure everyone sees our modifications to the page tables
> - * rmb: make sure we see changes to vcpu->mode
> - */
> - smp_mb();
> -
> - /*
> * Wait for all vcpus to exit guest mode and/or lockless shadow
> * page table walks.
> */
>
kvm_flush_remote_tlbs loads kvm->tlbs_dirty before issuing the memory
barrier. I think it's okay if the load is done earlier, but I'd like
Guangrong to take a look.
In any case, this patch is not going to be included in 4.6. I'll queue
it directly for 4.7.
Paolo
Looks good, but I'd like Xiao to take a look.
On 04/03/2016 08:12, Lan Tianyu wrote:
> > > /*
> > > - * wmb: make sure everyone sees our modifications to the page tables
> > > - * rmb: make sure we see changes to vcpu->mode
> >
> > You want to leave the comment explaining the memory barriers and tell that
> > kvm_flush_remote_tlbs() contains the smp_mb().
>
> That sounds more reasonable. Will update. Thanks.
In fact, the reason for kvm_flush_remote_tlbs()'s barrier is exactly
what was in this comment. So you can:
1) add a comment to kvm_flush_remote_tlbs like:
/*
* We want to publish modifications to the page tables before reading
* mode. Pairs with a memory barrier in arch-specific code.
* - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
* - powerpc: smp_mb in kvmppc_prepare_to_enter.
*/
2) add a comment to vcpu_enter_guest and kvmppc_prepare_to_enter, saying
that the memory barrier also orders the write to mode from any reads
to the page tables done while the VCPU is running. In other words, on
entry a single memory barrier achieves two purposes (write ->mode before
reading requests, write ->mode before reading page tables).
The same should be true in kvm_flush_remote_tlbs(). So you may investigate
removing the barrier from kvm_flush_remote_tlbs, because
kvm_make_all_cpus_request already includes a memory barrier. Like
Thomas suggested, leave a comment in kvm_flush_remote_tlbs(),
saying which memory barrier you are relying on and for what.
And finally, the memory barrier in kvm_make_all_cpus_request can become
smp_mb__after_atomic, which is free on x86.
Of course, all this should be done in at least three separate patches.
Thanks!
Paolo
On 03/04/2016 04:04 PM, Paolo Bonzini wrote:
> On 04/03/2016 02:35, Lan Tianyu wrote:
>> The following kvm_flush_remote_tlbs() will call smp_mb() inside and so
>> remove smp_mb() here.
>>
>> Signed-off-by: Lan Tianyu <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 6 ------
>> 1 file changed, 6 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index a54ecd9..6315416 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2383,12 +2383,6 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
>> return;
>>
>> /*
>> - * wmb: make sure everyone sees our modifications to the page tables
>> - * rmb: make sure we see changes to vcpu->mode
>> - */
>> - smp_mb();
>> -
>> - /*
>> * Wait for all vcpus to exit guest mode and/or lockless shadow
>> * page table walks.
>> */
>>
>
> kvm_flush_remote_tlbs loads kvm->tlbs_dirty before issuing the memory
> barrier. I think it's okay if the load is done earlier, but I'd like
> Guangrong to take a look.
Yes, It looks good to me, but please keep the comment.
Thanks!
On 2016年03月04日 16:49, Paolo Bonzini wrote:
> On 04/03/2016 08:12, Lan Tianyu wrote:
>>>> /*
>>>> - * wmb: make sure everyone sees our modifications to the page tables
>>>> - * rmb: make sure we see changes to vcpu->mode
>>>
>>> You want to leave the comment explaining the memory barriers and tell that
>>> kvm_flush_remote_tlbs() contains the smp_mb().
>>
>> That sounds more reasonable. Will update. Thanks.
>
> In fact, the reason for kvm_flush_remote_tlbs()'s barrier is exactly
> what was in this comment.
Hi Paolo:
Thanks for your comments.
Summary about smp_mb()s we met in this thread. If misunderstood, please
correct me. Thanks.
The smp_mb() in the kvm_flush_remote_tlbs() was introduced by the commit
a4ee1ca4 and it seems to keep the order of reading and cmpxchg
kvm->tlbs_dirty.
commit a4ee1ca4a36e7857d90ae8c2b85f1bde9a042c10
Author: Xiao Guangrong <[email protected]>
Date: Tue Nov 23 11:13:00 2010 +0800
KVM: MMU: delay flush all tlbs on sync_page path
Quote from Avi:
| I don't think we need to flush immediately; set a "tlb dirty" bit
somewhere
| that is cleareded when we flush the tlb.
kvm_mmu_notifier_invalidate_page()
| can consult the bit and force a flush if set.
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
The smp_mb() in the kvm_mmu_commit_zap_page() was introduced by commit
c142786c6 which was merged later than commit a4ee1ca4. It pairs with
smp_mb() in the walk_shadow_page_lockless_begin/end() to keep order
between modifications of the page tables and reading mode.
commit c142786c6291189b5c85f53d91743e1eefbd8fe0
Author: Avi Kivity <[email protected]>
Date: Mon May 14 15:44:06 2012 +0300
KVM: MMU: Don't use RCU for lockless shadow walking
Using RCU for lockless shadow walking can increase the amount of memory
in use by the system, since RCU grace periods are unpredictable. We
also
have an unconditional write to a shared variable (reader_counter), which
isn't good for scaling.
Replace that with a scheme similar to x86's get_user_pages_fast():
disable
interrupts during lockless shadow walk to force the freer
(kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
processor with interrupts enabled.
We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
kvm_flush_remote_tlbs() from avoiding the IPI.
Signed-off-by: Avi Kivity <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
The smp_mb() in the kvm_make_all_cpus_request() was introduced by commit
6b7e2d09. It keeps order between setting request bit and reading mode.
commit 6b7e2d0991489559a1df4500d77f7b76c4607ed0
Author: Xiao Guangrong <[email protected]>
Date: Wed Jan 12 15:40:31 2011 +0800
KVM: Add "exiting guest mode" state
Currently we keep track of only two states: guest mode and host
mode. This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.
Also
1: No need atomically to read/write ->mode in vcpu's thread
2: reorganize struct kvm_vcpu to make ->mode and ->requests
in the same cache line explicitly
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
> So you can:
>
> 1) add a comment to kvm_flush_remote_tlbs like:
>
> /*
> * We want to publish modifications to the page tables before reading
> * mode. Pairs with a memory barrier in arch-specific code.
> * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
> * - powerpc: smp_mb in kvmppc_prepare_to_enter.
> */
>
> 2) add a comment to vcpu_enter_guest and kvmppc_prepare_to_enter, saying
> that the memory barrier also orders the write to mode from any reads
> to the page tables done while the VCPU is running. In other words, on
> entry a single memory barrier achieves two purposes (write ->mode before
> reading requests, write ->mode before reading page tables).
These sounds good.
>
> The same should be true in kvm_flush_remote_tlbs(). So you may investigate
> removing the barrier from kvm_flush_remote_tlbs, because
> kvm_make_all_cpus_request already includes a memory barrier. Like
> Thomas suggested, leave a comment in kvm_flush_remote_tlbs(),
> saying which memory barrier you are relying on and for what.
If we remove the smp_mb() in the kvm_flush_remote_tlbs(), we need to
leave comments both in the kvm_flush_remote_tlbs() and
kvm_mmu_commit_zap_page(), right?
>
> And finally, the memory barrier in kvm_make_all_cpus_request can become
> smp_mb__after_atomic, which is free on x86.
I found you have done this in your tree :)
commit 5b06b60b72b73cbe3805d2a64d223e0264bd0479
Author: Paolo Bonzini <[email protected]>
Date: Wed Feb 24 12:44:17 2016 +0100
KVM: mark memory barrier with smp_mb__after_atomic
Signed-off-by: Paolo Bonzini <[email protected]>
>
> Of course, all this should be done in at least three separate patches.
>
> Thanks!
>
> Paolo
>
--
Best regards
Tianyu Lan
On 08/03/2016 09:36, Lan Tianyu wrote:
> Summary about smp_mb()s we met in this thread. If misunderstood, please
> correct me. Thanks.
>
> The smp_mb() in the kvm_flush_remote_tlbs() was introduced by the commit
> a4ee1ca4 and it seems to keep the order of reading and cmpxchg
> kvm->tlbs_dirty.
>
> Quote from Avi:
> | I don't think we need to flush immediately; set a "tlb dirty" bit
> somewhere
> | that is cleareded when we flush the tlb.
> kvm_mmu_notifier_invalidate_page()
> | can consult the bit and force a flush if set.
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> Signed-off-by: Marcelo Tosatti <[email protected]>
Unfortunately that patch added a bad memory barrier: 1) it lacks a
comment; 2) it lacks obvious pairing; 3) it is an smp_mb() after a read,
so it's not even obvious that this memory barrier has to do with the
immediately preceding read of kvm->tlbs_dirty. It also is not
documented in Documentation/virtual/kvm/mmu.txt (Guangrong documented
there most of his other work, back in 2013, but not this one :)).
The cmpxchg is ordered anyway against the read, because 1) x86 has
implicit ordering between earlier loads and later stores; 2) even
store-load barriers are unnecessary for accesses to the same variable
(in this case kvm->tlbs_dirty).
So offhand, I cannot say what it orders. There are two possibilities:
1) it orders the read of tlbs_dirty with the read of mode. In this
case, a smp_rmb() would have been enough, and it's not clear where is
the matching smp_wmb().
2) it orders the read of tlbs_dirty with the KVM_REQ_TLB_FLUSH request.
In this case a smp_load_acquire would be better.
3) it does the same as kvm_mmu_commit_zap_page's smp_mb() but for other
callers of kvm_flush_remote_tlbs(). In this case, we know what's the
matching memory barrier (walk_shadow_page_lockless_*).
4) it is completely unnecessary.
My guess is either (2) or (3), but I am not sure. We know that
anticipating kvm->tlbs_dirty should be safe; worst case, it causes the
cmpxchg to fail and an extra TLB flush on the next call to the MMU
notifier. But I'm not sure of what happens if the processor moves the
read later.
> The smp_mb() in the kvm_mmu_commit_zap_page() was introduced by commit
> c142786c6 which was merged later than commit a4ee1ca4. It pairs with
> smp_mb() in the walk_shadow_page_lockless_begin/end() to keep order
> between modifications of the page tables and reading mode.
Yes; it also pairs with the smp_mb__after_srcu_unlock() in vcpu_enter_guest.
> The smp_mb() in the kvm_make_all_cpus_request() was introduced by commit
> 6b7e2d09. It keeps order between setting request bit and reading mode.
Yes.
>> So you can:
>>
>> 1) add a comment to kvm_flush_remote_tlbs like:
>>
>> /*
>> * We want to publish modifications to the page tables before reading
>> * mode. Pairs with a memory barrier in arch-specific code.
>> * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
>> * - powerpc: smp_mb in kvmppc_prepare_to_enter.
>> */
>>
>> 2) add a comment to vcpu_enter_guest and kvmppc_prepare_to_enter, saying
>> that the memory barrier also orders the write to mode from any reads
>> to the page tables done while the VCPU is running. In other words, on
>> entry a single memory barrier achieves two purposes (write ->mode before
>> reading requests, write ->mode before reading page tables).
>
> These sounds good.
>
>> The same should be true in kvm_flush_remote_tlbs(). So you may investigate
>> removing the barrier from kvm_flush_remote_tlbs, because
>> kvm_make_all_cpus_request already includes a memory barrier. Like
>> Thomas suggested, leave a comment in kvm_flush_remote_tlbs(),
>> saying which memory barrier you are relying on and for what.
>
> If we remove the smp_mb() in the kvm_flush_remote_tlbs(), we need to
> leave comments both in the kvm_flush_remote_tlbs() and
> kvm_mmu_commit_zap_page(), right?
Yes. In fact, instead of removing it, I would change it to
smp_mb__before_atomic();
with a comment that points to the addition of the barrier in commit
a4ee1ca4. Unless Guangrong can enlighten us. :)
>>
>> And finally, the memory barrier in kvm_make_all_cpus_request can become
>> smp_mb__after_atomic, which is free on x86.
>
> I found you have done this in your tree :)
>
> commit 5b06b60b72b73cbe3805d2a64d223e0264bd0479
> Author: Paolo Bonzini <[email protected]>
> Date: Wed Feb 24 12:44:17 2016 +0100
>
> KVM: mark memory barrier with smp_mb__after_atomic
>
> Signed-off-by: Paolo Bonzini <[email protected]>
Yeah, but I reverted it because it makes sense to do it together with
your patch.
Paolo
On 2016年03月08日 23:27, Paolo Bonzini wrote:
> Unfortunately that patch added a bad memory barrier: 1) it lacks a
> comment; 2) it lacks obvious pairing; 3) it is an smp_mb() after a read,
> so it's not even obvious that this memory barrier has to do with the
> immediately preceding read of kvm->tlbs_dirty. It also is not
> documented in Documentation/virtual/kvm/mmu.txt (Guangrong documented
> there most of his other work, back in 2013, but not this one :)).
>
> The cmpxchg is ordered anyway against the read, because 1) x86 has
> implicit ordering between earlier loads and later stores; 2) even
> store-load barriers are unnecessary for accesses to the same variable
> (in this case kvm->tlbs_dirty).
>
> So offhand, I cannot say what it orders. There are two possibilities:
>
> 1) it orders the read of tlbs_dirty with the read of mode. In this
> case, a smp_rmb() would have been enough, and it's not clear where is
> the matching smp_wmb().
>
> 2) it orders the read of tlbs_dirty with the KVM_REQ_TLB_FLUSH request.
> In this case a smp_load_acquire would be better.
>
> 3) it does the same as kvm_mmu_commit_zap_page's smp_mb() but for other
> callers of kvm_flush_remote_tlbs(). In this case, we know what's the
> matching memory barrier (walk_shadow_page_lockless_*).
>
> 4) it is completely unnecessary.
>
> My guess is either (2) or (3), but I am not sure. We know that
> anticipating kvm->tlbs_dirty should be safe; worst case, it causes the
> cmpxchg to fail and an extra TLB flush on the next call to the MMU
> notifier. But I'm not sure of what happens if the processor moves the
> read later.
I found the smp_mb() in the kvm_mmu_commit_zap_page() was removed by
commit 5befdc38 and the commit was reverted by commit a086f6a1e.
The remove/revert reason is whether kvm_flush_remote_tlbs() is under
mmu_lock or not.
The mode and request aren't always under mmu_lock and so I think the
smp_mb() should not be related with mode and request when introduced.
>
>> > The smp_mb() in the kvm_mmu_commit_zap_page() was introduced by commit
>> > c142786c6 which was merged later than commit a4ee1ca4. It pairs with
>> > smp_mb() in the walk_shadow_page_lockless_begin/end() to keep order
>> > between modifications of the page tables and reading mode.
> Yes; it also pairs with the smp_mb__after_srcu_unlock() in vcpu_enter_guest.
>
>> > The smp_mb() in the kvm_make_all_cpus_request() was introduced by commit
>> > 6b7e2d09. It keeps order between setting request bit and reading mode.
> Yes.
>
>>> >> So you can:
>>> >>
>>> >> 1) add a comment to kvm_flush_remote_tlbs like:
>>> >>
>>> >> /*
>>> >> * We want to publish modifications to the page tables before reading
>>> >> * mode. Pairs with a memory barrier in arch-specific code.
>>> >> * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
>>> >> * - powerpc: smp_mb in kvmppc_prepare_to_enter.
>>> >> */
>>> >>
>>> >> 2) add a comment to vcpu_enter_guest and kvmppc_prepare_to_enter, saying
>>> >> that the memory barrier also orders the write to mode from any reads
>>> >> to the page tables done while the VCPU is running. In other words, on
>>> >> entry a single memory barrier achieves two purposes (write ->mode before
>>> >> reading requests, write ->mode before reading page tables).
>> >
>> > These sounds good.
>> >
>>> >> The same should be true in kvm_flush_remote_tlbs(). So you may investigate
>>> >> removing the barrier from kvm_flush_remote_tlbs, because
>>> >> kvm_make_all_cpus_request already includes a memory barrier. Like
>>> >> Thomas suggested, leave a comment in kvm_flush_remote_tlbs(),
>>> >> saying which memory barrier you are relying on and for what.
>> >
>> > If we remove the smp_mb() in the kvm_flush_remote_tlbs(), we need to
>> > leave comments both in the kvm_flush_remote_tlbs() and
>> > kvm_mmu_commit_zap_page(), right?
> Yes. In fact, instead of removing it, I would change it to
>
> smp_mb__before_atomic();
>
> with a comment that points to the addition of the barrier in commit
> a4ee1ca4. Unless Guangrong can enlighten us. :)
>
How about the following comments.
Log for kvm_mmu_commit_zap_page()
/*
* We need to make sure everyone sees our modifications to
* the page tables and see changes to vcpu->mode here. The
* barrier in the kvm_flush_remote_tlbs() helps us to achieve
* these. Otherwise, wait for all vcpus to exit guest mode
* and/or lockless shadow page table walks.
*/
kvm_flush_remote_tlbs(kvm);
Log for kvm_flush_remote_tlbs()
/*
* We want to publish modifications to the page tables before
* reading mode. Pairs with a memory barrier in arch-specific
* code.
* - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
* - powerpc: smp_mb in kvmppc_prepare_to_enter.
*/
smp_mb__before_atomic();
>>> >>
>>> >> And finally, the memory barrier in kvm_make_all_cpus_request can become
>>> >> smp_mb__after_atomic, which is free on x86.
>> >
>> > I found you have done this in your tree :)
>> >
>> > commit 5b06b60b72b73cbe3805d2a64d223e0264bd0479
>> > Author: Paolo Bonzini <[email protected]>
>> > Date: Wed Feb 24 12:44:17 2016 +0100
>> >
>> > KVM: mark memory barrier with smp_mb__after_atomic
>> >
>> > Signed-off-by: Paolo Bonzini <[email protected]>
> Yeah, but I reverted it because it makes sense to do it together with
> your patch.
>
> Paolo
--
Best regards
Tianyu Lan
On 03/08/2016 11:27 PM, Paolo Bonzini wrote:
>
>
> On 08/03/2016 09:36, Lan Tianyu wrote:
>> Summary about smp_mb()s we met in this thread. If misunderstood, please
>> correct me. Thanks.
>>
>> The smp_mb() in the kvm_flush_remote_tlbs() was introduced by the commit
>> a4ee1ca4 and it seems to keep the order of reading and cmpxchg
>> kvm->tlbs_dirty.
>>
>> Quote from Avi:
>> | I don't think we need to flush immediately; set a "tlb dirty" bit
>> somewhere
>> | that is cleareded when we flush the tlb.
>> kvm_mmu_notifier_invalidate_page()
>> | can consult the bit and force a flush if set.
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> Signed-off-by: Marcelo Tosatti <[email protected]>
>
> Unfortunately that patch added a bad memory barrier: 1) it lacks a
> comment; 2) it lacks obvious pairing; 3) it is an smp_mb() after a read,
> so it's not even obvious that this memory barrier has to do with the
> immediately preceding read of kvm->tlbs_dirty. It also is not
> documented in Documentation/virtual/kvm/mmu.txt (Guangrong documented
> there most of his other work, back in 2013, but not this one :)).
>
> The cmpxchg is ordered anyway against the read, because 1) x86 has
> implicit ordering between earlier loads and later stores; 2) even
> store-load barriers are unnecessary for accesses to the same variable
> (in this case kvm->tlbs_dirty).
>
> So offhand, I cannot say what it orders. There are two possibilities:
>
> 1) it orders the read of tlbs_dirty with the read of mode. In this
> case, a smp_rmb() would have been enough, and it's not clear where is
> the matching smp_wmb().
>
> 2) it orders the read of tlbs_dirty with the KVM_REQ_TLB_FLUSH request.
> In this case a smp_load_acquire would be better.
>
> 3) it does the same as kvm_mmu_commit_zap_page's smp_mb() but for other
> callers of kvm_flush_remote_tlbs(). In this case, we know what's the
> matching memory barrier (walk_shadow_page_lockless_*).
>
> 4) it is completely unnecessary.
Sorry, memory barriers were missed in sync_page(), this diff should fix it:
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 91e939b..4cad57f 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -948,6 +948,12 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
return -EINVAL;
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
+ /*
+ * update spte before increasing tlbs_dirty to make sure no tlb
+ * flush in lost after spte is zapped, see the comments in
+ * kvm_flush_remote_tlbs().
+ */
+ smp_wmb();
vcpu->kvm->tlbs_dirty++;
continue;
}
@@ -963,6 +969,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
if (gfn != sp->gfns[i]) {
drop_spte(vcpu->kvm, &sp->spt[i]);
+ /* the same as above where we are doing prefetch_invalid_gpte(). */
+ smp_wmb();
vcpu->kvm->tlbs_dirty++;
continue;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 314c777..82c86ea 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -193,7 +193,12 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
{
long dirty_count = kvm->tlbs_dirty;
+ /*
+ * read tlbs_dirty before doing tlb flush to make sure not tlb request is
+ * lost.
+ */
smp_mb();
+
if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm->stat.remote_tlb_flush;
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
Any comment?
On 10/03/2016 15:40, Xiao Guangrong wrote:
> long dirty_count = kvm->tlbs_dirty;
>
> + /*
> + * read tlbs_dirty before doing tlb flush to make sure not tlb
> request is
> + * lost.
> + */
> smp_mb();
> +
> if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
> ++kvm->stat.remote_tlb_flush;
> cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>
>
> Any comment?
Compared to smp_load_acquire(), smp_mb() adds an ordering between stores
and loads. Is the
The load of kvm->tlbs_dirty should then be
/*
* Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
* kvm_make_all_cpus_request. This
*/
long dirty_count = smp_load_acquire(kvm->tlbs_dirty);
Tianyu, I think Xiao provided the information that I was missing. Would
you like to prepare the patch?
Thanks,
Paolo
On 10/03/2016 16:26, Paolo Bonzini wrote:
> Compared to smp_load_acquire(), smp_mb() adds an ordering between stores
> and loads.
Here, the ordering is load-store, hence...
> The load of kvm->tlbs_dirty should then be
>
> /*
> * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
> * kvm_make_all_cpus_request. This
> */
> long dirty_count = smp_load_acquire(kvm->tlbs_dirty);
>
> Tianyu, I think Xiao provided the information that I was missing. Would
> you like to prepare the patch?
Thanks,
Paolo
On 03/10/2016 11:31 PM, Paolo Bonzini wrote:
>
>
> On 10/03/2016 16:26, Paolo Bonzini wrote:
>> Compared to smp_load_acquire(), smp_mb() adds an ordering between stores
>> and loads.
>
> Here, the ordering is load-store, hence...
Yes, this is why i put smp_mb() in the code. :)
On 10/03/2016 16:45, Xiao Guangrong wrote:
>>
>>> Compared to smp_load_acquire(), smp_mb() adds an ordering between stores
>>> and loads.
>>
>> Here, the ordering is load-store, hence...
>
> Yes, this is why i put smp_mb() in the code. :)
Here is a table of barriers:
'. after| |
before '. | load | store
__________'.|___________________|________________________
| |
| smp_rmb | smp_load_acquire
load | smp_load_acquire | smp_store_release XX
| smp_mb | smp_mb
____________|___________________|________________________
| |
| | smp_wmb
store | smp_mb | smp_store_release
| | smp_mb
| |
Your case is the one marked with XX, so a smp_load_acquire() is
enough---and it's preferrable, because it's cheaper than smp_mb() and
more self-documenting.
Paolo
On 09/03/2016 08:18, Lan Tianyu wrote:
> How about the following comments.
>
> Log for kvm_mmu_commit_zap_page()
> /*
> * We need to make sure everyone sees our modifications to
> * the page tables and see changes to vcpu->mode here.
Please mention that this pairs with vcpu_enter_guest and
walk_shadow_page_lockless_begin/end.
> The
> * barrier in the kvm_flush_remote_tlbs() helps us to achieve
> * these. Otherwise, wait for all vcpus to exit guest mode
> * and/or lockless shadow page table walks.
> */
> kvm_flush_remote_tlbs(kvm);
The rest of the comment is okay, but please replace "Otherwise" with "In
addition, we need to".
> Log for kvm_flush_remote_tlbs()
> /*
> * We want to publish modifications to the page tables before
> * reading mode. Pairs with a memory barrier in arch-specific
> * code.
> * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
... and smp_mb in walk_shadow_page_lockless_begin/end.
> * - powerpc: smp_mb in kvmppc_prepare_to_enter.
> */
> smp_mb__before_atomic();
The comment looks good, but the smp_mb__before_atomic() is not needed.
As mentioned in the reply to Guangrong, only a smp_load_acquire is required.
So the comment should say something like "There is already an smp_mb()
before kvm_make_all_cpus_request reads vcpu->mode. We reuse that
barrier here.".
On top of this there is:
- the change to paging_tmpl.h that Guangrong posted, adding smp_wmb()
before each increment of vcpu->kvm->tlbs_dirty
- the change to smp_mb__after_atomic() in kvm_make_all_cpus_request
- if you want :) you can also replace the store+mb in
walk_shadow_page_lockless_begin with smp_store_mb, and the mb+store in
walk_shadow_page_lockless_end with smp_store_release.
Thanks,
Paolo
On 03/11/2016 12:04 AM, Paolo Bonzini wrote:
>
>
> On 10/03/2016 16:45, Xiao Guangrong wrote:
>>>
>>>> Compared to smp_load_acquire(), smp_mb() adds an ordering between stores
>>>> and loads.
>>>
>>> Here, the ordering is load-store, hence...
>>
>> Yes, this is why i put smp_mb() in the code. :)
>
> Here is a table of barriers:
>
>
> '. after| |
> before '. | load | store
> __________'.|___________________|________________________
> | |
> | smp_rmb | smp_load_acquire
> load | smp_load_acquire | smp_store_release XX
> | smp_mb | smp_mb
> ____________|___________________|________________________
> | |
> | | smp_wmb
> store | smp_mb | smp_store_release
> | | smp_mb
> | |
>
> Your case is the one marked with XX, so a smp_load_acquire() is
> enough---and it's preferrable, because it's cheaper than smp_mb() and
> more self-documenting.
Yes, you are right and thank you for pointing it out.
On 03/11/2016 01:07 AM, Paolo Bonzini wrote:
> On 09/03/2016 08:18, Lan Tianyu wrote:
>> How about the following comments.
>>
>> Log for kvm_mmu_commit_zap_page()
>> /*
>> * We need to make sure everyone sees our modifications to
>> * the page tables and see changes to vcpu->mode here.
>
> Please mention that this pairs with vcpu_enter_guest and
> walk_shadow_page_lockless_begin/end.
>
>> The
>> * barrier in the kvm_flush_remote_tlbs() helps us to achieve
>> * these. Otherwise, wait for all vcpus to exit guest mode
>> * and/or lockless shadow page table walks.
>> */
>> kvm_flush_remote_tlbs(kvm);
>
> The rest of the comment is okay, but please replace "Otherwise" with "In
> addition, we need to".
>
>> Log for kvm_flush_remote_tlbs()
>> /*
>> * We want to publish modifications to the page tables before
>> * reading mode. Pairs with a memory barrier in arch-specific
>> * code.
>> * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
>
> ... and smp_mb in walk_shadow_page_lockless_begin/end.
>
>> * - powerpc: smp_mb in kvmppc_prepare_to_enter.
>> */
>> smp_mb__before_atomic();
>
> The comment looks good, but the smp_mb__before_atomic() is not needed.
> As mentioned in the reply to Guangrong, only a smp_load_acquire is required.
>
> So the comment should say something like "There is already an smp_mb()
> before kvm_make_all_cpus_request reads vcpu->mode. We reuse that
> barrier here.".
>
> On top of this there is:
>
> - the change to paging_tmpl.h that Guangrong posted, adding smp_wmb()
> before each increment of vcpu->kvm->tlbs_dirty
Yes, please make it as a separated patch.
>
> - the change to smp_mb__after_atomic() in kvm_make_all_cpus_request
>
> - if you want :) you can also replace the store+mb in
> walk_shadow_page_lockless_begin with smp_store_mb, and the mb+store in
> walk_shadow_page_lockless_end with smp_store_release.
These changes are good to me.
TianYu, please CC me when you post the new version out.
On 2016年03月10日 23:26, Paolo Bonzini wrote:
>
>
> On 10/03/2016 15:40, Xiao Guangrong wrote:
>> long dirty_count = kvm->tlbs_dirty;
>>
>> + /*
>> + * read tlbs_dirty before doing tlb flush to make sure not tlb
>> request is
>> + * lost.
>> + */
>> smp_mb();
>> +
>> if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>> ++kvm->stat.remote_tlb_flush;
>> cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
>>
>>
>> Any comment?
>
> Compared to smp_load_acquire(), smp_mb() adds an ordering between stores
> and loads. Is the
>
> The load of kvm->tlbs_dirty should then be
>
> /*
> * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
> * kvm_make_all_cpus_request. This
> */
> long dirty_count = smp_load_acquire(kvm->tlbs_dirty);
>
> Tianyu, I think Xiao provided the information that I was missing. Would
> you like to prepare the patch?
Paolo:
Sure. I will do that.
Guangrong:
Thanks a lot for your input.
--
Best regards
Tianyu Lan