This series is to clear up kvm mmu memory barriers.
1) Remove redundant barrier (PATCH 1)
2) Replace origin barrier functions with preferrable ones (PATCH 2, 3, 5)
3) Fix unpaired barriers (PATCH 4)
4) Update or add barrier related comments (PATCH 6, 7)
Lan Tianyu (7):
KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()
KVM/x86: Replace smp_mb() with smp_store_mb/release() in the
walk_shadow_page_lockless_begin/end()
KVM: Replace smp_mb() with smp_mb_after_atomic() in the
kvm_make_all_cpus_request()
KVM/x86: Call smp_wmb() before increasing tlbs_dirty
KVM: Replace smp_mb() with smp_load_acquire() in the
kvm_flush_remote_tlbs()
KVM/x86: update the comment of memory barrier in the
vcpu_enter_guest()
KVM/PPC: update the comment of memory barrier in the
kvmppc_prepare_to_enter()
arch/powerpc/kvm/powerpc.c | 3 +++
arch/x86/kvm/mmu.c | 23 ++++++++++-------------
arch/x86/kvm/paging_tmpl.h | 11 +++++++++++
arch/x86/kvm/x86.c | 8 ++++++--
virt/kvm/kvm_main.c | 22 ++++++++++++++++++----
5 files changed, 48 insertions(+), 19 deletions(-)
--
1.8.4.rc0.1.g8f6a3e5.dirty
Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5e795af..d1ee68c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -632,12 +632,12 @@ static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu)
* kvm_flush_remote_tlbs() IPI to all active vcpus.
*/
local_irq_disable();
- vcpu->mode = READING_SHADOW_PAGE_TABLES;
+
/*
* Make sure a following spte read is not reordered ahead of the write
* to vcpu->mode.
*/
- smp_mb();
+ smp_store_mb(vcpu->mode, READING_SHADOW_PAGE_TABLES);
}
static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
@@ -647,8 +647,7 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
* reads to sptes. If it does, kvm_commit_zap_page() can see us
* OUTSIDE_GUEST_MODE and proceed to free the shadow page table.
*/
- smp_mb();
- vcpu->mode = OUTSIDE_GUEST_MODE;
+ smp_store_release(&vcpu->mode, OUTSIDE_GUEST_MODE);
local_irq_enable();
}
--
1.8.4.rc0.1.g8f6a3e5.dirty
There is already a barrier inside of kvm_flush_remote_tlbs() which can
help to make sure everyone sees our modifications to the page tables and
see changes to vcpu->mode here. So remove the smp_mb in the
kvm_mmu_commit_zap_page() and update the comment.
Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/mmu.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2463de0..5e795af 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2390,14 +2390,12 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm,
return;
/*
- * wmb: make sure everyone sees our modifications to the page tables
- * rmb: make sure we see changes to vcpu->mode
- */
- smp_mb();
-
- /*
- * Wait for all vcpus to exit guest mode and/or lockless shadow
- * page table walks.
+ * We need to make sure everyone sees our modifications to
+ * the page tables and see changes to vcpu->mode here. The barrier
+ * in the kvm_flush_remote_tlbs() helps us to achieve these. This pairs
+ * with vcpu_enter_guest and walk_shadow_page_lockless_begin/end.
+ * In addition, wait for all vcpus to exit guest mode and/or lockless
+ * shadow page table walks.
*/
kvm_flush_remote_tlbs(kvm);
--
1.8.4.rc0.1.g8f6a3e5.dirty
The barrier also orders the write to mode from any reads
to the page tables done and so update the comment.
Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/x86.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bcbce0f..4bdb4e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6589,8 +6589,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
- /* We should set ->mode before check ->requests,
- * see the comment in make_all_cpus_request.
+ /*
+ * We should set ->mode before check ->requests,
+ * Please see the comment in kvm_make_all_cpus_request.
+ * This also orders the write to mode from any reads
+ * to the page tables done while the VCPU is running.
+ * Please see the comment in kvm_flush_remote_tlbs.
*/
smp_mb__after_srcu_read_unlock();
--
1.8.4.rc0.1.g8f6a3e5.dirty
smp_load_acquire() is enough here and it's cheaper than smp_mb().
Adding a comment about reusing memory barrier of kvm_make_all_cpus_request()
here to keep order between modifications to the page tables and reading mode.
Signed-off-by: Lan Tianyu <[email protected]>
---
virt/kvm/kvm_main.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ec5aa8d..39ebee9a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -191,9 +191,23 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
void kvm_flush_remote_tlbs(struct kvm *kvm)
{
- long dirty_count = kvm->tlbs_dirty;
+ /*
+ * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
+ * kvm_make_all_cpus_request.
+ */
+ long dirty_count = smp_load_acquire(&kvm->tlbs_dirty);
- smp_mb();
+ /*
+ * We want to publish modifications to the page tables before reading
+ * mode. Pairs with a memory barrier in arch-specific code.
+ * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest
+ * and smp_mb in walk_shadow_page_lockless_begin/end.
+ * - powerpc: smp_mb in kvmppc_prepare_to_enter.
+ *
+ * There is already an smp_mb__after_atomic() before
+ * kvm_make_all_cpus_request() reads vcpu->mode. We reuse that
+ * barrier here.
+ */
if (kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
++kvm->stat.remote_tlb_flush;
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
--
1.8.4.rc0.1.g8f6a3e5.dirty
The barrier also orders the write to mode from any reads
to the page tables done and so update the comment.
Signed-off-by: Lan Tianyu <[email protected]>
---
arch/powerpc/kvm/powerpc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 19aa59b..6a68730 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -96,6 +96,9 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
* so we don't miss a request because the requester sees
* OUTSIDE_GUEST_MODE and assumes we'll be checking requests
* before next entering the guest (and thus doesn't IPI).
+ * This also orders the write to mode from any reads
+ * to the page tables done while the VCPU is running.
+ * Please see the comment in kvm_flush_remote_tlbs.
*/
smp_mb();
--
1.8.4.rc0.1.g8f6a3e5.dirty
Update spte before increasing tlbs_dirty to make sure no tlb flush
in lost after spte is zapped. This pairs with the barrier in the
kvm_flush_remote_tlbs().
Signed-off-by: Lan Tianyu <[email protected]>
---
arch/x86/kvm/paging_tmpl.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index e159a81..d34475e 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -949,6 +949,12 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
return 0;
if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) {
+ /*
+ * Update spte before increasing tlbs_dirty to make sure
+ * no tlb flush in lost after spte is zapped, see the
+ * comments in kvm_flush_remote_tlbs().
+ */
+ smp_wmb();
vcpu->kvm->tlbs_dirty++;
continue;
}
@@ -964,6 +970,11 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
if (gfn != sp->gfns[i]) {
drop_spte(vcpu->kvm, &sp->spt[i]);
+ /*
+ * The same as above where we are doing
+ * prefetch_invalid_gpte().
+ */
+ smp_wmb();
vcpu->kvm->tlbs_dirty++;
continue;
}
--
1.8.4.rc0.1.g8f6a3e5.dirty
Signed-off-by: Lan Tianyu <[email protected]>
---
virt/kvm/kvm_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1eae052..ec5aa8d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -170,8 +170,8 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
kvm_make_request(req, vcpu);
cpu = vcpu->cpu;
- /* Set ->requests bit before we read ->mode */
- smp_mb();
+ /* Set ->requests bit before we read ->mode. */
+ smp_mb__after_atomic();
if (cpus != NULL && cpu != -1 && cpu != me &&
kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE)
--
1.8.4.rc0.1.g8f6a3e5.dirty
On 13/03/2016 04:10, Lan Tianyu wrote:
> This series is to clear up kvm mmu memory barriers.
> 1) Remove redundant barrier (PATCH 1)
> 2) Replace origin barrier functions with preferrable ones (PATCH 2, 3, 5)
> 3) Fix unpaired barriers (PATCH 4)
> 4) Update or add barrier related comments (PATCH 6, 7)
Thanks, this looks pretty good! I will apply it for 4.6 if I have to
send two pull requests during this merge window; otherwise, it will have
to wait for the next merge window.
Paolo