From: Lai Jiangshan <[email protected]>
[ Upstream commit e45e9e3998f0001079b09555db5bb3b4257f6746 ]
The KVM doesn't know whether any TLB for a specific pcid is cached in
the CPU when tdp is enabled. So it is better to flush all the guest
TLB when invalidating any single PCID context.
The case is very rare or even impossible since KVM generally doesn't
intercept CR3 write or INVPCID instructions when tdp is enabled, so the
fix is mostly for the sake of overall robustness.
Signed-off-by: Lai Jiangshan <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/kvm/x86.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c48e2b5729c5d..0644f429f848c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1091,6 +1091,18 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
unsigned long roots_to_free = 0;
int i;
+ /*
+ * MOV CR3 and INVPCID are usually not intercepted when using TDP, but
+ * this is reachable when running EPT=1 and unrestricted_guest=0, and
+ * also via the emulator. KVM's TDP page tables are not in the scope of
+ * the invalidation, but the guest's TLB entries need to be flushed as
+ * the CPU may have cached entries in its TLB for the target PCID.
+ */
+ if (unlikely(tdp_enabled)) {
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+ return;
+ }
+
/*
* If neither the current CR3 nor any of the prev_roots use the given
* PCID, then nothing needs to be done here because a resync will
--
2.33.0
From: Lai Jiangshan <[email protected]>
[ Upstream commit 552617382c197949ff965a3559da8952bf3c1fa5 ]
X86_CR4_PCIDE doesn't participate in kvm_mmu_role, so the mmu context
doesn't need to be reset. It is only required to flush all the guest
tlb.
Signed-off-by: Lai Jiangshan <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/kvm/x86.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0644f429f848c..98a0f3c28caec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1042,9 +1042,10 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
{
- if (((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) ||
- (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
+ if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
kvm_mmu_reset_context(vcpu);
+ else if (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE))
+ kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
}
EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
--
2.33.0
From: Lai Jiangshan <[email protected]>
[ Upstream commit 8b8f9d753b84c243bf0b1004b515c53b7ec7e138 ]
If the original spte is writable, the target gfn should not be the
gfn of synchronized shadowpage and can continue to be writable.
When !can_unsync, speculative must be false. So when the check of
"!can_unsync" is removed, we need to move the label of "out" up.
Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/kvm/mmu/spte.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 3e97cdb13eb7e..86a21eb85d25f 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -150,7 +150,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
* is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots.
* Same reasoning can be applied to dirty page accounting.
*/
- if (!can_unsync && is_writable_pte(old_spte))
+ if (is_writable_pte(old_spte))
goto out;
/*
@@ -171,10 +171,10 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level,
if (pte_access & ACC_WRITE_MASK)
spte |= spte_shadow_dirty_mask(spte);
+out:
if (speculative)
spte = mark_spte_for_access_track(spte);
-out:
WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level),
"spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
--
2.33.0
From: Thomas Huth <[email protected]>
[ Upstream commit 22d7108ce47290d47e1ea83a28fbfc85e0ecf97e ]
The kvm_vm_free() statement here is currently dead code, since the loop
in front of it can only be left with the "goto done" that jumps right
after the kvm_vm_free(). Fix it by swapping the locations of the "done"
label and the kvm_vm_free().
Signed-off-by: Thomas Huth <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c | 3 +--
tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c
index f40fd097cb359..6f6fd189dda3f 100644
--- a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c
+++ b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c
@@ -109,8 +109,7 @@ int main(int argc, char *argv[])
}
}
- kvm_vm_free(vm);
-
done:
+ kvm_vm_free(vm);
return 0;
}
diff --git a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c
index 7e33a350b053a..e683d0ac3e45e 100644
--- a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c
+++ b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c
@@ -161,7 +161,7 @@ int main(int argc, char *argv[])
}
}
- kvm_vm_free(vm);
done:
+ kvm_vm_free(vm);
return 0;
}
--
2.33.0
From: Vitaly Kuznetsov <[email protected]>
[ Upstream commit 82cc27eff4486f8e79ef8faac1af1f5573039aa4 ]
KVM_CAP_NR_VCPUS is a legacy advisory value which on other architectures
return num_online_cpus() caped by KVM_CAP_NR_VCPUS or something else
(ppc and arm64 are special cases). On s390, KVM_CAP_NR_VCPUS returns
the same as KVM_CAP_MAX_VCPUS and this may turn out to be a bad
'advice'. Switch s390 to returning caped num_online_cpus() too.
Acked-by: Christian Borntraeger <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Reviewed-by: Christian Borntraeger <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 1c97493d21e10..31bf4bc5a23d7 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -585,6 +585,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_MAX_VCPUS;
else if (sclp.has_esca && sclp.has_64bscao)
r = KVM_S390_ESCA_CPU_SLOTS;
+ if (ext == KVM_CAP_NR_VCPUS)
+ r = min_t(unsigned int, num_online_cpus(), r);
break;
case KVM_CAP_S390_COW:
r = MACHINE_HAS_ESOP;
--
2.33.0
From: Vitaly Kuznetsov <[email protected]>
[ Upstream commit b7915d55b1ac0e68a7586697fa2d06c018135c49 ]
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/powerpc/kvm/powerpc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b4e6f70b97b94..8334563a46236 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -641,9 +641,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
* implementations just count online CPUs.
*/
if (hv_enabled)
- r = num_present_cpus();
+ r = min_t(unsigned int, num_present_cpus(), KVM_MAX_VCPUS);
else
- r = num_online_cpus();
+ r = min_t(unsigned int, num_online_cpus(), KVM_MAX_VCPUS);
break;
case KVM_CAP_MAX_VCPUS:
r = KVM_MAX_VCPUS;
--
2.33.0
From: Vitaly Kuznetsov <[email protected]>
[ Upstream commit 57a2e13ebdda8b65602b44ec8b80e385603eb84c ]
It doesn't make sense to return the recommended maximum number of
vCPUs which exceeds the maximum possible number of vCPUs.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/mips/kvm/mips.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 75c6f264c626c..713ac87fbeb59 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1067,7 +1067,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
case KVM_CAP_NR_VCPUS:
- r = num_online_cpus();
+ r = min_t(unsigned int, num_online_cpus(), KVM_MAX_VCPUS);
break;
case KVM_CAP_MAX_VCPUS:
r = KVM_MAX_VCPUS;
--
2.33.0
From: Vitaly Kuznetsov <[email protected]>
[ Upstream commit f60a00d7295057cb4baea5a321501efc72794453 ]
Generally, it doesn't make sense to return the recommended maximum number
of vCPUs which exceeds the maximum possible number of vCPUs.
Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs
depending on whether it is a system-wide ioctl or a per-VM one. Previously,
KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to
keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus()
which is what gets returned by system-wide KVM_CAP_MAX_VCPUS.
Signed-off-by: Vitaly Kuznetsov <[email protected]>
Message-Id: <[email protected]>
Acked-by: Marc Zyngier <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
arch/arm64/kvm/arm.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9b328bb05596a..1ed82b6d5e8db 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -223,7 +223,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = 1;
break;
case KVM_CAP_NR_VCPUS:
- r = num_online_cpus();
+ /*
+ * ARM64 treats KVM_CAP_NR_CPUS differently from all other
+ * architectures, as it does not always bound it to
+ * KVM_CAP_MAX_VCPUS. It should not matter much because
+ * this is just an advisory value.
+ */
+ r = min_t(unsigned int, num_online_cpus(),
+ kvm_arm_default_max_vcpus());
break;
case KVM_CAP_MAX_VCPUS:
case KVM_CAP_MAX_VCPU_ID:
--
2.33.0
Hello,
[PATCH MANUALSEL 5.15 2/8] KVM: X86: Don't reset mmu context when X86_CR4_PCIDE 1->0
[PATCH MANUALSEL 5.15 3/8] KVM: X86: Don't check unsync if the original spte is writible
are pure cleanups, they don't need to backport to stable.
Thanks
Lai
On 2021/11/24 00:36, Sasha Levin wrote:
> From: Lai Jiangshan <[email protected]>
>
> [ Upstream commit 552617382c197949ff965a3559da8952bf3c1fa5 ]
>
> X86_CR4_PCIDE doesn't participate in kvm_mmu_role, so the mmu context
> doesn't need to be reset. It is only required to flush all the guest
> tlb.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> Reviewed-by: Sean Christopherson <[email protected]>
> Message-Id: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> arch/x86/kvm/x86.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0644f429f848c..98a0f3c28caec 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1042,9 +1042,10 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
>
> void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4)
> {
> - if (((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) ||
> - (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
> + if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS)
> kvm_mmu_reset_context(vcpu);
> + else if (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE))
> + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> }
> EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
>
>
On 11/23/21 17:36, Sasha Levin wrote:
> From: Lai Jiangshan <[email protected]>
>
> [ Upstream commit e45e9e3998f0001079b09555db5bb3b4257f6746 ]
>
> The KVM doesn't know whether any TLB for a specific pcid is cached in
> the CPU when tdp is enabled. So it is better to flush all the guest
> TLB when invalidating any single PCID context.
>
> The case is very rare or even impossible since KVM generally doesn't
> intercept CR3 write or INVPCID instructions when tdp is enabled, so the
> fix is mostly for the sake of overall robustness.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> Message-Id: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
Acked-by: Paolo Bonzini <[email protected]>
for this patch, but not to all the others.
Paolo