2021-12-16 02:19:26

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 0/3] KVM: x86: Fixes for kvm/queue

From: Lai Jiangshan <[email protected]>

Patch 1 and patch 2 are updated version of the original patches with
the same title. The original patches need to be dequeued. (Paolo has
sent the reverting patches to the mail list and done the work, but I
haven't seen the original patches dequeued or reverted in the public
kvm tree. I need to learn a bit more how patches are managed in kvm
tree.)

Patch 3 fixes for commit c62c7bd4f95b ("KVM: VMX: Update vmcs.GUEST_CR3
only when the guest CR3 is dirty"). Patch 3 is better to be reordered
to before the commit since the commit has not yet into Linus' tree.


Lai Jiangshan (3):
KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()
KVM: X86: Ensure pae_root to be reconstructed for shadow paging if the
guest PDPTEs is changed
KVM: VMX: Mark VCPU_EXREG_CR3 dirty when !CR0_PG -> CR0_PG if EPT +
!URG

arch/x86/kvm/vmx/nested.c | 11 +++--------
arch/x86/kvm/vmx/vmx.c | 28 ++++++++++++++++++----------
arch/x86/kvm/vmx/vmx.h | 5 +++--
arch/x86/kvm/x86.c | 7 +++++++
4 files changed, 31 insertions(+), 20 deletions(-)

--
2.19.1.6.gb485710b



2021-12-16 02:19:33

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 1/3] KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()

From: Lai Jiangshan <[email protected]>

The host CR3 in the vcpu thread can only be changed when scheduling.

So HOST_CR3 can be saved only in vmx_prepare_switch_to_guest() and be
synced in vmx_sync_vmcs_host_state() when switching VMCS.

vmx_set_host_fs_gs() is called in both places, so it is renamed to
vmx_set_vmcs_host_state() and it also updates the HOST_CR3.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 11 +++--------
arch/x86/kvm/vmx/vmx.c | 21 +++++++++++----------
arch/x86/kvm/vmx/vmx.h | 5 +++--
3 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 26b236187850..d07a7fa75783 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -245,7 +245,8 @@ static void vmx_sync_vmcs_host_state(struct vcpu_vmx *vmx,
src = &prev->host_state;
dest = &vmx->loaded_vmcs->host_state;

- vmx_set_host_fs_gs(dest, src->fs_sel, src->gs_sel, src->fs_base, src->gs_base);
+ vmx_set_vmcs_host_state(dest, src->cr3, src->fs_sel, src->gs_sel,
+ src->fs_base, src->gs_base);
dest->ldt_sel = src->ldt_sel;
#ifdef CONFIG_X86_64
dest->ds_sel = src->ds_sel;
@@ -3054,7 +3055,7 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- unsigned long cr3, cr4;
+ unsigned long cr4;
bool vm_fail;

if (!nested_early_check)
@@ -3077,12 +3078,6 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
*/
vmcs_writel(GUEST_RFLAGS, 0);

- cr3 = __get_current_cr3_fast();
- if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) {
- vmcs_writel(HOST_CR3, cr3);
- vmx->loaded_vmcs->host_state.cr3 = cr3;
- }
-
cr4 = cr4_read_shadow();
if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) {
vmcs_writel(HOST_CR4, cr4);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7826556b2a47..5f281f5ee961 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1069,9 +1069,14 @@ static void pt_guest_exit(struct vcpu_vmx *vmx)
wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl);
}

-void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
- unsigned long fs_base, unsigned long gs_base)
+void vmx_set_vmcs_host_state(struct vmcs_host_state *host, unsigned long cr3,
+ u16 fs_sel, u16 gs_sel,
+ unsigned long fs_base, unsigned long gs_base)
{
+ if (unlikely(cr3 != host->cr3)) {
+ vmcs_writel(HOST_CR3, cr3);
+ host->cr3 = cr3;
+ }
if (unlikely(fs_sel != host->fs_sel)) {
if (!(fs_sel & 7))
vmcs_write16(HOST_FS_SELECTOR, fs_sel);
@@ -1166,7 +1171,9 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu)
gs_base = segment_base(gs_sel);
#endif

- vmx_set_host_fs_gs(host_state, fs_sel, gs_sel, fs_base, gs_base);
+ vmx_set_vmcs_host_state(host_state, __get_current_cr3_fast(),
+ fs_sel, gs_sel, fs_base, gs_base);
+
vmx->guest_state_loaded = true;
}

@@ -6629,7 +6636,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- unsigned long cr3, cr4;
+ unsigned long cr4;

/* Record the guest's net vcpu time for enforced NMI injections. */
if (unlikely(!enable_vnmi &&
@@ -6674,12 +6681,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);
vcpu->arch.regs_dirty = 0;

- cr3 = __get_current_cr3_fast();
- if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) {
- vmcs_writel(HOST_CR3, cr3);
- vmx->loaded_vmcs->host_state.cr3 = cr3;
- }
-
cr4 = cr4_read_shadow();
if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) {
vmcs_writel(HOST_CR4, cr4);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 99588aa8474b..acb874db02da 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -374,8 +374,9 @@ int allocate_vpid(void);
void free_vpid(int vpid);
void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
-void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
- unsigned long fs_base, unsigned long gs_base);
+void vmx_set_vmcs_host_state(struct vmcs_host_state *host, unsigned long cr3,
+ u16 fs_sel, u16 gs_sel,
+ unsigned long fs_base, unsigned long gs_base);
int vmx_get_cpl(struct kvm_vcpu *vcpu);
bool vmx_emulation_required(struct kvm_vcpu *vcpu);
unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu);
--
2.19.1.6.gb485710b


2021-12-16 02:19:38

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH V2 2/3] KVM: x86/mmu: Ensure pae_root to be reconstructed for shadow paging if the guest PDPTEs is changed

From: Lai Jiangshan <[email protected]>

For shadow paging, the pae_root needs to be reconstructed before the
coming VMENTER if the guest PDPTEs is changed.

But not all paths that call load_pdptrs() will cause the pae_root to be
reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots()
are used to launch later reconstruction.

The commit d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and
CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs()
when changing CR0.CD and CR0.NW.

The commit 21823fbda552("KVM: x86: Invalidate all PGDs for the current
PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after
load_pdptrs() when rewriting the CR3 with the same value.

The commit a91a7c709600("KVM: X86: Don't reset mmu context when
toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after
load_pdptrs() when changing CR4.PGE.

Guests like linux would keep the PDPTEs unchanged for every instance of
pagetable, so this missing reconstruction has no problem for linux
guests.

Fixes: d81135a57aa6("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")
Fixes: 21823fbda552("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush")
Fixes: a91a7c709600("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE")
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/x86.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index af22ad79e081..71c389a47801 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -841,6 +841,13 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
}
}

+ /*
+ * Marking VCPU_EXREG_PDPTR dirty doesn't work for !tdp_enabled.
+ * Shadow page roots need to be reconstructed instead.
+ */
+ if (!tdp_enabled && memcmp(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)))
+ kvm_mmu_free_roots(vcpu, mmu, KVM_MMU_ROOT_CURRENT);
+
memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs));
kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR);
kvm_make_request(KVM_REQ_LOAD_MMU_PGD, vcpu);
--
2.19.1.6.gb485710b


2021-12-16 02:19:43

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 3/3] KVM: VMX: Mark VCPU_EXREG_CR3 dirty when !CR0_PG -> CR0_PG if EPT + !URG

From: Lai Jiangshan <[email protected]>

When !CR0_PG -> CR0_PG, vcpu->arch.cr3 becomes active, but GUEST_CR3 is
still vmx->ept_identity_map_addr if EPT + !URG. So VCPU_EXREG_CR3 is
considered to be dirty and GUEST_CR3 needs to be updated in this case.

Reported-by: Maxim Levitsky <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5f281f5ee961..e2d535fe0387 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3066,6 +3066,13 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
/* Note, vmx_set_cr4() consumes the new vcpu->arch.cr0. */
if ((old_cr0_pg ^ cr0) & X86_CR0_PG)
vmx_set_cr4(vcpu, kvm_read_cr4(vcpu));
+
+ /*
+ * When !CR0_PG -> CR0_PG, vcpu->arch.cr3 becomes active, but
+ * GUEST_CR3 is still vmx->ept_identity_map_addr if EPT + !URG.
+ */
+ if (!(old_cr0_pg & X86_CR0_PG) && (cr0 & X86_CR0_PG))
+ kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3);
}

/* depends on vcpu->arch.cr0 to be set to a new value */
--
2.19.1.6.gb485710b


2021-12-20 16:49:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/3] KVM: x86: Fixes for kvm/queue

On 12/16/21 03:19, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> Patch 1 and patch 2 are updated version of the original patches with
> the same title. The original patches need to be dequeued. (Paolo has
> sent the reverting patches to the mail list and done the work, but I
> haven't seen the original patches dequeued or reverted in the public
> kvm tree. I need to learn a bit more how patches are managed in kvm
> tree.)

This cycle has been a bit more disorganized than usual, due to me taking
some time off and a very unusual amount of patches sent for -rc.
Usually kvm/queue is updated about once a week and kvm/next once every
1-2 weeks.

> Patch 3 fixes for commit c62c7bd4f95b ("KVM: VMX: Update vmcs.GUEST_CR3
> only when the guest CR3 is dirty"). Patch 3 is better to be reordered
> to before the commit since the commit has not yet into Linus' tree.
>
>
> Lai Jiangshan (3):
> KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()
> KVM: X86: Ensure pae_root to be reconstructed for shadow paging if the
> guest PDPTEs is changed
> KVM: VMX: Mark VCPU_EXREG_CR3 dirty when !CR0_PG -> CR0_PG if EPT +
> !URG
>
> arch/x86/kvm/vmx/nested.c | 11 +++--------
> arch/x86/kvm/vmx/vmx.c | 28 ++++++++++++++++++----------
> arch/x86/kvm/vmx/vmx.h | 5 +++--
> arch/x86/kvm/x86.c | 7 +++++++
> 4 files changed, 31 insertions(+), 20 deletions(-)
>

Queued, thanks.

Paolo

2022-02-13 06:17:27

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/3] KVM: x86: Fixes for kvm/queue

On 2/11/22 17:31, Sean Christopherson wrote:
>> Maybe the patch "Revert "KVM: VMX: Save HOST_CR3 in
>> vmx_prepare_switch_to_guest()"" is still missing in the latest
>> kvm/queue, I saw the same warning.
>
> It hasn't made it way to Linus either.

This was supposed to fix the buggy patch, too:

commit a9f2705ec84449e3b8d70c804766f8e97e23080d
Author: Lai Jiangshan <[email protected]>
Date: Thu Dec 16 10:19:36 2021 +0800

KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()

The host CR3 in the vcpu thread can only be changed when scheduling,
so commit 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
changed vmx.c to only save it in vmx_prepare_switch_to_guest().

However, it also has to be synced in vmx_sync_vmcs_host_state() when switching VMCS.
vmx_set_host_fs_gs() is called in both places, so rename it to
vmx_set_vmcs_host_state() and make it update HOST_CR3.

Fixes: 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
Signed-off-by: Lai Jiangshan <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>

Paolo

2022-02-13 06:17:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/3] KVM: x86: Fixes for kvm/queue

On Fri, Feb 11, 2022, Wanpeng Li wrote:
> On Tue, 21 Dec 2021 at 04:13, Paolo Bonzini <[email protected]> wrote:
> >
> > On 12/16/21 03:19, Lai Jiangshan wrote:
> > > From: Lai Jiangshan <[email protected]>
> > >
> > > Patch 1 and patch 2 are updated version of the original patches with
> > > the same title. The original patches need to be dequeued. (Paolo has
> > > sent the reverting patches to the mail list and done the work, but I
> > > haven't seen the original patches dequeued or reverted in the public
> > > kvm tree. I need to learn a bit more how patches are managed in kvm
> > > tree.)
> >
> > This cycle has been a bit more disorganized than usual, due to me taking
> > some time off and a very unusual amount of patches sent for -rc.
> > Usually kvm/queue is updated about once a week and kvm/next once every
> > 1-2 weeks.
>
> Maybe the patch "Revert "KVM: VMX: Save HOST_CR3 in
> vmx_prepare_switch_to_guest()"" is still missing in the latest
> kvm/queue, I saw the same warning.

It hasn't made it way to Linus either.

2022-02-14 10:02:44

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH 0/3] KVM: x86: Fixes for kvm/queue

On Tue, 21 Dec 2021 at 04:13, Paolo Bonzini <[email protected]> wrote:
>
> On 12/16/21 03:19, Lai Jiangshan wrote:
> > From: Lai Jiangshan <[email protected]>
> >
> > Patch 1 and patch 2 are updated version of the original patches with
> > the same title. The original patches need to be dequeued. (Paolo has
> > sent the reverting patches to the mail list and done the work, but I
> > haven't seen the original patches dequeued or reverted in the public
> > kvm tree. I need to learn a bit more how patches are managed in kvm
> > tree.)
>
> This cycle has been a bit more disorganized than usual, due to me taking
> some time off and a very unusual amount of patches sent for -rc.
> Usually kvm/queue is updated about once a week and kvm/next once every
> 1-2 weeks.

Maybe the patch "Revert "KVM: VMX: Save HOST_CR3 in
vmx_prepare_switch_to_guest()"" is still missing in the latest
kvm/queue, I saw the same warning.

Wanpeng

2022-02-24 19:19:49

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/3] KVM: x86: Fixes for kvm/queue

On Fri, Feb 11, 2022, Paolo Bonzini wrote:
> On 2/11/22 17:31, Sean Christopherson wrote:
> > > Maybe the patch "Revert "KVM: VMX: Save HOST_CR3 in
> > > vmx_prepare_switch_to_guest()"" is still missing in the latest
> > > kvm/queue, I saw the same warning.
> >
> > It hasn't made it way to Linus either.
>
> This was supposed to fix the buggy patch, too:
>
> commit a9f2705ec84449e3b8d70c804766f8e97e23080d
> Author: Lai Jiangshan <[email protected]>
> Date: Thu Dec 16 10:19:36 2021 +0800
>
> KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs()
> The host CR3 in the vcpu thread can only be changed when scheduling,
> so commit 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
> changed vmx.c to only save it in vmx_prepare_switch_to_guest().
> However, it also has to be synced in vmx_sync_vmcs_host_state() when switching VMCS.
> vmx_set_host_fs_gs() is called in both places, so rename it to
> vmx_set_vmcs_host_state() and make it update HOST_CR3.
> Fixes: 15ad9762d69f ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
> Signed-off-by: Lai Jiangshan <[email protected]>
> Message-Id: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>

The underlying premise that CR3 can change only when scheduling is wrong, reverts
incoming...