Subject: [PATCH v4 0/7] KVM: nSVM: avoid TOC/TOU race when checking vmcb12

Currently there is a TOC/TOU race between the check of vmcb12's
efer, cr0 and cr4 registers and the later save of their values in
svm_set_*, because the guest could modify the values in the meanwhile.

To solve this issue, this series introduces and uses svm->nested.save
structure in enter_svm_guest_mode to save the current value of efer,
cr0 and cr4 and later use these to set the vcpu->arch.* state.

Similarly, svm->nested.ctl contains fields that are not used, so having
a full vmcb_control_area means passing uninitialized fields.

Patches 1,3 and 8 take care of renaming and refactoring code.
Patches 2 and 6 introduce respectively vmcb_ctrl_area_cached and
vmcb_save_area_cached.
Patches 4 and 5 use vmcb_save_area_cached to avoid TOC/TOU, and patch
7 uses vmcb_ctrl_area_cached.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>

---
v4:
* introduce _* helpers (_nested_vmcb_check_save,
_nested_copy_vmcb_control_to_cache, _nested_copy_vmcb_save_to_cache)
that take care of additional parameters.
* svm_set_nested_state: introduce {save, ctl}_cached variables
to not pollute svm->nested.{save,ctl} state, especially if the
check fails. remove also unnecessary memset added in previous versions.
* svm_get_nested_state: change stack variable ctl introduced in this series
into a pointer that will be zeroed and freed after it has been copied to user

v3:
* merge this series with "KVM: nSVM: use vmcb_ctrl_area_cached instead
of vmcb_control_area in nested state"
* rename "nested_load_save_from_vmcb12" in
"nested_copy_vmcb_save_to_cache"
* rename "nested_load_control_from_vmcb12" in
"nested_copy_vmcb_control_to_cache"
* change check functions (nested_vmcb_valid_sregs and nested_vmcb_valid_sregs)
to accept only the vcpu parameter, since we only check
nested state now
* rename "vmcb_is_intercept_cached" in "vmcb12_is_intercept"
and duplicate the implementation instead of calling vmcb_is_intercept

v2:
* svm->nested.save is a separate struct vmcb_save_area_cached,
and not vmcb_save_area.
* update also vmcb02->cr3 with svm->nested.save.cr3

RFC:
* use svm->nested.save instead of local variables.
* not dependent anymore from "KVM: nSVM: remove useless kvm_clear_*_queue"
* simplified patches, we just use the struct and not move the check
nearer to the TOU.

Emanuele Giuseppe Esposito (7):
KVM: nSVM: move nested_vmcb_check_cr3_cr4 logic in
nested_vmcb_valid_sregs
nSVM: introduce smv->nested.save to cache save area fields
nSVM: rename nested_load_control_from_vmcb12 in
nested_copy_vmcb_control_to_cache
nSVM: use vmcb_save_area_cached in nested_vmcb_valid_sregs()
nSVM: use svm->nested.save to load vmcb12 registers and avoid TOC/TOU
races
nSVM: introduce struct vmcb_ctrl_area_cached
nSVM: use vmcb_ctrl_area_cached instead of vmcb_control_area in struct
svm_nested_state

arch/x86/kvm/svm/nested.c | 250 ++++++++++++++++++++++++--------------
arch/x86/kvm/svm/svm.c | 7 +-
arch/x86/kvm/svm/svm.h | 57 ++++++++-
3 files changed, 212 insertions(+), 102 deletions(-)

--
2.27.0


Subject: [PATCH v4 3/7] nSVM: rename nested_load_control_from_vmcb12 in nested_copy_vmcb_control_to_cache

Following the same naming convention of the previous patch,
rename nested_load_control_from_vmcb12.
In addition, inline copy_vmcb_control_area as it is only called
by this function.

_nested_copy_vmcb_control_to_cache() works with vmcb_control_area
parameters and it will be useful in next patches, when we use
local variables instead of svm cached state.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
---
arch/x86/kvm/svm/nested.c | 80 +++++++++++++++++++--------------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 42 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b974b0edd9b5..c04f8750e1f7 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -163,37 +163,6 @@ void recalc_intercepts(struct vcpu_svm *svm)
vmcb_set_intercept(c, INTERCEPT_VMSAVE);
}

-static void copy_vmcb_control_area(struct vmcb_control_area *dst,
- struct vmcb_control_area *from)
-{
- unsigned int i;
-
- for (i = 0; i < MAX_INTERCEPT; i++)
- dst->intercepts[i] = from->intercepts[i];
-
- dst->iopm_base_pa = from->iopm_base_pa;
- dst->msrpm_base_pa = from->msrpm_base_pa;
- dst->tsc_offset = from->tsc_offset;
- /* asid not copied, it is handled manually for svm->vmcb. */
- dst->tlb_ctl = from->tlb_ctl;
- dst->int_ctl = from->int_ctl;
- dst->int_vector = from->int_vector;
- dst->int_state = from->int_state;
- dst->exit_code = from->exit_code;
- dst->exit_code_hi = from->exit_code_hi;
- dst->exit_info_1 = from->exit_info_1;
- dst->exit_info_2 = from->exit_info_2;
- dst->exit_int_info = from->exit_int_info;
- dst->exit_int_info_err = from->exit_int_info_err;
- dst->nested_ctl = from->nested_ctl;
- dst->event_inj = from->event_inj;
- dst->event_inj_err = from->event_inj_err;
- dst->nested_cr3 = from->nested_cr3;
- dst->virt_ext = from->virt_ext;
- dst->pause_filter_count = from->pause_filter_count;
- dst->pause_filter_thresh = from->pause_filter_thresh;
-}
-
static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
{
/*
@@ -302,15 +271,46 @@ static bool nested_vmcb_valid_sregs(struct kvm_vcpu *vcpu,
return true;
}

-void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
- struct vmcb_control_area *control)
+static
+void _nested_copy_vmcb_control_to_cache(struct vmcb_control_area *to,
+ struct vmcb_control_area *from)
{
- copy_vmcb_control_area(&svm->nested.ctl, control);
+ unsigned int i;
+
+ for (i = 0; i < MAX_INTERCEPT; i++)
+ to->intercepts[i] = from->intercepts[i];
+
+ to->iopm_base_pa = from->iopm_base_pa;
+ to->msrpm_base_pa = from->msrpm_base_pa;
+ to->tsc_offset = from->tsc_offset;
+ to->tlb_ctl = from->tlb_ctl;
+ to->int_ctl = from->int_ctl;
+ to->int_vector = from->int_vector;
+ to->int_state = from->int_state;
+ to->exit_code = from->exit_code;
+ to->exit_code_hi = from->exit_code_hi;
+ to->exit_info_1 = from->exit_info_1;
+ to->exit_info_2 = from->exit_info_2;
+ to->exit_int_info = from->exit_int_info;
+ to->exit_int_info_err = from->exit_int_info_err;
+ to->nested_ctl = from->nested_ctl;
+ to->event_inj = from->event_inj;
+ to->event_inj_err = from->event_inj_err;
+ to->nested_cr3 = from->nested_cr3;
+ to->virt_ext = from->virt_ext;
+ to->pause_filter_count = from->pause_filter_count;
+ to->pause_filter_thresh = from->pause_filter_thresh;
+
+ /* Copy asid here because nested_vmcb_check_controls will check it. */
+ to->asid = from->asid;
+ to->msrpm_base_pa &= ~0x0fffULL;
+ to->iopm_base_pa &= ~0x0fffULL;
+}

- /* Copy it here because nested_svm_check_controls will check it. */
- svm->nested.ctl.asid = control->asid;
- svm->nested.ctl.msrpm_base_pa &= ~0x0fffULL;
- svm->nested.ctl.iopm_base_pa &= ~0x0fffULL;
+void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm,
+ struct vmcb_control_area *control)
+{
+ _nested_copy_vmcb_control_to_cache(&svm->nested.ctl, control);
}

static void _nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
@@ -670,7 +670,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
if (WARN_ON_ONCE(!svm->nested.initialized))
return -EINVAL;

- nested_load_control_from_vmcb12(svm, &vmcb12->control);
+ nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);

if (!nested_vmcb_valid_sregs(vcpu, &vmcb12->save) ||
@@ -1406,7 +1406,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
svm->nested.vmcb12_gpa = kvm_state->hdr.svm.vmcb_pa;

svm_copy_vmrun_state(&svm->vmcb01.ptr->save, save);
- nested_load_control_from_vmcb12(svm, ctl);
+ nested_copy_vmcb_control_to_cache(svm, ctl);

svm_switch_vmcb(svm, &svm->nested.vmcb02);
nested_vmcb02_prepare_control(svm);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6565a3efabd1..4e586ce77591 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4377,7 +4377,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
*/

vmcb12 = map.hva;
- nested_load_control_from_vmcb12(svm, &vmcb12->control);
+ nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, false);

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 09621f4891f8..4346f6053432 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -497,7 +497,7 @@ int nested_svm_check_permissions(struct kvm_vcpu *vcpu);
int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
bool has_error_code, u32 error_code);
int nested_svm_exit_special(struct vcpu_svm *svm);
-void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
+void nested_copy_vmcb_control_to_cache(struct vcpu_svm *svm,
struct vmcb_control_area *control);
void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
struct vmcb_save_area *save);
--
2.27.0

Subject: [PATCH v4 5/7] nSVM: use svm->nested.save to load vmcb12 registers and avoid TOC/TOU races

Use the already checked svm->nested.save cached fields
(EFER, CR0, CR4, ...) instead of vmcb12's in
nested_vmcb02_prepare_save().
This prevents from creating TOC/TOU races, since the
guest could modify the vmcb12 fields.

This also avoids the need of force-setting EFER_SVME in
nested_vmcb02_prepare_save.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/nested.c | 24 ++++++------------------
1 file changed, 6 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 692bd38025a9..1b60e6818836 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -233,13 +233,6 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
static bool _nested_vmcb_check_save(struct kvm_vcpu *vcpu,
struct vmcb_save_area_cached *save)
{
- /*
- * FIXME: these should be done after copying the fields,
- * to avoid TOC/TOU races. For these save area checks
- * the possible damage is limited since kvm_set_cr0 and
- * kvm_set_cr4 handle failure; EFER_SVME is an exception
- * so it is force-set later in nested_prepare_vmcb_save.
- */
if (CC(!(save->efer & EFER_SVME)))
return false;

@@ -496,15 +489,10 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12

kvm_set_rflags(&svm->vcpu, vmcb12->save.rflags | X86_EFLAGS_FIXED);

- /*
- * Force-set EFER_SVME even though it is checked earlier on the
- * VMCB12, because the guest can flip the bit between the check
- * and now. Clearing EFER_SVME would call svm_free_nested.
- */
- svm_set_efer(&svm->vcpu, vmcb12->save.efer | EFER_SVME);
+ svm_set_efer(&svm->vcpu, svm->nested.save.efer);

- svm_set_cr0(&svm->vcpu, vmcb12->save.cr0);
- svm_set_cr4(&svm->vcpu, vmcb12->save.cr4);
+ svm_set_cr0(&svm->vcpu, svm->nested.save.cr0);
+ svm_set_cr4(&svm->vcpu, svm->nested.save.cr4);

svm->vcpu.arch.cr2 = vmcb12->save.cr2;

@@ -519,8 +507,8 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12

/* These bits will be set properly on the first execution when new_vmc12 is true */
if (unlikely(new_vmcb12 || vmcb_is_dirty(vmcb12, VMCB_DR))) {
- svm->vmcb->save.dr7 = vmcb12->save.dr7 | DR7_FIXED_1;
- svm->vcpu.arch.dr6 = vmcb12->save.dr6 | DR6_ACTIVE_LOW;
+ svm->vmcb->save.dr7 = svm->nested.save.dr7 | DR7_FIXED_1;
+ svm->vcpu.arch.dr6 = svm->nested.save.dr6 | DR6_ACTIVE_LOW;
vmcb_mark_dirty(svm->vmcb, VMCB_DR);
}
}
@@ -628,7 +616,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa,
nested_vmcb02_prepare_control(svm);
nested_vmcb02_prepare_save(svm, vmcb12);

- ret = nested_svm_load_cr3(&svm->vcpu, vmcb12->save.cr3,
+ ret = nested_svm_load_cr3(&svm->vcpu, svm->nested.save.cr3,
nested_npt_enabled(svm), from_vmrun);
if (ret)
return ret;
--
2.27.0

Subject: [PATCH v4 4/7] nSVM: use vmcb_save_area_cached in nested_vmcb_valid_sregs()

Now that struct vmcb_save_area_cached contains the required
vmcb fields values (done in nested_load_save_from_vmcb12()),
check them to see if they are correct in nested_vmcb_valid_sregs().

While at it, rename nested_vmcb_valid_sregs in nested_vmcb_check_save.
_nested_vmcb_check_save takes the additional @save parameter, so it
is helpful when we want to check a non-svm save state, like in
svm_set_nested_state. The reason for that is that save is the L1
state, not L2, so we just check it.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
---
arch/x86/kvm/svm/nested.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c04f8750e1f7..692bd38025a9 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -230,8 +230,8 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
}

/* Common checks that apply to both L1 and L2 state. */
-static bool nested_vmcb_valid_sregs(struct kvm_vcpu *vcpu,
- struct vmcb_save_area *save)
+static bool _nested_vmcb_check_save(struct kvm_vcpu *vcpu,
+ struct vmcb_save_area_cached *save)
{
/*
* FIXME: these should be done after copying the fields,
@@ -271,6 +271,14 @@ static bool nested_vmcb_valid_sregs(struct kvm_vcpu *vcpu,
return true;
}

+static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+ struct vmcb_save_area_cached *save = &svm->nested.save;
+
+ return _nested_vmcb_check_save(vcpu, save);
+}
+
static
void _nested_copy_vmcb_control_to_cache(struct vmcb_control_area *to,
struct vmcb_control_area *from)
@@ -673,7 +681,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);

- if (!nested_vmcb_valid_sregs(vcpu, &vmcb12->save) ||
+ if (!nested_vmcb_check_save(vcpu) ||
!nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) {
vmcb12->control.exit_code = SVM_EXIT_ERR;
vmcb12->control.exit_code_hi = 0;
@@ -1298,6 +1306,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
&user_kvm_nested_state->data.svm[0];
struct vmcb_control_area *ctl;
struct vmcb_save_area *save;
+ struct vmcb_save_area_cached save_cached;
unsigned long cr0;
int ret;

@@ -1365,10 +1374,11 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
* Validate host state saved from before VMRUN (see
* nested_svm_check_permissions).
*/
+ _nested_copy_vmcb_save_to_cache(&save_cached, save);
if (!(save->cr0 & X86_CR0_PG) ||
!(save->cr0 & X86_CR0_PE) ||
(save->rflags & X86_EFLAGS_VM) ||
- !nested_vmcb_valid_sregs(vcpu, save))
+ !_nested_vmcb_check_save(vcpu, &save_cached))
goto out_free;

/*
--
2.27.0

Subject: [PATCH v4 1/7] KVM: nSVM: move nested_vmcb_check_cr3_cr4 logic in nested_vmcb_valid_sregs

Inline nested_vmcb_check_cr3_cr4 as it is not called by anyone else.
Doing so simplifies next patches.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/nested.c | 35 +++++++++++++----------------------
1 file changed, 13 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 510b833cbd39..9470933c77cd 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -260,27 +260,6 @@ static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
return true;
}

-static bool nested_vmcb_check_cr3_cr4(struct kvm_vcpu *vcpu,
- struct vmcb_save_area *save)
-{
- /*
- * These checks are also performed by KVM_SET_SREGS,
- * except that EFER.LMA is not checked by SVM against
- * CR0.PG && EFER.LME.
- */
- if ((save->efer & EFER_LME) && (save->cr0 & X86_CR0_PG)) {
- if (CC(!(save->cr4 & X86_CR4_PAE)) ||
- CC(!(save->cr0 & X86_CR0_PE)) ||
- CC(kvm_vcpu_is_illegal_gpa(vcpu, save->cr3)))
- return false;
- }
-
- if (CC(!kvm_is_valid_cr4(vcpu, save->cr4)))
- return false;
-
- return true;
-}
-
/* Common checks that apply to both L1 and L2 state. */
static bool nested_vmcb_valid_sregs(struct kvm_vcpu *vcpu,
struct vmcb_save_area *save)
@@ -302,7 +281,19 @@ static bool nested_vmcb_valid_sregs(struct kvm_vcpu *vcpu,
if (CC(!kvm_dr6_valid(save->dr6)) || CC(!kvm_dr7_valid(save->dr7)))
return false;

- if (!nested_vmcb_check_cr3_cr4(vcpu, save))
+ /*
+ * These checks are also performed by KVM_SET_SREGS,
+ * except that EFER.LMA is not checked by SVM against
+ * CR0.PG && EFER.LME.
+ */
+ if ((save->efer & EFER_LME) && (save->cr0 & X86_CR0_PG)) {
+ if (CC(!(save->cr4 & X86_CR4_PAE)) ||
+ CC(!(save->cr0 & X86_CR0_PE)) ||
+ CC(kvm_vcpu_is_illegal_gpa(vcpu, save->cr3)))
+ return false;
+ }
+
+ if (CC(!kvm_is_valid_cr4(vcpu, save->cr4)))
return false;

if (CC(!kvm_valid_efer(vcpu, save->efer)))
--
2.27.0

Subject: [PATCH v4 2/7] nSVM: introduce smv->nested.save to cache save area fields

This is useful in next patch, to avoid having temporary
copies of vmcb12 registers and passing them manually.

Right now, instead of blindly copying everything,
we just copy EFER, CR0, CR3, CR4, DR6 and DR7. If more fields
will need to be added, it will be more obvious to see
that they must be added in struct vmcb_save_area_cached and
in nested_copy_vmcb_save_to_cache().

_nested_copy_vmcb_save_to_cache() takes a vmcb_save_area_cached
parameter, useful when we want to save the state to a local
variable instead of svm internals.

Note that in svm_set_nested_state() we want to cache the L2
save state only if we are in normal non guest mode, because
otherwise it is not touched.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
---
arch/x86/kvm/svm/nested.c | 27 ++++++++++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 16 ++++++++++++++++
3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9470933c77cd..b974b0edd9b5 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -313,6 +313,28 @@ void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
svm->nested.ctl.iopm_base_pa &= ~0x0fffULL;
}

+static void _nested_copy_vmcb_save_to_cache(struct vmcb_save_area_cached *to,
+ struct vmcb_save_area *from)
+{
+ /*
+ * Copy only fields that are validated, as we need them
+ * to avoid TOC/TOU races.
+ */
+ to->efer = from->efer;
+ to->cr0 = from->cr0;
+ to->cr3 = from->cr3;
+ to->cr4 = from->cr4;
+
+ to->dr6 = from->dr6;
+ to->dr7 = from->dr7;
+}
+
+void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
+ struct vmcb_save_area *save)
+{
+ _nested_copy_vmcb_save_to_cache(&svm->nested.save, save);
+}
+
/*
* Synchronize fields that are written by the processor, so that
* they can be copied back into the vmcb12.
@@ -649,6 +671,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu)
return -EINVAL;

nested_load_control_from_vmcb12(svm, &vmcb12->control);
+ nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);

if (!nested_vmcb_valid_sregs(vcpu, &vmcb12->save) ||
!nested_vmcb_check_controls(vcpu, &svm->nested.ctl)) {
@@ -1370,8 +1393,10 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,

if (is_guest_mode(vcpu))
svm_leave_nested(svm);
- else
+ else {
svm->nested.vmcb02.ptr->save = svm->vmcb01.ptr->save;
+ nested_copy_vmcb_save_to_cache(svm, &svm->nested.vmcb02.ptr->save);
+ }

svm_set_gif(svm, !!(kvm_state->flags & KVM_STATE_NESTED_GIF_SET));

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 989685098b3e..6565a3efabd1 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4378,6 +4378,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)

vmcb12 = map.hva;
nested_load_control_from_vmcb12(svm, &vmcb12->control);
+ nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, false);

unmap_save:
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 5d30db599e10..09621f4891f8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -103,6 +103,19 @@ struct kvm_vmcb_info {
uint64_t asid_generation;
};

+/*
+ * This struct is not kept up-to-date, and it is only valid within
+ * svm_set_nested_state and nested_svm_vmrun.
+ */
+struct vmcb_save_area_cached {
+ u64 efer;
+ u64 cr4;
+ u64 cr3;
+ u64 cr0;
+ u64 dr7;
+ u64 dr6;
+};
+
struct svm_nested_state {
struct kvm_vmcb_info vmcb02;
u64 hsave_msr;
@@ -119,6 +132,7 @@ struct svm_nested_state {

/* cache for control fields of the guest */
struct vmcb_control_area ctl;
+ struct vmcb_save_area_cached save;

bool initialized;
};
@@ -485,6 +499,8 @@ int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
int nested_svm_exit_special(struct vcpu_svm *svm);
void nested_load_control_from_vmcb12(struct vcpu_svm *svm,
struct vmcb_control_area *control);
+void nested_copy_vmcb_save_to_cache(struct vcpu_svm *svm,
+ struct vmcb_save_area *save);
void nested_sync_control_from_vmcb02(struct vcpu_svm *svm);
void nested_vmcb02_compute_g_pat(struct vcpu_svm *svm);
void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
--
2.27.0

Subject: [PATCH v4 7/7] nSVM: use vmcb_ctrl_area_cached instead of vmcb_control_area in struct svm_nested_state

This requires changing all vmcb_is_intercept(&svm->nested.ctl, ...)
calls with vmcb12_is_intercept().

In addition, in svm_get_nested_state() user space expects a
vmcb_control_area struct, so we need to copy back all fields
in a temporary structure to provide to the user space.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
---
arch/x86/kvm/svm/nested.c | 48 +++++++++++++++++++++++++--------------
arch/x86/kvm/svm/svm.c | 4 ++--
arch/x86/kvm/svm/svm.h | 8 +++----
3 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 7895ddf176ed..27d0871de854 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -58,8 +58,9 @@ static void svm_inject_page_fault_nested(struct kvm_vcpu *vcpu, struct x86_excep
struct vcpu_svm *svm = to_svm(vcpu);
WARN_ON(!is_guest_mode(vcpu));

- if (vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_EXCEPTION_OFFSET + PF_VECTOR) &&
- !svm->nested.nested_run_pending) {
+ if (vmcb12_is_intercept(&svm->nested.ctl,
+ INTERCEPT_EXCEPTION_OFFSET + PF_VECTOR) &&
+ !svm->nested.nested_run_pending) {
svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + PF_VECTOR;
svm->vmcb->control.exit_code_hi = 0;
svm->vmcb->control.exit_info_1 = fault->error_code;
@@ -121,7 +122,8 @@ static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu)

void recalc_intercepts(struct vcpu_svm *svm)
{
- struct vmcb_control_area *c, *h, *g;
+ struct vmcb_control_area *c, *h;
+ struct vmcb_ctrl_area_cached *g;
unsigned int i;

vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS);
@@ -172,7 +174,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
*/
int i;

- if (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
+ if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
return true;

for (i = 0; i < MSRPM_OFFSETS; i++) {
@@ -208,9 +210,9 @@ static bool nested_svm_check_bitmap_pa(struct kvm_vcpu *vcpu, u64 pa, u32 size)
}

static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu,
- struct vmcb_control_area *control)
+ struct vmcb_ctrl_area_cached *control)
{
- if (CC(!vmcb_is_intercept(control, INTERCEPT_VMRUN)))
+ if (CC(!vmcb12_is_intercept(control, INTERCEPT_VMRUN)))
return false;

if (CC(control->asid == 0))
@@ -273,7 +275,7 @@ static bool nested_vmcb_check_save(struct kvm_vcpu *vcpu)
}

static
-void _nested_copy_vmcb_control_to_cache(struct vmcb_control_area *to,
+void _nested_copy_vmcb_control_to_cache(struct vmcb_ctrl_area_cached *to,
struct vmcb_control_area *from)
{
unsigned int i;
@@ -976,7 +978,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
u32 offset, msr, value;
int write, mask;

- if (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
+ if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT)))
return NESTED_EXIT_HOST;

msr = svm->vcpu.arch.regs[VCPU_REGS_RCX];
@@ -1003,7 +1005,7 @@ static int nested_svm_intercept_ioio(struct vcpu_svm *svm)
u8 start_bit;
u64 gpa;

- if (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_IOIO_PROT)))
+ if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_IOIO_PROT)))
return NESTED_EXIT_HOST;

port = svm->vmcb->control.exit_info_1 >> 16;
@@ -1034,12 +1036,12 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
vmexit = nested_svm_intercept_ioio(svm);
break;
case SVM_EXIT_READ_CR0 ... SVM_EXIT_WRITE_CR8: {
- if (vmcb_is_intercept(&svm->nested.ctl, exit_code))
+ if (vmcb12_is_intercept(&svm->nested.ctl, exit_code))
vmexit = NESTED_EXIT_DONE;
break;
}
case SVM_EXIT_READ_DR0 ... SVM_EXIT_WRITE_DR7: {
- if (vmcb_is_intercept(&svm->nested.ctl, exit_code))
+ if (vmcb12_is_intercept(&svm->nested.ctl, exit_code))
vmexit = NESTED_EXIT_DONE;
break;
}
@@ -1057,7 +1059,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
break;
}
default: {
- if (vmcb_is_intercept(&svm->nested.ctl, exit_code))
+ if (vmcb12_is_intercept(&svm->nested.ctl, exit_code))
vmexit = NESTED_EXIT_DONE;
}
}
@@ -1135,7 +1137,7 @@ static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm)

static inline bool nested_exit_on_init(struct vcpu_svm *svm)
{
- return vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_INIT);
+ return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_INIT);
}

static int svm_check_nested_events(struct kvm_vcpu *vcpu)
@@ -1268,6 +1270,8 @@ static int svm_get_nested_state(struct kvm_vcpu *vcpu,
u32 user_data_size)
{
struct vcpu_svm *svm;
+ struct vmcb_control_area *ctl;
+ unsigned long r;
struct kvm_nested_state kvm_state = {
.flags = 0,
.format = KVM_STATE_NESTED_FORMAT_SVM,
@@ -1309,9 +1313,18 @@ static int svm_get_nested_state(struct kvm_vcpu *vcpu,
*/
if (clear_user(user_vmcb, KVM_STATE_NESTED_SVM_VMCB_SIZE))
return -EFAULT;
- if (copy_to_user(&user_vmcb->control, &svm->nested.ctl,
- sizeof(user_vmcb->control)))
+
+ ctl = kzalloc(sizeof(*ctl), GFP_KERNEL);
+ if (!ctl)
+ return -ENOMEM;
+
+ nested_copy_vmcb_cache_to_control(ctl, &svm->nested.ctl);
+ r = copy_to_user(&user_vmcb->control, ctl,
+ sizeof(user_vmcb->control));
+ kfree(ctl);
+ if (r)
return -EFAULT;
+
if (copy_to_user(&user_vmcb->save, &svm->vmcb01.ptr->save,
sizeof(user_vmcb->save)))
return -EFAULT;
@@ -1329,6 +1342,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
struct vmcb_control_area *ctl;
struct vmcb_save_area *save;
struct vmcb_save_area_cached save_cached;
+ struct vmcb_ctrl_area_cached ctl_cached;
unsigned long cr0;
int ret;

@@ -1381,7 +1395,8 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
goto out_free;

ret = -EINVAL;
- if (!nested_vmcb_check_controls(vcpu, ctl))
+ _nested_copy_vmcb_control_to_cache(&ctl_cached, ctl);
+ if (!nested_vmcb_check_controls(vcpu, &ctl_cached))
goto out_free;

/*
@@ -1438,7 +1453,6 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
svm->nested.vmcb12_gpa = kvm_state->hdr.svm.vmcb_pa;

svm_copy_vmrun_state(&svm->vmcb01.ptr->save, save);
- nested_copy_vmcb_control_to_cache(svm, ctl);

svm_switch_vmcb(svm, &svm->nested.vmcb02);
nested_vmcb02_prepare_control(svm);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4e586ce77591..86d966802bbc 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2440,7 +2440,7 @@ static bool check_selective_cr0_intercepted(struct kvm_vcpu *vcpu,
bool ret = false;

if (!is_guest_mode(vcpu) ||
- (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_SELECTIVE_CR0))))
+ (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_SELECTIVE_CR0))))
return false;

cr0 &= ~SVM_CR0_SELECTIVE_MASK;
@@ -4158,7 +4158,7 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
info->intercept == x86_intercept_clts)
break;

- if (!(vmcb_is_intercept(&svm->nested.ctl,
+ if (!(vmcb12_is_intercept(&svm->nested.ctl,
INTERCEPT_SELECTIVE_CR0)))
break;

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 49cc502986a9..5d1cdf90aa55 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -156,7 +156,7 @@ struct svm_nested_state {
bool nested_run_pending;

/* cache for control fields of the guest */
- struct vmcb_control_area ctl;
+ struct vmcb_ctrl_area_cached ctl;
struct vmcb_save_area_cached save;

bool initialized;
@@ -491,17 +491,17 @@ static inline bool nested_svm_virtualize_tpr(struct kvm_vcpu *vcpu)

static inline bool nested_exit_on_smi(struct vcpu_svm *svm)
{
- return vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_SMI);
+ return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_SMI);
}

static inline bool nested_exit_on_intr(struct vcpu_svm *svm)
{
- return vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_INTR);
+ return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_INTR);
}

static inline bool nested_exit_on_nmi(struct vcpu_svm *svm)
{
- return vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
+ return vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_NMI);
}

int enter_svm_guest_mode(struct kvm_vcpu *vcpu,
--
2.27.0

Subject: [PATCH v4 6/7] nSVM: introduce struct vmcb_ctrl_area_cached

This structure will replace vmcb_control_area in
svm_nested_state, providing only the fields that are actually
used by the nested state. This avoids having and copying around
uninitialized fields. The cost of this, however, is that all
functions (in this case vmcb_is_intercept) expect the old
structure, so they need to be duplicated.

Introduce also nested_copy_vmcb_cache_to_control(), useful to copy
vmcb_ctrl_area_cached fields in vmcb_control_area. This will
be used in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/nested.c | 34 ++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.h | 31 +++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 1b60e6818836..7895ddf176ed 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1229,6 +1229,40 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
return NESTED_EXIT_CONTINUE;
}

+/* Inverse operation of nested_copy_vmcb_control_to_cache(). asid is copied too. */
+static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst,
+ struct vmcb_ctrl_area_cached *from)
+{
+ unsigned int i;
+
+ memset(dst, 0, sizeof(struct vmcb_control_area));
+
+ for (i = 0; i < MAX_INTERCEPT; i++)
+ dst->intercepts[i] = from->intercepts[i];
+
+ dst->iopm_base_pa = from->iopm_base_pa;
+ dst->msrpm_base_pa = from->msrpm_base_pa;
+ dst->tsc_offset = from->tsc_offset;
+ dst->asid = from->asid;
+ dst->tlb_ctl = from->tlb_ctl;
+ dst->int_ctl = from->int_ctl;
+ dst->int_vector = from->int_vector;
+ dst->int_state = from->int_state;
+ dst->exit_code = from->exit_code;
+ dst->exit_code_hi = from->exit_code_hi;
+ dst->exit_info_1 = from->exit_info_1;
+ dst->exit_info_2 = from->exit_info_2;
+ dst->exit_int_info = from->exit_int_info;
+ dst->exit_int_info_err = from->exit_int_info_err;
+ dst->nested_ctl = from->nested_ctl;
+ dst->event_inj = from->event_inj;
+ dst->event_inj_err = from->event_inj_err;
+ dst->nested_cr3 = from->nested_cr3;
+ dst->virt_ext = from->virt_ext;
+ dst->pause_filter_count = from->pause_filter_count;
+ dst->pause_filter_thresh = from->pause_filter_thresh;
+}
+
static int svm_get_nested_state(struct kvm_vcpu *vcpu,
struct kvm_nested_state __user *user_kvm_nested_state,
u32 user_data_size)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4346f6053432..49cc502986a9 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -116,6 +116,31 @@ struct vmcb_save_area_cached {
u64 dr6;
};

+struct vmcb_ctrl_area_cached {
+ u32 intercepts[MAX_INTERCEPT];
+ u16 pause_filter_thresh;
+ u16 pause_filter_count;
+ u64 iopm_base_pa;
+ u64 msrpm_base_pa;
+ u64 tsc_offset;
+ u32 asid;
+ u8 tlb_ctl;
+ u32 int_ctl;
+ u32 int_vector;
+ u32 int_state;
+ u32 exit_code;
+ u32 exit_code_hi;
+ u64 exit_info_1;
+ u64 exit_info_2;
+ u32 exit_int_info;
+ u32 exit_int_info_err;
+ u64 nested_ctl;
+ u32 event_inj;
+ u32 event_inj_err;
+ u64 nested_cr3;
+ u64 virt_ext;
+};
+
struct svm_nested_state {
struct kvm_vmcb_info vmcb02;
u64 hsave_msr;
@@ -308,6 +333,12 @@ static inline bool vmcb_is_intercept(struct vmcb_control_area *control, u32 bit)
return test_bit(bit, (unsigned long *)&control->intercepts);
}

+static inline bool vmcb12_is_intercept(struct vmcb_ctrl_area_cached *control, u32 bit)
+{
+ WARN_ON_ONCE(bit >= 32 * MAX_INTERCEPT);
+ return test_bit(bit, (unsigned long *)&control->intercepts);
+}
+
static inline void set_dr_intercepts(struct vcpu_svm *svm)
{
struct vmcb *vmcb = svm->vmcb01.ptr;
--
2.27.0

Subject: Re: [PATCH v4 0/7] KVM: nSVM: avoid TOC/TOU race when checking vmcb12

Apologies, I did not rebase this series on kvm/queue branch, but on
linux/master. I will rebase and resend once more.

Emanuele

On 03/11/2021 12:52, Emanuele Giuseppe Esposito wrote:
> Currently there is a TOC/TOU race between the check of vmcb12's
> efer, cr0 and cr4 registers and the later save of their values in
> svm_set_*, because the guest could modify the values in the meanwhile.
>
> To solve this issue, this series introduces and uses svm->nested.save
> structure in enter_svm_guest_mode to save the current value of efer,
> cr0 and cr4 and later use these to set the vcpu->arch.* state.
>
> Similarly, svm->nested.ctl contains fields that are not used, so having
> a full vmcb_control_area means passing uninitialized fields.
>
> Patches 1,3 and 8 take care of renaming and refactoring code.
> Patches 2 and 6 introduce respectively vmcb_ctrl_area_cached and
> vmcb_save_area_cached.
> Patches 4 and 5 use vmcb_save_area_cached to avoid TOC/TOU, and patch
> 7 uses vmcb_ctrl_area_cached.
>
> Signed-off-by: Emanuele Giuseppe Esposito <[email protected]>
>
> ---
> v4:
> * introduce _* helpers (_nested_vmcb_check_save,
> _nested_copy_vmcb_control_to_cache, _nested_copy_vmcb_save_to_cache)
> that take care of additional parameters.
> * svm_set_nested_state: introduce {save, ctl}_cached variables
> to not pollute svm->nested.{save,ctl} state, especially if the
> check fails. remove also unnecessary memset added in previous versions.
> * svm_get_nested_state: change stack variable ctl introduced in this series
> into a pointer that will be zeroed and freed after it has been copied to user
>
> v3:
> * merge this series with "KVM: nSVM: use vmcb_ctrl_area_cached instead
> of vmcb_control_area in nested state"
> * rename "nested_load_save_from_vmcb12" in
> "nested_copy_vmcb_save_to_cache"
> * rename "nested_load_control_from_vmcb12" in
> "nested_copy_vmcb_control_to_cache"
> * change check functions (nested_vmcb_valid_sregs and nested_vmcb_valid_sregs)
> to accept only the vcpu parameter, since we only check
> nested state now
> * rename "vmcb_is_intercept_cached" in "vmcb12_is_intercept"
> and duplicate the implementation instead of calling vmcb_is_intercept
>
> v2:
> * svm->nested.save is a separate struct vmcb_save_area_cached,
> and not vmcb_save_area.
> * update also vmcb02->cr3 with svm->nested.save.cr3
>
> RFC:
> * use svm->nested.save instead of local variables.
> * not dependent anymore from "KVM: nSVM: remove useless kvm_clear_*_queue"
> * simplified patches, we just use the struct and not move the check
> nearer to the TOU.
>
> Emanuele Giuseppe Esposito (7):
> KVM: nSVM: move nested_vmcb_check_cr3_cr4 logic in
> nested_vmcb_valid_sregs
> nSVM: introduce smv->nested.save to cache save area fields
> nSVM: rename nested_load_control_from_vmcb12 in
> nested_copy_vmcb_control_to_cache
> nSVM: use vmcb_save_area_cached in nested_vmcb_valid_sregs()
> nSVM: use svm->nested.save to load vmcb12 registers and avoid TOC/TOU
> races
> nSVM: introduce struct vmcb_ctrl_area_cached
> nSVM: use vmcb_ctrl_area_cached instead of vmcb_control_area in struct
> svm_nested_state
>
> arch/x86/kvm/svm/nested.c | 250 ++++++++++++++++++++++++--------------
> arch/x86/kvm/svm/svm.c | 7 +-
> arch/x86/kvm/svm/svm.h | 57 ++++++++-
> 3 files changed, 212 insertions(+), 102 deletions(-)
>