2024-06-09 15:50:42

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 00/18] Introducing Core Building Blocks for Hyper-V VSM Emulation

This series introduces core KVM functionality necessary to emulate Hyper-V's
Virtual Secure Mode in a Virtual Machine Monitor (VMM).

Hyper-V's Virtual Secure Mode (VSM) is a virtualization security feature that
leverages the hypervisor to create secure execution environments within a
guest. VSM is documented as part of Microsoft's Hypervisor Top Level Functional
Specification [1]. Security features that build upon VSM, like Windows
Credential Guard, are enabled by default on Windows 11 and are becoming a
prerequisite in some industries.

VSM introduces the concept of Virtual Trust Levels (VTLs). These are
independent execution contexts, each with its own CPU architectural state,
local APIC state, and a different view of memory. They are hierarchical, with
more privileged VTLs having priority over the execution of lower VTLs and
control over lower VTLs' state. Windows leverages these low-level
paravirtualized primitives, as well as the hypervisor's higher trust base, to
prevent guest data exfiltration even when the operating system itself has been
compromised.

As discussed at LPC2023 and in our previous RFC [2], we decided to model each
VTL as a distinct KVM VM. With this approach, and the RWX memory attributes
introduced in this series, we have been able to implement VTL memory
protections in a non-intrusive way, using generic KVM APIs. Additionally, each
CPU's VTL is modeled as a distinct KVM vCPU, owned by the KVM VM tracking that
VTL's state. VTL awareness is fully removed from KVM, and the responsibility
for VTL-aware hypercalls, VTL scheduling, and state transfer is delegated to
userspace.

Series overview:
- 1-8: Introduce a number of Hyper-V hyper-calls, all of which are VTL-aware and
expected to be handled in userspace. Additionally an new VTL-specifc MP
state is introduced.
- 9-10: Pass the instruction length as part of the userspace fault exit data
in order to simplify VSM's secure intercept generation.
- 11-17: Introduce RWX memory attributes as well as extend userspace faults.
- 18: Introduces the main VSM CPUID bit which gates all VTL configuration and
runtime hypercalls.

The series is accompanied by two repositories:
- A PoC QEMU implementation of VSM [3]: This PoC VSM implementation is capable
of booting Windows Server 2016 and 2019 with Credential Guard (CG) enabled
on VMs of any size or vCPUs number. It's generally stable, but still sees
its share of crashes. The PoC itself implements VSM interfaces to
accommodate CG's needs, and it's by no means comprehensive. All in all,
don't expect anything usable in production.

- VSM kvm-unit-tests [4]: They cover all VSM hypercalls, as well as KVM APIs
introduced by this series. But unfortunately depends on the QEMU
implementation.

We mostly tested on an Intel machine, both with and without TDP. Basic tests
were also run on AMD (build and kvm-unit-tests). Please note that v2 will
include KVM self-tests to close the testing gap, and allow merging this while
we work on the userspace bits.

The series is based on 'kvm/master', that is, commit db574f2f96d0, and also
available in github [5].

This series also serves as a call-out to anyone interested in collaborating. We
have a proven design, a working PoC, and hopefully a path forward to merge
these KVM APIs. There is plenty to do in both QEMU and KVM still, I'll post a
list of ideas in the future. Feel free to get in touch!

Thanks,
Nicolas

[1] https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://github.com/vianpl/qemu/tree/vsm-v1
[4] https://github.com/vianpl/kvm-unit-tests/tree/vsm-v1
[4] https://github.com/vianpl/linux/tree/vsm-v1

---

Anish Moorthy (1):
KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to
userspace

Nicolas Saenz Julienne (17):
KVM: x86: hyper-v: Introduce XMM output support
KVM: x86: hyper-v: Introduce helpers to check if VSM is exposed to
guest
hyperv-tlfs: Update struct hv_send_ipi{_ex}'s declarations
KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs
KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL
KVM: x86: hyper-v: Exit on Get/SetVpRegisters hcall
KVM: x86: hyper-v: Exit on TranslateVirtualAddress hcall
KVM: x86: hyper-v: Exit on StartVirtualProcessor and
GetVpIndexFromApicId hcalls
KVM: x86: Keep track of instruction length during faults
KVM: x86: Pass the instruction length on memory fault user-space exits
KVM: x86/mmu: Introduce infrastructure to handle non-executable
mappings
KVM: x86/mmu: Avoid warning when installing non-private memory
attributes
KVM: x86/mmu: Init memslot if memory attributes available
KVM: Introduce RWX memory attributes
KVM: x86: Take mem attributes into account when faulting memory
KVM: Introduce traces to track memory attributes modification.
KVM: x86: hyper-v: Handle VSM hcalls in user-space

Documentation/virt/kvm/api.rst | 107 +++++++++++++++++++++++-
arch/x86/hyperv/hv_apic.c | 3 +-
arch/x86/include/asm/hyperv-tlfs.h | 2 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/hyperv.c | 127 +++++++++++++++++++++++++++--
arch/x86/kvm/hyperv.h | 18 ++++
arch/x86/kvm/mmu/mmu.c | 91 +++++++++++++++++----
arch/x86/kvm/mmu/mmu_internal.h | 9 +-
arch/x86/kvm/mmu/mmutrace.h | 29 +++++++
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
arch/x86/kvm/mmu/tdp_mmu.c | 8 +-
arch/x86/kvm/svm/svm.c | 7 +-
arch/x86/kvm/vmx/vmx.c | 23 +++++-
arch/x86/kvm/x86.c | 17 +++-
include/asm-generic/hyperv-tlfs.h | 16 +++-
include/linux/kvm_host.h | 45 +++++++++-
include/trace/events/kvm.h | 20 +++++
include/uapi/linux/kvm.h | 15 ++++
virt/kvm/kvm_main.c | 35 +++++++-
19 files changed, 527 insertions(+), 48 deletions(-)

--
2.40.1



2024-06-09 15:51:18

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 01/18] KVM: x86: hyper-v: Introduce XMM output support

Prepare infrastructure to be able to return data through the XMM
registers when Hyper-V hypercalls are issues in fast mode. The XMM
registers are exposed to user-space through KVM_EXIT_HYPERV_HCALL and
restored on successful hypercall completion.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>

---

There was some discussion in the RFC about whether growing 'struct
kvm_hyperv_exit' is ABI breakage. IMO it isn't:
- There is padding in 'struct kvm_run' that ensures that a bigger
'struct kvm_hyperv_exit' doesn't alter the offsets within that struct.
- Adding a new field at the bottom of the 'hcall' field within the
'struct kvm_hyperv_exit' should be fine as well, as it doesn't alter
the offsets within that struct either.
- Ultimately, previous updates to 'struct kvm_hyperv_exit's hint that
its size isn't part of the uABI. It already grew when syndbg was
introduced.

Documentation/virt/kvm/api.rst | 19 ++++++++++
arch/x86/include/asm/hyperv-tlfs.h | 2 +-
arch/x86/kvm/hyperv.c | 56 +++++++++++++++++++++++++++++-
include/uapi/linux/kvm.h | 6 ++++
4 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a71d91978d9ef..17893b330b76f 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8893,3 +8893,22 @@ Ordering of KVM_GET_*/KVM_SET_* ioctls
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

TBD
+
+10. Hyper-V CPUIDs
+==================
+
+This section only applies to x86.
+
+New Hyper-V feature support is no longer being tracked through KVM
+capabilities. Userspace can check if a particular version of KVM supports a
+feature using KMV_GET_SUPPORTED_HV_CPUID. This section documents how Hyper-V
+CPUIDs map to KVM functionality.
+
+10.1 HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE
+------------------------------------------
+
+:Location: CPUID.40000003H:EDX[bit 15]
+
+This CPUID indicates that KVM supports retuning data to the guest in response
+to a hypercall using the XMM registers. It also extends ``struct
+kvm_hyperv_exit`` to allow passing the XMM data from userspace.
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 3787d26810c1c..6a18c9f77d5fe 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -49,7 +49,7 @@
/* Support for physical CPU dynamic partitioning events is available*/
#define HV_X64_CPU_DYNAMIC_PARTITIONING_AVAILABLE BIT(3)
/*
- * Support for passing hypercall input parameter block via XMM
+ * Support for passing hypercall input and output parameter block via XMM
* registers is available
*/
#define HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE BIT(4)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 8a47f8541eab7..42f44546fe79c 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -1865,6 +1865,7 @@ struct kvm_hv_hcall {
u16 rep_idx;
bool fast;
bool rep;
+ bool xmm_dirty;
sse128_t xmm[HV_HYPERCALL_MAX_XMM_REGISTERS];

/*
@@ -2396,9 +2397,49 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
return ret;
}

+static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)
+{
+ int reg;
+
+ kvm_fpu_get();
+ for (reg = 0; reg < HV_HYPERCALL_MAX_XMM_REGISTERS; reg++) {
+ const sse128_t data = sse128(xmm[reg].low, xmm[reg].high);
+ _kvm_write_sse_reg(reg, &data);
+ }
+ kvm_fpu_put();
+}
+
+static bool kvm_hv_is_xmm_output_hcall(u16 code)
+{
+ return false;
+}
+
+static bool kvm_hv_xmm_output_allowed(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ return !hv_vcpu->enforce_cpuid ||
+ hv_vcpu->cpuid_cache.features_edx &
+ HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
+}
+
static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
{
- return kvm_hv_hypercall_complete(vcpu, vcpu->run->hyperv.u.hcall.result);
+ bool fast = !!(vcpu->run->hyperv.u.hcall.input & HV_HYPERCALL_FAST_BIT);
+ u16 code = vcpu->run->hyperv.u.hcall.input & 0xffff;
+ u64 result = vcpu->run->hyperv.u.hcall.result;
+
+ if (hv_result_success(result) && fast &&
+ kvm_hv_is_xmm_output_hcall(code)) {
+ if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
+ }
+
+ return kvm_hv_hypercall_complete(vcpu, result);
}

static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
@@ -2553,6 +2594,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
hc.rep_cnt = (hc.param >> HV_HYPERCALL_REP_COMP_OFFSET) & 0xfff;
hc.rep_idx = (hc.param >> HV_HYPERCALL_REP_START_OFFSET) & 0xfff;
hc.rep = !!(hc.rep_cnt || hc.rep_idx);
+ hc.xmm_dirty = false;

trace_kvm_hv_hypercall(hc.code, hc.fast, hc.var_cnt, hc.rep_cnt,
hc.rep_idx, hc.ingpa, hc.outgpa);
@@ -2673,6 +2715,15 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
break;
}

+ if (hv_result_success(ret) && hc.xmm_dirty) {
+ if (unlikely(!kvm_hv_xmm_output_allowed(vcpu))) {
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
+ }
+
+ kvm_hv_write_xmm((struct kvm_hyperv_xmm_reg *)hc.xmm);
+ }
+
hypercall_complete:
return kvm_hv_hypercall_complete(vcpu, ret);

@@ -2682,6 +2733,8 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
vcpu->run->hyperv.u.hcall.input = hc.param;
vcpu->run->hyperv.u.hcall.params[0] = hc.ingpa;
vcpu->run->hyperv.u.hcall.params[1] = hc.outgpa;
+ if (hc.fast)
+ memcpy(vcpu->run->hyperv.u.hcall.xmm, hc.xmm, sizeof(hc.xmm));
vcpu->arch.complete_userspace_io = kvm_hv_hypercall_complete_userspace;
return 0;
}
@@ -2830,6 +2883,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;

ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
+ ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
ent->edx |= HV_FEATURE_FREQUENCY_MSRS_AVAILABLE;
ent->edx |= HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d03842abae578..fbdee8d754595 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -90,6 +90,11 @@ struct kvm_pit_config {

#define KVM_PIT_SPEAKER_DUMMY 1

+struct kvm_hyperv_xmm_reg {
+ __u64 low;
+ __u64 high;
+};
+
struct kvm_hyperv_exit {
#define KVM_EXIT_HYPERV_SYNIC 1
#define KVM_EXIT_HYPERV_HCALL 2
@@ -108,6 +113,7 @@ struct kvm_hyperv_exit {
__u64 input;
__u64 result;
__u64 params[2];
+ struct kvm_hyperv_xmm_reg xmm[6];
} hcall;
struct {
__u32 msr;
--
2.40.1


2024-06-09 15:51:59

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 02/18] KVM: x86: hyper-v: Introduce helpers to check if VSM is exposed to guest

Introduce a helper function to check if the guest exposes the VSM CPUID
bit.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/kvm/hyperv.h | 10 ++++++++++
include/asm-generic/hyperv-tlfs.h | 1 +
2 files changed, 11 insertions(+)

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 923e64903da9a..d007d2203e0e4 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -265,6 +265,12 @@ static inline void kvm_hv_nested_transtion_tlb_flush(struct kvm_vcpu *vcpu,
}

int kvm_hv_vcpu_flush_tlb(struct kvm_vcpu *vcpu);
+static inline bool kvm_hv_cpuid_vsm_enabled(struct kvm_vcpu *vcpu)
+{
+ struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
+
+ return hv_vcpu && (hv_vcpu->cpuid_cache.features_ebx & HV_ACCESS_VSM);
+}
#else /* CONFIG_KVM_HYPERV */
static inline void kvm_hv_setup_tsc_page(struct kvm *kvm,
struct pvclock_vcpu_time_info *hv_clock) {}
@@ -322,6 +328,10 @@ static inline u32 kvm_hv_get_vpindex(struct kvm_vcpu *vcpu)
return vcpu->vcpu_idx;
}
static inline void kvm_hv_nested_transtion_tlb_flush(struct kvm_vcpu *vcpu, bool tdp_enabled) {}
+static inline bool kvm_hv_cpuid_vsm_enabled(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
#endif /* CONFIG_KVM_HYPERV */

#endif /* __ARCH_X86_KVM_HYPERV_H__ */
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 814207e7c37fc..ffac04bbd0c19 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -89,6 +89,7 @@
#define HV_ACCESS_STATS BIT(8)
#define HV_DEBUGGING BIT(11)
#define HV_CPU_MANAGEMENT BIT(12)
+#define HV_ACCESS_VSM BIT(16)
#define HV_ENABLE_EXTENDED_HYPERCALLS BIT(20)
#define HV_ISOLATION BIT(22)

--
2.40.1


2024-06-09 15:52:24

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 03/18] hyperv-tlfs: Update struct hv_send_ipi{_ex}'s declarations

Both 'struct hv_send_ipi' and 'struct hv_send_ipi_ex' have an 'union
hv_input_vtl' parameter which has been ignored until now. Expose it, as
KVM will soon provide a way of dealing with VTL-aware IPIs. While doing
Also fixup __send_ipi_mask_ex().

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/hyperv/hv_apic.c | 3 +--
include/asm-generic/hyperv-tlfs.h | 6 ++++--
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 0569f579338b5..97907371d51ef 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -121,9 +121,8 @@ static bool __send_ipi_mask_ex(const struct cpumask *mask, int vector,
if (unlikely(!ipi_arg))
goto ipi_mask_ex_done;

+ memset(ipi_arg, 0, sizeof(*ipi_arg));
ipi_arg->vector = vector;
- ipi_arg->reserved = 0;
- ipi_arg->vp_set.valid_bank_mask = 0;

/*
* Use HV_GENERIC_SET_ALL and avoid converting cpumask to VP_SET
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index ffac04bbd0c19..28cde641b5474 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -425,14 +425,16 @@ struct hv_vpset {
/* HvCallSendSyntheticClusterIpi hypercall */
struct hv_send_ipi {
u32 vector;
- u32 reserved;
+ union hv_input_vtl in_vtl;
+ u8 reserved[3];
u64 cpu_mask;
} __packed;

/* HvCallSendSyntheticClusterIpiEx hypercall */
struct hv_send_ipi_ex {
u32 vector;
- u32 reserved;
+ union hv_input_vtl in_vtl;
+ u8 reserved[3];
struct hv_vpset vp_set;
} __packed;

--
2.40.1


2024-06-09 15:52:57

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 04/18] KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs

HvCallSendSyntheticClusterIpi and HvCallSendSyntheticClusterIpiEx allow
sending VTL-aware IPIs. Honour the hcall by exiting to user-space upon
receiving a request with a valid VTL target. This behaviour is only
available if the VSM CPUID flag is available and exposed to the guest.
It doesn't introduce a behaviour change otherwise.

User-space is accountable for the correct processing of the PV-IPI
before resuming execution.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/kvm/hyperv.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 42f44546fe79c..d00baf3ffb165 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2217,16 +2217,20 @@ static void kvm_hv_send_ipi_to_many(struct kvm *kvm, u32 vector,

static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
{
+ bool vsm_enabled = kvm_hv_cpuid_vsm_enabled(vcpu);
struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
u64 *sparse_banks = hv_vcpu->sparse_banks;
struct kvm *kvm = vcpu->kvm;
struct hv_send_ipi_ex send_ipi_ex;
struct hv_send_ipi send_ipi;
+ union hv_input_vtl *in_vtl;
u64 valid_bank_mask;
+ int rsvd_shift;
u32 vector;
bool all_cpus;

if (hc->code == HVCALL_SEND_IPI) {
+ in_vtl = &send_ipi.in_vtl;
if (!hc->fast) {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi,
sizeof(send_ipi))))
@@ -2235,16 +2239,22 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
vector = send_ipi.vector;
} else {
/* 'reserved' part of hv_send_ipi should be 0 */
- if (unlikely(hc->ingpa >> 32 != 0))
+ rsvd_shift = vsm_enabled ? 40 : 32;
+ if (unlikely(hc->ingpa >> rsvd_shift != 0))
return HV_STATUS_INVALID_HYPERCALL_INPUT;
+ in_vtl->as_uint8 = (u8)(hc->ingpa >> 32);
sparse_banks[0] = hc->outgpa;
vector = (u32)hc->ingpa;
}
all_cpus = false;
valid_bank_mask = BIT_ULL(0);

+ if (in_vtl->use_target_vtl)
+ return -ENODEV;
+
trace_kvm_hv_send_ipi(vector, sparse_banks[0]);
} else {
+ in_vtl = &send_ipi_ex.in_vtl;
if (!hc->fast) {
if (unlikely(kvm_read_guest(kvm, hc->ingpa, &send_ipi_ex,
sizeof(send_ipi_ex))))
@@ -2253,8 +2263,12 @@ static u64 kvm_hv_send_ipi(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
send_ipi_ex.vector = (u32)hc->ingpa;
send_ipi_ex.vp_set.format = hc->outgpa;
send_ipi_ex.vp_set.valid_bank_mask = sse128_lo(hc->xmm[0]);
+ in_vtl->as_uint8 = (u8)(hc->ingpa >> 32);
}

+ if (vsm_enabled && in_vtl->use_target_vtl)
+ return -ENODEV;
+
trace_kvm_hv_send_ipi_ex(send_ipi_ex.vector,
send_ipi_ex.vp_set.format,
send_ipi_ex.vp_set.valid_bank_mask);
@@ -2682,6 +2696,9 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
break;
}
ret = kvm_hv_send_ipi(vcpu, &hc);
+ /* VTL-enabled ipi, let user-space handle it */
+ if (ret == -ENODEV)
+ goto hypercall_userspace_exit;
break;
case HVCALL_POST_DEBUG_DATA:
case HVCALL_RETRIEVE_DEBUG_DATA:
--
2.40.1


2024-06-09 15:54:24

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 05/18] KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL

Model inactive VTL vCPUs' behaviour with a new MP state.

Inactive VTLs are in an artificial halt state. They enter into this
state in response to invoking HvCallVtlCall, HvCallVtlReturn.
User-space, which is VTL aware, can processes the hypercall, and set the
vCPU in MP_STATE_HV_INACTIVE_VTL. When a vCPU is run in this state it'll
block until a wakeup event is received. The rules of what constitutes an
event are analogous to halt's except that VTL's ignore RFLAGS.IF.

When a wakeup event is registered, KVM will exit to user-space with a
KVM_SYSTEM_EVENT exit, and KVM_SYSTEM_EVENT_WAKEUP event type.
User-space is responsible of deciding whether the event has precedence
over the active VTL and will switch the vCPU to KVM_MP_STATE_RUNNABLE
before resuming execution on it.

Running a KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will
return immediately to user-space.

Note that by re-using the readily available halt infrastructure in
KVM_RUN, MP_STATE_HV_INACTIVE_VTL correctly handles (or disables)
virtualisation features like the VMX preemption timer or APICv before
blocking.

Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Nicolas Saenz Julienne <[email protected]>

---

I do recall Sean mentioning using MP states for this might have
unexpected side-effects. But it was in the context of introducing a
broader `HALTED_USERSPACE` style state. I believe that by narrowing down
the MP state's semantics to the specifics of inactive VTLs --
alternatively, we could change RFLAGS.IF in user-space before updating
the mp state -- we cement this as a VSM-only API as well as limit the
ambiguity on the guest/vCPU's state upon entering into this execution
mode.

Documentation/virt/kvm/api.rst | 19 +++++++++++++++++++
arch/x86/kvm/hyperv.h | 8 ++++++++
arch/x86/kvm/svm/svm.c | 7 ++++++-
arch/x86/kvm/vmx/vmx.c | 7 ++++++-
arch/x86/kvm/x86.c | 16 +++++++++++++++-
include/uapi/linux/kvm.h | 1 +
6 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 17893b330b76f..e664c54a13b04 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1517,6 +1517,8 @@ Possible values are:
[s390]
KVM_MP_STATE_SUSPENDED the vcpu is in a suspend state and is waiting
for a wakeup event [arm64]
+ KVM_MP_STATE_HV_INACTIVE_VTL the vcpu is an inactive VTL and is waiting for
+ a wakeup event [x86]
========================== ===============================================

On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
@@ -1559,6 +1561,23 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
On LoongArch, only the KVM_MP_STATE_RUNNABLE state is used to reflect
whether the vcpu is runnable.

+For x86:
+^^^^^^^^
+
+KVM_MP_STATE_HV_INACTIVE_VTL is only available to a VM if Hyper-V's
+HV_ACCESS_VSM CPUID is exposed to the guest. This processor state models the
+behavior of an inactive VTL and should only be used for this purpose. A
+userspace process should only switch a vCPU into this MP state in response to a
+HvCallVtlCall, HvCallVtlReturn.
+
+If a vCPU is in KVM_MP_STATE_HV_INACTIVE_VTL, KVM will emulate the
+architectural execution of a HLT instruction with the caveat that RFLAGS.IF is
+ignored when deciding whether to wake up (TLFS 12.12.2.1). If a wakeup is
+recognized, KVM will exit to userspace with a KVM_SYSTEM_EVENT exit, where the
+event type is KVM_SYSTEM_EVENT_WAKEUP. Userspace has the responsibility to
+switch the vCPU back into KVM_MP_STATE_RUNNABLE state. Calling KVM_RUN on a
+KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will exit immediately.
+
4.39 KVM_SET_MP_STATE
---------------------

diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index d007d2203e0e4..d42fe3f85b002 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -271,6 +271,10 @@ static inline bool kvm_hv_cpuid_vsm_enabled(struct kvm_vcpu *vcpu)

return hv_vcpu && (hv_vcpu->cpuid_cache.features_ebx & HV_ACCESS_VSM);
}
+static inline bool kvm_hv_vcpu_is_idle_vtl(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.mp_state == KVM_MP_STATE_HV_INACTIVE_VTL;
+}
#else /* CONFIG_KVM_HYPERV */
static inline void kvm_hv_setup_tsc_page(struct kvm *kvm,
struct pvclock_vcpu_time_info *hv_clock) {}
@@ -332,6 +336,10 @@ static inline bool kvm_hv_cpuid_vsm_enabled(struct kvm_vcpu *vcpu)
{
return false;
}
+static inline bool kvm_hv_vcpu_is_idle_vtl(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
#endif /* CONFIG_KVM_HYPERV */

#endif /* __ARCH_X86_KVM_HYPERV_H__ */
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 296c524988f95..9671191fef4ea 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -49,6 +49,7 @@
#include "svm.h"
#include "svm_ops.h"

+#include "hyperv.h"
#include "kvm_onhyperv.h"
#include "svm_onhyperv.h"

@@ -3797,6 +3798,10 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
if (!gif_set(svm))
return true;

+ /*
+ * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding
+ * whether to block interrupts targeted at inactive VTLs.
+ */
if (is_guest_mode(vcpu)) {
/* As long as interrupts are being delivered... */
if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
@@ -3808,7 +3813,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
if (nested_exit_on_intr(svm))
return false;
} else {
- if (!svm_get_if_flag(vcpu))
+ if (!svm_get_if_flag(vcpu) && !kvm_hv_vcpu_is_idle_vtl(vcpu))
return true;
}

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b3c83c06f8265..ac0682fece604 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5057,7 +5057,12 @@ bool vmx_interrupt_blocked(struct kvm_vcpu *vcpu)
if (is_guest_mode(vcpu) && nested_exit_on_intr(vcpu))
return false;

- return !(vmx_get_rflags(vcpu) & X86_EFLAGS_IF) ||
+ /*
+ * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding
+ * whether to block interrupts targeted at inactive VTLs.
+ */
+ return (!(vmx_get_rflags(vcpu) & X86_EFLAGS_IF) &&
+ !kvm_hv_vcpu_is_idle_vtl(vcpu)) ||
(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8c9e4281d978d..a6e2312ccb68f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -134,6 +134,7 @@ static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu);

static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
+static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu);

static DEFINE_MUTEX(vendor_module_lock);
struct kvm_x86_ops kvm_x86_ops __read_mostly;
@@ -11176,7 +11177,8 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
kvm_lapic_switch_to_sw_timer(vcpu);

kvm_vcpu_srcu_read_unlock(vcpu);
- if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
+ if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED ||
+ kvm_hv_vcpu_is_idle_vtl(vcpu))
kvm_vcpu_halt(vcpu);
else
kvm_vcpu_block(vcpu);
@@ -11218,6 +11220,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
vcpu->arch.apf.halted = false;
break;
case KVM_MP_STATE_INIT_RECEIVED:
+ case KVM_MP_STATE_HV_INACTIVE_VTL:
break;
default:
WARN_ON_ONCE(1);
@@ -11264,6 +11267,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
if (kvm_cpu_has_pending_timer(vcpu))
kvm_inject_pending_timer_irqs(vcpu);

+ if (kvm_hv_vcpu_is_idle_vtl(vcpu) && kvm_vcpu_has_events(vcpu)) {
+ r = 0;
+ vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+ vcpu->run->system_event.type = KVM_SYSTEM_EVENT_WAKEUP;
+ break;
+ }
+
if (dm_request_for_irq_injection(vcpu) &&
kvm_vcpu_ready_for_interrupt_injection(vcpu)) {
r = 0;
@@ -11703,6 +11713,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
goto out;
break;

+ case KVM_MP_STATE_HV_INACTIVE_VTL:
+ if (is_guest_mode(vcpu) || !kvm_hv_cpuid_vsm_enabled(vcpu))
+ goto out;
+ break;
case KVM_MP_STATE_RUNNABLE:
break;

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index fbdee8d754595..f4864e6907e0b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -564,6 +564,7 @@ struct kvm_vapic_addr {
#define KVM_MP_STATE_LOAD 8
#define KVM_MP_STATE_AP_RESET_HOLD 9
#define KVM_MP_STATE_SUSPENDED 10
+#define KVM_MP_STATE_HV_INACTIVE_VTL 11

struct kvm_mp_state {
__u32 mp_state;
--
2.40.1


2024-06-09 15:54:54

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 06/18] KVM: x86: hyper-v: Exit on Get/SetVpRegisters hcall

Let user-space handle HvGetVpRegisters and HvSetVpRegisters as they are
VTL aware hypercalls used solely in the context of VSM. Additionally,
expose the cpuid bit.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
Documentation/virt/kvm/api.rst | 10 ++++++++++
arch/x86/kvm/hyperv.c | 15 +++++++++++++++
include/asm-generic/hyperv-tlfs.h | 1 +
3 files changed, 26 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e664c54a13b04..05b01b00a395c 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8931,3 +8931,13 @@ CPUIDs map to KVM functionality.
This CPUID indicates that KVM supports retuning data to the guest in response
to a hypercall using the XMM registers. It also extends ``struct
kvm_hyperv_exit`` to allow passing the XMM data from userspace.
+
+10.2 HV_ACCESS_VP_REGISTERS
+---------------------------
+
+:Location: CPUID.40000003H:EBX[bit 17]
+
+This CPUID indicates that KVM supports HvGetVpRegisters and HvSetVpRegisters.
+Currently, it is only used in conjunction with HV_ACCESS_VSM, and immediately
+exits to userspace with KVM_EXIT_HYPERV_HCALL as the reason. Userspace is
+expected to complete the hypercall before resuming execution.
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index d00baf3ffb165..d0edc2bec5a4f 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2425,6 +2425,11 @@ static void kvm_hv_write_xmm(struct kvm_hyperv_xmm_reg *xmm)

static bool kvm_hv_is_xmm_output_hcall(u16 code)
{
+ switch (code) {
+ case HVCALL_GET_VP_REGISTERS:
+ return true;
+ }
+
return false;
}

@@ -2505,6 +2510,8 @@ static bool is_xmm_fast_hypercall(struct kvm_hv_hcall *hc)
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
case HVCALL_SEND_IPI_EX:
+ case HVCALL_GET_VP_REGISTERS:
+ case HVCALL_SET_VP_REGISTERS:
return true;
}

@@ -2543,6 +2550,10 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
*/
return !kvm_hv_is_syndbg_enabled(hv_vcpu->vcpu) ||
hv_vcpu->cpuid_cache.features_ebx & HV_DEBUGGING;
+ case HVCALL_GET_VP_REGISTERS:
+ case HVCALL_SET_VP_REGISTERS:
+ return hv_vcpu->cpuid_cache.features_ebx &
+ HV_ACCESS_VP_REGISTERS;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
if (!(hv_vcpu->cpuid_cache.enlightenments_eax &
@@ -2727,6 +2738,9 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
break;
}
goto hypercall_userspace_exit;
+ case HVCALL_GET_VP_REGISTERS:
+ case HVCALL_SET_VP_REGISTERS:
+ goto hypercall_userspace_exit;
default:
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
break;
@@ -2898,6 +2912,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_POST_MESSAGES;
ent->ebx |= HV_SIGNAL_EVENTS;
ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
+ ent->ebx |= HV_ACCESS_VP_REGISTERS;

ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 28cde641b5474..9e909f0834598 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -90,6 +90,7 @@
#define HV_DEBUGGING BIT(11)
#define HV_CPU_MANAGEMENT BIT(12)
#define HV_ACCESS_VSM BIT(16)
+#define HV_ACCESS_VP_REGISTERS BIT(17)
#define HV_ENABLE_EXTENDED_HYPERCALLS BIT(20)
#define HV_ISOLATION BIT(22)

--
2.40.1


2024-06-09 15:55:14

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 07/18] KVM: x86: hyper-v: Exit on TranslateVirtualAddress hcall

Handle HvTranslateVirtualAddress in user-space. The hypercall is
VTL-aware and only used in the context of VSM. Additionally, the TLFS
doesn't introduce an ad-hoc CPUID bit for it, so the hypercall
availability is tracked as part of the HV_ACCESS_VSM CPUID. This will be
documented with the main VSM commit.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/kvm/hyperv.c | 3 +++
include/asm-generic/hyperv-tlfs.h | 1 +
2 files changed, 4 insertions(+)

diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index d0edc2bec5a4f..cbe2aca52514b 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2427,6 +2427,7 @@ static bool kvm_hv_is_xmm_output_hcall(u16 code)
{
switch (code) {
case HVCALL_GET_VP_REGISTERS:
+ case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
return true;
}

@@ -2512,6 +2513,7 @@ static bool is_xmm_fast_hypercall(struct kvm_hv_hcall *hc)
case HVCALL_SEND_IPI_EX:
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
+ case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
return true;
}

@@ -2740,6 +2742,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
goto hypercall_userspace_exit;
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
+ case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
goto hypercall_userspace_exit;
default:
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 9e909f0834598..57c791c555861 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -159,6 +159,7 @@ union hv_reference_tsc_msr {
#define HVCALL_CREATE_VP 0x004e
#define HVCALL_GET_VP_REGISTERS 0x0050
#define HVCALL_SET_VP_REGISTERS 0x0051
+#define HVCALL_TRANSLATE_VIRTUAL_ADDRESS 0x0052
#define HVCALL_POST_MESSAGE 0x005c
#define HVCALL_SIGNAL_EVENT 0x005d
#define HVCALL_POST_DEBUG_DATA 0x0069
--
2.40.1


2024-06-09 15:55:43

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 08/18] KVM: x86: hyper-v: Exit on StartVirtualProcessor and GetVpIndexFromApicId hcalls

Both HvCallStartVirtualProcessor and GetVpIndexFromApicId are used as
part of the Hyper-V VSM CPU bootstrap process, and requires VTL
awareness, as such handle the hypercall in user-space. Also, expose the
ad-hoc CPUID bit.

Note that these hypercalls aren't necessary on Hyper-V guests that don't
enable VSM.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
Documentation/virt/kvm/api.rst | 11 +++++++++++
arch/x86/kvm/hyperv.c | 7 +++++++
include/asm-generic/hyperv-tlfs.h | 1 +
3 files changed, 19 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 05b01b00a395c..161a772c23c6a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8941,3 +8941,14 @@ This CPUID indicates that KVM supports HvGetVpRegisters and HvSetVpRegisters.
Currently, it is only used in conjunction with HV_ACCESS_VSM, and immediately
exits to userspace with KVM_EXIT_HYPERV_HCALL as the reason. Userspace is
expected to complete the hypercall before resuming execution.
+
+10.3 HV_START_VIRTUAL_PROCESSOR
+-------------------------------
+
+:Location: CPUID.40000003H:EBX[bit 21]
+
+This CPUID indicates that KVM supports HvCallStartVirtualProcessor and
+HvCallGetVpIndexFromApicId. Currently, it is only used in conjunction with
+HV_ACCESS_VSM, and immediately exits to userspace with KVM_EXIT_HYPERV_HCALL as
+the reason. Userspace is expected to complete the hypercall before resuming
+execution.
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index cbe2aca52514b..dd64f41dc835d 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2556,6 +2556,10 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
case HVCALL_SET_VP_REGISTERS:
return hv_vcpu->cpuid_cache.features_ebx &
HV_ACCESS_VP_REGISTERS;
+ case HVCALL_START_VP:
+ case HVCALL_GET_VP_ID_FROM_APIC_ID:
+ return hv_vcpu->cpuid_cache.features_ebx &
+ HV_START_VIRTUAL_PROCESSOR;
case HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX:
case HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX:
if (!(hv_vcpu->cpuid_cache.enlightenments_eax &
@@ -2743,6 +2747,8 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
+ case HVCALL_START_VP:
+ case HVCALL_GET_VP_ID_FROM_APIC_ID:
goto hypercall_userspace_exit;
default:
ret = HV_STATUS_INVALID_HYPERCALL_CODE;
@@ -2916,6 +2922,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_SIGNAL_EVENTS;
ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
ent->ebx |= HV_ACCESS_VP_REGISTERS;
+ ent->ebx |= HV_START_VIRTUAL_PROCESSOR;

ent->edx |= HV_X64_HYPERCALL_XMM_INPUT_AVAILABLE;
ent->edx |= HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE;
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index 57c791c555861..e24b88ec4ec00 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -92,6 +92,7 @@
#define HV_ACCESS_VSM BIT(16)
#define HV_ACCESS_VP_REGISTERS BIT(17)
#define HV_ENABLE_EXTENDED_HYPERCALLS BIT(20)
+#define HV_START_VIRTUAL_PROCESSOR BIT(21)
#define HV_ISOLATION BIT(22)

/*
--
2.40.1


2024-06-09 15:56:04

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 09/18] KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace

From: Anish Moorthy <[email protected]>

kvm_prepare_memory_fault_exit() already takes parameters describing the
RWX-ness of the relevant access but doesn't actually do anything with
them. Define and use the flags necessary to pass this information on to
userspace.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Anish Moorthy <[email protected]>
Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
Documentation/virt/kvm/api.rst | 5 +++++
include/linux/kvm_host.h | 9 ++++++++-
include/uapi/linux/kvm.h | 3 +++
3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 161a772c23c6a..761b99987cf1a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7014,6 +7014,9 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.

/* KVM_EXIT_MEMORY_FAULT */
struct {
+ #define KVM_MEMORY_EXIT_FLAG_READ (1ULL << 0)
+ #define KVM_MEMORY_EXIT_FLAG_WRITE (1ULL << 1)
+ #define KVM_MEMORY_EXIT_FLAG_EXEC (1ULL << 2)
#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3)
__u64 flags;
__u64 gpa;
@@ -7025,6 +7028,8 @@ could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the
guest physical address range [gpa, gpa + size) of the fault. The 'flags' field
describes properties of the faulting access that are likely pertinent:

+ - KVM_MEMORY_EXIT_FLAG_READ/WRITE/EXEC - When set, indicates that the memory
+ fault occurred on a read/write/exec access respectively.
- KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred
on a private memory access. When clear, indicates the fault occurred on a
shared access.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 692c01e41a18e..59f687985ba24 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2397,8 +2397,15 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
vcpu->run->memory_fault.gpa = gpa;
vcpu->run->memory_fault.size = size;

- /* RWX flags are not (yet) defined or communicated to userspace. */
vcpu->run->memory_fault.flags = 0;
+
+ if (is_write)
+ vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_WRITE;
+ else if (is_exec)
+ vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_EXEC;
+ else
+ vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_READ;
+
if (is_private)
vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f4864e6907e0b..d6d8b17bfa9a7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -434,6 +434,9 @@ struct kvm_run {
} notify;
/* KVM_EXIT_MEMORY_FAULT */
struct {
+#define KVM_MEMORY_EXIT_FLAG_READ (1ULL << 0)
+#define KVM_MEMORY_EXIT_FLAG_WRITE (1ULL << 1)
+#define KVM_MEMORY_EXIT_FLAG_EXEC (1ULL << 2)
#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3)
__u64 flags;
__u64 gpa;
--
2.40.1


2024-06-09 15:56:36

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 10/18] KVM: x86: Keep track of instruction length during faults

Both VMX and SVM provide the length of the instruction
being run at the time of the page fault. Save it within 'struct
kvm_page_fault', as it'll become useful in the future.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 11 ++++++++---
arch/x86/kvm/mmu/mmu_internal.h | 5 ++++-
arch/x86/kvm/vmx/vmx.c | 16 ++++++++++++++--
3 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d74bdef68c1d..39b113afefdfc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4271,7 +4271,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
work->arch.cr3 != kvm_mmu_get_guest_pgd(vcpu, vcpu->arch.mmu))
return;

- kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code, true, NULL);
+ kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, work->arch.error_code,
+ true, NULL, 0);
}

static inline u8 kvm_max_level_for_order(int order)
@@ -5887,7 +5888,7 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err

if (r == RET_PF_INVALID) {
r = kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, error_code, false,
- &emulation_type);
+ &emulation_type, insn_len);
if (KVM_BUG_ON(r == RET_PF_INVALID, vcpu->kvm))
return -EIO;
}
@@ -5924,8 +5925,12 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
if (!mmio_info_in_cache(vcpu, cr2_or_gpa, direct) && !is_guest_mode(vcpu))
emulation_type |= EMULTYPE_ALLOW_RETRY_PF;
emulate:
+ /*
+ * x86_emulate_instruction() expects insn to contain data if
+ * insn_len > 0.
+ */
return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn,
- insn_len);
+ insn ? insn_len : 0);
}
EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index ce2fcd19ba6be..a0cde1a0e39b0 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -192,6 +192,7 @@ struct kvm_page_fault {
const gpa_t addr;
const u64 error_code;
const bool prefetch;
+ const u8 insn_len;

/* Derived from error_code. */
const bool exec;
@@ -288,11 +289,13 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
}

static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u64 err, bool prefetch, int *emulation_type)
+ u64 err, bool prefetch,
+ int *emulation_type, u8 insn_len)
{
struct kvm_page_fault fault = {
.addr = cr2_or_gpa,
.error_code = err,
+ .insn_len = insn_len,
.exec = err & PFERR_FETCH_MASK,
.write = err & PFERR_WRITE_MASK,
.present = err & PFERR_PRESENT_MASK,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ac0682fece604..9ba38e0b0c7a8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5807,11 +5807,13 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
if (unlikely(allow_smaller_maxphyaddr && !kvm_vcpu_is_legal_gpa(vcpu, gpa)))
return kvm_emulate_instruction(vcpu, 0);

- return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
+ return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL,
+ vmcs_read32(VM_EXIT_INSTRUCTION_LEN));
}

static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
{
+ u8 insn_len = 0;
gpa_t gpa;

if (vmx_check_emulate_instruction(vcpu, EMULTYPE_PF, NULL, 0))
@@ -5828,7 +5830,17 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
return kvm_skip_emulated_instruction(vcpu);
}

- return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
+ /*
+ * Using VMCS.VM_EXIT_INSTRUCTION_LEN on EPT misconfig depends on
+ * undefined behavior: Intel's SDM doesn't mandate the VMCS field be
+ * set when EPT misconfig occurs. In practice, real hardware updates
+ * VM_EXIT_INSTRUCTION_LEN on EPT misconfig, but other hypervisors
+ * (namely Hyper-V) don't set it due to it being undefined behavior.
+ */
+ if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+ insn_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+
+ return kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, insn_len);
}

static int handle_nmi_window(struct kvm_vcpu *vcpu)
--
2.40.1


2024-06-09 15:56:55

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 11/18] KVM: x86: Pass the instruction length on memory fault user-space exits

In order to simplify Hyper-V VSM secure memory intercept generation in
user-space (it avoids the need of implementing an x86 instruction
decoder and the actual decoding). Pass the instruction length being run
at the time of the guest exit as part of the memory fault exit
information.

The presence of this additional information is indicated by a new
capability, KVM_CAP_FAULT_EXIT_INSN_LEN.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
Documentation/virt/kvm/api.rst | 6 +++++-
arch/x86/kvm/mmu/mmu_internal.h | 2 +-
arch/x86/kvm/x86.c | 1 +
include/linux/kvm_host.h | 3 ++-
include/uapi/linux/kvm.h | 2 ++
5 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 761b99987cf1a..18ddea9c4c58a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7021,11 +7021,15 @@ spec refer, https://github.com/riscv/riscv-sbi-doc.
__u64 flags;
__u64 gpa;
__u64 size;
+ __u8 insn_len;
} memory_fault;

KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that
could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the
-guest physical address range [gpa, gpa + size) of the fault. The 'flags' field
+guest physical address range [gpa, gpa + size) of the fault. The
+'insn_len' field describes the size (in bytes) of the instruction
+that caused the fault. It is only available if the underlying HW exposes that
+information on guest exit, otherwise it's set to 0. The 'flags' field
describes properties of the faulting access that are likely pertinent:

- KVM_MEMORY_EXIT_FLAG_READ/WRITE/EXEC - When set, indicates that the memory
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index a0cde1a0e39b0..4f5c4c8af9941 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -285,7 +285,7 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
{
kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT,
PAGE_SIZE, fault->write, fault->exec,
- fault->is_private);
+ fault->is_private, fault->insn_len);
}

static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6e2312ccb68f..d2b8b74cb48bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4704,6 +4704,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
case KVM_CAP_IRQFD_RESAMPLE:
case KVM_CAP_MEMORY_FAULT_INFO:
+ case KVM_CAP_FAULT_EXIT_INSN_LEN:
r = 1;
break;
case KVM_CAP_EXIT_HYPERCALL:
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 59f687985ba24..4fa16c4772269 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2391,11 +2391,12 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr)
static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
gpa_t gpa, gpa_t size,
bool is_write, bool is_exec,
- bool is_private)
+ bool is_private, u8 insn_len)
{
vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
vcpu->run->memory_fault.gpa = gpa;
vcpu->run->memory_fault.size = size;
+ vcpu->run->memory_fault.insn_len = insn_len;

vcpu->run->memory_fault.flags = 0;

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d6d8b17bfa9a7..516d39910f9ab 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -441,6 +441,7 @@ struct kvm_run {
__u64 flags;
__u64 gpa;
__u64 size;
+ __u8 insn_len;
} memory_fault;
/* Fix the size of the union. */
char padding[256];
@@ -927,6 +928,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_FAULT_EXIT_INSN_LEN 236

struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.40.1


2024-06-09 15:58:06

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 14/18] KVM: x86/mmu: Init memslot if memory attributes available

Systems that lack private memory support are about to start using memory
attributes. So query if the memory attributes xarray is empty in order
to decide whether it's necessary to init the hugepage information when
installing a new memslot.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 2 +-
include/linux/kvm_host.h | 9 +++++++++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d56c04fbdc66b..91edd873dcdbc 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7487,7 +7487,7 @@ void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
{
int level;

- if (!kvm_arch_has_private_mem(kvm))
+ if (!kvm_memory_attributes_in_use(kvm))
return;

for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 4fa16c4772269..9250bf1c4db15 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2424,12 +2424,21 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range);

+static inline bool kvm_memory_attributes_in_use(struct kvm *kvm)
+{
+ return !xa_empty(&kvm->mem_attr_array);
+}
+
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
return IS_ENABLED(CONFIG_KVM_PRIVATE_MEM) &&
kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
}
#else
+static inline bool kvm_memory_attributes_in_use(struct kvm *kvm)
+{
+ return false;
+}
static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
return false;
--
2.40.1


2024-06-09 15:58:41

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 15/18] KVM: Introduce RWX memory attributes

Declare memory attributes to map memory regions as non-readable,
non-writable, and/or non-executable.

The attributes are negated for the following reasons:
- Setting a 0 memory attribute (attr->attributes == 0) shouldn't
introduce any access restrictions. For example, when moving from
private to shared mappings in context of confidential computing.
- In practice, with negated attributes, a non-private RWX memory
attribute is analogous to a delete operation. It's a nice outcome, as
it forces remapping the region with huge-pages, doing the right thing
for use-cases that have short-lived access restricted regions like
Hyper-V's VSM.
- A non-negated version of the flags has no way of expressing
non-access mapping (NR/NW/NX) without having to introduce an extra
flag (since 0 isn't available).

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
Documentation/virt/kvm/api.rst | 14 +++++++++++---
include/linux/kvm_host.h | 22 +++++++++++++++++++++-
include/uapi/linux/kvm.h | 3 +++
virt/kvm/kvm_main.c | 32 +++++++++++++++++++++++++++++---
4 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 18ddea9c4c58a..6d3bc5092ea63 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6313,15 +6313,23 @@ of guest physical memory.
__u64 flags;
};

+ #define KVM_MEMORY_ATTRIBUTE_NR (1ULL << 0)
+ #define KVM_MEMORY_ATTRIBUTE_NW (1ULL << 1)
+ #define KVM_MEMORY_ATTRIBUTE_NX (1ULL << 2)
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)

The address and size must be page aligned. The supported attributes can be
retrieved via ioctl(KVM_CHECK_EXTENSION) on KVM_CAP_MEMORY_ATTRIBUTES. If
executed on a VM, KVM_CAP_MEMORY_ATTRIBUTES precisely returns the attributes
supported by that VM. If executed at system scope, KVM_CAP_MEMORY_ATTRIBUTES
-returns all attributes supported by KVM. The only attribute defined at this
-time is KVM_MEMORY_ATTRIBUTE_PRIVATE, which marks the associated gfn as being
-guest private memory.
+returns all attributes supported by KVM. The attribute defined at this
+time are:
+
+ - KVM_MEMORY_ATTRIBUTE_NR/NW/NX - Respectively marks the memory region as
+ non-read, non-write and/or non-exec. Note that write-only, exec-only and
+ write-exec mappings are not supported.
+ - KVM_MEMORY_ATTRIBUTE_PRIVATE - Which marks the associated gfn as being guest
+ private memory.

Note, there is no "get" API. Userspace is responsible for explicitly tracking
the state of a gfn/page as needed.
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9250bf1c4db15..85378345e8e77 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2411,6 +2411,21 @@ static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
}

+static inline bool kvm_mem_attributes_may_read(u64 attrs)
+{
+ return !(attrs & KVM_MEMORY_ATTRIBUTE_NR);
+}
+
+static inline bool kvm_mem_attributes_may_write(u64 attrs)
+{
+ return !(attrs & KVM_MEMORY_ATTRIBUTE_NW);
+}
+
+static inline bool kvm_mem_attributes_may_exec(u64 attrs)
+{
+ return !(attrs & KVM_MEMORY_ATTRIBUTE_NX);
+}
+
#ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES
static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
{
@@ -2423,7 +2438,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range);
bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range);
-
+bool kvm_mem_attributes_valid(struct kvm *kvm, unsigned long attrs);
static inline bool kvm_memory_attributes_in_use(struct kvm *kvm)
{
return !xa_empty(&kvm->mem_attr_array);
@@ -2435,6 +2450,11 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE;
}
#else
+static inline bool kvm_mem_attributes_valid(struct kvm *kvm,
+ unsigned long attrs)
+{
+ return false;
+}
static inline bool kvm_memory_attributes_in_use(struct kvm *kvm)
{
return false;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 516d39910f9ab..26d4477dae8c6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1550,6 +1550,9 @@ struct kvm_memory_attributes {
__u64 flags;
};

+#define KVM_MEMORY_ATTRIBUTE_NR (1ULL << 0)
+#define KVM_MEMORY_ATTRIBUTE_NW (1ULL << 1)
+#define KVM_MEMORY_ATTRIBUTE_NX (1ULL << 2)
#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3)

#define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest_memfd)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 63c4b6739edee..bd27fc01e9715 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2430,10 +2430,14 @@ bool kvm_range_has_memory_attributes(struct kvm *kvm, gfn_t start, gfn_t end,

static u64 kvm_supported_mem_attributes(struct kvm *kvm)
{
+ u64 supported_attrs = KVM_MEMORY_ATTRIBUTE_NR |
+ KVM_MEMORY_ATTRIBUTE_NW |
+ KVM_MEMORY_ATTRIBUTE_NX;
+
if (!kvm || kvm_arch_has_private_mem(kvm))
- return KVM_MEMORY_ATTRIBUTE_PRIVATE;
+ supported_attrs |= KVM_MEMORY_ATTRIBUTE_PRIVATE;

- return 0;
+ return supported_attrs;
}

static __always_inline void kvm_handle_gfn_range(struct kvm *kvm,
@@ -2557,6 +2561,28 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,

return r;
}
+
+bool kvm_mem_attributes_valid(struct kvm *kvm, unsigned long attrs)
+{
+ bool may_read = kvm_mem_attributes_may_read(attrs);
+ bool may_write = kvm_mem_attributes_may_write(attrs);
+ bool may_exec = kvm_mem_attributes_may_exec(attrs);
+ bool priv = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
+
+ if (attrs & ~kvm_supported_mem_attributes(kvm))
+ return false;
+
+ /* Private memory and access permissions are incompatible */
+ if (priv && (!may_read || !may_write || !may_exec))
+ return false;
+
+ /* Write and exec mappings require read access */
+ if ((may_write || may_exec) && !may_read)
+ return false;
+
+ return true;
+}
+
static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
struct kvm_memory_attributes *attrs)
{
@@ -2565,7 +2591,7 @@ static int kvm_vm_ioctl_set_mem_attributes(struct kvm *kvm,
/* flags is currently not used. */
if (attrs->flags)
return -EINVAL;
- if (attrs->attributes & ~kvm_supported_mem_attributes(kvm))
+ if (!kvm_mem_attributes_valid(kvm, attrs->attributes))
return -EINVAL;
if (attrs->size == 0 || attrs->address + attrs->size < attrs->address)
return -EINVAL;
--
2.40.1


2024-06-09 16:00:36

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 17/18] KVM: Introduce traces to track memory attributes modification.

Introduce traces that track memory attributes modification.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
include/trace/events/kvm.h | 20 ++++++++++++++++++++
virt/kvm/kvm_main.c | 2 ++
2 files changed, 22 insertions(+)

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 74e40d5d4af42..aa6caeb16f12a 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -489,6 +489,26 @@ TRACE_EVENT(kvm_test_age_hva,
TP_printk("mmu notifier test age hva: %#016lx", __entry->hva)
);

+TRACE_EVENT(kvm_vm_set_mem_attributes,
+ TP_PROTO(u64 start, u64 cnt, u64 attributes),
+ TP_ARGS(start, cnt, attributes),
+
+ TP_STRUCT__entry(
+ __field( u64, start )
+ __field( u64, cnt )
+ __field( u64, attributes )
+ ),
+
+ TP_fast_assign(
+ __entry->start = start;
+ __entry->cnt = cnt;
+ __entry->attributes = attributes;
+ ),
+
+ TP_printk("gfn 0x%llx, cnt 0x%llx, attributes 0x%llx",
+ __entry->start, __entry->cnt, __entry->attributes)
+);
+
#endif /* _TRACE_KVM_MAIN_H */

/* This part must be outside protection */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd27fc01e9715..1c493ece3deb1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2556,6 +2556,8 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,

kvm_handle_gfn_range(kvm, &post_set_range);

+ trace_kvm_vm_set_mem_attributes(start, end - start, attributes);
+
out_unlock:
mutex_unlock(&kvm->slots_lock);

--
2.40.1


2024-06-09 16:02:55

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 12/18] KVM: x86/mmu: Introduce infrastructure to handle non-executable mappings

The upcoming access restriction KVM memory attributes open the door to
installing non-executable mappings. Introduce a new attribute in struct
kvm_page_fault, map_executable, to control whether the gfn range should
be mapped as executable and make sure it's taken into account when
generating new sptes.

No functional change intended.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 6 +++++-
arch/x86/kvm/mmu/mmu_internal.h | 2 ++
arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++++--
3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 39b113afefdfc..b0c210b96419f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3197,6 +3197,7 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct kvm_shadow_walk_iterator it;
+ unsigned int access = ACC_ALL;
struct kvm_mmu_page *sp;
int ret;
gfn_t base_gfn = fault->gfn;
@@ -3229,7 +3230,10 @@ static int direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
if (WARN_ON_ONCE(it.level != fault->goal_level))
return -EFAULT;

- ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL,
+ if (!fault->map_executable)
+ access &= ~ACC_EXEC_MASK;
+
+ ret = mmu_set_spte(vcpu, fault->slot, it.sptep, access,
base_gfn, fault->pfn, fault);
if (ret == RET_PF_SPURIOUS)
return ret;
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 4f5c4c8af9941..af0c3a154ed89 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -241,6 +241,7 @@ struct kvm_page_fault {
kvm_pfn_t pfn;
hva_t hva;
bool map_writable;
+ bool map_executable;

/*
* Indicates the guest is trying to write a gfn that contains one or
@@ -313,6 +314,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,

.pfn = KVM_PFN_ERR_FAULT,
.hva = KVM_HVA_ERR_BAD,
+ .map_executable = true,
};
int r;

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 36539c1b36cd6..344781981999a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1018,6 +1018,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
struct tdp_iter *iter)
{
struct kvm_mmu_page *sp = sptep_to_sp(rcu_dereference(iter->sptep));
+ unsigned int access = ACC_ALL;
u64 new_spte;
int ret = RET_PF_FIXED;
bool wrprot = false;
@@ -1025,10 +1026,13 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
if (WARN_ON_ONCE(sp->role.level != fault->goal_level))
return RET_PF_RETRY;

+ if (!fault->map_executable)
+ access &= ~ACC_EXEC_MASK;
+
if (unlikely(!fault->slot))
- new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
+ new_spte = make_mmio_spte(vcpu, iter->gfn, access);
else
- wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn,
+ wrprot = make_spte(vcpu, sp, fault->slot, access, iter->gfn,
fault->pfn, iter->old_spte, fault->prefetch, true,
fault->map_writable, &new_spte);

--
2.40.1


2024-06-09 16:03:33

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 18/18] KVM: x86: hyper-v: Handle VSM hcalls in user-space

Let user-space handle all hypercalls that fall under the AccessVsm
partition privilege flag. That is:
- HvCallModifyVtlProtectionMask
- HvCallEnablePartitionVtl
- HvCallEnableVpVtl
- HvCallVtlCall
- HvCallVtlReturn

All these are VTL aware and as such need to be handled in user-space.
Additionally, select KVM_GENERIC_MEMORY_ATTRIBUTES when
CONFIG_KVM_HYPERV is enabled, as it's necessary in order to implement
VTL memory protections.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
Documentation/virt/kvm/api.rst | 23 +++++++++++++++++++++++
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/hyperv.c | 29 +++++++++++++++++++++++++----
include/asm-generic/hyperv-tlfs.h | 6 +++++-
4 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 6d3bc5092ea63..77af2ccf49a30 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8969,3 +8969,26 @@ HvCallGetVpIndexFromApicId. Currently, it is only used in conjunction with
HV_ACCESS_VSM, and immediately exits to userspace with KVM_EXIT_HYPERV_HCALL as
the reason. Userspace is expected to complete the hypercall before resuming
execution.
+
+10.4 HV_ACCESS_VSM
+------------------
+
+:Location: CPUID.40000003H:EBX[bit 16]
+
+This CPUID indicates that KVM supports HvCallModifyVtlProtectionMask,
+HvCallEnablePartitionVtl, HvCallEnableVpVtl, HvCallVtlCall, and
+HvCallVtlReturn. Additionally, as a prerequirsite to being able to implement
+Hyper-V VSM, it also identifies the availability of HvTranslateVirtualAddress,
+as well as the VTL-aware aspects of HvCallSendSyntheticClusterIpi and
+HvCallSendSyntheticClusterIpiEx.
+
+All these hypercalls immediately exit with KVM_EXIT_HYPERV_HCALL as the reason.
+Userspace is expected to complete the hypercall before resuming execution.
+Note that both IPI hypercalls will only exit to userspace if the request is
+VTL-aware, which will only happen if HV_ACCESS_VSM is exposed to the guest.
+
+Access restriction memory attributes (4.141) are available to simplify
+HvCallModifyVtlProtectionMask's implementation.
+
+Ultimately this CPUID also indicates that KVM_MP_STATE_HV_INACTIVE_VTL is
+available.
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fec95a7702703..8d851fe3b8c25 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -157,6 +157,7 @@ config KVM_SMM
config KVM_HYPERV
bool "Support for Microsoft Hyper-V emulation"
depends on KVM
+ select KVM_GENERIC_MEMORY_ATTRIBUTES
default y
help
Provides KVM support for emulating Microsoft Hyper-V. This allows KVM
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index dd64f41dc835d..1158c59a92790 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -2388,7 +2388,12 @@ static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result)
}
}

-static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
+static inline bool kvm_hv_is_vtl_call_return(u16 code)
+{
+ return code == HVCALL_VTL_CALL || code == HVCALL_VTL_RETURN;
+}
+
+static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u16 code, u64 result)
{
u32 tlb_lock_count = 0;
int ret;
@@ -2400,9 +2405,12 @@ static int kvm_hv_hypercall_complete(struct kvm_vcpu *vcpu, u64 result)
result = HV_STATUS_INVALID_HYPERCALL_INPUT;

trace_kvm_hv_hypercall_done(result);
- kvm_hv_hypercall_set_result(vcpu, result);
++vcpu->stat.hypercalls;

+ /* VTL call and return don't set a hcall result */
+ if (!kvm_hv_is_vtl_call_return(code))
+ kvm_hv_hypercall_set_result(vcpu, result);
+
ret = kvm_skip_emulated_instruction(vcpu);

if (tlb_lock_count)
@@ -2459,7 +2467,7 @@ static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu)
kvm_hv_write_xmm(vcpu->run->hyperv.u.hcall.xmm);
}

- return kvm_hv_hypercall_complete(vcpu, result);
+ return kvm_hv_hypercall_complete(vcpu, code, result);
}

static u16 kvm_hvcall_signal_event(struct kvm_vcpu *vcpu, struct kvm_hv_hcall *hc)
@@ -2513,6 +2521,7 @@ static bool is_xmm_fast_hypercall(struct kvm_hv_hcall *hc)
case HVCALL_SEND_IPI_EX:
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
+ case HVCALL_MODIFY_VTL_PROTECTION_MASK:
case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
return true;
}
@@ -2552,6 +2561,12 @@ static bool hv_check_hypercall_access(struct kvm_vcpu_hv *hv_vcpu, u16 code)
*/
return !kvm_hv_is_syndbg_enabled(hv_vcpu->vcpu) ||
hv_vcpu->cpuid_cache.features_ebx & HV_DEBUGGING;
+ case HVCALL_MODIFY_VTL_PROTECTION_MASK:
+ case HVCALL_ENABLE_PARTITION_VTL:
+ case HVCALL_ENABLE_VP_VTL:
+ case HVCALL_VTL_CALL:
+ case HVCALL_VTL_RETURN:
+ return hv_vcpu->cpuid_cache.features_ebx & HV_ACCESS_VSM;
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
return hv_vcpu->cpuid_cache.features_ebx &
@@ -2744,6 +2759,11 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
break;
}
goto hypercall_userspace_exit;
+ case HVCALL_MODIFY_VTL_PROTECTION_MASK:
+ case HVCALL_ENABLE_PARTITION_VTL:
+ case HVCALL_ENABLE_VP_VTL:
+ case HVCALL_VTL_CALL:
+ case HVCALL_VTL_RETURN:
case HVCALL_GET_VP_REGISTERS:
case HVCALL_SET_VP_REGISTERS:
case HVCALL_TRANSLATE_VIRTUAL_ADDRESS:
@@ -2765,7 +2785,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu)
}

hypercall_complete:
- return kvm_hv_hypercall_complete(vcpu, ret);
+ return kvm_hv_hypercall_complete(vcpu, hc.code, ret);

hypercall_userspace_exit:
vcpu->run->exit_reason = KVM_EXIT_HYPERV;
@@ -2921,6 +2941,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->ebx |= HV_POST_MESSAGES;
ent->ebx |= HV_SIGNAL_EVENTS;
ent->ebx |= HV_ENABLE_EXTENDED_HYPERCALLS;
+ ent->ebx |= HV_ACCESS_VSM;
ent->ebx |= HV_ACCESS_VP_REGISTERS;
ent->ebx |= HV_START_VIRTUAL_PROCESSOR;

diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index e24b88ec4ec00..6b12e5818292c 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -149,9 +149,13 @@ union hv_reference_tsc_msr {
/* Declare the various hypercall operations. */
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE 0x0002
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003
-#define HVCALL_ENABLE_VP_VTL 0x000f
#define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008
#define HVCALL_SEND_IPI 0x000b
+#define HVCALL_MODIFY_VTL_PROTECTION_MASK 0x000c
+#define HVCALL_ENABLE_PARTITION_VTL 0x000d
+#define HVCALL_ENABLE_VP_VTL 0x000f
+#define HVCALL_VTL_CALL 0x0011
+#define HVCALL_VTL_RETURN 0x0012
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX 0x0013
#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
#define HVCALL_SEND_IPI_EX 0x0015
--
2.40.1


2024-06-09 16:07:28

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 13/18] KVM: x86/mmu: Avoid warning when installing non-private memory attributes

In preparation to introducing RWX memory attributes, make sure
user-space is attempting to install a memory attribute with
KVM_MEMORY_ATTRIBUTE_PRIVATE before throwing a warning on systems with
no private memory support.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 8 ++++++--
virt/kvm/kvm_main.c | 1 +
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b0c210b96419f..d56c04fbdc66b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -7359,6 +7359,9 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range)
{
+ unsigned long attrs = range->arg.attributes;
+ bool priv_attr = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
+
/*
* Zap SPTEs even if the slot can't be mapped PRIVATE. KVM x86 only
* supports KVM_MEMORY_ATTRIBUTE_PRIVATE, and so it *seems* like KVM
@@ -7370,7 +7373,7 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
* Zapping SPTEs in this case ensures KVM will reassess whether or not
* a hugepage can be used for affected ranges.
*/
- if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+ if (WARN_ON_ONCE(priv_attr && !kvm_arch_has_private_mem(kvm)))
return false;

return kvm_unmap_gfn_range(kvm, range);
@@ -7415,6 +7418,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
struct kvm_gfn_range *range)
{
unsigned long attrs = range->arg.attributes;
+ bool priv_attr = attrs & KVM_MEMORY_ATTRIBUTE_PRIVATE;
struct kvm_memory_slot *slot = range->slot;
int level;

@@ -7427,7 +7431,7 @@ bool kvm_arch_post_set_memory_attributes(struct kvm *kvm,
* a range that has PRIVATE GFNs, and conversely converting a range to
* SHARED may now allow hugepages.
*/
- if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
+ if (WARN_ON_ONCE(priv_attr && !kvm_arch_has_private_mem(kvm)))
return false;

/*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 14841acb8b959..63c4b6739edee 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2506,6 +2506,7 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
struct kvm_mmu_notifier_range pre_set_range = {
.start = start,
.end = end,
+ .arg.attributes = attributes,
.handler = kvm_pre_set_memory_attributes,
.on_lock = kvm_mmu_invalidate_begin,
.flush_on_ret = true,
--
2.40.1


2024-06-09 16:08:52

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: [PATCH 16/18] KVM: x86: Take mem attributes into account when faulting memory

Take into account access restrictions memory attributes when faulting
guest memory. Prohibited memory accesses will cause an user-space fault
exit.

Additionally, bypass a warning in the !tdp case. Access restrictions in
guest page tables might not necessarily match the host pte's when memory
attributes are in use.

Signed-off-by: Nicolas Saenz Julienne <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 64 ++++++++++++++++++++++++++++------
arch/x86/kvm/mmu/mmutrace.h | 29 +++++++++++++++
arch/x86/kvm/mmu/paging_tmpl.h | 2 +-
include/linux/kvm_host.h | 4 +++
4 files changed, 87 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 91edd873dcdbc..dfe50c9c31f7b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -754,7 +754,8 @@ static u32 kvm_mmu_page_get_access(struct kvm_mmu_page *sp, int index)
return sp->role.access;
}

-static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
+static void kvm_mmu_page_set_translation(struct kvm *kvm,
+ struct kvm_mmu_page *sp, int index,
gfn_t gfn, unsigned int access)
{
if (sp_has_gptes(sp)) {
@@ -762,10 +763,17 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
return;
}

- WARN_ONCE(access != kvm_mmu_page_get_access(sp, index),
- "access mismatch under %s page %llx (expected %u, got %u)\n",
- sp->role.passthrough ? "passthrough" : "direct",
- sp->gfn, kvm_mmu_page_get_access(sp, index), access);
+ /*
+ * Userspace might have introduced memory attributes for this gfn,
+ * breaking the assumption that the spte's access restrictions match
+ * the guest's. Userspace is also responsible from taking care of
+ * faults caused by these 'artificial' access restrictions.
+ */
+ WARN_ONCE(access != kvm_mmu_page_get_access(sp, index) &&
+ !kvm_get_memory_attributes(kvm, gfn),
+ "access mismatch under %s page %llx (expected %u, got %u)\n",
+ sp->role.passthrough ? "passthrough" : "direct", sp->gfn,
+ kvm_mmu_page_get_access(sp, index), access);

WARN_ONCE(gfn != kvm_mmu_page_get_gfn(sp, index),
"gfn mismatch under %s page %llx (expected %llx, got %llx)\n",
@@ -773,12 +781,12 @@ static void kvm_mmu_page_set_translation(struct kvm_mmu_page *sp, int index,
sp->gfn, kvm_mmu_page_get_gfn(sp, index), gfn);
}

-static void kvm_mmu_page_set_access(struct kvm_mmu_page *sp, int index,
- unsigned int access)
+static void kvm_mmu_page_set_access(struct kvm *kvm, struct kvm_mmu_page *sp,
+ int index, unsigned int access)
{
gfn_t gfn = kvm_mmu_page_get_gfn(sp, index);

- kvm_mmu_page_set_translation(sp, index, gfn, access);
+ kvm_mmu_page_set_translation(kvm, sp, index, gfn, access);
}

/*
@@ -1607,7 +1615,7 @@ static void __rmap_add(struct kvm *kvm,
int rmap_count;

sp = sptep_to_sp(spte);
- kvm_mmu_page_set_translation(sp, spte_index(spte), gfn, access);
+ kvm_mmu_page_set_translation(kvm, sp, spte_index(spte), gfn, access);
kvm_update_page_stats(kvm, sp->role.level, 1);

rmap_head = gfn_to_rmap(gfn, sp->role.level, slot);
@@ -2928,7 +2936,8 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot,
rmap_add(vcpu, slot, sptep, gfn, pte_access);
} else {
/* Already rmapped but the pte_access bits may have changed. */
- kvm_mmu_page_set_access(sp, spte_index(sptep), pte_access);
+ kvm_mmu_page_set_access(vcpu->kvm, sp, spte_index(sptep),
+ pte_access);
}

return ret;
@@ -4320,6 +4329,38 @@ static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
return RET_PF_CONTINUE;
}

+static int kvm_mem_attributes_faultin_access_prots(struct kvm_vcpu *vcpu,
+ struct kvm_page_fault *fault)
+{
+ bool may_read, may_write, may_exec;
+ unsigned long attrs;
+
+ attrs = kvm_get_memory_attributes(vcpu->kvm, fault->gfn);
+ if (!attrs)
+ return RET_PF_CONTINUE;
+
+ if (!kvm_mem_attributes_valid(vcpu->kvm, attrs)) {
+ kvm_err("Invalid mem attributes 0x%lx found for address 0x%016llx\n",
+ attrs, fault->addr);
+ return -EFAULT;
+ }
+
+ trace_kvm_mem_attributes_faultin_access_prots(vcpu, fault, attrs);
+
+ may_read = kvm_mem_attributes_may_read(attrs);
+ may_write = kvm_mem_attributes_may_write(attrs);
+ may_exec = kvm_mem_attributes_may_exec(attrs);
+
+ if ((fault->user && !may_read) || (fault->write && !may_write) ||
+ (fault->exec && !may_exec))
+ return -EFAULT;
+
+ fault->map_writable = may_write;
+ fault->map_executable = may_exec;
+
+ return RET_PF_CONTINUE;
+}
+
static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
bool async;
@@ -4375,7 +4416,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
* Now that we have a snapshot of mmu_invalidate_seq we can check for a
* private vs. shared mismatch.
*/
- if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+ if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn) ||
+ kvm_mem_attributes_faultin_access_prots(vcpu, fault)) {
kvm_mmu_prepare_memory_fault_exit(vcpu, fault);
return -EFAULT;
}
diff --git a/arch/x86/kvm/mmu/mmutrace.h b/arch/x86/kvm/mmu/mmutrace.h
index 195d98bc8de85..ddbdd7396e9fa 100644
--- a/arch/x86/kvm/mmu/mmutrace.h
+++ b/arch/x86/kvm/mmu/mmutrace.h
@@ -440,6 +440,35 @@ TRACE_EVENT(
__entry->gfn, __entry->spte, __entry->level, __entry->errno)
);

+TRACE_EVENT(kvm_mem_attributes_faultin_access_prots,
+ TP_PROTO(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
+ u64 mem_attrs),
+ TP_ARGS(vcpu, fault, mem_attrs),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, vcpu_id)
+ __field(unsigned long, guest_rip)
+ __field(u64, fault_address)
+ __field(bool, write)
+ __field(bool, exec)
+ __field(u64, mem_attrs)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_id = vcpu->vcpu_id;
+ __entry->guest_rip = kvm_rip_read(vcpu);
+ __entry->fault_address = fault->addr;
+ __entry->write = fault->write;
+ __entry->exec = fault->exec;
+ __entry->mem_attrs = mem_attrs;
+ ),
+
+ TP_printk("vcpu %d rip 0x%lx gfn 0x%016llx access %s mem_attrs 0x%llx",
+ __entry->vcpu_id, __entry->guest_rip, __entry->fault_address,
+ __entry->exec ? "X" : (__entry->write ? "W" : "R"),
+ __entry->mem_attrs)
+);
+
#endif /* _TRACE_KVMMMU_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index d3dbcf382ed2d..166f5f0e885e0 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -954,7 +954,7 @@ static int FNAME(sync_spte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, int
return 0;

/* Update the shadowed access bits in case they changed. */
- kvm_mmu_page_set_access(sp, i, pte_access);
+ kvm_mmu_page_set_access(vcpu->kvm, sp, i, pte_access);

sptep = &sp->spt[i];
spte = *sptep;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 85378345e8e77..9c26161d13dea 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2463,6 +2463,10 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
{
return false;
}
+static inline unsigned long kvm_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
+{
+ return 0;
+}
#endif /* CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES */

#ifdef CONFIG_KVM_PRIVATE_MEM
--
2.40.1