2019-09-17 09:56:13

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 0/9] Enable Sub-page Write Protection Support

EPT-Based Sub-Page write Protection(SPP)is a HW capability which allows
Virtual Machine Monitor(VMM) to specify write-permission for guest
physical memory at a sub-page(128 byte) granularity. When this
capability is enabled, the CPU enforces write-access check for sub-pages
within a 4KB page.

The feature is targeted to provide fine-grained memory protection for
usages such as device virtualization, memory check-point and VM
introspection etc.

SPP is active when the "sub-page write protection" (bit 23) is 1 in
Secondary VM-Execution Controls. The feature is backed with a Sub-Page
Permission Table(SPPT), SPPT is referenced via a 64-bit control field
called Sub-Page Permission Table Pointer (SPPTP) which contains a
4K-aligned physical address.

To enable SPP for certain physical page, the gfn should be first mapped
to a 4KB entry, then set bit 61 of the corresponding EPT leaf entry.
While HW walks EPT, if bit 61 is set, it traverses SPPT with the guset
physical address to find out the sub-page permissions at the leaf entry.
If the corresponding bit is set, write to sub-page is permitted,
otherwise, SPP induced EPT violation is generated.

This patch serial passed SPP function test and selftest on Ice-Lake platform.

Please refer to the SPP introduction document in this patch set and
Intel SDM for details:

Intel SDM:
https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

SPP selftest patch:
https://lkml.org/lkml/2019/6/18/1197

Previous patch:
https://lkml.org/lkml/2019/8/14/97

Patch 1: Introduction to SPP.
Patch 2: Add SPP related flags and control bits.
Patch 3: Functions for SPPT setup.
Patch 4: Add SPP access bitmaps for memslots.
Patch 5: Introduce SPP {init,set,get} functions
Patch 6: Implement User space access IOCTLs.
Patch 7: Set up SPP paging table at vm-entry/exit.
Patch 8: Enable lazy mode SPPT setup.
Patch 9: Handle SPP protected pages when VM memory changes


Change logs:

V5 -> V4:
1. Enable SPP support for Hugepage(1GB/2MB) to extend application.
2. Make SPP miss vm-exit handler as the unified place to set up SPPT.
3. If SPP protected pages are access-tracked or dirty-page-tracked,
store SPP flag in reserved address bit, restore it in
fast_page_fault() handler.
4. Move SPP specific functions to vmx/spp.c and vmx/spp.h
5. Rebased code to kernel v5.3
6. Other change suggested by KVM community.

V3 -> V4:
1. Modified documentation to make it consistent with patches.
2. Allocated SPPT root page in init_spp() instead of vmx_set_cr3() to
avoid SPPT miss error.
3. Added back co-developers and sign-offs.

V2 -> V3:
1. Rebased patches to kernel 5.1 release
2. Deferred SPPT setup to EPT fault handler if the page is not
available while set_subpage() is being called.
3. Added init IOCTL to reduce extra cost if SPP is not used.
4. Refactored patch structure, cleaned up cross referenced functions.
5. Added code to deal with memory swapping/migration/shrinker cases.

V2 -> V1:
1. Rebased to 4.20-rc1
2. Move VMCS change to a separated patch.
3. Code refine and Bug fix


Yang Weijiang (9):
Documentation: Introduce EPT based Subpage Protection
vmx: spp: Add control flags for Sub-Page Protection(SPP)
mmu: spp: Add SPP Table setup functions
mmu: spp: Add functions to create/destroy SPP bitmap block
mmu: spp: Introduce SPP {init,set,get} functions
x86: spp: Introduce user-space SPP IOCTLs
vmx: spp: Set up SPP paging table at vm-entry/exit
mmu: spp: Enable Lazy mode SPP protection
mmu: spp: Handle SPP protected pages when VM memory changes

Documentation/virtual/kvm/spp_kvm.txt | 178 +++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_host.h | 10 +-
arch/x86/include/asm/vmx.h | 10 +
arch/x86/include/uapi/asm/vmx.h | 2 +
arch/x86/kernel/cpu/intel.c | 4 +
arch/x86/kvm/mmu.c | 78 ++-
arch/x86/kvm/mmu.h | 2 +
arch/x86/kvm/vmx/capabilities.h | 5 +
arch/x86/kvm/vmx/spp.c | 651 ++++++++++++++++++++++++++
arch/x86/kvm/vmx/spp.h | 27 ++
arch/x86/kvm/vmx/vmx.c | 99 ++++
arch/x86/kvm/x86.c | 51 ++
include/uapi/linux/kvm.h | 17 +
14 files changed, 1133 insertions(+), 2 deletions(-)
create mode 100644 Documentation/virtual/kvm/spp_kvm.txt
create mode 100644 arch/x86/kvm/vmx/spp.c
create mode 100644 arch/x86/kvm/vmx/spp.h

--
2.17.2


2019-09-17 09:56:59

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 9/9] mmu: spp: Handle SPP protected pages when VM memory changes

Host page swapping/migration may change the translation in
EPT leaf entry, if the target page is SPP protected,
re-enable SPP protection in MMU notifier. If SPPT shadow
page is reclaimed, the level1 pages don't have rmap to clear.

Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/mmu.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c9c430d2c7e3..c1c744ab05c9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1828,6 +1828,24 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
new_spte &= ~PT_WRITABLE_MASK;
new_spte &= ~SPTE_HOST_WRITEABLE;

+ /*
+ * if it's EPT leaf entry and the physical page is
+ * SPP protected, then re-enable SPP protection for
+ * the page.
+ */
+ if (kvm->arch.spp_active &&
+ level == PT_PAGE_TABLE_LEVEL) {
+ struct kvm_subpage spp_info = {0};
+ int i;
+
+ spp_info.base_gfn = gfn;
+ spp_info.npages = 1;
+ i = kvm_spp_get_permission(kvm, &spp_info);
+ if (i == 1 &&
+ spp_info.access_map[0] != FULL_SPP_ACCESS)
+ new_spte |= PT_SPP_MASK;
+ }
+
new_spte = mark_spte_for_access_track(new_spte);

mmu_spte_clear_track_bits(sptep);
@@ -2677,6 +2695,10 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
pte = *spte;
if (is_shadow_present_pte(pte)) {
if (is_last_spte(pte, sp->role.level)) {
+ /* SPPT leaf entries don't have rmaps*/
+ if (sp->role.level == PT_PAGE_TABLE_LEVEL &&
+ is_spp_spte(sp))
+ return true;
drop_spte(kvm, spte);
if (is_large_pte(pte))
--kvm->stat.lpages;
--
2.17.2

2019-09-17 09:57:14

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 8/9] mmu: spp: Enable Lazy mode SPP protection

To deal with SPP protected 4KB pages within hugepage(2MB,1GB etc),
the hugepage entry is first zapped when set subpage permission, then
in tdp_page_fault(), it checks whether the gfn should be mapped to
PT_PAGE_TABLE_LEVEL or PT_DIRECTORY_LEVEL level depending on gfn
inclusion of SPP protected page range.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/kvm/mmu.c | 14 ++++++++++++
arch/x86/kvm/vmx/spp.c | 48 ++++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/spp.h | 4 ++++
3 files changed, 66 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a632c6b3c326..c9c430d2c7e3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3240,6 +3240,17 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
map_writable);
direct_pte_prefetch(vcpu, it.sptep);
++vcpu->stat.pf_fixed;
+ if (level == PT_PAGE_TABLE_LEVEL) {
+ struct kvm_subpage sbp = {0};
+ int pages;
+
+ sbp.base_gfn = gfn;
+ sbp.npages = 1;
+ pages = kvm_spp_get_permission(vcpu->kvm, &sbp);
+ if (pages == 1 && sbp.access_map[0] != FULL_SPP_ACCESS)
+ kvm_spp_mark_protection(vcpu->kvm, &sbp);
+ }
+
return ret;
}

@@ -4183,6 +4194,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
if (level > PT_DIRECTORY_LEVEL &&
!check_hugepage_cache_consistency(vcpu, gfn, level))
level = PT_DIRECTORY_LEVEL;
+
+ check_spp_protection(vcpu, gfn, &force_pt_level,&level);
+
gfn &= ~(KVM_PAGES_PER_HPAGE(level) - 1);
}

diff --git a/arch/x86/kvm/vmx/spp.c b/arch/x86/kvm/vmx/spp.c
index b6fc2e313b59..7f7a3749c35b 100644
--- a/arch/x86/kvm/vmx/spp.c
+++ b/arch/x86/kvm/vmx/spp.c
@@ -547,6 +547,54 @@ inline u64 construct_spptp(unsigned long root_hpa)
}
EXPORT_SYMBOL_GPL(construct_spptp);

+bool is_spp_protected(struct kvm_memory_slot *slot, gfn_t gfn, int level)
+{
+ int page_num = KVM_PAGES_PER_HPAGE(level);
+ int i;
+ gfn &= ~(page_num - 1);
+
+ for (i = 0; i < page_num; ++i) {
+ if (*gfn_to_subpage_wp_info(slot, gfn + i) != FULL_SPP_ACCESS)
+ return true;
+ }
+ return false;
+}
+
+bool check_spp_protection(struct kvm_vcpu *vcpu, gfn_t gfn,
+ bool *force_pt_level, int *level)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_memory_slot *slot;
+ u32 access;
+
+ if (!kvm->arch.spp_active)
+ return false;
+
+ slot = gfn_to_memslot(kvm, gfn);
+
+ if (!slot)
+ return false;
+
+ if (*level == PT_PAGE_TABLE_LEVEL) {
+ access = *gfn_to_subpage_wp_info(slot, gfn);
+
+ if (access != FULL_SPP_ACCESS) {
+ *force_pt_level = true;
+ return true;
+ }
+ } else {
+ if (is_spp_protected(slot, gfn, PT_PDPE_LEVEL)) {
+ bool protected = is_spp_protected(slot, gfn,
+ PT_DIRECTORY_LEVEL);
+ *level = protected ? PT_PAGE_TABLE_LEVEL :
+ PT_DIRECTORY_LEVEL;
+ *force_pt_level = protected;
+ return true;
+ }
+ }
+ return false;
+}
+
int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
struct kvm_subpage *spp_info)
{
diff --git a/arch/x86/kvm/vmx/spp.h b/arch/x86/kvm/vmx/spp.h
index 8925a6ca4d3b..ed7852bb6b33 100644
--- a/arch/x86/kvm/vmx/spp.h
+++ b/arch/x86/kvm/vmx/spp.h
@@ -4,9 +4,13 @@

#define FULL_SPP_ACCESS ((u32)((1ULL << 32) - 1))

+int kvm_spp_get_permission(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_spp_mark_protection(struct kvm *kvm, struct kvm_subpage *spp_info);
bool is_spp_spte(struct kvm_mmu_page *sp);
void restore_spp_bit(u64 *spte);
bool was_spp_armed(u64 spte);
+bool check_spp_protection(struct kvm_vcpu *vcpu, gfn_t gfn,
+ bool *force_pt_level, int *level);
inline u64 construct_spptp(unsigned long root_hpa);
int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
struct kvm_subpage *spp_info);
--
2.17.2

2019-09-17 09:57:40

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 6/9] x86: spp: Introduce user-space SPP IOCTLs

User application, e.g., QEMU or VMI, must initialize SPP
before gets/sets SPP subpages, the dynamic initialization is to
reduce the extra storage cost if the SPP feature is not not used.

Co-developed-by: He Chen <[email protected]>
Signed-off-by: He Chen <[email protected]>
Co-developed-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 ++
arch/x86/kvm/vmx/spp.c | 54 +++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/spp.h | 5 +++
arch/x86/kvm/vmx/vmx.c | 8 +++++
arch/x86/kvm/x86.c | 49 ++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 3 ++
6 files changed, 122 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cc38670a0c45..3863eb3c0e6a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1216,6 +1216,9 @@ struct kvm_x86_ops {
uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);

bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
+
+ int (*init_spp)(struct kvm *kvm);
+ int (*flush_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx/spp.c b/arch/x86/kvm/vmx/spp.c
index ffc4ebcb64a6..edc6a39340d9 100644
--- a/arch/x86/kvm/vmx/spp.c
+++ b/arch/x86/kvm/vmx/spp.c
@@ -535,3 +535,57 @@ inline u64 construct_spptp(unsigned long root_hpa)
}
EXPORT_SYMBOL_GPL(construct_spptp);

+int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ int ret;
+
+ mutex_lock(&kvm->slots_lock);
+ ret = kvm_spp_get_permission(kvm, spp_info);
+ mutex_unlock(&kvm->slots_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_ioctl_get_subpages);
+
+int kvm_vm_ioctl_set_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ int ret;
+
+ if (!kvm_x86_ops->flush_subpages)
+ return -EINVAL;
+
+ spin_lock(&kvm->mmu_lock);
+ ret = kvm_x86_ops->flush_subpages(kvm, spp_info);
+ spin_unlock(&kvm->mmu_lock);
+
+ if (ret < 0)
+ return ret;
+
+ mutex_lock(&kvm->slots_lock);
+ spin_lock(&kvm->mmu_lock);
+
+ ret = kvm_spp_set_permission(kvm, spp_info);
+
+ spin_unlock(&kvm->mmu_lock);
+ mutex_unlock(&kvm->slots_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_ioctl_set_subpages);
+
+int kvm_vm_ioctl_init_spp(struct kvm *kvm)
+{
+ int ret;
+
+ if (!kvm_x86_ops->init_spp)
+ return -ENODEV;
+
+ mutex_lock(&kvm->slots_lock);
+ ret = kvm_x86_ops->init_spp(kvm);
+ mutex_unlock(&kvm->slots_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vm_ioctl_init_spp);
diff --git a/arch/x86/kvm/vmx/spp.h b/arch/x86/kvm/vmx/spp.h
index 9c3a51feddda..52cf87de1330 100644
--- a/arch/x86/kvm/vmx/spp.h
+++ b/arch/x86/kvm/vmx/spp.h
@@ -6,6 +6,11 @@

bool is_spp_spte(struct kvm_mmu_page *sp);
inline u64 construct_spptp(unsigned long root_hpa);
+int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info);
+int kvm_vm_ioctl_set_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info);
+int kvm_vm_ioctl_init_spp(struct kvm *kvm);
int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
u32 access_map, gfn_t gfn);
int vmx_spp_flush_sppt(struct kvm *kvm, struct kvm_subpage *spp_info);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8ecf9cb24879..7655c62decf4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7597,6 +7597,11 @@ static __init int hardware_setup(void)
kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
}

+ if (!spp_supported) {
+ kvm_x86_ops->flush_subpages = NULL;
+ kvm_x86_ops->init_spp = NULL;
+ }
+
if (!cpu_has_vmx_preemption_timer())
enable_preemption_timer = false;

@@ -7809,6 +7814,9 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.nested_enable_evmcs = NULL,
.nested_get_evmcs_version = NULL,
.need_emulation_on_page_fault = vmx_need_emulation_on_page_fault,
+
+ .flush_subpages = vmx_spp_flush_sppt,
+ .init_spp = vmx_spp_init,
};

static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91602d310a3f..3561949577b9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -26,6 +26,7 @@
#include "cpuid.h"
#include "pmu.h"
#include "hyperv.h"
+#include "vmx/spp.h"

#include <linux/clocksource.h>
#include <linux/interrupt.h>
@@ -4977,6 +4978,54 @@ long kvm_arch_vm_ioctl(struct file *filp,
case KVM_SET_PMU_EVENT_FILTER:
r = kvm_vm_ioctl_set_pmu_event_filter(kvm, argp);
break;
+ case KVM_SUBPAGES_GET_ACCESS: {
+ struct kvm_subpage spp_info;
+
+ if (!kvm->arch.spp_active) {
+ r = -ENODEV;
+ goto out;
+ }
+
+ r = -EFAULT;
+ if (copy_from_user(&spp_info, argp, sizeof(spp_info)))
+ goto out;
+
+ r = -EINVAL;
+ if (spp_info.npages == 0 ||
+ spp_info.npages > SUBPAGE_MAX_BITMAP)
+ goto out;
+
+ r = kvm_vm_ioctl_get_subpages(kvm, &spp_info);
+ if (copy_to_user(argp, &spp_info, sizeof(spp_info))) {
+ r = -EFAULT;
+ goto out;
+ }
+ break;
+ }
+ case KVM_SUBPAGES_SET_ACCESS: {
+ struct kvm_subpage spp_info;
+
+ if (!kvm->arch.spp_active) {
+ r = -ENODEV;
+ goto out;
+ }
+
+ r = -EFAULT;
+ if (copy_from_user(&spp_info, argp, sizeof(spp_info)))
+ goto out;
+
+ r = -EINVAL;
+ if (spp_info.npages == 0 ||
+ spp_info.npages > SUBPAGE_MAX_BITMAP)
+ goto out;
+
+ r = kvm_vm_ioctl_set_subpages(kvm, &spp_info);
+ break;
+ }
+ case KVM_INIT_SPP: {
+ r = kvm_vm_ioctl_init_spp(kvm);
+ break;
+ }
default:
r = -ENOTTY;
}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 9460830de536..700f0825336d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1257,6 +1257,9 @@ struct kvm_vfio_spapr_tce {
struct kvm_userspace_memory_region)
#define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47)
#define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64)
+#define KVM_SUBPAGES_GET_ACCESS _IOR(KVMIO, 0x49, __u64)
+#define KVM_SUBPAGES_SET_ACCESS _IOW(KVMIO, 0x4a, __u64)
+#define KVM_INIT_SPP _IOW(KVMIO, 0x4b, __u64)

/* enable ucontrol for s390 */
struct kvm_s390_ucas_mapping {
--
2.17.2

2019-09-17 09:57:42

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 5/9] mmu: spp: Introduce SPP {init,set,get} functions

spp_init() must be called before any {get, set}_subpage
functions, it creates subpage access bitmaps for VM memory
space then sets up SPPT root pages.

kvm_spp_set_permission() is to enable SPP bit in EPT leaf page.
If the gfn range covers hugepage, then it zapps the hugepage entries
in EPT, this induces following memory access to cause EPT page fault.
The mmu_lock must be held before above operation.

kvm_spp_get_permission() is used to query access bitmap for
protected page, it's also used in EPT fault handler to check
whether the fault EPT page is SPP protected as well.

Co-developed-by: He Chen <[email protected]>
Signed-off-by: He Chen <[email protected]>
Co-developed-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/vmx/spp.c | 242 ++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/spp.h | 5 +
include/uapi/linux/kvm.h | 9 ++
4 files changed, 258 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fe6417756983..cc38670a0c45 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -940,6 +940,8 @@ struct kvm_arch {
bool exception_payload_enabled;

struct kvm_pmu_event_filter *pmu_event_filter;
+ bool spp_active;
+
};

struct kvm_vm_stat {
diff --git a/arch/x86/kvm/vmx/spp.c b/arch/x86/kvm/vmx/spp.c
index 7e66d87186a2..ffc4ebcb64a6 100644
--- a/arch/x86/kvm/vmx/spp.c
+++ b/arch/x86/kvm/vmx/spp.c
@@ -186,6 +186,24 @@ bool is_spp_spte(struct kvm_mmu_page *sp)
return sp->role.spp;
}

+/*
+ * all vcpus share the same SPPT, vcpu->arch.mmu->sppt_root points to same
+ * SPPT root page, so any vcpu will do.
+ */
+static struct kvm_vcpu *kvm_spp_get_vcpu(struct kvm *kvm)
+{
+ struct kvm_vcpu *vcpu = NULL;
+ int idx;
+
+ for (idx = 0; idx < atomic_read(&kvm->online_vcpus); idx++) {
+ vcpu = kvm_get_vcpu(kvm, idx);
+ if (vcpu)
+ break;
+ }
+
+ return vcpu;
+}
+
#define SPPT_ENTRY_PHA_MASK (0xFFFFFFFFFF << 12)

int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
@@ -236,6 +254,40 @@ int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
}
EXPORT_SYMBOL_GPL(kvm_spp_setup_structure);

+int vmx_spp_flush_sppt(struct kvm *kvm, struct kvm_subpage *spp_info)
+{
+ struct kvm_shadow_walk_iterator iter;
+ struct kvm_vcpu *vcpu;
+ gfn_t gfn = spp_info->base_gfn;
+ int npages = spp_info->npages;
+ u64 spde;
+ int i;
+
+ vcpu = kvm_spp_get_vcpu(kvm);
+ /* direct_map spp start */
+ if (!VALID_PAGE(vcpu->arch.mmu->sppt_root))
+ return -EFAULT;
+
+ for (i = 0; i< npages; ++i) {
+ for_each_shadow_spp_entry(vcpu,(u64)gfn << PAGE_SHIFT, iter) {
+ if (!is_spp_shadow_present(*iter.sptep))
+ break;
+
+ if (iter.level == PT_DIRECTORY_LEVEL) {
+ spde = *iter.sptep;
+ spde &= ~PT_PRESENT_MASK;
+ spp_spte_set(iter.sptep, spde);
+ break;
+ }
+ }
+ gfn++;
+ }
+ kvm_flush_remote_tlbs(kvm);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vmx_spp_flush_sppt);
+
static int kvm_spp_create_bitmaps(struct kvm *kvm)
{
struct kvm_memslots *slots;
@@ -276,6 +328,196 @@ static int kvm_spp_create_bitmaps(struct kvm *kvm)
return ret;
}

+int vmx_spp_init(struct kvm *kvm)
+{
+ int i, ret;
+ struct kvm_vcpu *vcpu;
+ int root_level;
+ struct kvm_mmu_page *ssp_sp;
+
+ /* SPP feature is exclusive with nested VM.*/
+ if (kvm_x86_ops->get_nested_state)
+ return -EPERM;
+
+ if (kvm->arch.spp_active)
+ return 0;
+
+ ret = kvm_spp_create_bitmaps(kvm);
+
+ if (ret)
+ return ret;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ /* prepare caches for SPP setup.*/
+ mmu_topup_memory_caches(vcpu);
+ root_level = vcpu->arch.mmu->shadow_root_level;
+ ssp_sp = kvm_spp_get_page(vcpu, 0, root_level);
+ ++ssp_sp->root_count;
+ vcpu->arch.mmu->sppt_root = __pa(ssp_sp->spt);
+ kvm_make_request(KVM_REQ_LOAD_CR3, vcpu);
+ }
+
+ kvm->arch.spp_active = true;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(vmx_spp_init);
+
+int kvm_spp_get_permission(struct kvm *kvm, struct kvm_subpage *spp_info)
+{
+ u32 *access = spp_info->access_map;
+ gfn_t gfn = spp_info->base_gfn;
+ int npages = spp_info->npages;
+ struct kvm_memory_slot *slot;
+ int i;
+
+ if (!kvm->arch.spp_active)
+ return -ENODEV;
+
+ for (i = 0; i < npages; i++, gfn++) {
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot)
+ return -EFAULT;
+ access[i] = *gfn_to_subpage_wp_info(slot, gfn);
+ }
+
+ return i;
+}
+EXPORT_SYMBOL_GPL(kvm_spp_get_permission);
+
+static void kvm_spp_zap_pte(struct kvm *kvm, u64 *spte, int level)
+{
+ u64 pte;
+
+ pte = *spte;
+ if (is_shadow_present_pte(pte) && is_last_spte(pte, level)) {
+ drop_spte(kvm, spte);
+ if (is_large_pte(pte))
+ --kvm->stat.lpages;
+ }
+}
+
+int kvm_spp_zap_entry(struct kvm *kvm, gfn_t gfn_lower, gfn_t gfn_upper,
+ u64 *sptep, int level)
+{
+ int page_num = KVM_PAGES_PER_HPAGE(level);
+ gfn_t gfn_max = (gfn_lower & ~(page_num - 1)) + page_num -1;
+ int ret;
+
+ if (gfn_upper <= gfn_max)
+ ret = gfn_upper - gfn_lower + 1;
+ else
+ ret = gfn_max - gfn_lower + 1;
+
+ kvm_spp_zap_pte(kvm, sptep, level);
+ kvm_flush_remote_tlbs(kvm);
+
+ return ret;
+}
+
+int kvm_spp_set_permission(struct kvm *kvm, struct kvm_subpage *spp_info)
+{
+ u32 *access = spp_info->access_map;
+ gfn_t gfn = spp_info->base_gfn;
+ int npages = spp_info->npages;
+ struct kvm_memory_slot *slot;
+ struct kvm_subpage sbp = {0};
+ struct kvm_shadow_walk_iterator iterator;
+ struct kvm_vcpu *vcpu;
+ gfn_t max_gfn;
+ gfn_t old_gfn = gfn;
+ u32 *wp_map;
+ int i, count;
+
+ if (!kvm->arch.spp_active)
+ return -ENODEV;
+
+ if (npages > SUBPAGE_MAX_BITMAP)
+ return -EFAULT;
+
+ for (i = 0; i < npages; i++, gfn++) {
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot)
+ return -EFAULT;
+
+ wp_map = gfn_to_subpage_wp_info(slot, gfn);
+ *wp_map = access[i];
+ }
+
+ gfn = old_gfn;
+ max_gfn = gfn + npages - 1;
+ vcpu = kvm_spp_get_vcpu(kvm);
+
+ for (i = 0; gfn <= max_gfn; i++, gfn++) {
+ for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
+ if (!is_shadow_present_pte(*iterator.sptep))
+ break;
+
+ if (iterator.level == PT_PAGE_TABLE_LEVEL) {
+ sbp.base_gfn = gfn;
+ sbp.access_map[0] = access[i];
+ sbp.npages = 1;
+ if (kvm_spp_mark_protection(kvm, &sbp) < 0)
+ return -EFAULT;
+ break;
+ }
+
+ if (is_large_pte(*iterator.sptep)) {
+ count = kvm_spp_zap_entry(kvm, gfn, max_gfn,
+ iterator.sptep,
+ iterator.level);
+ if (count >= npages)
+ goto out;
+ gfn += count - 1;
+ }
+ }
+ }
+out:
+ return npages;
+}
+
+int kvm_spp_mark_protection(struct kvm *kvm, struct kvm_subpage *spp_info)
+{
+ u32 *access = spp_info->access_map;
+ gfn_t gfn = spp_info->base_gfn;
+ struct kvm_memory_slot *slot;
+ struct kvm_rmap_head *rmap_head;
+ int ret;
+
+ if (!kvm->arch.spp_active)
+ return -ENODEV;
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot)
+ return -EFAULT;
+
+ /*
+ * check whether the target 4KB page exists in EPT leaf
+ * entry.If it's there, just flag SPP bit of the entry,
+ * defer the setup to SPPT miss induced vm-exit handler.
+ */
+ rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+ if (rmap_head->val) {
+ /*
+ * if all subpages are not writable, open SPP bit in
+ * EPT leaf entry to enable SPP protection for
+ * corresponding page.
+ */
+ if (access[0] != FULL_SPP_ACCESS) {
+ ret = kvm_spp_open_write_protect(kvm,
+ slot, gfn);
+ if (ret)
+ return ret;
+ } else {
+ ret = kvm_spp_clear_write_protect(kvm,
+ slot, gfn);
+ if (ret)
+ return ret;
+ }
+ }
+
+ return 0;
+}

void kvm_spp_free_memslot(struct kvm_memory_slot *free,
struct kvm_memory_slot *dont)
diff --git a/arch/x86/kvm/vmx/spp.h b/arch/x86/kvm/vmx/spp.h
index 94f6e39b30ed..9c3a51feddda 100644
--- a/arch/x86/kvm/vmx/spp.h
+++ b/arch/x86/kvm/vmx/spp.h
@@ -3,9 +3,14 @@
#define __KVM_X86_VMX_SPP_H

#define FULL_SPP_ACCESS ((u32)((1ULL << 32) - 1))
+
bool is_spp_spte(struct kvm_mmu_page *sp);
inline u64 construct_spptp(unsigned long root_hpa);
int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
u32 access_map, gfn_t gfn);
+int vmx_spp_flush_sppt(struct kvm *kvm, struct kvm_subpage *spp_info);
+void kvm_spp_free_memslot(struct kvm_memory_slot *free,
+ struct kvm_memory_slot *dont);
+int vmx_spp_init(struct kvm *kvm);

#endif /* __KVM_X86_VMX_SPP_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5e3f12d5359e..9460830de536 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -102,6 +102,15 @@ struct kvm_userspace_memory_region {
__u64 userspace_addr; /* start of the userspace allocated memory */
};

+/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+#define SUBPAGE_MAX_BITMAP 64
+struct kvm_subpage {
+ __u64 base_gfn;
+ __u64 npages;
+ /* sub-page write-access bitmap array */
+ __u32 access_map[SUBPAGE_MAX_BITMAP];
+};
+
/*
* The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
* other bits are reserved for kvm internal use which are defined in
--
2.17.2

2019-09-17 09:58:02

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 2/9] vmx: spp: Add control flags for Sub-Page Protection(SPP)

Check SPP capability in MSR_IA32_VMX_PROCBASED_CTLS2, its 23-bit
indicates SPP capability. Enable SPP feature bit in CPU capabilities
bitmap if it's supported.

Co-developed-by: He Chen <[email protected]>
Signed-off-by: He Chen <[email protected]>
Co-developed-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/vmx.h | 1 +
arch/x86/kernel/cpu/intel.c | 4 ++++
arch/x86/kvm/mmu.h | 2 ++
arch/x86/kvm/vmx/capabilities.h | 5 +++++
arch/x86/kvm/vmx/vmx.c | 10 ++++++++++
6 files changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e880f2408e29..ee2c76fdadf6 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -228,6 +228,7 @@
#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */
#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */
#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */
+#define X86_FEATURE_SPP ( 8*32+ 5) /* Intel EPT-based Sub-Page Write Protection */

#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */
#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index a39136b0d509..e1137807affc 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -68,6 +68,7 @@
#define SECONDARY_EXEC_XSAVES 0x00100000
#define SECONDARY_EXEC_PT_USE_GPA 0x01000000
#define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x00400000
+#define SECONDARY_EXEC_ENABLE_SPP 0x00800000
#define SECONDARY_EXEC_TSC_SCALING 0x02000000

#define PIN_BASED_EXT_INTR_MASK 0x00000001
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8d6d92ebeb54..27617e522f01 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -503,6 +503,7 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002
#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020
#define x86_VMX_FEATURE_EPT_CAP_AD 0x00200000
+#define X86_VMX_FEATURE_PROC_CTLS2_SPP 0x00800000

u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
u32 msr_vpid_cap, msr_ept_cap;
@@ -513,6 +514,7 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_EPT);
clear_cpu_cap(c, X86_FEATURE_VPID);
clear_cpu_cap(c, X86_FEATURE_EPT_AD);
+ clear_cpu_cap(c, X86_FEATURE_SPP);

rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
msr_ctl = vmx_msr_high | vmx_msr_low;
@@ -536,6 +538,8 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
}
if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
set_cpu_cap(c, X86_FEATURE_VPID);
+ if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_SPP)
+ set_cpu_cap(c, X86_FEATURE_SPP);
}
}

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 54c2a377795b..3c1423526a98 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -26,6 +26,8 @@
#define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
#define PT_PAT_MASK (1ULL << 7)
#define PT_GLOBAL_MASK (1ULL << 8)
+#define PT_SPP_SHIFT 61
+#define PT_SPP_MASK (1ULL << PT_SPP_SHIFT)
#define PT64_NX_SHIFT 63
#define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d6664ee3d127..e3bde7a32123 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -241,6 +241,11 @@ static inline bool cpu_has_vmx_pml(void)
return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_PML;
}

+static inline bool cpu_has_vmx_ept_spp(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_SPP;
+}
+
static inline bool vmx_xsaves_supported(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c030c96fc81a..8ecf9cb24879 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -60,6 +60,7 @@
#include "vmcs12.h"
#include "vmx.h"
#include "x86.h"
+#include "spp.h"

MODULE_AUTHOR("Qumranet");
MODULE_LICENSE("GPL");
@@ -113,6 +114,7 @@ module_param_named(pml, enable_pml, bool, S_IRUGO);

static bool __read_mostly dump_invalid_vmcs = 0;
module_param(dump_invalid_vmcs, bool, 0644);
+static bool __read_mostly spp_supported = 0;

#define MSR_BITMAP_MODE_X2APIC 1
#define MSR_BITMAP_MODE_X2APIC_APICV 2
@@ -2279,6 +2281,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
SECONDARY_EXEC_RDSEED_EXITING |
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
+ SECONDARY_EXEC_ENABLE_SPP |
SECONDARY_EXEC_TSC_SCALING |
SECONDARY_EXEC_PT_USE_GPA |
SECONDARY_EXEC_PT_CONCEAL_VMX |
@@ -3931,6 +3934,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
if (!enable_pml)
exec_control &= ~SECONDARY_EXEC_ENABLE_PML;

+ if (!spp_supported)
+ exec_control &= ~SECONDARY_EXEC_ENABLE_SPP;
+
if (vmx_xsaves_supported()) {
/* Exposing XSAVES only when XSAVE is exposed */
bool xsaves_enabled =
@@ -7521,6 +7527,10 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_flexpriority())
flexpriority_enabled = 0;

+ if (cpu_has_vmx_ept_spp() && enable_ept &&
+ boot_cpu_has(X86_FEATURE_SPP))
+ spp_supported = 1;
+
if (!cpu_has_virtual_nmis())
enable_vnmi = 0;

--
2.17.2

2019-09-17 15:57:30

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 3/9] mmu: spp: Add SPP Table setup functions

SPPT is a 4-level paging structure similar to EPT, when SPP is
armed for target physical page, bit 61 of the corresponding
EPT entry is flaged, then SPPT is traversed with the gfn,
the leaf entry of SPPT contains the access bitmap of subpages
inside the target 4KB physical page, one bit per 128-byte subpage.

Co-developed-by: He Chen <[email protected]>
Signed-off-by: He Chen <[email protected]>
Co-developed-by: Zhang Yi <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: Yang Weijiang <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/vmx/spp.c | 236 ++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/spp.h | 10 ++
3 files changed, 249 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kvm/vmx/spp.c
create mode 100644 arch/x86/kvm/vmx/spp.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bdc16b0aa7c6..eb18f4dd993d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -270,7 +270,8 @@ union kvm_mmu_page_role {
unsigned smap_andnot_wp:1;
unsigned ad_disabled:1;
unsigned guest_mode:1;
- unsigned :6;
+ unsigned spp:1;
+ unsigned reserved:5;

/*
* This is left at the top of the word so that
@@ -399,6 +400,7 @@ struct kvm_mmu {
u64 *spte, const void *pte);
hpa_t root_hpa;
gpa_t root_cr3;
+ hpa_t sppt_root;
union kvm_mmu_role mmu_role;
u8 root_level;
u8 shadow_root_level;
diff --git a/arch/x86/kvm/vmx/spp.c b/arch/x86/kvm/vmx/spp.c
new file mode 100644
index 000000000000..1b33cd39108b
--- /dev/null
+++ b/arch/x86/kvm/vmx/spp.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "spp.h"
+
+#define for_each_shadow_spp_entry(_vcpu, _addr, _walker) \
+ for (shadow_spp_walk_init(&(_walker), _vcpu, _addr); \
+ shadow_walk_okay(&(_walker)); \
+ shadow_walk_next(&(_walker)))
+
+static void shadow_spp_walk_init(struct kvm_shadow_walk_iterator *iterator,
+ struct kvm_vcpu *vcpu, u64 addr)
+{
+ iterator->addr = addr;
+ iterator->shadow_addr = vcpu->arch.mmu->sppt_root;
+
+ /* SPP Table is a 4-level paging structure */
+ iterator->level = PT64_ROOT_4LEVEL;
+}
+
+static int is_spp_shadow_present(u64 pte)
+{
+ return pte & PT_PRESENT_MASK;
+}
+
+static bool __rmap_open_subpage_bit(struct kvm *kvm,
+ struct kvm_rmap_head *rmap_head)
+{
+ struct rmap_iterator iter;
+ bool flush = false;
+ u64 *sptep;
+ u64 spte;
+
+ for_each_rmap_spte(rmap_head, &iter, sptep) {
+ /*
+ * SPP works only when the page is write-protected
+ * and SPP bit is set in EPT leaf entry.
+ */
+ flush |= spte_write_protect(sptep, false);
+ spte = *sptep | PT_SPP_MASK;
+ flush |= mmu_spte_update(sptep, spte);
+ }
+
+ return flush;
+}
+
+static int kvm_spp_open_write_protect(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ struct kvm_rmap_head *rmap_head;
+ bool flush = false;
+
+ /*
+ * SPP is only supported with 4KB level1 memory page, check
+ * if the page is mapped in EPT leaf entry.
+ */
+ rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+ if (!rmap_head->val)
+ return -EFAULT;
+
+ flush |= __rmap_open_subpage_bit(kvm, rmap_head);
+
+ if (flush)
+ kvm_flush_remote_tlbs(kvm);
+
+ return 0;
+}
+
+static bool __rmap_clear_subpage_bit(struct kvm *kvm,
+ struct kvm_rmap_head *rmap_head)
+{
+ struct rmap_iterator iter;
+ bool flush = false;
+ u64 *sptep;
+ u64 spte;
+
+ for_each_rmap_spte(rmap_head, &iter, sptep) {
+ spte = (*sptep & ~PT_SPP_MASK);
+ flush |= mmu_spte_update(sptep, spte);
+ }
+
+ return flush;
+}
+
+static int kvm_spp_clear_write_protect(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ struct kvm_rmap_head *rmap_head;
+ bool flush = false;
+
+ rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+ if (!rmap_head->val)
+ return -EFAULT;
+
+ flush |= __rmap_clear_subpage_bit(kvm, rmap_head);
+
+ if (flush)
+ kvm_flush_remote_tlbs(kvm);
+
+ return 0;
+}
+
+struct kvm_mmu_page *kvm_spp_get_page(struct kvm_vcpu *vcpu,
+ gfn_t gfn,
+ unsigned int level)
+{
+ struct kvm_mmu_page *sp;
+ union kvm_mmu_page_role role;
+
+ role = vcpu->arch.mmu->mmu_role.base;
+ role.level = level;
+ role.direct = true;
+ role.spp = true;
+
+ for_each_valid_sp(vcpu->kvm, sp, gfn) {
+ if (sp->gfn != gfn)
+ continue;
+ if (sp->role.word != role.word)
+ continue;
+ if (sp->role.spp && role.level == level)
+ goto out;
+ }
+
+ sp = kvm_mmu_alloc_page(vcpu, true);
+ sp->gfn = gfn;
+ sp->role = role;
+ hlist_add_head(&sp->hash_link,
+ &vcpu->kvm->arch.mmu_page_hash
+ [kvm_page_table_hashfn(gfn)]);
+ clear_page(sp->spt);
+out:
+ return sp;
+}
+EXPORT_SYMBOL_GPL(kvm_spp_get_page);
+
+static void link_spp_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+ struct kvm_mmu_page *sp)
+{
+ u64 spte;
+
+ spte = __pa(sp->spt) | PT_PRESENT_MASK;
+
+ mmu_spte_set(sptep, spte);
+
+ mmu_page_add_parent_pte(vcpu, sp, sptep);
+}
+
+static u64 format_spp_spte(u32 spp_wp_bitmap)
+{
+ u64 new_spte = 0;
+ int i = 0;
+
+ /*
+ * One 4K page contains 32 sub-pages, in SPP table L4E, old bits
+ * are reserved, so we need to transfer u32 subpage write
+ * protect bitmap to u64 SPP L4E format.
+ */
+ while (i < 32) {
+ if (spp_wp_bitmap & (1ULL << i))
+ new_spte |= 1ULL << (i * 2);
+
+ i++;
+ }
+
+ return new_spte;
+}
+
+static void spp_spte_set(u64 *sptep, u64 new_spte)
+{
+ __set_spte(sptep, new_spte);
+}
+
+bool is_spp_spte(struct kvm_mmu_page *sp)
+{
+ return sp->role.spp;
+}
+
+#define SPPT_ENTRY_PHA_MASK (0xFFFFFFFFFF << 12)
+
+int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
+ u32 access_map, gfn_t gfn)
+{
+ struct kvm_shadow_walk_iterator iter;
+ struct kvm_mmu_page *sp;
+ gfn_t pseudo_gfn;
+ u64 old_spte, spp_spte;
+ int ret = -EFAULT;
+
+ /* direct_map spp start */
+ if (!VALID_PAGE(vcpu->arch.mmu->sppt_root))
+ return -EFAULT;
+
+ for_each_shadow_spp_entry(vcpu, (u64)gfn << PAGE_SHIFT, iter) {
+ if (iter.level == PT_PAGE_TABLE_LEVEL) {
+ spp_spte = format_spp_spte(access_map);
+ old_spte = mmu_spte_get_lockless(iter.sptep);
+ if (old_spte != spp_spte) {
+ spp_spte_set(iter.sptep, spp_spte);
+ kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+ }
+ ret = 0;
+ break;
+ }
+
+ if (!is_spp_shadow_present(*iter.sptep)) {
+ u64 base_addr = iter.addr;
+
+ base_addr &= PT64_LVL_ADDR_MASK(iter.level);
+ pseudo_gfn = base_addr >> PAGE_SHIFT;
+
+ spp_spte = *iter.sptep;
+ if ((iter.level == PT_DIRECTORY_LEVEL) &&
+ (spp_spte & SPPT_ENTRY_PHA_MASK)) {
+ spp_spte |= PT_PRESENT_MASK;
+ spp_spte_set(iter.sptep, spp_spte);
+ continue;
+ }
+ sp = kvm_spp_get_page(vcpu, pseudo_gfn,
+ iter.level - 1);
+ link_spp_shadow_page(vcpu, iter.sptep, sp);
+ }
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_spp_setup_structure);
+
+inline u64 construct_spptp(unsigned long root_hpa)
+{
+ return root_hpa & PAGE_MASK;
+}
+EXPORT_SYMBOL_GPL(construct_spptp);
+
diff --git a/arch/x86/kvm/vmx/spp.h b/arch/x86/kvm/vmx/spp.h
new file mode 100644
index 000000000000..2b8f4e8d2267
--- /dev/null
+++ b/arch/x86/kvm/vmx/spp.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_SPP_H
+#define __KVM_X86_VMX_SPP_H
+
+bool is_spp_spte(struct kvm_mmu_page *sp);
+inline u64 construct_spptp(unsigned long root_hpa);
+int kvm_spp_setup_structure(struct kvm_vcpu *vcpu,
+ u32 access_map, gfn_t gfn);
+
+#endif /* __KVM_X86_VMX_SPP_H */
--
2.17.2

2019-09-17 15:57:31

by Yang Weijiang

[permalink] [raw]
Subject: [PATCH v5 1/9] Documentation: Introduce EPT based Subpage Protection

Co-developed-by: [email protected]
Signed-off-by: [email protected]
Signed-off-by: Yang Weijiang <[email protected]>
---
Documentation/virtual/kvm/spp_kvm.txt | 178 ++++++++++++++++++++++++++
1 file changed, 178 insertions(+)
create mode 100644 Documentation/virtual/kvm/spp_kvm.txt

diff --git a/Documentation/virtual/kvm/spp_kvm.txt b/Documentation/virtual/kvm/spp_kvm.txt
new file mode 100644
index 000000000000..1bd1c11d0a99
--- /dev/null
+++ b/Documentation/virtual/kvm/spp_kvm.txt
@@ -0,0 +1,178 @@
+EPT-Based Sub-Page Protection (SPP) for KVM
+====================================================
+
+1.Overview
+ EPT-based Sub-Page Protection(SPP) allows VMM to specify
+ fine-grained(128byte per sub-page) write-protection for guest physical
+ memory. When it's enabled, the CPU enforces write-access permission
+ for the sub-pages within a 4KB page, if corresponding bit is set in
+ permission vector, write to sub-page region is allowed, otherwise,
+ it's prevented with a EPT violation.
+
+ *Note*: In current implementation, SPP is exclusive with nested flag,
+ if it's on, SPP feature won't work.
+
+2.SPP Operation
+ Sub-Page Protection Table (SPPT) is introduced to manage sub-page
+ write-access permission.
+
+ It is active when:
+ a) nested flag is turned off.
+ b) "sub-page write protection" VM-execution control is 1.
+ c) SPP is initialized with KVM_INIT_SPP ioctl.
+ d) Sub-page permissions are set with KVM_SUBPAGES_SET_ACCESS ioctl.
+ see below sections for details.
+
+ __________________________________________________________________________
+
+ How SPP hardware works:
+ __________________________________________________________________________
+
+ Guest write access --> GPA --> Walk EPT --> EPT leaf entry -----|
+ |---------------------------------------------------------------|
+ |-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
+ |
+ |-> <false> --> EPT legacy behavior
+ |
+ |
+ |-> <true> --> if ept_leaf_entry.writable
+ |
+ |-> <true> --> Ignore SPP
+ |
+ |-> <false> --> GPA --> Walk SPP 4-level table--|
+ |
+ |------------<----------get-the-SPPT-point-from-VMCS-filed-----<------|
+ |
+ Walk SPP L4E table
+ |
+ |---> if-entry-misconfiguration ------------>-------|-------<---------|
+ | | |
+ else | |
+ | | |
+ | |------------------SPP VMexit<-----------------| |
+ | | |
+ | |-> exit_qualification & sppt_misconfig --> sppt misconfig |
+ | | |
+ | |-> exit_qualification & sppt_miss --> sppt miss |
+ |---| |
+ | |
+ walk SPPT L3E--|--> if-entry-misconfiguration------------>------------|
+ | |
+ else |
+ | |
+ | |
+ walk SPPT L2E --|--> if-entry-misconfiguration-------->-------|
+ | |
+ else |
+ | |
+ | |
+ walk SPPT L1E --|-> if-entry-misconfiguration--->----|
+ |
+ else
+ |
+ |-> if sub-page writable
+ |-> <true> allow, write access
+ |-> <false> disallow, EPT violation
+ ______________________________________________________________________________
+
+3.IOCTL Interfaces
+
+ KVM_INIT_SPP:
+ Allocate storage for sub-page permission vectors and SPPT root page.
+
+ KVM_SUBPAGES_GET_ACCESS:
+ Get sub-page write permission vectors for given continuous guest pages.
+
+ KVM_SUBPAGES_SET_ACCESS
+ Set SPP bit in EPT leaf entries for given continuous guest pages. The
+ actual SPPT setup is triggered when SPP miss vm-exit is handled.
+
+ /* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+ struct kvm_subpage_info {
+ __u64 gfn; /* the first page gfn of the continuous pages */
+ __u64 npages; /* number of 4K pages */
+ __u64 *access_map; /* sub-page write-access bitmap array */
+ };
+
+ #define KVM_SUBPAGES_GET_ACCESS _IOR(KVMIO, 0x49, __u64)
+ #define KVM_SUBPAGES_SET_ACCESS _IOW(KVMIO, 0x4a, __u64)
+ #define KVM_INIT_SPP _IOW(KVMIO, 0x4b, __u64)
+
+4.Set Sub-Page Permission
+
+ * To enable SPP protection, system admin sets sub-page permission via
+ KVM_SUBPAGES_SET_ACCESS ioctl:
+ (1) It first stores the access permissions in bitmap array.
+
+ (2) Then, if the target 4KB page is mapped as PT_PAGE_TABLE_LEVEL entry in EPT,
+ it sets SPP bit of the corresponding entry to mark sub-page protection.
+ If the 4KB page is mapped as PT_DIRECTORY_LEVEL or PT_PDPE_LEVEL, it
+ zapps the hugepage entry and let following memroy access to trigger EPT
+ page fault, there the gfn is check against SPP permission bitmap and
+ proper level is selected to set up EPT entry.
+
+
+ The SPPT paging structure format is as below:
+
+ Format of the SPPT L4E, L3E, L2E:
+ | Bit | Contents |
+ | :----- | :------------------------------------------------------------------------|
+ | 0 | Valid entry when set; indicates whether the entry is present |
+ | 11:1 | Reserved (0) |
+ | N-1:12 | Physical address of 4KB aligned SPPT LX-1 Table referenced by this entry |
+ | 51:N | Reserved (0) |
+ | 63:52 | Reserved (0) |
+ Note: N is the physical address width supported by the processor. X is the page level
+
+ Format of the SPPT L1E:
+ | Bit | Contents |
+ | :---- | :---------------------------------------------------------------- |
+ | 0+2i | Write permission for i-th 128 byte sub-page region. |
+ | 1+2i | Reserved (0). |
+ Note: 0<=i<=31
+
+5.SPPT-induced VM exit
+
+ * SPPT miss and misconfiguration induced VM exit
+
+ A SPPT missing VM exit occurs when walk the SPPT, there is no SPPT
+ misconfiguration but a paging-structure entry is not
+ present in any of L4E/L3E/L2E entries.
+
+ A SPPT misconfiguration VM exit occurs when reserved bits or unsupported values
+ are set in SPPT entry.
+
+ *NOTE* SPPT miss and SPPT misconfigurations can occur only due to an
+ attempt to write memory with a guest physical address.
+
+ * SPP permission induced VM exit
+ SPP sub-page permission induced violation is reported as EPT violation
+ thesefore causes VM exit.
+
+6.SPPT-induced VM exit handling
+
+ #define EXIT_REASON_SPP 66
+
+ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
+ ...
+ [EXIT_REASON_SPP] = handle_spp,
+ ...
+ };
+
+ New exit qualification for SPPT-induced vmexits.
+
+ | Bit | Contents |
+ | :---- | :---------------------------------------------------------------- |
+ | 10:0 | Reserved (0). |
+ | 11 | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
+ | 12 | NMI unblocking due to IRET |
+ | 63:13 | Reserved (0) |
+
+ In addition to the exit qualification, guest linear address and guest
+ physical address fields will be reported.
+
+ * SPPT miss and misconfiguration induced VM exit
+ Set up SPPT entries correctly.
+
+ * SPP permission induced VM exit
+ This kind of VM exit is left to VMI tool to handle.
--
2.17.2

2019-09-17 17:25:33

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v5 0/9] Enable Sub-page Write Protection Support

On Tue, Sep 17, 2019 at 04:52:55PM +0800, Yang Weijiang wrote:
> EPT-Based Sub-Page write Protection(SPP)is a HW capability which allows
> Virtual Machine Monitor(VMM) to specify write-permission for guest
> physical memory at a sub-page(128 byte) granularity. When this
> capability is enabled, the CPU enforces write-access check for sub-pages
> within a 4KB page.
>
> The feature is targeted to provide fine-grained memory protection for
> usages such as device virtualization, memory check-point and VM
> introspection etc.
>
> SPP is active when the "sub-page write protection" (bit 23) is 1 in
> Secondary VM-Execution Controls. The feature is backed with a Sub-Page
> Permission Table(SPPT), SPPT is referenced via a 64-bit control field
> called Sub-Page Permission Table Pointer (SPPTP) which contains a
> 4K-aligned physical address.
>
> To enable SPP for certain physical page, the gfn should be first mapped
> to a 4KB entry, then set bit 61 of the corresponding EPT leaf entry.
> While HW walks EPT, if bit 61 is set, it traverses SPPT with the guset
> physical address to find out the sub-page permissions at the leaf entry.
> If the corresponding bit is set, write to sub-page is permitted,
> otherwise, SPP induced EPT violation is generated.
>
> This patch serial passed SPP function test and selftest on Ice-Lake platform.
>
> Please refer to the SPP introduction document in this patch set and
> Intel SDM for details:
>
> Intel SDM:
> https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
>
> SPP selftest patch:
> https://lkml.org/lkml/2019/6/18/1197
>
> Previous patch:
> https://lkml.org/lkml/2019/8/14/97

I saw the patches as part of the introspection patch-set.
Are you all working together on this?

Would it be possible for some of the bitdefender folks who depend on this
to provide Tested-by adn could they also take the time to review this patch-set?

Thanks.

2019-09-17 18:41:03

by Adalbert Lazăr

[permalink] [raw]
Subject: Re: [PATCH v5 0/9] Enable Sub-page Write Protection Support

On Tue, 17 Sep 2019 08:59:04 -0400, Konrad Rzeszutek Wilk <[email protected]> wrote:
> On Tue, Sep 17, 2019 at 04:52:55PM +0800, Yang Weijiang wrote:
> > EPT-Based Sub-Page write Protection(SPP)is a HW capability which allows
> > Virtual Machine Monitor(VMM) to specify write-permission for guest
> > physical memory at a sub-page(128 byte) granularity. When this
> > capability is enabled, the CPU enforces write-access check for sub-pages
> > within a 4KB page.
> >
> > The feature is targeted to provide fine-grained memory protection for
> > usages such as device virtualization, memory check-point and VM
> > introspection etc.
> >
> > SPP is active when the "sub-page write protection" (bit 23) is 1 in
> > Secondary VM-Execution Controls. The feature is backed with a Sub-Page
> > Permission Table(SPPT), SPPT is referenced via a 64-bit control field
> > called Sub-Page Permission Table Pointer (SPPTP) which contains a
> > 4K-aligned physical address.
> >
> > To enable SPP for certain physical page, the gfn should be first mapped
> > to a 4KB entry, then set bit 61 of the corresponding EPT leaf entry.
> > While HW walks EPT, if bit 61 is set, it traverses SPPT with the guset
> > physical address to find out the sub-page permissions at the leaf entry.
> > If the corresponding bit is set, write to sub-page is permitted,
> > otherwise, SPP induced EPT violation is generated.
> >
> > This patch serial passed SPP function test and selftest on Ice-Lake platform.
> >
> > Please refer to the SPP introduction document in this patch set and
> > Intel SDM for details:
> >
> > Intel SDM:
> > https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
> >
> > SPP selftest patch:
> > https://lkml.org/lkml/2019/6/18/1197
> >
> > Previous patch:
> > https://lkml.org/lkml/2019/8/14/97
>
> I saw the patches as part of the introspection patch-set.
> Are you all working together on this?

Weijiang helped us to start using the SPP feature with the introspection
API and tested the integration when we didn't had the hardware
available. I've included the SPP patches in the introspection patch
series in order to "show the full picture".

> Would it be possible for some of the bitdefender folks who depend on this
> to provide Tested-by adn could they also take the time to review this patch-set?

Sure. Once we rebase the introspection patches on 5.3, we'll replace
the previous version this new one in our tree and test it.

2019-10-04 20:51:43

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] vmx: spp: Add control flags for Sub-Page Protection(SPP)

On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
>
> Check SPP capability in MSR_IA32_VMX_PROCBASED_CTLS2, its 23-bit
> indicates SPP capability. Enable SPP feature bit in CPU capabilities
> bitmap if it's supported.
>
> Co-developed-by: He Chen <[email protected]>
> Signed-off-by: He Chen <[email protected]>
> Co-developed-by: Zhang Yi <[email protected]>
> Signed-off-by: Zhang Yi <[email protected]>
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/vmx.h | 1 +
> arch/x86/kernel/cpu/intel.c | 4 ++++
> arch/x86/kvm/mmu.h | 2 ++
> arch/x86/kvm/vmx/capabilities.h | 5 +++++
> arch/x86/kvm/vmx/vmx.c | 10 ++++++++++
> 6 files changed, 23 insertions(+)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index e880f2408e29..ee2c76fdadf6 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -228,6 +228,7 @@
> #define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */
> #define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */
> #define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */
> +#define X86_FEATURE_SPP ( 8*32+ 5) /* Intel EPT-based Sub-Page Write Protection */
>
> #define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */
> #define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index a39136b0d509..e1137807affc 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -68,6 +68,7 @@
> #define SECONDARY_EXEC_XSAVES 0x00100000
> #define SECONDARY_EXEC_PT_USE_GPA 0x01000000
> #define SECONDARY_EXEC_MODE_BASED_EPT_EXEC 0x00400000
> +#define SECONDARY_EXEC_ENABLE_SPP 0x00800000
> #define SECONDARY_EXEC_TSC_SCALING 0x02000000
>
> #define PIN_BASED_EXT_INTR_MASK 0x00000001
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 8d6d92ebeb54..27617e522f01 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -503,6 +503,7 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
> #define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002
> #define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020
> #define x86_VMX_FEATURE_EPT_CAP_AD 0x00200000
> +#define X86_VMX_FEATURE_PROC_CTLS2_SPP 0x00800000
>
> u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
> u32 msr_vpid_cap, msr_ept_cap;
> @@ -513,6 +514,7 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
> clear_cpu_cap(c, X86_FEATURE_EPT);
> clear_cpu_cap(c, X86_FEATURE_VPID);
> clear_cpu_cap(c, X86_FEATURE_EPT_AD);
> + clear_cpu_cap(c, X86_FEATURE_SPP);
>
> rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
> msr_ctl = vmx_msr_high | vmx_msr_low;
> @@ -536,6 +538,8 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
> }
> if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
> set_cpu_cap(c, X86_FEATURE_VPID);
> + if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_SPP)
> + set_cpu_cap(c, X86_FEATURE_SPP);

SPP requires EPT, so this could be moved up next to the EPT_AD check.
In fact, I would suggest changing 'SPP' to 'EPT_SPP' to make it clear
that this feature is *EPT* sub-page permissions.

> }
> }
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 54c2a377795b..3c1423526a98 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -26,6 +26,8 @@
> #define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
> #define PT_PAT_MASK (1ULL << 7)
> #define PT_GLOBAL_MASK (1ULL << 8)
> +#define PT_SPP_SHIFT 61
> +#define PT_SPP_MASK (1ULL << PT_SPP_SHIFT)

Since these constants are only applicable to EPT, would it be more
appropriate to define them in paging_tmpl.h, under '#elif PTTYPE ==
PTTYPE_EPT'? If not, it seems that they should at least be renamed to
PT64_SPP_* for consistency with the other macros here.

> #define PT64_NX_SHIFT 63
> #define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)
>
> diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
> index d6664ee3d127..e3bde7a32123 100644
> --- a/arch/x86/kvm/vmx/capabilities.h
> +++ b/arch/x86/kvm/vmx/capabilities.h
> @@ -241,6 +241,11 @@ static inline bool cpu_has_vmx_pml(void)
> return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_PML;
> }
>
> +static inline bool cpu_has_vmx_ept_spp(void)
> +{
> + return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_SPP;
> +}
> +
> static inline bool vmx_xsaves_supported(void)
> {
> return vmcs_config.cpu_based_2nd_exec_ctrl &
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c030c96fc81a..8ecf9cb24879 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -60,6 +60,7 @@
> #include "vmcs12.h"
> #include "vmx.h"
> #include "x86.h"
> +#include "spp.h"
>
> MODULE_AUTHOR("Qumranet");
> MODULE_LICENSE("GPL");
> @@ -113,6 +114,7 @@ module_param_named(pml, enable_pml, bool, S_IRUGO);
>
> static bool __read_mostly dump_invalid_vmcs = 0;
> module_param(dump_invalid_vmcs, bool, 0644);
> +static bool __read_mostly spp_supported = 0;
>
> #define MSR_BITMAP_MODE_X2APIC 1
> #define MSR_BITMAP_MODE_X2APIC_APICV 2
> @@ -2279,6 +2281,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> SECONDARY_EXEC_RDSEED_EXITING |
> SECONDARY_EXEC_RDRAND_EXITING |
> SECONDARY_EXEC_ENABLE_PML |
> + SECONDARY_EXEC_ENABLE_SPP |
> SECONDARY_EXEC_TSC_SCALING |
> SECONDARY_EXEC_PT_USE_GPA |
> SECONDARY_EXEC_PT_CONCEAL_VMX |
> @@ -3931,6 +3934,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
> if (!enable_pml)
> exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
>
> + if (!spp_supported)
> + exec_control &= ~SECONDARY_EXEC_ENABLE_SPP;
> +
> if (vmx_xsaves_supported()) {
> /* Exposing XSAVES only when XSAVE is exposed */
> bool xsaves_enabled =
> @@ -7521,6 +7527,10 @@ static __init int hardware_setup(void)
> if (!cpu_has_vmx_flexpriority())
> flexpriority_enabled = 0;
>
> + if (cpu_has_vmx_ept_spp() && enable_ept &&
> + boot_cpu_has(X86_FEATURE_SPP))
> + spp_supported = 1;

Don't cpu_has_vmx_ept_spp() and boot_cpu_has(X86_FEATURE_SPP) test
exactly the same thing?

> if (!cpu_has_virtual_nmis())
> enable_vnmi = 0;
>
> --
> 2.17.2
>

2019-10-04 21:05:35

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] vmx: spp: Add control flags for Sub-Page Protection(SPP)

On Fri, Oct 04, 2019 at 01:48:34PM -0700, Jim Mattson wrote:
> On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
> > @@ -7521,6 +7527,10 @@ static __init int hardware_setup(void)
> > if (!cpu_has_vmx_flexpriority())
> > flexpriority_enabled = 0;
> >
> > + if (cpu_has_vmx_ept_spp() && enable_ept &&
> > + boot_cpu_has(X86_FEATURE_SPP))
> > + spp_supported = 1;
>
> Don't cpu_has_vmx_ept_spp() and boot_cpu_has(X86_FEATURE_SPP) test
> exactly the same thing?

More or less. I'm about to hit 'send' on a series to eliminate the
synthetic VMX features flags. If that goes through, the X86_FEATURE_SPP
flag can also go away.

2019-10-09 02:16:29

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 0/9] Enable Sub-page Write Protection Support

On Tue, Sep 17, 2019 at 04:52:55PM +0800, Yang, Weijiang wrote:
Hi, Paolo,
Could you review this v5 patch at your convenience?
Thanks a lot!

> EPT-Based Sub-Page write Protection(SPP)is a HW capability which allows
> Virtual Machine Monitor(VMM) to specify write-permission for guest
> physical memory at a sub-page(128 byte) granularity. When this
> capability is enabled, the CPU enforces write-access check for sub-pages
> within a 4KB page.
>
> The feature is targeted to provide fine-grained memory protection for
> usages such as device virtualization, memory check-point and VM
> introspection etc.
>
> SPP is active when the "sub-page write protection" (bit 23) is 1 in
> Secondary VM-Execution Controls. The feature is backed with a Sub-Page
> Permission Table(SPPT), SPPT is referenced via a 64-bit control field
> called Sub-Page Permission Table Pointer (SPPTP) which contains a
> 4K-aligned physical address.
>
> To enable SPP for certain physical page, the gfn should be first mapped
> to a 4KB entry, then set bit 61 of the corresponding EPT leaf entry.
> While HW walks EPT, if bit 61 is set, it traverses SPPT with the guset
> physical address to find out the sub-page permissions at the leaf entry.
> If the corresponding bit is set, write to sub-page is permitted,
> otherwise, SPP induced EPT violation is generated.
>
> This patch serial passed SPP function test and selftest on Ice-Lake platform.
>
> Please refer to the SPP introduction document in this patch set and
> Intel SDM for details:
>
> Intel SDM:
> https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf
>
> SPP selftest patch:
> https://lkml.org/lkml/2019/6/18/1197
>
> Previous patch:
> https://lkml.org/lkml/2019/8/14/97
>
> Patch 1: Introduction to SPP.
> Patch 2: Add SPP related flags and control bits.
> Patch 3: Functions for SPPT setup.
> Patch 4: Add SPP access bitmaps for memslots.
> Patch 5: Introduce SPP {init,set,get} functions
> Patch 6: Implement User space access IOCTLs.
> Patch 7: Set up SPP paging table at vm-entry/exit.
> Patch 8: Enable lazy mode SPPT setup.
> Patch 9: Handle SPP protected pages when VM memory changes
>
>
> Change logs:
>
> V5 -> V4:
> 1. Enable SPP support for Hugepage(1GB/2MB) to extend application.
> 2. Make SPP miss vm-exit handler as the unified place to set up SPPT.
> 3. If SPP protected pages are access-tracked or dirty-page-tracked,
> store SPP flag in reserved address bit, restore it in
> fast_page_fault() handler.
> 4. Move SPP specific functions to vmx/spp.c and vmx/spp.h
> 5. Rebased code to kernel v5.3
> 6. Other change suggested by KVM community.
>
> V3 -> V4:
> 1. Modified documentation to make it consistent with patches.
> 2. Allocated SPPT root page in init_spp() instead of vmx_set_cr3() to
> avoid SPPT miss error.
> 3. Added back co-developers and sign-offs.
>
> V2 -> V3:
> 1. Rebased patches to kernel 5.1 release
> 2. Deferred SPPT setup to EPT fault handler if the page is not
> available while set_subpage() is being called.
> 3. Added init IOCTL to reduce extra cost if SPP is not used.
> 4. Refactored patch structure, cleaned up cross referenced functions.
> 5. Added code to deal with memory swapping/migration/shrinker cases.
>
> V2 -> V1:
> 1. Rebased to 4.20-rc1
> 2. Move VMCS change to a separated patch.
> 3. Code refine and Bug fix
>
>
> Yang Weijiang (9):
> Documentation: Introduce EPT based Subpage Protection
> vmx: spp: Add control flags for Sub-Page Protection(SPP)
> mmu: spp: Add SPP Table setup functions
> mmu: spp: Add functions to create/destroy SPP bitmap block
> mmu: spp: Introduce SPP {init,set,get} functions
> x86: spp: Introduce user-space SPP IOCTLs
> vmx: spp: Set up SPP paging table at vm-entry/exit
> mmu: spp: Enable Lazy mode SPP protection
> mmu: spp: Handle SPP protected pages when VM memory changes
>
> Documentation/virtual/kvm/spp_kvm.txt | 178 +++++++
> arch/x86/include/asm/cpufeatures.h | 1 +
> arch/x86/include/asm/kvm_host.h | 10 +-
> arch/x86/include/asm/vmx.h | 10 +
> arch/x86/include/uapi/asm/vmx.h | 2 +
> arch/x86/kernel/cpu/intel.c | 4 +
> arch/x86/kvm/mmu.c | 78 ++-
> arch/x86/kvm/mmu.h | 2 +
> arch/x86/kvm/vmx/capabilities.h | 5 +
> arch/x86/kvm/vmx/spp.c | 651 ++++++++++++++++++++++++++
> arch/x86/kvm/vmx/spp.h | 27 ++
> arch/x86/kvm/vmx/vmx.c | 99 ++++
> arch/x86/kvm/x86.c | 51 ++
> include/uapi/linux/kvm.h | 17 +
> 14 files changed, 1133 insertions(+), 2 deletions(-)
> create mode 100644 Documentation/virtual/kvm/spp_kvm.txt
> create mode 100644 arch/x86/kvm/vmx/spp.c
> create mode 100644 arch/x86/kvm/vmx/spp.h
>
> --
> 2.17.2

2019-10-10 21:44:37

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v5 0/9] Enable Sub-page Write Protection Support

On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
>
> EPT-Based Sub-Page write Protection(SPP)is a HW capability which allows
> Virtual Machine Monitor(VMM) to specify write-permission for guest
> physical memory at a sub-page(128 byte) granularity. When this
> capability is enabled, the CPU enforces write-access check for sub-pages
> within a 4KB page.
>
> The feature is targeted to provide fine-grained memory protection for
> usages such as device virtualization, memory check-point and VM
> introspection etc.
>
> SPP is active when the "sub-page write protection" (bit 23) is 1 in
> Secondary VM-Execution Controls. The feature is backed with a Sub-Page
> Permission Table(SPPT), SPPT is referenced via a 64-bit control field
> called Sub-Page Permission Table Pointer (SPPTP) which contains a
> 4K-aligned physical address.
>
> To enable SPP for certain physical page, the gfn should be first mapped
> to a 4KB entry, then set bit 61 of the corresponding EPT leaf entry.
> While HW walks EPT, if bit 61 is set, it traverses SPPT with the guset
> physical address to find out the sub-page permissions at the leaf entry.
> If the corresponding bit is set, write to sub-page is permitted,
> otherwise, SPP induced EPT violation is generated.

How do you handle sub-page permissions for instructions emulated by kvm?

2019-10-11 07:49:27

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 0/9] Enable Sub-page Write Protection Support

On Thu, Oct 10, 2019 at 02:42:51PM -0700, Jim Mattson wrote:
> On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
> >
> > EPT-Based Sub-Page write Protection(SPP)is a HW capability which allows
> > Virtual Machine Monitor(VMM) to specify write-permission for guest
> > physical memory at a sub-page(128 byte) granularity. When this
> > capability is enabled, the CPU enforces write-access check for sub-pages
> > within a 4KB page.
> >
> > The feature is targeted to provide fine-grained memory protection for
> > usages such as device virtualization, memory check-point and VM
> > introspection etc.
> >
> > SPP is active when the "sub-page write protection" (bit 23) is 1 in
> > Secondary VM-Execution Controls. The feature is backed with a Sub-Page
> > Permission Table(SPPT), SPPT is referenced via a 64-bit control field
> > called Sub-Page Permission Table Pointer (SPPTP) which contains a
> > 4K-aligned physical address.
> >
> > To enable SPP for certain physical page, the gfn should be first mapped
> > to a 4KB entry, then set bit 61 of the corresponding EPT leaf entry.
> > While HW walks EPT, if bit 61 is set, it traverses SPPT with the guset
> > physical address to find out the sub-page permissions at the leaf entry.
> > If the corresponding bit is set, write to sub-page is permitted,
> > otherwise, SPP induced EPT violation is generated.
>
> How do you handle sub-page permissions for instructions emulated by kvm?
How about checking if the gpa is SPP protected, if it is, inject some
exception to guest?

2019-10-11 16:15:37

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v5 0/9] Enable Sub-page Write Protection Support

On Fri, Oct 11, 2019 at 12:48 AM Yang Weijiang <[email protected]> wrote:
>
> On Thu, Oct 10, 2019 at 02:42:51PM -0700, Jim Mattson wrote:
> > On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
> > >
> > > EPT-Based Sub-Page write Protection(SPP)is a HW capability which allows
> > > Virtual Machine Monitor(VMM) to specify write-permission for guest
> > > physical memory at a sub-page(128 byte) granularity. When this
> > > capability is enabled, the CPU enforces write-access check for sub-pages
> > > within a 4KB page.
> > >
> > > The feature is targeted to provide fine-grained memory protection for
> > > usages such as device virtualization, memory check-point and VM
> > > introspection etc.
> > >
> > > SPP is active when the "sub-page write protection" (bit 23) is 1 in
> > > Secondary VM-Execution Controls. The feature is backed with a Sub-Page
> > > Permission Table(SPPT), SPPT is referenced via a 64-bit control field
> > > called Sub-Page Permission Table Pointer (SPPTP) which contains a
> > > 4K-aligned physical address.
> > >
> > > To enable SPP for certain physical page, the gfn should be first mapped
> > > to a 4KB entry, then set bit 61 of the corresponding EPT leaf entry.
> > > While HW walks EPT, if bit 61 is set, it traverses SPPT with the guset
> > > physical address to find out the sub-page permissions at the leaf entry.
> > > If the corresponding bit is set, write to sub-page is permitted,
> > > otherwise, SPP induced EPT violation is generated.
> >
> > How do you handle sub-page permissions for instructions emulated by kvm?
> How about checking if the gpa is SPP protected, if it is, inject some
> exception to guest?
The SPP semantics are well-defined. If a kvm-emulated instruction
tries to write to a sub-page that is write-protected, then an
SPP-induced EPT violation should be synthesized.

2019-10-11 20:58:51

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v5 1/9] Documentation: Introduce EPT based Subpage Protection

On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
>
> Co-developed-by: [email protected]
> Signed-off-by: [email protected]
> Signed-off-by: Yang Weijiang <[email protected]>
> ---
> Documentation/virtual/kvm/spp_kvm.txt | 178 ++++++++++++++++++++++++++
> 1 file changed, 178 insertions(+)
> create mode 100644 Documentation/virtual/kvm/spp_kvm.txt
>
> diff --git a/Documentation/virtual/kvm/spp_kvm.txt b/Documentation/virtual/kvm/spp_kvm.txt
> new file mode 100644
> index 000000000000..1bd1c11d0a99
> --- /dev/null
> +++ b/Documentation/virtual/kvm/spp_kvm.txt
> @@ -0,0 +1,178 @@
> +EPT-Based Sub-Page Protection (SPP) for KVM
> +====================================================
> +
> +1.Overview
> + EPT-based Sub-Page Protection(SPP) allows VMM to specify
> + fine-grained(128byte per sub-page) write-protection for guest physical
> + memory. When it's enabled, the CPU enforces write-access permission
> + for the sub-pages within a 4KB page, if corresponding bit is set in
> + permission vector, write to sub-page region is allowed, otherwise,
> + it's prevented with a EPT violation.
> +
> + *Note*: In current implementation, SPP is exclusive with nested flag,
> + if it's on, SPP feature won't work.
> +
> +2.SPP Operation
> + Sub-Page Protection Table (SPPT) is introduced to manage sub-page
> + write-access permission.
> +
> + It is active when:
> + a) nested flag is turned off.
> + b) "sub-page write protection" VM-execution control is 1.
> + c) SPP is initialized with KVM_INIT_SPP ioctl.
> + d) Sub-page permissions are set with KVM_SUBPAGES_SET_ACCESS ioctl.
> + see below sections for details.
> +
> + __________________________________________________________________________
> +
> + How SPP hardware works:
> + __________________________________________________________________________
> +
> + Guest write access --> GPA --> Walk EPT --> EPT leaf entry -----|
> + |---------------------------------------------------------------|
> + |-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
> + |
> + |-> <false> --> EPT legacy behavior
> + |
> + |
> + |-> <true> --> if ept_leaf_entry.writable
> + |
> + |-> <true> --> Ignore SPP
> + |
> + |-> <false> --> GPA --> Walk SPP 4-level table--|
> + |
> + |------------<----------get-the-SPPT-point-from-VMCS-filed-----<------|
/filed/field/
> + |
> + Walk SPP L4E table
> + |
> + |---> if-entry-misconfiguration ------------>-------|-------<---------|
> + | | |
> + else | |
> + | | |
> + | |------------------SPP VMexit<-----------------| |
> + | | |
> + | |-> exit_qualification & sppt_misconfig --> sppt misconfig |
> + | | |
> + | |-> exit_qualification & sppt_miss --> sppt miss |
> + |---| |
> + | |
> + walk SPPT L3E--|--> if-entry-misconfiguration------------>------------|
> + | |
> + else |
> + | |
> + | |
> + walk SPPT L2E --|--> if-entry-misconfiguration-------->-------|
> + | |
> + else |
> + | |
> + | |
> + walk SPPT L1E --|-> if-entry-misconfiguration--->----|
> + |
> + else
> + |
> + |-> if sub-page writable
> + |-> <true> allow, write access
> + |-> <false> disallow, EPT violation
> + ______________________________________________________________________________
> +
> +3.IOCTL Interfaces
> +
> + KVM_INIT_SPP:
> + Allocate storage for sub-page permission vectors and SPPT root page.
> +
> + KVM_SUBPAGES_GET_ACCESS:
> + Get sub-page write permission vectors for given continuous guest pages.
/continuous/contiguous/
> +
> + KVM_SUBPAGES_SET_ACCESS
> + Set SPP bit in EPT leaf entries for given continuous guest pages. The
/continuous/contiguous/
> + actual SPPT setup is triggered when SPP miss vm-exit is handled.
> +
> + /* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
> + struct kvm_subpage_info {
> + __u64 gfn; /* the first page gfn of the continuous pages */
/continuous/contiguous/
> + __u64 npages; /* number of 4K pages */
> + __u64 *access_map; /* sub-page write-access bitmap array */
> + };
> +
> + #define KVM_SUBPAGES_GET_ACCESS _IOR(KVMIO, 0x49, __u64)
> + #define KVM_SUBPAGES_SET_ACCESS _IOW(KVMIO, 0x4a, __u64)
> + #define KVM_INIT_SPP _IOW(KVMIO, 0x4b, __u64)

The ioctls should be documented in api.txt.

> +4.Set Sub-Page Permission
> +
> + * To enable SPP protection, system admin sets sub-page permission via
Why system admin? Can't any kvm user do this?
> + KVM_SUBPAGES_SET_ACCESS ioctl:
> + (1) It first stores the access permissions in bitmap array.
> +
> + (2) Then, if the target 4KB page is mapped as PT_PAGE_TABLE_LEVEL entry in EPT,
/page is/pages are/
> + it sets SPP bit of the corresponding entry to mark sub-page protection.
> + If the 4KB page is mapped as PT_DIRECTORY_LEVEL or PT_PDPE_LEVEL, it
/page is/pages are/
> + zapps the hugepage entry and let following memroy access to trigger EPT
/zapps/zaps/, /entry/enttries/, /memroy/memory/
> + page fault, there the gfn is check against SPP permission bitmap and
/page fault/violation/
> + proper level is selected to set up EPT entry.
> +
> +
> + The SPPT paging structure format is as below:
> +
> + Format of the SPPT L4E, L3E, L2E:
> + | Bit | Contents |
> + | :----- | :------------------------------------------------------------------------|
> + | 0 | Valid entry when set; indicates whether the entry is present |
> + | 11:1 | Reserved (0) |
> + | N-1:12 | Physical address of 4KB aligned SPPT LX-1 Table referenced by this entry |
> + | 51:N | Reserved (0) |
> + | 63:52 | Reserved (0) |
> + Note: N is the physical address width supported by the processor. X is the page level
> +
> + Format of the SPPT L1E:
> + | Bit | Contents |
> + | :---- | :---------------------------------------------------------------- |
> + | 0+2i | Write permission for i-th 128 byte sub-page region. |
> + | 1+2i | Reserved (0). |
> + Note: 0<=i<=31
> +
> +5.SPPT-induced VM exit
> +
> + * SPPT miss and misconfiguration induced VM exit
> +
> + A SPPT missing VM exit occurs when walk the SPPT, there is no SPPT
> + misconfiguration but a paging-structure entry is not
> + present in any of L4E/L3E/L2E entries.
> +
> + A SPPT misconfiguration VM exit occurs when reserved bits or unsupported values
> + are set in SPPT entry.
> +
> + *NOTE* SPPT miss and SPPT misconfigurations can occur only due to an
> + attempt to write memory with a guest physical address.

Can you clarify what this means? For instance, setting an A or D bit
in a PTE is an attempt to "write memory with a guest physical
address," but per the SDM, it is not an operation that is eligible for
sub-page write permissions.

> + * SPP permission induced VM exit
> + SPP sub-page permission induced violation is reported as EPT violation
> + thesefore causes VM exit.
/thesefore/therefore/

> +
> +6.SPPT-induced VM exit handling
> +
> + #define EXIT_REASON_SPP 66
> +
> + static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> + ...
> + [EXIT_REASON_SPP] = handle_spp,
> + ...
> + };
> +
> + New exit qualification for SPPT-induced vmexits.
> +
> + | Bit | Contents |
> + | :---- | :---------------------------------------------------------------- |
> + | 10:0 | Reserved (0). |
> + | 11 | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
> + | 12 | NMI unblocking due to IRET |
> + | 63:13 | Reserved (0) |
> +
> + In addition to the exit qualification, guest linear address and guest
> + physical address fields will be reported.
> +
> + * SPPT miss and misconfiguration induced VM exit
> + Set up SPPT entries correctly.
> +
> + * SPP permission induced VM exit
> + This kind of VM exit is left to VMI tool to handle.
> --
> 2.17.2
>

2019-10-15 05:35:03

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] vmx: spp: Add control flags for Sub-Page Protection(SPP)

On Fri, Oct 04, 2019 at 02:02:22PM -0700, Sean Christopherson wrote:
> On Fri, Oct 04, 2019 at 01:48:34PM -0700, Jim Mattson wrote:
> > On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
> > > @@ -7521,6 +7527,10 @@ static __init int hardware_setup(void)
> > > if (!cpu_has_vmx_flexpriority())
> > > flexpriority_enabled = 0;
> > >
> > > + if (cpu_has_vmx_ept_spp() && enable_ept &&
> > > + boot_cpu_has(X86_FEATURE_SPP))
> > > + spp_supported = 1;
> >
> > Don't cpu_has_vmx_ept_spp() and boot_cpu_has(X86_FEATURE_SPP) test
> > exactly the same thing?
>
> More or less. I'm about to hit 'send' on a series to eliminate the
> synthetic VMX features flags. If that goes through, the X86_FEATURE_SPP
> flag can also go away.

Thank you, these two are synonyms. I'll remove one next time.

2019-10-15 11:15:05

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 1/9] Documentation: Introduce EPT based Subpage Protection

On Fri, Oct 11, 2019 at 01:31:08PM -0700, Jim Mattson wrote:
> On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
> >
> > Co-developed-by: [email protected]
> > Signed-off-by: [email protected]
> > Signed-off-by: Yang Weijiang <[email protected]>
> > ---
> > Documentation/virtual/kvm/spp_kvm.txt | 178 ++++++++++++++++++++++++++
> > 1 file changed, 178 insertions(+)
> > create mode 100644 Documentation/virtual/kvm/spp_kvm.txt
> >
> > diff --git a/Documentation/virtual/kvm/spp_kvm.txt b/Documentation/virtual/kvm/spp_kvm.txt
> > new file mode 100644
> > index 000000000000..1bd1c11d0a99
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/spp_kvm.txt
> > @@ -0,0 +1,178 @@
> > +EPT-Based Sub-Page Protection (SPP) for KVM
> > +====================================================
> > +
> > +1.Overview
> > + EPT-based Sub-Page Protection(SPP) allows VMM to specify
> > + fine-grained(128byte per sub-page) write-protection for guest physical
> > + memory. When it's enabled, the CPU enforces write-access permission
> > + for the sub-pages within a 4KB page, if corresponding bit is set in
> > + permission vector, write to sub-page region is allowed, otherwise,
> > + it's prevented with a EPT violation.
> > +
> > + *Note*: In current implementation, SPP is exclusive with nested flag,
> > + if it's on, SPP feature won't work.
> > +
> > +2.SPP Operation
> > + Sub-Page Protection Table (SPPT) is introduced to manage sub-page
> > + write-access permission.
> > +
> > + It is active when:
> > + a) nested flag is turned off.
> > + b) "sub-page write protection" VM-execution control is 1.
> > + c) SPP is initialized with KVM_INIT_SPP ioctl.
> > + d) Sub-page permissions are set with KVM_SUBPAGES_SET_ACCESS ioctl.
> > + see below sections for details.
> > +
> > + __________________________________________________________________________
> > +
> > + How SPP hardware works:
> > + __________________________________________________________________________
> > +
> > + Guest write access --> GPA --> Walk EPT --> EPT leaf entry -----|
> > + |---------------------------------------------------------------|
> > + |-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
> > + |
> > + |-> <false> --> EPT legacy behavior
> > + |
> > + |
> > + |-> <true> --> if ept_leaf_entry.writable
> > + |
> > + |-> <true> --> Ignore SPP
> > + |
> > + |-> <false> --> GPA --> Walk SPP 4-level table--|
> > + |
> > + |------------<----------get-the-SPPT-point-from-VMCS-filed-----<------|
> /filed/field/
> > + |
> > + Walk SPP L4E table
> > + |
> > + |---> if-entry-misconfiguration ------------>-------|-------<---------|
> > + | | |
> > + else | |
> > + | | |
> > + | |------------------SPP VMexit<-----------------| |
> > + | | |
> > + | |-> exit_qualification & sppt_misconfig --> sppt misconfig |
> > + | | |
> > + | |-> exit_qualification & sppt_miss --> sppt miss |
> > + |---| |
> > + | |
> > + walk SPPT L3E--|--> if-entry-misconfiguration------------>------------|
> > + | |
> > + else |
> > + | |
> > + | |
> > + walk SPPT L2E --|--> if-entry-misconfiguration-------->-------|
> > + | |
> > + else |
> > + | |
> > + | |
> > + walk SPPT L1E --|-> if-entry-misconfiguration--->----|
> > + |
> > + else
> > + |
> > + |-> if sub-page writable
> > + |-> <true> allow, write access
> > + |-> <false> disallow, EPT violation
> > + ______________________________________________________________________________
> > +
> > +3.IOCTL Interfaces
> > +
> > + KVM_INIT_SPP:
> > + Allocate storage for sub-page permission vectors and SPPT root page.
> > +
> > + KVM_SUBPAGES_GET_ACCESS:
> > + Get sub-page write permission vectors for given continuous guest pages.
> /continuous/contiguous/
Thanks for all the corrections.

> > +
> > + KVM_SUBPAGES_SET_ACCESS
> > + Set SPP bit in EPT leaf entries for given continuous guest pages. The
> /continuous/contiguous/
> > + actual SPPT setup is triggered when SPP miss vm-exit is handled.
> > +
> > + /* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
> > + struct kvm_subpage_info {
> > + __u64 gfn; /* the first page gfn of the continuous pages */
> /continuous/contiguous/
> > + __u64 npages; /* number of 4K pages */
> > + __u64 *access_map; /* sub-page write-access bitmap array */
> > + };
> > +
> > + #define KVM_SUBPAGES_GET_ACCESS _IOR(KVMIO, 0x49, __u64)
> > + #define KVM_SUBPAGES_SET_ACCESS _IOW(KVMIO, 0x4a, __u64)
> > + #define KVM_INIT_SPP _IOW(KVMIO, 0x4b, __u64)
>
> The ioctls should be documented in api.txt.
>
Sure, will do it.

> > +4.Set Sub-Page Permission
> > +
> > + * To enable SPP protection, system admin sets sub-page permission via
> Why system admin? Can't any kvm user do this?
Oops, will change it.

> > + KVM_SUBPAGES_SET_ACCESS ioctl:
> > + (1) It first stores the access permissions in bitmap array.
> > +
> > + (2) Then, if the target 4KB page is mapped as PT_PAGE_TABLE_LEVEL entry in EPT,
> /page is/pages are/
> > + it sets SPP bit of the corresponding entry to mark sub-page protection.
> > + If the 4KB page is mapped as PT_DIRECTORY_LEVEL or PT_PDPE_LEVEL, it
> /page is/pages are/
> > + zapps the hugepage entry and let following memroy access to trigger EPT
> /zapps/zaps/, /entry/enttries/, /memroy/memory/
> > + page fault, there the gfn is check against SPP permission bitmap and
> /page fault/violation/
> > + proper level is selected to set up EPT entry.
> > +
> > +
> > + The SPPT paging structure format is as below:
> > +
> > + Format of the SPPT L4E, L3E, L2E:
> > + | Bit | Contents |
> > + | :----- | :------------------------------------------------------------------------|
> > + | 0 | Valid entry when set; indicates whether the entry is present |
> > + | 11:1 | Reserved (0) |
> > + | N-1:12 | Physical address of 4KB aligned SPPT LX-1 Table referenced by this entry |
> > + | 51:N | Reserved (0) |
> > + | 63:52 | Reserved (0) |
> > + Note: N is the physical address width supported by the processor. X is the page level
> > +
> > + Format of the SPPT L1E:
> > + | Bit | Contents |
> > + | :---- | :---------------------------------------------------------------- |
> > + | 0+2i | Write permission for i-th 128 byte sub-page region. |
> > + | 1+2i | Reserved (0). |
> > + Note: 0<=i<=31
> > +
> > +5.SPPT-induced VM exit
> > +
> > + * SPPT miss and misconfiguration induced VM exit
> > +
> > + A SPPT missing VM exit occurs when walk the SPPT, there is no SPPT
> > + misconfiguration but a paging-structure entry is not
> > + present in any of L4E/L3E/L2E entries.
> > +
> > + A SPPT misconfiguration VM exit occurs when reserved bits or unsupported values
> > + are set in SPPT entry.
> > +
> > + *NOTE* SPPT miss and SPPT misconfigurations can occur only due to an
> > + attempt to write memory with a guest physical address.
>
> Can you clarify what this means? For instance, setting an A or D bit
> in a PTE is an attempt to "write memory with a guest physical
> address," but per the SDM, it is not an operation that is eligible for
> sub-page write permissions.

Yep, should be "memory write mapped by EPT leaf entry and guarded by SPP". thanks!
>
> > + * SPP permission induced VM exit
> > + SPP sub-page permission induced violation is reported as EPT violation
> > + thesefore causes VM exit.
> /thesefore/therefore/
>
> > +
> > +6.SPPT-induced VM exit handling
> > +
> > + #define EXIT_REASON_SPP 66
> > +
> > + static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> > + ...
> > + [EXIT_REASON_SPP] = handle_spp,
> > + ...
> > + };
> > +
> > + New exit qualification for SPPT-induced vmexits.
> > +
> > + | Bit | Contents |
> > + | :---- | :---------------------------------------------------------------- |
> > + | 10:0 | Reserved (0). |
> > + | 11 | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
> > + | 12 | NMI unblocking due to IRET |
> > + | 63:13 | Reserved (0) |
> > +
> > + In addition to the exit qualification, guest linear address and guest
> > + physical address fields will be reported.
> > +
> > + * SPPT miss and misconfiguration induced VM exit
> > + Set up SPPT entries correctly.
> > +
> > + * SPP permission induced VM exit
> > + This kind of VM exit is left to VMI tool to handle.
> > --
> > 2.17.2
> >

2019-10-22 06:18:43

by Yang Weijiang

[permalink] [raw]
Subject: Re: [PATCH v5 0/9] Enable Sub-page Write Protection Support

On Fri, Oct 11, 2019 at 09:11:54AM -0700, Jim Mattson wrote:
> On Fri, Oct 11, 2019 at 12:48 AM Yang Weijiang <[email protected]> wrote:
> >
> > On Thu, Oct 10, 2019 at 02:42:51PM -0700, Jim Mattson wrote:
> > > On Tue, Sep 17, 2019 at 1:52 AM Yang Weijiang <[email protected]> wrote:
> > > >
> > > > EPT-Based Sub-Page write Protection(SPP)is a HW capability which allows
> > > > Virtual Machine Monitor(VMM) to specify write-permission for guest
> > > > physical memory at a sub-page(128 byte) granularity. When this
> > > > capability is enabled, the CPU enforces write-access check for sub-pages
> > > > within a 4KB page.
> > > >
> > > > The feature is targeted to provide fine-grained memory protection for
> > > > usages such as device virtualization, memory check-point and VM
> > > > introspection etc.
> > > >
> > > > SPP is active when the "sub-page write protection" (bit 23) is 1 in
> > > > Secondary VM-Execution Controls. The feature is backed with a Sub-Page
> > > > Permission Table(SPPT), SPPT is referenced via a 64-bit control field
> > > > called Sub-Page Permission Table Pointer (SPPTP) which contains a
> > > > 4K-aligned physical address.
> > > >
> > > > To enable SPP for certain physical page, the gfn should be first mapped
> > > > to a 4KB entry, then set bit 61 of the corresponding EPT leaf entry.
> > > > While HW walks EPT, if bit 61 is set, it traverses SPPT with the guset
> > > > physical address to find out the sub-page permissions at the leaf entry.
> > > > If the corresponding bit is set, write to sub-page is permitted,
> > > > otherwise, SPP induced EPT violation is generated.
> > >
> > > How do you handle sub-page permissions for instructions emulated by kvm?
> > How about checking if the gpa is SPP protected, if it is, inject some
> > exception to guest?
> The SPP semantics are well-defined. If a kvm-emulated instruction
> tries to write to a sub-page that is write-protected, then an
> SPP-induced EPT violation should be synthesized.
Hi, Jim,

Regarding the emulated instructions in KVM, there're quite a few
instructions can write guest memory, such as MOVS, XCHG, INS etc.,
check each destination against SPP protected area would be trivial if
deals with them individually, and PIO/MMIO induced vmexit/page_fault also
can link to a SPP protected page, e.g., a string instruction's the destination
is SPP protected memory. Is there a good way to intercept these writes?
emulate_ops.write_emulated() is called in most of the emulation cases to
check and write guest memory, but not sure it's suitable.
Do you have any suggestion?

Thanks!