Here is a patch-series which adding EPT-Based Sub-page Write Protection Support.
Introduction:
EPT-Based Sub-page Write Protection referred to as SPP, it is a capability which
allow Virtual Machine Monitors(VMM) to specify write-permission for guest
physical memory at a sub-page(128 byte) granularity. When this capability is
utilized, the CPU enforces write-access permissions for sub-page regions of 4K
pages as specified by the VMM. EPT-based sub-page permissions is intended to
enable fine-grained memory write enforcement by a VMM for security(guest OS
monitoring) and usages such as device virtualization and memory check-point.
SPPT is active when the "sub-page write protection" VM-execution control is 1.
SPPT looks up the guest physical addresses to derive a 64 bit "sub-page
permission" value containing sub-page write permissions. The lookup from
guest-physical addresses to the sub-page region permissions is determined by a
set of SPPT paging structures.
When the "sub-page write protection" VM-execution control is 1, the SPPT is used
to lookup write permission bits for the 128 byte sub-page regions containing in
the 4KB guest physical page. EPT specifies the 4KB page level privileges that
software is allowed when accessing the guest physical address, whereas SPPT
defines the write permissions for software at the 128 byte granularity regions
within a 4KB page. Write accesses prevented due to sub-page permissions looked
up via SPPT are reported as EPT violation VM exits. Similar to EPT, a logical
processor uses SPPT to lookup sub-page region write permissions for
guest-physical addresses only when those addresses are used to access memory.
______________________________________________________________________________
How SPP hardware works:
______________________________________________________________________________
Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
┌-----------------------------------------------------------┘
└-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
|
└-> <false> --> EPT legacy behavior
|
|
└-> <true> --> if ept_leaf_entry.writable
|
└-> <true> --> Ignore SPP
|
└-> <false> --> GPA --> Walk SPP 4-level table--┐
|
┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
|
Walk SPP L4E table
|
└┐--> entry misconfiguration ------------>----------┐<----------------┐
| | |
else | |
| | |
| ┌------------------SPP VMexit<-----------------┘ |
| | |
| └-> exit_qualification & sppt_misconfig --> sppt misconfig |
| | |
| └-> exit_qualification & sppt_miss --> sppt miss |
└--┐ |
| |
walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
| |
else |
| |
| |
walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
| |
else |
| |
| |
walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
|
else
|
└-> if sub-page writable
└-> <true> allow, write access
└-> <false> disallow, EPT violation
Patch description:
Patch 1: The design Doc of EPT-Based Sub-page Write Protection(SPP)
Patch 2: this patch adds reporting SPP capability from VMX Procbased MSR,
according to the definition of hardware spec, bit 23 is the control of the SPP
capability.
Patch 3: Add new secondary processor-based VM-execution control bit which
defined as "sub-page write permission", same as VMX Procbased MSR, bit 23 is
the enable bit of SPP. Also we introduced a kernel parameter "enable_ept_spp",
now SPP is active when the "Sub-page Write Protection" in Secondary VM-Execution
Control is set and enable the kernel parameter by "spp=1".
Patch 4: Introduced the spptp and spp page table.
The sub-page permission table is referenced via a 64-bit control field called
Sub-Page Permission Table Pointer (SPPTP) which contains a 4K-aligned physical
address. The index and encoding for this VMCS field if defined 0x2030 at this
time The format of SPPTP is shown in below figure:
---------------------------------------------------------------|
| Bit | Contents |
:--------------------------------------------------------------|
| 11:0 | Reserved (0) |
| N-1:12 | Physical address of 4KB aligned SPPT L4E Table |
| 51:N | Reserved (0) |
| 63:52 | Reserved (0) |
---------------------------------------------------------------|
This patch introduced the Spp paging structures, which root page will created at
kvm mmu page initialization. Also we added a mmu page role type spp to distinguish
it is a spp page or a EPT page.
Patch 5: Defined SPPTP in new VMCS area, then we write the SPPTP to vmcs.
Patch 6: Introduced the SPP-Induced VM exit and it's handle.
Accesses using guest-physical addresses may cause SPP-induced VM exits due to an
SPPT misconfiguration or an SPPT miss. The basic VM exit reason code reported for
SPP-induced VM exits is 66.
Also Introduced the below exit qualification for SPPT-induced vmexits.
| Bit | Contents |
| :---- | :---------------------------------------------------------------- |
| 10:0 | Reserved (0). |
| 11 | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
| 12 | NMI unblocking due to IRET |
| 63:13 | Reserved (0) |
Patch 7: Added a handle of EPT subpage write protection fault.
A control bit in EPT leaf paging-structure entries is defined as Sub-Page
Permission (SPP bit). The bit position is 61; it is chosen from among the bits
that are currently ignored by the processor and available to software.
While hardware walking the SPP page table, If the sub-page region write
permission bit is set, the write is allowed, else the write is disallowed and
results in an EPT violation.
We need peek this case in EPT volition handler, and trigger a user-space exit,
return the write protected address(GVA) to user(qemu).
Patch 8: Introduce ioctls to set/get Sub-Page Write Protection.
We introduced 2 ioctls to let user application to set/get subpage write
protection bitmap per gfn, each gfn corresponds to a bitmap.
The user application, qemu, or some other security control daemon, will set the
protection bitmap via this ioctl.
the API defined as:
struct kvm_subpage {
__u64 base_gfn;
__u64 npages;
/* sub-page write-access bitmap array */
__u32 access_map[SUBPAGE_MAX_BITMAP];
}sp;
kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)
Patch 9 ~ Patch 11: Setup spp page table and update the EPT leaf entry indicated
with the SPP enable bit. If the sub-page write permission VM-execution control
is set, treatment of write accesses to guest-physical accesses depends on the
state of the accumulated write-access bit (position 1) and sub-page permission
bit(position 61) in the EPT leaf paging-structure.
Software will update the EPT leaf entry sub-page permission bit while
kvm_set_subpage(patch 7). If the EPT write-access bit set to 0 and the SPP bit
set to 1 in the leaf EPT paging-structure entry that maps a 4KB page, then the
hardware will look up a VMM-managed Sub-Page Permission Table (SPPT), which
will be prepared by setup kvm_set_subpage(patch 8).
The hardware uses the guest-physical address and bits 11:7 of the address
accessed to lookup the SPPT to fetch a write permission bit for the 128 byte
wide sub-page region being accessed within the 4K guest-physical page. If the
sub-page region write permission bit is set, the write is allowed, otherwise
the write is disallowed and results in an EPT violation.
Guest-physical pages mapped via leaf EPT-paging-structures for which the
accumulated write-access bit and the SPP bits are both clear (0) generate EPT
violations on memory writes accesses. Guest-physical pages mapped via
EPT-paging-structure for which the accumulated write-access bit is set (1) allow
writes, effectively ignoring the SPP bit on the leaf EPT-paging structure.
Software will setup the spp page table level4,3,2 as well as EPT page structure,
and fill the level1 via the 32 bit bitmaps per a single 4K page. Now it could be
divided to 32 x 128 sub-pages.
The SPP L4E L3E L2E is defined as below figure.
________________________________________________________________________________
| Bit | Contents |
| :----- | :-------------------------------------------------------------------|
| 0 | Valid entry when set; indicates whether the entry is present |
| 11:1 | Reserved (0) |
| N-1:12 | Physical address of 4K SPPT LX-1 Table referenced by the entry |
| 51:N | Reserved (0) |
| 63:52 | Reserved (0) |
Note: N is the physical address width supported by the processor, X is the page level
The SPP L1E format is defined as below figure.
____________________________________________________________________________
| Bit | Contents |
| :---- | :---------------------------------------------------------------- |
| 0+2i | Write permission for i-th 128 byte sub-page region. |
| 1+2i | Reserved (0). |
Note: `0<=i<=31`
Chang logs:
V2 - V1:
1. Rebased to 4.20-rc1
2. Move VMCS change to a separated patch.
3. Code refine and Bug fix
Zhang Yi (11):
Documentation: Added EPT Subpage Protection Documentation.
x86/cpufeature: Add intel Sub-Page Protection to CPU features
KVM: VMX: Added VMX SPP feature flags and VM-Execution Controls.
KVM: VMX: Introduce the SPPTP and SPP page table.
KVM: VMX: Write the SPPTP to VMCS area.
KVM: VMX: Introduce SPP-Induced vm exit and it's handle.
KVM: VMX: Added handle of SPP write protection fault.
KVM: VMX: Introduce ioctls to set/get Sub-Page Write Protection.
KVM: VMX: Update the EPT leaf entry indicated with the SPP enable bit.
KVM: VMX: Added setup spp page structure.
KVM: VMX: implement setup SPP page structure in spp miss.
Documentation/virtual/kvm/spp_design_kvm.txt | 275 ++++++++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm_host.h | 19 +-
arch/x86/include/asm/vmx.h | 10 +
arch/x86/include/uapi/asm/vmx.h | 2 +
arch/x86/kernel/cpu/intel.c | 4 +
arch/x86/kvm/mmu.c | 334 ++++++++++++++++++++++++++-
arch/x86/kvm/mmu.h | 1 +
arch/x86/kvm/vmx.c | 105 +++++++++
arch/x86/kvm/x86.c | 124 +++++++++-
include/linux/kvm_host.h | 5 +
include/uapi/linux/kvm.h | 16 ++
12 files changed, 892 insertions(+), 4 deletions(-)
create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt
--
2.7.4
Signed-off-by: Zhang Yi <[email protected]>
---
Documentation/virtual/kvm/spp_design_kvm.txt | 275 +++++++++++++++++++++++++++
1 file changed, 275 insertions(+)
create mode 100644 Documentation/virtual/kvm/spp_design_kvm.txt
diff --git a/Documentation/virtual/kvm/spp_design_kvm.txt b/Documentation/virtual/kvm/spp_design_kvm.txt
new file mode 100644
index 0000000..8dc4530
--- /dev/null
+++ b/Documentation/virtual/kvm/spp_design_kvm.txt
@@ -0,0 +1,275 @@
+DRAFT: EPT-Based Sub-Page Protection (SPP) Design Doc for KVM
+=============================================================
+
+1. Overview
+
+EPT-based Sub-Page Protection (SPP) capability to allow Virtual Machine
+Monitors to specify write-protection for guest physical memory at a
+sub-page (128 byte) granularity. When this capability is utilized, the
+CPU enforces write-access permissions for sub-page regions of 4K pages
+as specified by the VMM.
+
+2. Operation of SPP
+
+Sub-Page Protection Table (SPPT) is introduced to manage sub-page
+write-access.
+
+SPPT is active when the "sub-page write protection" VM-execution control
+is 1. SPPT looks up the guest physical addresses to derive a 64 bit
+"sub-page permission" value containing sub-page write permissions. The
+lookup from guest-physical addresses to the sub-page region permissions
+is determined by a set of SPPT paging structures.
+
+When the "sub-page write protection" VM-execution control is 1, the SPPT
+is used to lookup write permission bits for the 128 byte sub-page regions
+containing in the 4KB guest physical page. EPT specifies the 4KB page
+level privileges that software is allowed when accessing the guest
+physical address, whereas SPPT defines the write permissions for software
+at the 128 byte granularity regions within a 4KB page. Write accesses
+prevented due to sub-page permissions looked up via SPPT are reported as
+EPT violation VM exits. Similar to EPT, a logical processor uses SPPT to
+lookup sub-page region write permissions for guest-physical addresses
+only when those addresses are used to access memory.
+______________________________________________________________________________
+
+How SPP hardware works:
+______________________________________________________________________________
+
+Guest write access --> GPA --> Walk EPT --> EPT leaf entry -┐
+┌-----------------------------------------------------------┘
+└-> if VMexec_control.spp && ept_leaf_entry.spp_bit (bit 61)
+ |
+ └-> <false> --> EPT legacy behavior
+ |
+ |
+ └-> <true> --> if ept_leaf_entry.writable
+ |
+ └-> <true> --> Ignore SPP
+ |
+ └-> <false> --> GPA --> Walk SPP 4-level table--┐
+ |
+┌------------<----------get-the-SPPT-point-from-VMCS-filed-----<------┘
+|
+Walk SPP L4E table
+|
+└┐--> entry misconfiguration ------------>----------┐<----------------┐
+ | | |
+else | |
+ | | |
+ | ┌------------------SPP VMexit<-----------------┘ |
+ | | |
+ | └-> exit_qualification & sppt_misconfig --> sppt misconfig |
+ | | |
+ | └-> exit_qualification & sppt_miss --> sppt miss |
+ └--┐ |
+ | |
+walk SPPT L3E--┐--> if-entry-misconfiguration------------>------------┘
+ | |
+ else |
+ | |
+ | |
+ walk SPPT L2E --┐--> if-entry-misconfiguration-------->-------┘
+ | |
+ else |
+ | |
+ | |
+ walk SPPT L1E --┐-> if-entry-misconfiguration--->----┘
+ |
+ else
+ |
+ └-> if sub-page writable
+ └-> <true> allow, write access
+ └-> <false> disallow, EPT violation
+______________________________________________________________________________
+
+3. Interfaces
+
+* Feature enabling
+
+Add "spp=on" to KVM module parameter to enable SPP feature, default is off.
+
+* Get/Set sub-page write access permission
+
+New KVM ioctl:
+
+`KVM_SUBPAGES_GET_ACCESS`:
+Get sub-pages write access bitmap corresponding to given rang of continuous gfn.
+
+`KVM_SUBPAGES_SET_ACCESS`
+Set sub-pages write access bitmap corresponding to given rang of continuous gfn.
+
+```c
+/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+struct kvm_subpage_info {
+ __u64 gfn;
+ __u64 npages; /* number of 4K pages */
+ __u64 *access_map; /* sub-page write-access bitmap array */
+};
+
+#define KVM_SUBPAGES_GET_ACCESS _IOR(KVMIO, 0x49, struct kvm_subpage_info)
+#define KVM_SUBPAGES_SET_ACCESS _IOW(KVMIO, 0x4a, struct kvm_subpage_info)
+```
+
+4. SPPT initialization
+
+* SPPT root page allocation
+
+ SPPT is referenced via a 64-bit control field called "sub-page
+ protection table pointer" (SPPTP, encoding 0x2030) which contains a
+ 4K-align physical address.
+
+ SPPT also has 4 level table as well as EPT. So, as EPT does, when KVM
+ loads mmu, we allocate a root page for SPPT L4 table.
+
+* EPT leaf entry SPP bit
+
+ Set 0 to SPP bit to close SPP by default.
+
+5. Set/Get Sub-Page access bitmap for bunch of guest physical pages
+
+* To utilize SPP feature, system admin should Set a Sub-page access write via
+ SPP KVM ioctl `KVM_SUBPAGES_SET_ACCESS`, which will prepared the flowing things.
+
+ (1.Got the corresponding EPT leaf entry via the guest physical address.
+ (2.If it is a 4K page frame, flag the bit 61 to enable subpage protection on this page.
+ (3.Setup spp page structure, the page structure format is list following.
+
+ Format of the SPPT L4E, L3E, L2E:
+ | Bit | Contents |
+ | :----- | :------------------------------------------------------------------------|
+ | 0 | Valid entry when set; indicates whether the entry is present |
+ | 11:1 | Reserved (0) |
+ | N-1:12 | Physical address of 4KB aligned SPPT LX-1 Table referenced by this entry |
+ | 51:N | Reserved (0) |
+ | 63:52 | Reserved (0) |
+ Note: N is the physical address width supported by the processor. X is the page level
+
+ Format of the SPPT L1E:
+ | Bit | Contents |
+ | :---- | :---------------------------------------------------------------- |
+ | 0+2i | Write permission for i-th 128 byte sub-page region. |
+ | 1+2i | Reserved (0). |
+ Note: `0<=i<=31`
+
+ (4.Update the subpage info into memory slot structure.
+
+* Sub-page write access bitmap setting pseudo-code:
+
+```c
+static int kvm_mmu_set_subpages(struct kvm_vcpu *vcpu,
+ struct kvm_subpage_info *spp_info)
+{
+ gfn_t *gfns = spp_info->gfns;
+ u64 *access_map = spp_info->access_map;
+
+ sanity_check();
+
+ /* SPP works when the page is unwritable */
+ if (set_ept_leaf_level_unwritable(gfn) == success)
+
+ if (kvm_mmu_setup_spp_structure(gfn) == success)
+
+ set_subpage_slot_info(access_map);
+
+}
+```
+
+User could get the subpage info via SPP KVM ioctl `KVM_SUBPAGES_GET_ACCESS`,
+which from the memory slot structure corresponding the specify gpa.
+
+* Sub-page get subpage info pseudo-code:
+
+```c
+static int kvm_mmu_get_subpages(struct kvm_vcpu *vcpu,
+ struct kvm_subpage_info *spp_info)
+{
+ gfn_t *gfns = spp_info->gfns;
+
+ sanity_check(gfn);
+ spp_info = get_subpage_slot_info(gfn);
+}
+
+```
+
+5. SPPT-induced vmexits
+
+* SPP VM exits
+
+Accesses using guest physical addresses may cause VM exits due to a SPPT
+Misconfiguration or a SPPT Miss.
+
+A SPPT Misconfiguration vmexit occurs when, in the course of translating
+a guest physical address, the logical processor encounters a leaf EPT
+paging-structure entry mapping a 4KB page, with SPP enabled, during the
+SPPT lookup, a SPPT paging-structure entry contains an unsupported
+value.
+
+A SPPT Miss vmexit occurs during the SPPT lookup there is no SPPT
+misconfiguration but any level of SPPT paging-structure entries are not
+present.
+
+NOTE. SPPT misconfigurations and SPPT miss can occur only due to an
+attempt to write memory with a guest physical address.
+
+* EPT violation vmexits due to SPPT
+
+EPT violations due to memory write accesses disallowed due to sub-page
+protection permissions specified in the SPPT are reported via EPT
+violation VM exits.
+
+6. SPPT-induced vmexits handling
+
+```c
+#define EXIT_REASON_SPP 66
+
+static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
+ ...
+ [EXIT_REASON_SPP] = handle_spp,
+ ...
+};
+```
+
+New exit qualification for SPPT-induced vmexits.
+
+| Bit | Contents |
+| :---- | :---------------------------------------------------------------- |
+| 10:0 | Reserved (0). |
+| 11 | SPPT VM exit type. Set for SPPT Miss, cleared for SPPT Misconfig. |
+| 12 | NMI unblocking due to IRET |
+| 63:13 | Reserved (0) |
+
+In addition to the exit qualification, Guest Linear Address and Guest
+Physical Address fields will be reported.
+
+* SPPT miss and misconfiguration
+
+Allocate a page for the SPPT entry and set the entry correctly.
+
+
+SPP VMexit handler Pseudo-code:
+```c
+static int handle_spp(kvm_vcpu *vcpu)
+{
+ exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+ if (exit_qualification & SPP_EXIT_TYPE_BIT) {
+ /* SPPT Miss */
+ /* We don't set SPP write access for the corresponding
+ * GPA, leave it unwritable, so no need to construct
+ * SPP table here. */
+ } else {
+ /* SPPT Misconfig */
+ vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+ vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
+ }
+ return 0;
+}
+```
+
+* EPT violation vmexits due to SPPT
+
+While hardware walking the SPP page table, If the sub-page region write
+permission bit is set, the write is allowed, else the write is disallowed
+and results in an EPT violation.
+
+we need peek this case in EPT volition handler, and trigger a user-space
+exit, return the write protected address(GPA) to user(qemu).
--
2.7.4
SPPT has 4-level paging structure that is similar to EPT
except L1E.
The sub-page permission table is referenced via a 64-bit control field
called Sub-Page Permission Table Pointer (SPPTP) which contains a
4K-aligned physical address. the index and encoding for this VMCS field
is defined 0x2030 at this time
The format of SPPTP is shown in below figure
-------------------------------------------------------------------------
| Bit | Contents |
| | |
:-----------------------------------------------------------------------|
| 11:0 | Reserved (0) |
| N-1:12 | Physical address of 4KB aligned SPPT L4E Table |
| 51:N | Reserved (0) |
| 63:52 | Reserved (0) |
------------------------------------------------------------------------|
Note: N is the physical address width supported by the processor.
This patch introduced the Spp paging structures, which root page will
created at kvm mmu page initialization. and free at mmu page free.
Same as EPT page table, We initialized the SPPT,
and write the SPPT point into VMCS field.
Also we added a mmu page role type spp to distinguish it is a spp page
or a EPT page.
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: He Chen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 +++-
arch/x86/kvm/mmu.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55e51ff..46312b9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -270,7 +270,8 @@ union kvm_mmu_page_role {
unsigned smap_andnot_wp:1;
unsigned ad_disabled:1;
unsigned guest_mode:1;
- unsigned :6;
+ unsigned spp:1;
+ unsigned reserved:5;
/*
* This is left at the top of the word so that
@@ -397,6 +398,7 @@ struct kvm_mmu {
void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
u64 *spte, const void *pte);
hpa_t root_hpa;
+ hpa_t sppt_root;
union kvm_mmu_role mmu_role;
u8 root_level;
u8 shadow_root_level;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cf5f572..d1f1fe1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2366,6 +2366,29 @@ static void clear_sp_write_flooding_count(u64 *spte)
__clear_sp_write_flooding_count(sp);
}
+static struct kvm_mmu_page *kvm_mmu_get_spp_page(struct kvm_vcpu *vcpu,
+ gfn_t gfn,
+ unsigned int level)
+
+{
+ struct kvm_mmu_page *sp;
+ union kvm_mmu_page_role role;
+
+ role = vcpu->arch.mmu->mmu_role.base;
+ role.level = level;
+ role.direct = true;
+ role.spp = true;
+
+ sp = kvm_mmu_alloc_page(vcpu, true);
+ sp->gfn = gfn;
+ sp->role = role;
+ hlist_add_head(&sp->hash_link,
+ &vcpu->kvm->arch.mmu_page_hash
+ [kvm_page_table_hashfn(gfn)]);
+ clear_page(sp->spt);
+ return sp;
+}
+
static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
gfn_t gfn,
gva_t gaddr,
@@ -3509,6 +3532,9 @@ void kvm_mmu_free_roots(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
(mmu->root_level >= PT64_ROOT_4LEVEL || mmu->direct_map)) {
mmu_free_root_page(vcpu->kvm, &mmu->root_hpa,
&invalid_list);
+ mmu_free_root_page(vcpu->kvm, &mmu->sppt_root,
+ &invalid_list);
+
} else {
for (i = 0; i < 4; ++i)
if (mmu->pae_root[i] != 0)
@@ -3538,7 +3564,7 @@ static int mmu_check_root(struct kvm_vcpu *vcpu, gfn_t root_gfn)
static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
{
- struct kvm_mmu_page *sp;
+ struct kvm_mmu_page *sp, *spp_sp;
unsigned i;
if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
@@ -3549,9 +3575,13 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu)
}
sp = kvm_mmu_get_page(vcpu, 0, 0,
vcpu->arch.mmu->shadow_root_level, 1, ACC_ALL);
+ spp_sp = kvm_mmu_get_spp_page(vcpu, 0,
+ vcpu->arch.mmu->shadow_root_level);
++sp->root_count;
+ ++spp_sp->root_count;
spin_unlock(&vcpu->kvm->mmu_lock);
vcpu->arch.mmu->root_hpa = __pa(sp->spt);
+ vcpu->arch.mmu->sppt_root = __pa(spp_sp->spt);
} else if (vcpu->arch.mmu->shadow_root_level == PT32E_ROOT_LEVEL) {
for (i = 0; i < 4; ++i) {
hpa_t root = vcpu->arch.mmu->pae_root[i];
@@ -4986,6 +5016,7 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu, bool reset_roots)
uint i;
vcpu->arch.mmu->root_hpa = INVALID_PAGE;
+ vcpu->arch.mmu->sppt_root = INVALID_PAGE;
for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++)
vcpu->arch.mmu->prev_roots[i] = KVM_MMU_ROOT_INFO_INVALID;
--
2.7.4
Same as EPT page table, We initialized the SPPT,
and write the SPPT point into VMCS field.
Signed-off-by: Zhang Yi <[email protected]>
---
arch/x86/include/asm/vmx.h | 2 ++
arch/x86/kvm/vmx.c | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 2aa088f..bd4ec8a 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -217,6 +217,8 @@ enum vmcs_field {
XSS_EXIT_BITMAP_HIGH = 0x0000202D,
ENCLS_EXITING_BITMAP = 0x0000202E,
ENCLS_EXITING_BITMAP_HIGH = 0x0000202F,
+ SPPT_POINTER = 0x00002030,
+ SPPT_POINTER_HIGH = 0x00002031,
TSC_MULTIPLIER = 0x00002032,
TSC_MULTIPLIER_HIGH = 0x00002033,
GUEST_PHYSICAL_ADDRESS = 0x00002400,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f76d3fb..e96b4c7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -600,6 +600,7 @@ struct __packed vmcs12 {
u16 host_gs_selector;
u16 host_tr_selector;
u16 guest_pml_index;
+ u64 sppt_pointer;
};
/*
@@ -1158,6 +1159,7 @@ static const unsigned short vmcs_field_to_offset_table[] = {
FIELD64(VMREAD_BITMAP, vmread_bitmap),
FIELD64(VMWRITE_BITMAP, vmwrite_bitmap),
FIELD64(XSS_EXIT_BITMAP, xss_exit_bitmap),
+ FIELD64(SPPT_POINTER, sppt_pointer),
FIELD64(GUEST_PHYSICAL_ADDRESS, guest_physical_address),
FIELD64(VMCS_LINK_POINTER, vmcs_link_pointer),
FIELD64(GUEST_IA32_DEBUGCTL, guest_ia32_debugctl),
@@ -5348,11 +5350,17 @@ static u64 construct_eptp(struct kvm_vcpu *vcpu, unsigned long root_hpa)
return eptp;
}
+static inline u64 construct_spptp(unsigned long root_hpa)
+{
+ return root_hpa & PAGE_MASK;
+}
+
static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
{
struct kvm *kvm = vcpu->kvm;
unsigned long guest_cr3;
u64 eptp;
+ u64 spptp;
guest_cr3 = cr3;
if (enable_ept) {
@@ -5375,6 +5383,13 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
ept_load_pdptrs(vcpu);
}
+ if (vcpu->arch.mmu->sppt_root != INVALID_PAGE &&
+ enable_ept_spp) {
+ spptp = construct_spptp(vcpu->arch.mmu->sppt_root);
+ vmcs_write64(SPPT_POINTER, spptp);
+ vmx_flush_tlb(vcpu, true);
+ }
+
vmcs_writel(GUEST_CR3, guest_cr3);
}
@@ -10505,6 +10520,8 @@ static void dump_vmcs(void)
pr_err("PostedIntrVec = 0x%02x\n", vmcs_read16(POSTED_INTR_NV));
if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_EPT))
pr_err("EPT pointer = 0x%016llx\n", vmcs_read64(EPT_POINTER));
+ if ((secondary_exec_control & SECONDARY_EXEC_ENABLE_SPP))
+ pr_err("SPPT pointer = 0x%016llx\n", vmcs_read64(SPPT_POINTER));
n = vmcs_read32(CR3_TARGET_COUNT);
for (i = 0; i + 1 < n; i += 4)
pr_err("CR3 target%u=%016lx target%u=%016lx\n",
--
2.7.4
Accesses using guest-physical addresses may cause SPP-induced VM exits
due to an SPPT misconfiguration or an
SPPT miss. The basic VM exit reason code reported for SPP-induced VM
exits is 66.
An SPPT misconfiguration VM exit occurs when, in the course of
translating a guest-physical address, the logical processor encounters
a leaf EPT paging-structure entry mapping a 4KB page for which the
sub-page write permission control bit is set and during the SPPT lookup
an SPPT paging-structure entry contains an unsupported value.
An SPPT miss VM exit occurs when, in the course of translation a
guest-physical address, the logical processor encounters a leaf
EPT paging-structure entry for which the sub-page write permission
control bit is set and during the SPPT lookup there is no SPPT
misconfiguration but any level of SPPT paging-structure entries
are not-present.
SPPT misconfigurations and SPPT misses can occur only due to an attempt
to write memory with a guest-physical address.
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: He Chen <[email protected]>
---
arch/x86/include/asm/vmx.h | 7 +++++++
arch/x86/include/uapi/asm/vmx.h | 2 ++
arch/x86/kvm/vmx.c | 45 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 54 insertions(+)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index bd4ec8a..ee24eb2 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -539,6 +539,13 @@ struct vmx_msr_entry {
#define EPT_VIOLATION_GVA_TRANSLATED (1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
/*
+ * Exit Qualifications for SPPT-Induced VM Exits
+ */
+#define SPPT_INDUCED_EXIT_TYPE_BIT 11
+#define SPPT_INDUCED_EXIT_TYPE (1 << SPPT_INDUCED_EXIT_TYPE_BIT)
+#define SPPT_INTR_INFO_UNBLOCK_NMI INTR_INFO_UNBLOCK_NMI
+
+/*
* VM-instruction error numbers
*/
enum vm_instruction_error_number {
diff --git a/arch/x86/include/uapi/asm/vmx.h b/arch/x86/include/uapi/asm/vmx.h
index f0b0c90..ac67622 100644
--- a/arch/x86/include/uapi/asm/vmx.h
+++ b/arch/x86/include/uapi/asm/vmx.h
@@ -85,6 +85,7 @@
#define EXIT_REASON_PML_FULL 62
#define EXIT_REASON_XSAVES 63
#define EXIT_REASON_XRSTORS 64
+#define EXIT_REASON_SPP 66
#define VMX_EXIT_REASONS \
{ EXIT_REASON_EXCEPTION_NMI, "EXCEPTION_NMI" }, \
@@ -141,6 +142,7 @@
{ EXIT_REASON_ENCLS, "ENCLS" }, \
{ EXIT_REASON_RDSEED, "RDSEED" }, \
{ EXIT_REASON_PML_FULL, "PML_FULL" }, \
+ { EXIT_REASON_SPP, "SPP" }, \
{ EXIT_REASON_XSAVES, "XSAVES" }, \
{ EXIT_REASON_XRSTORS, "XRSTORS" }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e96b4c7..6634098 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9698,6 +9698,50 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
}
}
+static int handle_spp(struct kvm_vcpu *vcpu)
+{
+ unsigned long exit_qualification;
+
+ exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+
+ /*
+ * SPP VM exit happened while executing iret from NMI,
+ * "blocked by NMI" bit has to be set before next VM entry.
+ * There are errata that may cause this bit to not be set:
+ * AAK134, BY25.
+ */
+ if (!(to_vmx(vcpu)->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+ (exit_qualification & SPPT_INTR_INFO_UNBLOCK_NMI))
+ vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+
+ pr_debug("SPP: SPP exit_qualification=%lx\n", exit_qualification);
+
+ vcpu->arch.exit_qualification = exit_qualification;
+
+ if (exit_qualification & SPPT_INDUCED_EXIT_TYPE) {
+ /*
+ * SPPT Miss
+ * We don't set SPP write access for the corresponding
+ * GPA, if we haven't setup, we need to construct
+ * SPP table here.
+ */
+ pr_debug("SPP: %s: SPPT Miss!!!\n", __func__);
+ return 1;
+ }
+
+ /*
+ * SPPT Misconfig
+ * This is probably possible that your sppt table
+ * set as a incorrect format
+ */
+ WARN_ON(1);
+ vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
+ vcpu->run->hw.hardware_exit_reason = EXIT_REASON_SPP;
+ pr_alert("SPP: %s: SPPT Misconfiguration!!!\n", __func__);
+ return 0;
+}
+
static int handle_pml_full(struct kvm_vcpu *vcpu)
{
unsigned long exit_qualification;
@@ -9910,6 +9954,7 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_INVVPID] = handle_invvpid,
[EXIT_REASON_RDRAND] = handle_invalid_op,
[EXIT_REASON_RDSEED] = handle_invalid_op,
+ [EXIT_REASON_SPP] = handle_spp,
[EXIT_REASON_XSAVES] = handle_xsaves,
[EXIT_REASON_XRSTORS] = handle_xrstors,
[EXIT_REASON_PML_FULL] = handle_pml_full,
--
2.7.4
Add new secondary processor-based VM-execution control bit which
defined as "sub-page write permission", same as VMX Procbased MSR,
bit 23 is the enable bit of SPP.
Also we introduced a enable_ept_spp parameter to control the
SPP is ON/OFF, Set the default is OFF as we are on the way of
enabling.
Now SPP is active when the "Sub-page Write Protection"
in Secondary VM-Execution Control is set and enable the kernel
parameter by "spp=on".
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: He Chen <[email protected]>
---
arch/x86/include/asm/vmx.h | 1 +
arch/x86/kvm/vmx.c | 15 +++++++++++++++
2 files changed, 16 insertions(+)
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index ade0f15..2aa088f 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -78,6 +78,7 @@
#define SECONDARY_EXEC_RDSEED_EXITING 0x00010000
#define SECONDARY_EXEC_ENABLE_PML 0x00020000
#define SECONDARY_EXEC_XSAVES 0x00100000
+#define SECONDARY_EXEC_ENABLE_SPP 0x00800000
#define SECONDARY_EXEC_TSC_SCALING 0x02000000
#define PIN_BASED_EXT_INTR_MASK 0x00000001
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4555077..f76d3fb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -92,6 +92,9 @@ module_param_named(unrestricted_guest,
static bool __read_mostly enable_ept_ad_bits = 1;
module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO);
+static bool __read_mostly enable_ept_spp;
+module_param_named(spp, enable_ept_spp, bool, S_IRUGO);
+
static bool __read_mostly emulate_invalid_guest_state = true;
module_param(emulate_invalid_guest_state, bool, S_IRUGO);
@@ -1941,6 +1944,11 @@ static inline bool cpu_has_vmx_pml(void)
return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_PML;
}
+static inline bool cpu_has_vmx_ept_spp(void)
+{
+ return vmcs_config.cpu_based_2nd_exec_ctrl & SECONDARY_EXEC_ENABLE_SPP;
+}
+
static inline bool cpu_has_vmx_tsc_scaling(void)
{
return vmcs_config.cpu_based_2nd_exec_ctrl &
@@ -4583,6 +4591,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
SECONDARY_EXEC_RDSEED_EXITING |
SECONDARY_EXEC_RDRAND_EXITING |
SECONDARY_EXEC_ENABLE_PML |
+ SECONDARY_EXEC_ENABLE_SPP |
SECONDARY_EXEC_TSC_SCALING |
SECONDARY_EXEC_ENABLE_VMFUNC |
SECONDARY_EXEC_ENCLS_EXITING;
@@ -6486,6 +6495,9 @@ static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
if (!enable_pml)
exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
+ if (!enable_ept_spp)
+ exec_control &= ~SECONDARY_EXEC_ENABLE_SPP;
+
if (vmx_xsaves_supported()) {
/* Exposing XSAVES only when XSAVE is exposed */
bool xsaves_enabled =
@@ -7927,6 +7939,9 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_unrestricted_guest() || !enable_ept)
enable_unrestricted_guest = 0;
+ if (!cpu_has_vmx_ept_spp() || !enable_ept)
+ enable_ept_spp = 0;
+
if (!cpu_has_vmx_flexpriority())
flexpriority_enabled = 0;
--
2.7.4
Adds reporting SPP capability from VMX Procbased MSR, according to
the definition of hardware spec, bit 32 is the control of the SPP
capability.
Defined X86_FEATURE_SPP under intel X86 VT-x CPU features.
Defined the X86_VMX_FEATURE_PROC_CTLS2_SPP in intel VMX MSR indicated
features, And enable SPP capability by this MSR.
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: He Chen <[email protected]>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/intel.c | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 28c4a50..e22567e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -228,6 +228,7 @@
#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */
#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */
#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */
+#define X86_FEATURE_SPP ( 8*32+ 5) /* Intel EPT-based Sub-Page Write Protection */
#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */
#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index fc3c07f..b55156c 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -476,6 +476,7 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
#define X86_VMX_FEATURE_PROC_CTLS2_EPT 0x00000002
#define X86_VMX_FEATURE_PROC_CTLS2_VPID 0x00000020
#define x86_VMX_FEATURE_EPT_CAP_AD 0x00200000
+#define X86_VMX_FEATURE_PROC_CTLS2_SPP 0x00800000
u32 vmx_msr_low, vmx_msr_high, msr_ctl, msr_ctl2;
u32 msr_vpid_cap, msr_ept_cap;
@@ -486,6 +487,7 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_EPT);
clear_cpu_cap(c, X86_FEATURE_VPID);
clear_cpu_cap(c, X86_FEATURE_EPT_AD);
+ clear_cpu_cap(c, X86_FEATURE_SPP);
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, vmx_msr_low, vmx_msr_high);
msr_ctl = vmx_msr_high | vmx_msr_low;
@@ -509,6 +511,8 @@ static void detect_vmx_virtcap(struct cpuinfo_x86 *c)
}
if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_VPID)
set_cpu_cap(c, X86_FEATURE_VPID);
+ if (msr_ctl2 & X86_VMX_FEATURE_PROC_CTLS2_SPP)
+ set_cpu_cap(c, X86_FEATURE_SPP);
}
}
--
2.7.4
If the sub-page write permission VM-execution control is set,
treatment of write accesses to guest-physical accesses
depends on the state of the accumulated write-access bit (position 1)
and sub-page permission bit (position 61) in the EPT leaf paging-structure.
Software will update the EPT leaf entry sub-page permission bit while
kvm_set_subpage. If the EPT write-access bit set to 0 and the SPP bit
set to 1 in the leaf EPT paging-structure entry that maps a 4KB page,
then the hardware will look up a VMM-managed Sub-Page Permission Table
(SPPT), which will also be prepared by setup kvm_set_subpage.
Signed-off-by: Zhang Yi <[email protected]>
---
arch/x86/kvm/mmu.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 100 insertions(+)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b1773c6..d512125 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1668,6 +1668,87 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu)
return 0;
}
+static bool __rmap_open_subpage_bit(struct kvm *kvm,
+ struct kvm_rmap_head *rmap_head)
+{
+ struct rmap_iterator iter;
+ bool flush = false;
+ u64 *sptep;
+ u64 spte;
+
+ for_each_rmap_spte(rmap_head, &iter, sptep) {
+ /*
+ * SPP works only when the page is unwritable
+ * and SPP bit is set
+ */
+ flush |= spte_write_protect(sptep, false);
+ spte = *sptep | PT_SPP_MASK;
+ flush |= mmu_spte_update(sptep, spte);
+ }
+
+ return flush;
+}
+
+static int kvm_mmu_open_subpage_write_protect(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ struct kvm_rmap_head *rmap_head;
+ bool flush = false;
+
+ /*
+ * we only support spp in a normal 4K level 1 page frame
+ * If it a huge page, we drop it.
+ */
+ rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+ if (!rmap_head->val)
+ return -EFAULT;
+
+ flush |= __rmap_open_subpage_bit(kvm, rmap_head);
+
+ if (flush)
+ kvm_flush_remote_tlbs(kvm);
+
+ return 0;
+}
+
+static bool __rmap_clear_subpage_bit(struct kvm *kvm,
+ struct kvm_rmap_head *rmap_head)
+{
+ struct rmap_iterator iter;
+ bool flush = false;
+ u64 *sptep;
+ u64 spte;
+
+ for_each_rmap_spte(rmap_head, &iter, sptep) {
+ spte = (*sptep & ~PT_SPP_MASK) | PT_WRITABLE_MASK;
+ flush |= mmu_spte_update(sptep, spte);
+ }
+
+ return flush;
+}
+
+static int kvm_mmu_clear_subpage_write_protect(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ struct kvm_rmap_head *rmap_head;
+ bool flush = false;
+
+ rmap_head = __gfn_to_rmap(gfn, PT_PAGE_TABLE_LEVEL, slot);
+
+ if (!rmap_head->val)
+ return -EFAULT;
+
+ flush |= __rmap_clear_subpage_bit(kvm, rmap_head);
+
+ if (flush)
+ kvm_flush_remote_tlbs(kvm);
+
+ return 0;
+}
+
bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
struct kvm_memory_slot *slot, u64 gfn)
{
@@ -4175,12 +4256,31 @@ int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
int npages = spp_info->npages;
struct kvm_memory_slot *slot;
u32 *wp_map;
+ int ret;
int i;
for (i = 0; i < npages; i++, gfn++) {
slot = gfn_to_memslot(kvm, gfn);
if (!slot)
return -EFAULT;
+
+ /*
+ * open SPP bit in EPT leaf entry to write protect the
+ * sub-pages in corresponding page
+ */
+ if (access != (u32)((1ULL << 32) - 1))
+ ret = kvm_mmu_open_subpage_write_protect(kvm,
+ slot, gfn);
+ else
+ ret = kvm_mmu_clear_subpage_write_protect(kvm,
+ slot, gfn);
+
+ if (ret) {
+ pr_info("SPP ,didn't get the gfn:%llx from EPT leaf level1\n"
+ "Current we didn't support huge page on SPP\n"
+ "Please try to disable the huge page\n", gfn);
+ return -EFAULT;
+ }
wp_map = gfn_to_subpage_wp_info(slot, gfn);
*wp_map = access;
}
--
2.7.4
The hardware uses the guest-physical address and bits 11:7 of the
address accessed to lookup the SPPT to fetch a write permission bit for
the 128 byte wide sub-page region being accessed within the 4K
guest-physical page. If the sub-page region write permission bit is set,
the write is allowed; otherwise the write is disallowed and results in
an EPT violation.
Guest-physical pages mapped via leaf EPT-paging-structures for which the
accumulated write-access bit and the SPP bits are both clear (0) generate
EPT violations on memory writes accesses. Guest-physical pages mapped via
EPT-paging-structure for which the accumulated write-access bit is set
(1) allow writes, effectively ignoring the SPP bit on the leaf EPT-paging
structure.
Software will setup the spp page table level4,3,2 as well as EPT page
structure, and fill the level1 via the 32 bit bitmap per a single 4K page.
Now it could be divided to 32 x 128 sub-pages.
Signed-off-by: Zhang Yi <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ++
arch/x86/kvm/mmu.c | 123 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 125 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3218d91..ce6d258 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1402,6 +1402,10 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
void *insn, int insn_len);
+
+int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
+ u32 access_map, gfn_t gfn);
+
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d512125..287ee62 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -206,6 +206,11 @@ static const union kvm_mmu_page_role mmu_base_role_mask = {
({ spte = mmu_spte_get_lockless(_walker.sptep); 1; }); \
__shadow_walk_next(&(_walker), spte))
+#define for_each_shadow_spp_entry(_vcpu, _addr, _walker) \
+ for (shadow_spp_walk_init(&(_walker), _vcpu, _addr); \
+ shadow_walk_okay(&(_walker)); \
+ shadow_walk_next(&(_walker)))
+
static struct kmem_cache *pte_list_desc_cache;
static struct kmem_cache *mmu_page_header_cache;
static struct percpu_counter kvm_total_used_mmu_pages;
@@ -476,6 +481,11 @@ static int is_shadow_present_pte(u64 pte)
return (pte != 0) && !is_mmio_spte(pte);
}
+static int is_spp_mide_page_present(u64 pte)
+{
+ return pte & PT_PRESENT_MASK;
+}
+
static int is_large_pte(u64 pte)
{
return pte & PT_PAGE_SIZE_MASK;
@@ -495,6 +505,11 @@ static bool is_executable_pte(u64 spte)
return (spte & (shadow_x_mask | shadow_nx_mask)) == shadow_x_mask;
}
+static bool is_spp_spte(struct kvm_mmu_page *sp)
+{
+ return sp->role.spp;
+}
+
static kvm_pfn_t spte_to_pfn(u64 pte)
{
return (pte & PT64_BASE_ADDR_MASK) >> PAGE_SHIFT;
@@ -2606,6 +2621,16 @@ static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator,
addr);
}
+static void shadow_spp_walk_init(struct kvm_shadow_walk_iterator *iterator,
+ struct kvm_vcpu *vcpu, u64 addr)
+{
+ iterator->addr = addr;
+ iterator->shadow_addr = vcpu->arch.mmu->sppt_root;
+
+ /* SPP Table is a 4-level paging structure */
+ iterator->level = 4;
+}
+
static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
{
if (iterator->level < PT_PAGE_TABLE_LEVEL)
@@ -2656,6 +2681,18 @@ static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
mark_unsync(sptep);
}
+static void link_spp_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep,
+ struct kvm_mmu_page *sp)
+{
+ u64 spte;
+
+ spte = __pa(sp->spt) | PT_PRESENT_MASK;
+
+ mmu_spte_set(sptep, spte);
+
+ mmu_page_add_parent_pte(vcpu, sp, sptep);
+}
+
static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
unsigned direct_access)
{
@@ -2686,7 +2723,13 @@ static bool mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
pte = *spte;
if (is_shadow_present_pte(pte)) {
- if (is_last_spte(pte, sp->role.level)) {
+ if (is_spp_spte(sp)) {
+ if (sp->role.level == PT_PAGE_TABLE_LEVEL)
+ //spp page do not need to release rmap.
+ return true;
+ child = page_header(pte & PT64_BASE_ADDR_MASK);
+ drop_parent_pte(child, spte);
+ } else if (is_last_spte(pte, sp->role.level)) {
drop_spte(kvm, spte);
if (is_large_pte(pte))
--kvm->stat.lpages;
@@ -4231,6 +4274,77 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
return RET_PF_RETRY;
}
+static u64 format_spp_spte(u32 spp_wp_bitmap)
+{
+ u64 new_spte = 0;
+ int i = 0;
+
+ /*
+ * One 4K page contains 32 sub-pages, in SPP table L4E, old bits
+ * are reserved, so we need to transfer u32 subpage write
+ * protect bitmap to u64 SPP L4E format.
+ */
+ while (i < 32) {
+ if (spp_wp_bitmap & (1ULL << i))
+ new_spte |= 1ULL << (i * 2);
+
+ i++;
+ }
+
+ return new_spte;
+}
+
+static void mmu_spp_spte_set(u64 *sptep, u64 new_spte)
+{
+ __set_spte(sptep, new_spte);
+}
+
+int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
+ u32 access_map, gfn_t gfn)
+{
+ struct kvm_shadow_walk_iterator iter;
+ struct kvm_mmu_page *sp;
+ gfn_t pseudo_gfn;
+ u64 old_spte, spp_spte;
+ struct kvm *kvm = vcpu->kvm;
+
+ spin_lock(&kvm->mmu_lock);
+
+ /* direct_map spp start */
+
+ if (!VALID_PAGE(vcpu->arch.mmu->sppt_root))
+ goto out_unlock;
+
+ for_each_shadow_spp_entry(vcpu, (u64)gfn << PAGE_SHIFT, iter) {
+ if (iter.level == PT_PAGE_TABLE_LEVEL) {
+ spp_spte = format_spp_spte(access_map);
+ old_spte = mmu_spte_get_lockless(iter.sptep);
+ if (old_spte != spp_spte) {
+ mmu_spp_spte_set(iter.sptep, spp_spte);
+ kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
+ }
+ break;
+ }
+
+ if (!is_spp_mide_page_present(*iter.sptep)) {
+ u64 base_addr = iter.addr;
+
+ base_addr &= PT64_LVL_ADDR_MASK(iter.level);
+ pseudo_gfn = base_addr >> PAGE_SHIFT;
+ sp = kvm_mmu_get_spp_page(vcpu, pseudo_gfn,
+ iter.level - 1);
+ link_spp_shadow_page(vcpu, iter.sptep, sp);
+ }
+ }
+
+ spin_unlock(&kvm->mmu_lock);
+ return 0;
+
+out_unlock:
+ spin_unlock(&kvm->mmu_lock);
+ return -EFAULT;
+}
+
int kvm_mmu_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
{
u32 *access = spp_info->access_map;
@@ -4255,9 +4369,10 @@ int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
gfn_t gfn = spp_info->base_gfn;
int npages = spp_info->npages;
struct kvm_memory_slot *slot;
+ struct kvm_vcpu *vcpu;
u32 *wp_map;
int ret;
- int i;
+ int i, j;
for (i = 0; i < npages; i++, gfn++) {
slot = gfn_to_memslot(kvm, gfn);
@@ -4281,6 +4396,10 @@ int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
"Please try to disable the huge page\n", gfn);
return -EFAULT;
}
+
+ kvm_for_each_vcpu(j, vcpu, kvm)
+ kvm_mmu_setup_spp_structure(vcpu, access, gfn);
+
wp_map = gfn_to_subpage_wp_info(slot, gfn);
*wp_map = access;
}
--
2.7.4
A control bit in EPT leaf paging-structure entries is defined as
“Sub-Page Permission” (SPP bit). The bit position is 61
While hardware walking the SPP page table, If the sub-page
region write permission bit is set, the write is allowed,
else the write is disallowed and results in an EPT violation.
we need peek this case in EPT violation handler, and trigger
a user-space exit, return the write protected address(GPA)
to user(qemu).
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: He Chen <[email protected]>
---
arch/x86/kvm/mmu.c | 19 +++++++++++++++++++
arch/x86/kvm/mmu.h | 1 +
include/uapi/linux/kvm.h | 5 +++++
3 files changed, 25 insertions(+)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d1f1fe1..d077693 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3378,6 +3378,21 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level,
if ((error_code & PFERR_WRITE_MASK) &&
spte_can_locklessly_be_made_writable(spte))
{
+ /*
+ * Record write protect fault caused by
+ * Sub-page Protection
+ */
+ if (spte & PT_SPP_MASK) {
+ fault_handled = true;
+
+ vcpu->run->exit_reason = KVM_EXIT_SPP;
+ vcpu->run->spp.addr = gva;
+ kvm_skip_emulated_instruction(vcpu);
+
+ /* Let QEMU decide how to handle this. */
+ break;
+ }
+
new_spte |= PT_WRITABLE_MASK;
/*
@@ -5343,6 +5358,10 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u64 error_code,
r = vcpu->arch.mmu->page_fault(vcpu, cr2,
lower_32_bits(error_code),
false);
+
+ if (vcpu->run->exit_reason == KVM_EXIT_SPP)
+ return 0;
+
WARN_ON(r == RET_PF_INVALID);
}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index c7b3331..b41e9e9 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -26,6 +26,7 @@
#define PT_PAGE_SIZE_MASK (1ULL << PT_PAGE_SIZE_SHIFT)
#define PT_PAT_MASK (1ULL << 7)
#define PT_GLOBAL_MASK (1ULL << 8)
+#define PT_SPP_MASK (1ULL << 61)
#define PT64_NX_SHIFT 63
#define PT64_NX_MASK (1ULL << PT64_NX_SHIFT)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2b7a652..01174f8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -235,6 +235,7 @@ struct kvm_hyperv_exit {
#define KVM_EXIT_S390_STSI 25
#define KVM_EXIT_IOAPIC_EOI 26
#define KVM_EXIT_HYPERV 27
+#define KVM_EXIT_SPP 28
/* For KVM_EXIT_INTERNAL_ERROR */
/* Emulate instruction failed. */
@@ -390,6 +391,10 @@ struct kvm_run {
struct {
__u8 vector;
} eoi;
+ /* KVM_EXIT_SPP */
+ struct {
+ __u64 addr;
+ } spp;
/* KVM_EXIT_HYPERV */
struct kvm_hyperv_exit hyperv;
/* Fix the size of the union. */
--
2.7.4
We introduced 2 ioctls to let user application to set/get subpage write
protection bitmap per gfn, each gfn corresponds to a bitmap.
The user application, qemu, or some other security control daemon. will
set the protection bitmap via this ioctl.
the API defined as:
struct kvm_subpage {
__u64 base_gfn;
__u64 npages;
/* sub-page write-access bitmap array */
__u32 access_map[SUBPAGE_MAX_BITMAP];
}sp;
kvm_vm_ioctl(s, KVM_SUBPAGES_SET_ACCESS, &sp)
kvm_vm_ioctl(s, KVM_SUBPAGES_GET_ACCESS, &sp)
Signed-off-by: Zhang Yi <[email protected]>
Signed-off-by: He Chen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 9 +++
arch/x86/kvm/mmu.c | 49 ++++++++++++++++
arch/x86/kvm/vmx.c | 20 +++++++
arch/x86/kvm/x86.c | 124 +++++++++++++++++++++++++++++++++++++++-
include/linux/kvm_host.h | 5 ++
include/uapi/linux/kvm.h | 11 ++++
6 files changed, 217 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 46312b9..3218d91 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -397,6 +397,8 @@ struct kvm_mmu {
void (*invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa);
void (*update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
u64 *spte, const void *pte);
+ int (*get_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
+ int (*set_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
hpa_t root_hpa;
hpa_t sppt_root;
union kvm_mmu_role mmu_role;
@@ -784,6 +786,7 @@ struct kvm_lpage_info {
struct kvm_arch_memory_slot {
struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
+ u32 *subpage_wp_info;
struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
};
@@ -1187,6 +1190,9 @@ struct kvm_x86_ops {
int (*nested_enable_evmcs)(struct kvm_vcpu *vcpu,
uint16_t *vmcs_version);
+
+ int (*get_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
+ int (*set_subpages)(struct kvm *kvm, struct kvm_subpage *spp_info);
};
struct kvm_arch_async_pf {
@@ -1400,6 +1406,9 @@ void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush);
+int kvm_mmu_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+
void kvm_enable_tdp(void);
void kvm_disable_tdp(void);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d077693..b1773c6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1430,6 +1430,15 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
return sptep;
}
+static u32 *gfn_to_subpage_wp_info(struct kvm_memory_slot *slot,
+ gfn_t gfn)
+{
+ unsigned long idx;
+
+ idx = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL);
+ return &slot->arch.subpage_wp_info[idx];
+}
+
#define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \
for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \
_spte_; _spte_ = rmap_get_next(_iter_))
@@ -4141,6 +4150,44 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
return RET_PF_RETRY;
}
+int kvm_mmu_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
+{
+ u32 *access = spp_info->access_map;
+ gfn_t gfn = spp_info->base_gfn;
+ int npages = spp_info->npages;
+ struct kvm_memory_slot *slot;
+ int i;
+
+ for (i = 0; i < npages; i++, gfn++) {
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot)
+ return -EFAULT;
+ access[i] = *gfn_to_subpage_wp_info(slot, gfn);
+ }
+
+ return i;
+}
+
+int kvm_mmu_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
+{
+ u32 access = spp_info->access_map[0];
+ gfn_t gfn = spp_info->base_gfn;
+ int npages = spp_info->npages;
+ struct kvm_memory_slot *slot;
+ u32 *wp_map;
+ int i;
+
+ for (i = 0; i < npages; i++, gfn++) {
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot)
+ return -EFAULT;
+ wp_map = gfn_to_subpage_wp_info(slot, gfn);
+ *wp_map = access;
+ }
+
+ return i;
+}
+
static void nonpaging_init_context(struct kvm_vcpu *vcpu,
struct kvm_mmu *context)
{
@@ -4835,6 +4882,8 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
context->get_cr3 = get_cr3;
context->get_pdptr = kvm_pdptr_read;
context->inject_page_fault = kvm_inject_page_fault;
+ context->get_subpages = kvm_x86_ops->get_subpages;
+ context->set_subpages = kvm_x86_ops->set_subpages;
if (!is_paging(vcpu)) {
context->nx = false;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6634098..b660812 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8028,6 +8028,11 @@ static __init int hardware_setup(void)
kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
}
+ if (!enable_ept_spp) {
+ kvm_x86_ops->get_subpages = NULL;
+ kvm_x86_ops->set_subpages = NULL;
+ }
+
if (!cpu_has_vmx_preemption_timer())
kvm_x86_ops->request_immediate_exit = __kvm_request_immediate_exit;
@@ -15037,6 +15042,18 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
return 0;
}
+static int vmx_get_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ return kvm_get_subpages(kvm, spp_info);
+}
+
+static int vmx_set_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ return kvm_set_subpages(kvm, spp_info);
+}
+
static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -15184,6 +15201,9 @@ static struct kvm_x86_ops vmx_x86_ops __ro_after_init = {
.enable_smi_window = enable_smi_window,
.nested_enable_evmcs = nested_enable_evmcs,
+
+ .get_subpages = vmx_get_subpages,
+ .set_subpages = vmx_set_subpages,
};
static void vmx_cleanup_l1d_flush(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5cd5647..fa36858 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4507,6 +4507,44 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
return r;
}
+static int kvm_vm_ioctl_get_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ return kvm_arch_get_subpages(kvm, spp_info);
+}
+
+static int kvm_vm_ioctl_set_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ return kvm_arch_set_subpages(kvm, spp_info);
+}
+
+int kvm_get_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ int ret;
+
+ mutex_lock(&kvm->slots_lock);
+ ret = kvm_mmu_get_subpages(kvm, spp_info);
+ mutex_unlock(&kvm->slots_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_get_subpages);
+
+int kvm_set_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ int ret;
+
+ mutex_lock(&kvm->slots_lock);
+ ret = kvm_mmu_set_subpages(kvm, spp_info);
+ mutex_unlock(&kvm->slots_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_set_subpages);
+
long kvm_arch_vm_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -4811,6 +4849,39 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (copy_from_user(&hvevfd, argp, sizeof(hvevfd)))
goto out;
r = kvm_vm_ioctl_hv_eventfd(kvm, &hvevfd);
+ }
+ case KVM_SUBPAGES_GET_ACCESS: {
+ struct kvm_subpage spp_info;
+
+ r = -EFAULT;
+ if (copy_from_user(&spp_info, argp, sizeof(spp_info)))
+ goto out;
+
+ r = -EINVAL;
+ if (spp_info.npages == 0 ||
+ spp_info.npages > SUBPAGE_MAX_BITMAP)
+ goto out;
+
+ r = kvm_vm_ioctl_get_subpages(kvm, &spp_info);
+ if (copy_to_user(argp, &spp_info, sizeof(spp_info))) {
+ r = -EFAULT;
+ goto out;
+ }
+ break;
+ }
+ case KVM_SUBPAGES_SET_ACCESS: {
+ struct kvm_subpage spp_info;
+
+ r = -EFAULT;
+ if (copy_from_user(&spp_info, argp, sizeof(spp_info)))
+ goto out;
+
+ r = -EINVAL;
+ if (spp_info.npages == 0 ||
+ spp_info.npages > SUBPAGE_MAX_BITMAP)
+ goto out;
+
+ r = kvm_vm_ioctl_set_subpages(kvm, &spp_info);
break;
}
default:
@@ -9152,6 +9223,34 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_hv_destroy_vm(kvm);
}
+int kvm_subpage_create_memslot(struct kvm_memory_slot *slot,
+ unsigned long npages)
+{
+ int lpages;
+
+ lpages = gfn_to_index(slot->base_gfn + npages - 1,
+ slot->base_gfn, 1) + 1;
+
+ slot->arch.subpage_wp_info =
+ kvzalloc(lpages * sizeof(*slot->arch.subpage_wp_info),
+ GFP_KERNEL);
+
+ if (!slot->arch.subpage_wp_info)
+ return -ENOMEM;
+
+ return 0;
+}
+
+void kvm_subpage_free_memslot(struct kvm_memory_slot *free,
+ struct kvm_memory_slot *dont)
+{
+ if (!dont || free->arch.subpage_wp_info !=
+ dont->arch.subpage_wp_info) {
+ kvfree(free->arch.subpage_wp_info);
+ free->arch.subpage_wp_info = NULL;
+ }
+}
+
void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
struct kvm_memory_slot *dont)
{
@@ -9173,6 +9272,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
}
kvm_page_track_free_memslot(free, dont);
+ kvm_subpage_free_memslot(free, dont);
}
int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
@@ -9225,8 +9325,12 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
if (kvm_page_track_create_memslot(slot, npages))
goto out_free;
- return 0;
+ if (kvm_subpage_create_memslot(slot, npages))
+ goto out_free_page_track;
+ return 0;
+out_free_page_track:
+ kvm_page_track_free_memslot(slot, NULL);
out_free:
for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
kvfree(slot->arch.rmap[i]);
@@ -9713,6 +9817,24 @@ int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq,
return kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set);
}
+int kvm_arch_get_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ if (!kvm_x86_ops->get_subpages)
+ return -EINVAL;
+
+ return kvm_x86_ops->get_subpages(kvm, spp_info);
+}
+
+int kvm_arch_set_subpages(struct kvm *kvm,
+ struct kvm_subpage *spp_info)
+{
+ if (!kvm_x86_ops->set_subpages)
+ return -EINVAL;
+
+ return kvm_x86_ops->set_subpages(kvm, spp_info);
+}
+
bool kvm_vector_hashing_enabled(void)
{
return vector_hashing;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c926698..7f29f97 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -816,6 +816,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
+int kvm_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_arch_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+int kvm_arch_set_subpages(struct kvm *kvm, struct kvm_subpage *spp_info);
+
#ifndef __KVM_HAVE_ARCH_VM_ALLOC
/*
* All architectures that want to use vzalloc currently also
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 01174f8..3fd6d14 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -102,6 +102,15 @@ struct kvm_userspace_memory_region {
__u64 userspace_addr; /* start of the userspace allocated memory */
};
+/* for KVM_SUBPAGES_GET_ACCESS and KVM_SUBPAGES_SET_ACCESS */
+#define SUBPAGE_MAX_BITMAP 128
+struct kvm_subpage {
+ __u64 base_gfn;
+ __u64 npages;
+ /* sub-page write-access bitmap array */
+ __u32 access_map[SUBPAGE_MAX_BITMAP];
+};
+
/*
* The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
* other bits are reserved for kvm internal use which are defined in
@@ -1229,6 +1238,8 @@ struct kvm_vfio_spapr_tce {
struct kvm_userspace_memory_region)
#define KVM_SET_TSS_ADDR _IO(KVMIO, 0x47)
#define KVM_SET_IDENTITY_MAP_ADDR _IOW(KVMIO, 0x48, __u64)
+#define KVM_SUBPAGES_GET_ACCESS _IOR(KVMIO, 0x49, __u64)
+#define KVM_SUBPAGES_SET_ACCESS _IOW(KVMIO, 0x4a, __u64)
/* enable ucontrol for s390 */
struct kvm_s390_ucas_mapping {
--
2.7.4
We also should setup SPP page structure while we catch
a SPP miss, some case, such as hotplug vcpu, should update
the SPP page table in SPP miss handler.
Signed-off-by: Zhang Yi <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu.c | 12 ++++++++++++
arch/x86/kvm/vmx.c | 8 ++++++++
3 files changed, 22 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ce6d258..a09ea39 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1406,6 +1406,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u64 error_code,
int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
u32 access_map, gfn_t gfn);
+int kvm_mmu_get_spp_acsess_map(struct kvm *kvm, u32 *access_map, gfn_t gfn);
+
void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva);
void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid);
void kvm_mmu_new_cr3(struct kvm_vcpu *vcpu, gpa_t new_cr3, bool skip_tlb_flush);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 287ee62..01cf85e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4299,6 +4299,17 @@ static void mmu_spp_spte_set(u64 *sptep, u64 new_spte)
__set_spte(sptep, new_spte);
}
+int kvm_mmu_get_spp_acsess_map(struct kvm *kvm, u32 *access_map, gfn_t gfn)
+{
+ struct kvm_memory_slot *slot;
+
+ slot = gfn_to_memslot(kvm, gfn);
+ *access_map = *gfn_to_subpage_wp_info(slot, gfn);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_spp_acsess_map);
+
int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
u32 access_map, gfn_t gfn)
{
@@ -4344,6 +4355,7 @@ int kvm_mmu_setup_spp_structure(struct kvm_vcpu *vcpu,
spin_unlock(&kvm->mmu_lock);
return -EFAULT;
}
+EXPORT_SYMBOL_GPL(kvm_mmu_setup_spp_structure);
int kvm_mmu_get_subpages(struct kvm *kvm, struct kvm_subpage *spp_info)
{
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b660812..b0ab645 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9706,6 +9706,9 @@ static int handle_invpcid(struct kvm_vcpu *vcpu)
static int handle_spp(struct kvm_vcpu *vcpu)
{
unsigned long exit_qualification;
+ gpa_t gpa;
+ gfn_t gfn;
+ u32 map;
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -9732,6 +9735,11 @@ static int handle_spp(struct kvm_vcpu *vcpu)
* SPP table here.
*/
pr_debug("SPP: %s: SPPT Miss!!!\n", __func__);
+
+ gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+ gfn = gpa >> PAGE_SHIFT;
+ kvm_mmu_get_spp_acsess_map(vcpu->kvm, &map, gfn);
+ kvm_mmu_setup_spp_structure(vcpu, map, gfn);
return 1;
}
--
2.7.4
On 30/11/18 08:52, Zhang Yi wrote:
> Here is a patch-series which adding EPT-Based Sub-page Write Protection Support.
>
> Introduction:
>
> EPT-Based Sub-page Write Protection referred to as SPP, it is a capability which
> allow Virtual Machine Monitors(VMM) to specify write-permission for guest
> physical memory at a sub-page(128 byte) granularity. When this capability is
> utilized, the CPU enforces write-access permissions for sub-page regions of 4K
> pages as specified by the VMM. EPT-based sub-page permissions is intended to
> enable fine-grained memory write enforcement by a VMM for security(guest OS
> monitoring) and usages such as device virtualization and memory check-point.
>
> SPPT is active when the "sub-page write protection" VM-execution control is 1.
> SPPT looks up the guest physical addresses to derive a 64 bit "sub-page
> permission" value containing sub-page write permissions. The lookup from
> guest-physical addresses to the sub-page region permissions is determined by a
> set of SPPT paging structures.
>
> When the "sub-page write protection" VM-execution control is 1, the SPPT is used
> to lookup write permission bits for the 128 byte sub-page regions containing in
> the 4KB guest physical page. EPT specifies the 4KB page level privileges that
> software is allowed when accessing the guest physical address, whereas SPPT
> defines the write permissions for software at the 128 byte granularity regions
> within a 4KB page. Write accesses prevented due to sub-page permissions looked
> up via SPPT are reported as EPT violation VM exits. Similar to EPT, a logical
> processor uses SPPT to lookup sub-page region write permissions for
> guest-physical addresses only when those addresses are used to access memory.
Hi,
I think the right thing to do here would be to first get VM
introspection in KVM, as SPP is mostly an introspection feature and it
should be controller by the introspector rather than the KVM userspace.
Mihai, if you resubmit, I promise that I will look at it promptly.
Paolo
Hi Paolo,
On Fri, 2018-11-30 at 11:07 +0100, Paolo Bonzini wrote:
> On 30/11/18 08:52, Zhang Yi wrote:
> > Here is a patch-series which adding EPT-Based Sub-page Write Protection Support.
> >
> > Introduction:
> >
> > EPT-Based Sub-page Write Protection referred to as SPP, it is a capability which
> > allow Virtual Machine Monitors(VMM) to specify write-permission for guest
> > physical memory at a sub-page(128 byte) granularity. When this capability is
> > utilized, the CPU enforces write-access permissions for sub-page regions of 4K
> > pages as specified by the VMM. EPT-based sub-page permissions is intended to
> > enable fine-grained memory write enforcement by a VMM for security(guest OS
> > monitoring) and usages such as device virtualization and memory check-point.
> >
> > SPPT is active when the "sub-page write protection" VM-execution control is 1.
> > SPPT looks up the guest physical addresses to derive a 64 bit "sub-page
> > permission" value containing sub-page write permissions. The lookup from
> > guest-physical addresses to the sub-page region permissions is determined by a
> > set of SPPT paging structures.
> >
> > When the "sub-page write protection" VM-execution control is 1, the SPPT is used
> > to lookup write permission bits for the 128 byte sub-page regions containing in
> > the 4KB guest physical page. EPT specifies the 4KB page level privileges that
> > software is allowed when accessing the guest physical address, whereas SPPT
> > defines the write permissions for software at the 128 byte granularity regions
> > within a 4KB page. Write accesses prevented due to sub-page permissions looked
> > up via SPPT are reported as EPT violation VM exits. Similar to EPT, a logical
> > processor uses SPPT to lookup sub-page region write permissions for
> > guest-physical addresses only when those addresses are used to access memory.
>
> Hi,
>
> I think the right thing to do here would be to first get VM
> introspection in KVM, as SPP is mostly an introspection feature and it
> should be controller by the introspector rather than the KVM userspace.
>
> Mihai, if you resubmit, I promise that I will look at it promptly.
I'm currently traveling until Wednesday, but when I'll get into the
office I will see about preparing a new patch set and send it to the
list before Christmas.
Regards,
--
Mihai Donțu
On 2018-12-03 at 05:56:13 +0200, Mihai Donțu wrote:
> Hi Paolo,
>
> On Fri, 2018-11-30 at 11:07 +0100, Paolo Bonzini wrote:
> > On 30/11/18 08:52, Zhang Yi wrote:
> > > Here is a patch-series which adding EPT-Based Sub-page Write Protection Support.
> > >
> > > Introduction:
> > >
> > > EPT-Based Sub-page Write Protection referred to as SPP, it is a capability which
> > > allow Virtual Machine Monitors(VMM) to specify write-permission for guest
> > > physical memory at a sub-page(128 byte) granularity. When this capability is
> > > utilized, the CPU enforces write-access permissions for sub-page regions of 4K
> > > pages as specified by the VMM. EPT-based sub-page permissions is intended to
> > > enable fine-grained memory write enforcement by a VMM for security(guest OS
> > > monitoring) and usages such as device virtualization and memory check-point.
> > >
> > > SPPT is active when the "sub-page write protection" VM-execution control is 1.
> > > SPPT looks up the guest physical addresses to derive a 64 bit "sub-page
> > > permission" value containing sub-page write permissions. The lookup from
> > > guest-physical addresses to the sub-page region permissions is determined by a
> > > set of SPPT paging structures.
> > >
> > > When the "sub-page write protection" VM-execution control is 1, the SPPT is used
> > > to lookup write permission bits for the 128 byte sub-page regions containing in
> > > the 4KB guest physical page. EPT specifies the 4KB page level privileges that
> > > software is allowed when accessing the guest physical address, whereas SPPT
> > > defines the write permissions for software at the 128 byte granularity regions
> > > within a 4KB page. Write accesses prevented due to sub-page permissions looked
> > > up via SPPT are reported as EPT violation VM exits. Similar to EPT, a logical
> > > processor uses SPPT to lookup sub-page region write permissions for
> > > guest-physical addresses only when those addresses are used to access memory.
> >
> > Hi,
> >
> > I think the right thing to do here would be to first get VM
> > introspection in KVM, as SPP is mostly an introspection feature and it
> > should be controller by the introspector rather than the KVM userspace.
> >
> > Mihai, if you resubmit, I promise that I will look at it promptly.
Thanks review, Paolo, What do u think we cook some user-cases for qemu or
some kvmtools? even with some other kernel hyper-calls?
SPP is not only an introspection depended features.
>
> I'm currently traveling until Wednesday, but when I'll get into the
> office I will see about preparing a new patch set and send it to the
> list before Christmas.
Thanks Mihai, please include me in the new VMI patch set.
>
> Regards,
>
> --
> Mihai Donțu
>
On 04/12/18 07:35, Yi Zhang wrote:
> On 2018-12-03 at 05:56:13 +0200, Mihai Donțu wrote:
>>> Hi,
>>>
>>> I think the right thing to do here would be to first get VM
>>> introspection in KVM, as SPP is mostly an introspection feature and it
>>> should be controller by the introspector rather than the KVM userspace.
>>>
>>> Mihai, if you resubmit, I promise that I will look at it promptly.
> Thanks review, Paolo, What do u think we cook some user-cases for qemu or
> some kvmtools? even with some other kernel hyper-calls?
That's up to you. If you can find a usecase, I'll certainly consider
this independent patch set and the ioctl API.
Paolo