... to avoid having a stale value when handling an EPT misconfig for MMIO
regions.
MMIO regions that are not passed-through to the guest are handled through
EPT misconfigs. The first time a certain MMIO page is touched it causes an
EPT violation, then KVM marks the EPT entry to cause an EPT misconfig
instead. Any subsequent accesses to the entry will generate an EPT
misconfig.
Things gets slightly complicated with nested guest handling for MMIO
regions that are not passed through from L0 (i.e. emulated by L0
user-space).
An EPT violation for one of these MMIO regions from L2, exits to L0
hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and
realize it is not present (or not sufficient to serve the request). Then L0
injects an EPT violation to L1. L1 would then update its EPT mappings. The
EXIT_QUALIFICATION value for L1 would come from exit_qualification variable
in "struct vcpu". The problem is that this variable is only updated on EPT
violation and not on EPT misconfig. So if an EPT violation because of a
read happened first, then an EPT misconfig because of a write happened
afterwards. The L0 hypervisor will still contain exit_qualification value
from the previous read instead of the write and end up injecting an EPT
violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION.
The EPT violation that is injected from L0 to L1 needs to have the correct
EXIT_QUALIFICATION specially for the access bits because the individual
access bits for MMIO EPTs are updated only on actual access of this
specific type. So for the example above, the L1 hypervisor will keep
updating only the read bit in the EPT then resume the L2 guest. The L2
guest would end up causing another exit where the L0 *again* will inject
another EPT violation to L1 hypervisor with *again* an out of date
exit_qualification which indicates a read and not a write. Then this
ping-pong just keeps happening without making any forward progress.
The behavior of mapping MMIO regions changed in:
commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)")
... where an EPT violation for a read would also fixup the write bits to
avoid another EPT violation which by acciddent would fix the bug mentioned
above.
This commit fixes this situation and ensures that the access bits for the
exit_qualifcation is up to date. That ensures that even L1 hypervisor
running with a KVM version before the commit mentioned above would still
work.
( The description above assumes EPT to be available and used by L1
hypervisor + the L1 hypervisor is passing through the MMIO region to the L2
guest while this MMIO region is emulated by the L0 user-space ).
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: KarimAllah Ahmed <[email protected]>
---
arch/x86/kvm/paging_tmpl.h | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 5abae72..6288e9d 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -452,14 +452,21 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
* done by is_rsvd_bits_set() above.
*
* We set up the value of exit_qualification to inject:
- * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation
+ * [2:0] - Derive from the access bits. The exit_qualification might be
+ * out of date if it is serving an EPT misconfiguration.
* [5:3] - Calculated by the page walk of the guest EPT page tables
* [7:8] - Derived from [7:8] of real exit_qualification
*
* The other bits are set to 0.
*/
if (!(errcode & PFERR_RSVD_MASK)) {
- vcpu->arch.exit_qualification &= 0x187;
+ vcpu->arch.exit_qualification &= 0x180;
+ if (write_fault)
+ vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_WRITE;
+ if (user_fault)
+ vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_READ;
+ if (fetch_fault)
+ vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_INSTR;
vcpu->arch.exit_qualification |= (pte_access & 0x7) << 3;
}
#endif
--
2.7.4
On 28/02/2018 19:06, KarimAllah Ahmed wrote:
> ... to avoid having a stale value when handling an EPT misconfig for MMIO
> regions.
>
> MMIO regions that are not passed-through to the guest are handled through
> EPT misconfigs. The first time a certain MMIO page is touched it causes an
> EPT violation, then KVM marks the EPT entry to cause an EPT misconfig
> instead. Any subsequent accesses to the entry will generate an EPT
> misconfig.
>
> Things gets slightly complicated with nested guest handling for MMIO
> regions that are not passed through from L0 (i.e. emulated by L0
> user-space).
>
> An EPT violation for one of these MMIO regions from L2, exits to L0
> hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and
> realize it is not present (or not sufficient to serve the request). Then L0
> injects an EPT violation to L1. L1 would then update its EPT mappings. The
> EXIT_QUALIFICATION value for L1 would come from exit_qualification variable
> in "struct vcpu". The problem is that this variable is only updated on EPT
> violation and not on EPT misconfig. So if an EPT violation because of a
> read happened first, then an EPT misconfig because of a write happened
> afterwards. The L0 hypervisor will still contain exit_qualification value
> from the previous read instead of the write and end up injecting an EPT
> violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION.
>
> The EPT violation that is injected from L0 to L1 needs to have the correct
> EXIT_QUALIFICATION specially for the access bits because the individual
> access bits for MMIO EPTs are updated only on actual access of this
> specific type. So for the example above, the L1 hypervisor will keep
> updating only the read bit in the EPT then resume the L2 guest. The L2
> guest would end up causing another exit where the L0 *again* will inject
> another EPT violation to L1 hypervisor with *again* an out of date
> exit_qualification which indicates a read and not a write. Then this
> ping-pong just keeps happening without making any forward progress.
>
> The behavior of mapping MMIO regions changed in:
>
> commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)")
>
> ... where an EPT violation for a read would also fixup the write bits to
> avoid another EPT violation which by acciddent would fix the bug mentioned
> above.
>
> This commit fixes this situation and ensures that the access bits for the
> exit_qualifcation is up to date. That ensures that even L1 hypervisor
> running with a KVM version before the commit mentioned above would still
> work.
>
> ( The description above assumes EPT to be available and used by L1
> hypervisor + the L1 hypervisor is passing through the MMIO region to the L2
> guest while this MMIO region is emulated by the L0 user-space ).
This looks okay. Would it be possible to add a kvm-unit-tests testcase
for this?
Thanks,
Paolo
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: KarimAllah Ahmed <[email protected]>
> ---
> arch/x86/kvm/paging_tmpl.h | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 5abae72..6288e9d 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -452,14 +452,21 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> * done by is_rsvd_bits_set() above.
> *
> * We set up the value of exit_qualification to inject:
> - * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation
> + * [2:0] - Derive from the access bits. The exit_qualification might be
> + * out of date if it is serving an EPT misconfiguration.
> * [5:3] - Calculated by the page walk of the guest EPT page tables
> * [7:8] - Derived from [7:8] of real exit_qualification
> *
> * The other bits are set to 0.
> */
> if (!(errcode & PFERR_RSVD_MASK)) {
> - vcpu->arch.exit_qualification &= 0x187;
> + vcpu->arch.exit_qualification &= 0x180;
> + if (write_fault)
> + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_WRITE;
> + if (user_fault)
> + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_READ;
> + if (fetch_fault)
> + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_INSTR;
> vcpu->arch.exit_qualification |= (pte_access & 0x7) << 3;
> }
> #endif
>
Validate that a write MMIO access that follows a read MMIO access would
have the correct access captured in the exit qualification.
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: KarimAllah Ahmed <[email protected]>
Message-Id: <[email protected]>
---
x86/vmx_tests.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 48 insertions(+), 4 deletions(-)
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 598dd88..a72af1a 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -7,6 +7,7 @@
#include "msr.h"
#include "processor.h"
#include "vm.h"
+#include "pci.h"
#include "fwcfg.h"
#include "isr.h"
#include "desc.h"
@@ -28,6 +29,8 @@ unsigned long *pml4;
u64 eptp;
void *data_page1, *data_page2;
+phys_addr_t pci_physaddr;
+
void *pml_log;
#define PML_INDEX 512
@@ -1041,6 +1044,9 @@ static int apic_version;
static int ept_init_common(bool have_ad)
{
+ int ret;
+ struct pci_dev pcidev;
+
if (setup_ept(have_ad))
return VMX_TEST_EXIT;
data_page1 = alloc_page();
@@ -1053,6 +1059,13 @@ static int ept_init_common(bool have_ad)
EPT_RA | EPT_WA | EPT_EA);
apic_version = apic_read(APIC_LVR);
+
+ ret = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
+ if (ret != PCIDEVADDR_INVALID) {
+ pci_dev_init(&pcidev, ret);
+ pci_physaddr = pcidev.resource[PCI_TESTDEV_BAR_MEM];
+ }
+
return VMX_TEST_START;
}
@@ -1101,6 +1114,16 @@ t1:
vmcall();
*((u32 *)data_page1) = MAGIC_VAL_2;
report("EPT violation - paging structure", vmx_get_test_stage() == 5);
+
+ // MMIO Read/Write
+ vmx_set_test_stage(5);
+ vmcall();
+
+ *(u32 volatile *)pci_physaddr;
+ report("MMIO EPT violation - read", vmx_get_test_stage() == 6);
+
+ *(u32 volatile *)pci_physaddr = MAGIC_VAL_1;
+ report("MMIO EPT violation - write", vmx_get_test_stage() == 7);
}
static void ept_main()
@@ -1108,12 +1131,12 @@ static void ept_main()
ept_common();
// Test EPT access to L1 MMIO
- vmx_set_test_stage(6);
+ vmx_set_test_stage(7);
report("EPT - MMIO access", *((u32 *)0xfee00030UL) == apic_version);
// Test invalid operand for INVEPT
vmcall();
- report("EPT - unsupported INVEPT", vmx_get_test_stage() == 7);
+ report("EPT - unsupported INVEPT", vmx_get_test_stage() == 8);
}
bool invept_test(int type, u64 eptp)
@@ -1187,7 +1210,7 @@ static int ept_exit_handler_common(bool have_ad)
ulong reason;
u32 insn_len;
u32 exit_qual;
- static unsigned long data_page1_pte, data_page1_pte_pte;
+ static unsigned long data_page1_pte, data_page1_pte_pte, memaddr_pte;
guest_rip = vmcs_read(GUEST_RIP);
guest_cr3 = vmcs_read(GUEST_CR3);
@@ -1249,7 +1272,12 @@ static int ept_exit_handler_common(bool have_ad)
data_page1_pte_pte & ~EPT_PRESENT);
ept_sync(INVEPT_SINGLE, eptp);
break;
- case 6:
+ case 5:
+ install_ept(pml4, (unsigned long)pci_physaddr,
+ (unsigned long)pci_physaddr, 0);
+ ept_sync(INVEPT_SINGLE, eptp);
+ break;
+ case 7:
if (!invept_test(0, eptp))
vmx_inc_test_stage();
break;
@@ -1305,6 +1333,22 @@ static int ept_exit_handler_common(bool have_ad)
data_page1_pte_pte | (EPT_PRESENT));
ept_sync(INVEPT_SINGLE, eptp);
break;
+ case 5:
+ if (exit_qual & EPT_VLT_RD)
+ vmx_inc_test_stage();
+ TEST_ASSERT(get_ept_pte(pml4, (unsigned long)pci_physaddr,
+ 1, &memaddr_pte));
+ set_ept_pte(pml4, memaddr_pte, 1, memaddr_pte | EPT_RA);
+ ept_sync(INVEPT_SINGLE, eptp);
+ break;
+ case 6:
+ if (exit_qual & EPT_VLT_WR)
+ vmx_inc_test_stage();
+ TEST_ASSERT(get_ept_pte(pml4, (unsigned long)pci_physaddr,
+ 1, &memaddr_pte));
+ set_ept_pte(pml4, memaddr_pte, 1, memaddr_pte | EPT_RA | EPT_WA);
+ ept_sync(INVEPT_SINGLE, eptp);
+ break;
default:
// Should not reach here
report("ERROR : unexpected stage, %d", false,
--
2.7.4
On Fri, 2018-03-02 at 18:41 +0100, Paolo Bonzini wrote:
> On 28/02/2018 19:06, KarimAllah Ahmed wrote:
> >
> > ... to avoid having a stale value when handling an EPT misconfig for MMIO
> > regions.
> >
> > MMIO regions that are not passed-through to the guest are handled through
> > EPT misconfigs. The first time a certain MMIO page is touched it causes an
> > EPT violation, then KVM marks the EPT entry to cause an EPT misconfig
> > instead. Any subsequent accesses to the entry will generate an EPT
> > misconfig.
> >
> > Things gets slightly complicated with nested guest handling for MMIO
> > regions that are not passed through from L0 (i.e. emulated by L0
> > user-space).
> >
> > An EPT violation for one of these MMIO regions from L2, exits to L0
> > hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and
> > realize it is not present (or not sufficient to serve the request). Then L0
> > injects an EPT violation to L1. L1 would then update its EPT mappings. The
> > EXIT_QUALIFICATION value for L1 would come from exit_qualification variable
> > in "struct vcpu". The problem is that this variable is only updated on EPT
> > violation and not on EPT misconfig. So if an EPT violation because of a
> > read happened first, then an EPT misconfig because of a write happened
> > afterwards. The L0 hypervisor will still contain exit_qualification value
> > from the previous read instead of the write and end up injecting an EPT
> > violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION.
> >
> > The EPT violation that is injected from L0 to L1 needs to have the correct
> > EXIT_QUALIFICATION specially for the access bits because the individual
> > access bits for MMIO EPTs are updated only on actual access of this
> > specific type. So for the example above, the L1 hypervisor will keep
> > updating only the read bit in the EPT then resume the L2 guest. The L2
> > guest would end up causing another exit where the L0 *again* will inject
> > another EPT violation to L1 hypervisor with *again* an out of date
> > exit_qualification which indicates a read and not a write. Then this
> > ping-pong just keeps happening without making any forward progress.
> >
> > The behavior of mapping MMIO regions changed in:
> >
> > commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)")
> >
> > ... where an EPT violation for a read would also fixup the write bits to
> > avoid another EPT violation which by acciddent would fix the bug mentioned
> > above.
> >
> > This commit fixes this situation and ensures that the access bits for the
> > exit_qualifcation is up to date. That ensures that even L1 hypervisor
> > running with a KVM version before the commit mentioned above would still
> > work.
> >
> > ( The description above assumes EPT to be available and used by L1
> > hypervisor + the L1 hypervisor is passing through the MMIO region to the L2
> > guest while this MMIO region is emulated by the L0 user-space ).
>
> This looks okay. Would it be possible to add a kvm-unit-tests testcase
> for this?
Yup, makes sense. Just sent out a patch for kvm-unit-tests.
Thanks.
>
> Thanks,
>
> Paolo
>
> >
> > Cc: Paolo Bonzini <[email protected]>
> > Cc: Radim Krčmář <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: H. Peter Anvin <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Signed-off-by: KarimAllah Ahmed <[email protected]>
> > ---
> > arch/x86/kvm/paging_tmpl.h | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > index 5abae72..6288e9d 100644
> > --- a/arch/x86/kvm/paging_tmpl.h
> > +++ b/arch/x86/kvm/paging_tmpl.h
> > @@ -452,14 +452,21 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> > * done by is_rsvd_bits_set() above.
> > *
> > * We set up the value of exit_qualification to inject:
> > - * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation
> > + * [2:0] - Derive from the access bits. The exit_qualification might be
> > + * out of date if it is serving an EPT misconfiguration.
> > * [5:3] - Calculated by the page walk of the guest EPT page tables
> > * [7:8] - Derived from [7:8] of real exit_qualification
> > *
> > * The other bits are set to 0.
> > */
> > if (!(errcode & PFERR_RSVD_MASK)) {
> > - vcpu->arch.exit_qualification &= 0x187;
> > + vcpu->arch.exit_qualification &= 0x180;
> > + if (write_fault)
> > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_WRITE;
> > + if (user_fault)
> > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_READ;
> > + if (fetch_fault)
> > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_INSTR;
> > vcpu->arch.exit_qualification |= (pte_access & 0x7) << 3;
> > }
> > #endif
> >
>
>
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
On Sun, 2018-03-04 at 10:17 +0000, Raslan, KarimAllah wrote:
> On Fri, 2018-03-02 at 18:41 +0100, Paolo Bonzini wrote:
> >
> > On 28/02/2018 19:06, KarimAllah Ahmed wrote:
> > >
> > >
> > > ... to avoid having a stale value when handling an EPT misconfig for MMIO
> > > regions.
> > >
> > > MMIO regions that are not passed-through to the guest are handled through
> > > EPT misconfigs. The first time a certain MMIO page is touched it causes an
> > > EPT violation, then KVM marks the EPT entry to cause an EPT misconfig
> > > instead. Any subsequent accesses to the entry will generate an EPT
> > > misconfig.
> > >
> > > Things gets slightly complicated with nested guest handling for MMIO
> > > regions that are not passed through from L0 (i.e. emulated by L0
> > > user-space).
> > >
> > > An EPT violation for one of these MMIO regions from L2, exits to L0
> > > hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and
> > > realize it is not present (or not sufficient to serve the request). Then L0
> > > injects an EPT violation to L1. L1 would then update its EPT mappings. The
> > > EXIT_QUALIFICATION value for L1 would come from exit_qualification variable
> > > in "struct vcpu". The problem is that this variable is only updated on EPT
> > > violation and not on EPT misconfig. So if an EPT violation because of a
> > > read happened first, then an EPT misconfig because of a write happened
> > > afterwards. The L0 hypervisor will still contain exit_qualification value
> > > from the previous read instead of the write and end up injecting an EPT
> > > violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION.
> > >
> > > The EPT violation that is injected from L0 to L1 needs to have the correct
> > > EXIT_QUALIFICATION specially for the access bits because the individual
> > > access bits for MMIO EPTs are updated only on actual access of this
> > > specific type. So for the example above, the L1 hypervisor will keep
> > > updating only the read bit in the EPT then resume the L2 guest. The L2
> > > guest would end up causing another exit where the L0 *again* will inject
> > > another EPT violation to L1 hypervisor with *again* an out of date
> > > exit_qualification which indicates a read and not a write. Then this
> > > ping-pong just keeps happening without making any forward progress.
> > >
> > > The behavior of mapping MMIO regions changed in:
> > >
> > > commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)")
> > >
> > > ... where an EPT violation for a read would also fixup the write bits to
> > > avoid another EPT violation which by acciddent would fix the bug mentioned
> > > above.
> > >
> > > This commit fixes this situation and ensures that the access bits for the
> > > exit_qualifcation is up to date. That ensures that even L1 hypervisor
> > > running with a KVM version before the commit mentioned above would still
> > > work.
> > >
> > > ( The description above assumes EPT to be available and used by L1
> > > hypervisor + the L1 hypervisor is passing through the MMIO region to the L2
> > > guest while this MMIO region is emulated by the L0 user-space ).
> >
> > This looks okay. Would it be possible to add a kvm-unit-tests testcase
> > for this?
>
> Yup, makes sense. Just sent out a patch for kvm-unit-tests.
Was the kvm-unit-test that I posted sufficient?
>
> Thanks.
>
> >
> >
> > Thanks,
> >
> > Paolo
> >
> > >
> > >
> > > Cc: Paolo Bonzini <[email protected]>
> > > Cc: Radim Krčmář <[email protected]>
> > > Cc: Thomas Gleixner <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: H. Peter Anvin <[email protected]>
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Cc: [email protected]
> > > Signed-off-by: KarimAllah Ahmed <[email protected]>
> > > ---
> > > arch/x86/kvm/paging_tmpl.h | 11 +++++++++--
> > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > > index 5abae72..6288e9d 100644
> > > --- a/arch/x86/kvm/paging_tmpl.h
> > > +++ b/arch/x86/kvm/paging_tmpl.h
> > > @@ -452,14 +452,21 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> > > * done by is_rsvd_bits_set() above.
> > > *
> > > * We set up the value of exit_qualification to inject:
> > > - * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation
> > > + * [2:0] - Derive from the access bits. The exit_qualification might be
> > > + * out of date if it is serving an EPT misconfiguration.
> > > * [5:3] - Calculated by the page walk of the guest EPT page tables
> > > * [7:8] - Derived from [7:8] of real exit_qualification
> > > *
> > > * The other bits are set to 0.
> > > */
> > > if (!(errcode & PFERR_RSVD_MASK)) {
> > > - vcpu->arch.exit_qualification &= 0x187;
> > > + vcpu->arch.exit_qualification &= 0x180;
> > > + if (write_fault)
> > > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_WRITE;
> > > + if (user_fault)
> > > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_READ;
> > > + if (fetch_fault)
> > > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_INSTR;
> > > vcpu->arch.exit_qualification |= (pte_access & 0x7) << 3;
> > > }
> > > #endif
> > >
> >
> >
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
On Mon, 2018-03-12 at 08:52 +0000, Raslan, KarimAllah wrote:
> On Sun, 2018-03-04 at 10:17 +0000, Raslan, KarimAllah wrote:
> >
> > On Fri, 2018-03-02 at 18:41 +0100, Paolo Bonzini wrote:
> > >
> > >
> > > On 28/02/2018 19:06, KarimAllah Ahmed wrote:
> > > >
> > > >
> > > >
> > > > ... to avoid having a stale value when handling an EPT misconfig for MMIO
> > > > regions.
> > > >
> > > > MMIO regions that are not passed-through to the guest are handled through
> > > > EPT misconfigs. The first time a certain MMIO page is touched it causes an
> > > > EPT violation, then KVM marks the EPT entry to cause an EPT misconfig
> > > > instead. Any subsequent accesses to the entry will generate an EPT
> > > > misconfig.
> > > >
> > > > Things gets slightly complicated with nested guest handling for MMIO
> > > > regions that are not passed through from L0 (i.e. emulated by L0
> > > > user-space).
> > > >
> > > > An EPT violation for one of these MMIO regions from L2, exits to L0
> > > > hypervisor. L0 would then look at the EPT12 mapping for L1 hypervisor and
> > > > realize it is not present (or not sufficient to serve the request). Then L0
> > > > injects an EPT violation to L1. L1 would then update its EPT mappings. The
> > > > EXIT_QUALIFICATION value for L1 would come from exit_qualification variable
> > > > in "struct vcpu". The problem is that this variable is only updated on EPT
> > > > violation and not on EPT misconfig. So if an EPT violation because of a
> > > > read happened first, then an EPT misconfig because of a write happened
> > > > afterwards. The L0 hypervisor will still contain exit_qualification value
> > > > from the previous read instead of the write and end up injecting an EPT
> > > > violation to the L1 hypervisor with an out of date EXIT_QUALIFICATION.
> > > >
> > > > The EPT violation that is injected from L0 to L1 needs to have the correct
> > > > EXIT_QUALIFICATION specially for the access bits because the individual
> > > > access bits for MMIO EPTs are updated only on actual access of this
> > > > specific type. So for the example above, the L1 hypervisor will keep
> > > > updating only the read bit in the EPT then resume the L2 guest. The L2
> > > > guest would end up causing another exit where the L0 *again* will inject
> > > > another EPT violation to L1 hypervisor with *again* an out of date
> > > > exit_qualification which indicates a read and not a write. Then this
> > > > ping-pong just keeps happening without making any forward progress.
> > > >
> > > > The behavior of mapping MMIO regions changed in:
> > > >
> > > > commit a340b3e229b24 ("kvm: Map PFN-type memory regions as writable (if possible)")
> > > >
> > > > ... where an EPT violation for a read would also fixup the write bits to
> > > > avoid another EPT violation which by acciddent would fix the bug mentioned
> > > > above.
> > > >
> > > > This commit fixes this situation and ensures that the access bits for the
> > > > exit_qualifcation is up to date. That ensures that even L1 hypervisor
> > > > running with a KVM version before the commit mentioned above would still
> > > > work.
> > > >
> > > > ( The description above assumes EPT to be available and used by L1
> > > > hypervisor + the L1 hypervisor is passing through the MMIO region to the L2
> > > > guest while this MMIO region is emulated by the L0 user-space ).
> > >
> > > This looks okay. Would it be possible to add a kvm-unit-tests testcase
> > > for this?
> >
> > Yup, makes sense. Just sent out a patch for kvm-unit-tests.
>
> Was the kvm-unit-test that I posted sufficient?
Never mind, I just noticed that Radim already pulled this fix to the
kvm/queue.
Thanks Radim :)
>
> >
> >
> > Thanks.
> >
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Paolo
> > >
> > > >
> > > >
> > > >
> > > > Cc: Paolo Bonzini <[email protected]>
> > > > Cc: Radim Krčmář <[email protected]>
> > > > Cc: Thomas Gleixner <[email protected]>
> > > > Cc: Ingo Molnar <[email protected]>
> > > > Cc: H. Peter Anvin <[email protected]>
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Cc: [email protected]
> > > > Signed-off-by: KarimAllah Ahmed <[email protected]>
> > > > ---
> > > > arch/x86/kvm/paging_tmpl.h | 11 +++++++++--
> > > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > > > index 5abae72..6288e9d 100644
> > > > --- a/arch/x86/kvm/paging_tmpl.h
> > > > +++ b/arch/x86/kvm/paging_tmpl.h
> > > > @@ -452,14 +452,21 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
> > > > * done by is_rsvd_bits_set() above.
> > > > *
> > > > * We set up the value of exit_qualification to inject:
> > > > - * [2:0] - Derive from [2:0] of real exit_qualification at EPT violation
> > > > + * [2:0] - Derive from the access bits. The exit_qualification might be
> > > > + * out of date if it is serving an EPT misconfiguration.
> > > > * [5:3] - Calculated by the page walk of the guest EPT page tables
> > > > * [7:8] - Derived from [7:8] of real exit_qualification
> > > > *
> > > > * The other bits are set to 0.
> > > > */
> > > > if (!(errcode & PFERR_RSVD_MASK)) {
> > > > - vcpu->arch.exit_qualification &= 0x187;
> > > > + vcpu->arch.exit_qualification &= 0x180;
> > > > + if (write_fault)
> > > > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_WRITE;
> > > > + if (user_fault)
> > > > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_READ;
> > > > + if (fetch_fault)
> > > > + vcpu->arch.exit_qualification |= EPT_VIOLATION_ACC_INSTR;
> > > > vcpu->arch.exit_qualification |= (pte_access & 0x7) << 3;
> > > > }
> > > > #endif
> > > >
> > >
> > >
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
On 04/03/2018 11:16, KarimAllah Ahmed wrote:
> Validate that a write MMIO access that follows a read MMIO access would
> have the correct access captured in the exit qualification.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: KarimAllah Ahmed <[email protected]>
> Message-Id: <[email protected]>
> ---
> x86/vmx_tests.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 48 insertions(+), 4 deletions(-)
>
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index 598dd88..a72af1a 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -7,6 +7,7 @@
> #include "msr.h"
> #include "processor.h"
> #include "vm.h"
> +#include "pci.h"
> #include "fwcfg.h"
> #include "isr.h"
> #include "desc.h"
> @@ -28,6 +29,8 @@ unsigned long *pml4;
> u64 eptp;
> void *data_page1, *data_page2;
>
> +phys_addr_t pci_physaddr;
> +
> void *pml_log;
> #define PML_INDEX 512
>
> @@ -1041,6 +1044,9 @@ static int apic_version;
>
> static int ept_init_common(bool have_ad)
> {
> + int ret;
> + struct pci_dev pcidev;
> +
> if (setup_ept(have_ad))
> return VMX_TEST_EXIT;
> data_page1 = alloc_page();
> @@ -1053,6 +1059,13 @@ static int ept_init_common(bool have_ad)
> EPT_RA | EPT_WA | EPT_EA);
>
> apic_version = apic_read(APIC_LVR);
> +
> + ret = pci_find_dev(PCI_VENDOR_ID_REDHAT, PCI_DEVICE_ID_REDHAT_TEST);
> + if (ret != PCIDEVADDR_INVALID) {
> + pci_dev_init(&pcidev, ret);
> + pci_physaddr = pcidev.resource[PCI_TESTDEV_BAR_MEM];
> + }
> +
> return VMX_TEST_START;
> }
>
> @@ -1101,6 +1114,16 @@ t1:
> vmcall();
> *((u32 *)data_page1) = MAGIC_VAL_2;
> report("EPT violation - paging structure", vmx_get_test_stage() == 5);
> +
> + // MMIO Read/Write
> + vmx_set_test_stage(5);
> + vmcall();
> +
> + *(u32 volatile *)pci_physaddr;
> + report("MMIO EPT violation - read", vmx_get_test_stage() == 6);
> +
> + *(u32 volatile *)pci_physaddr = MAGIC_VAL_1;
> + report("MMIO EPT violation - write", vmx_get_test_stage() == 7);
> }
>
> static void ept_main()
> @@ -1108,12 +1131,12 @@ static void ept_main()
> ept_common();
>
> // Test EPT access to L1 MMIO
> - vmx_set_test_stage(6);
> + vmx_set_test_stage(7);
> report("EPT - MMIO access", *((u32 *)0xfee00030UL) == apic_version);
>
> // Test invalid operand for INVEPT
> vmcall();
> - report("EPT - unsupported INVEPT", vmx_get_test_stage() == 7);
> + report("EPT - unsupported INVEPT", vmx_get_test_stage() == 8);
> }
>
> bool invept_test(int type, u64 eptp)
> @@ -1187,7 +1210,7 @@ static int ept_exit_handler_common(bool have_ad)
> ulong reason;
> u32 insn_len;
> u32 exit_qual;
> - static unsigned long data_page1_pte, data_page1_pte_pte;
> + static unsigned long data_page1_pte, data_page1_pte_pte, memaddr_pte;
>
> guest_rip = vmcs_read(GUEST_RIP);
> guest_cr3 = vmcs_read(GUEST_CR3);
> @@ -1249,7 +1272,12 @@ static int ept_exit_handler_common(bool have_ad)
> data_page1_pte_pte & ~EPT_PRESENT);
> ept_sync(INVEPT_SINGLE, eptp);
> break;
> - case 6:
> + case 5:
> + install_ept(pml4, (unsigned long)pci_physaddr,
> + (unsigned long)pci_physaddr, 0);
> + ept_sync(INVEPT_SINGLE, eptp);
> + break;
> + case 7:
> if (!invept_test(0, eptp))
> vmx_inc_test_stage();
> break;
> @@ -1305,6 +1333,22 @@ static int ept_exit_handler_common(bool have_ad)
> data_page1_pte_pte | (EPT_PRESENT));
> ept_sync(INVEPT_SINGLE, eptp);
> break;
> + case 5:
> + if (exit_qual & EPT_VLT_RD)
> + vmx_inc_test_stage();
> + TEST_ASSERT(get_ept_pte(pml4, (unsigned long)pci_physaddr,
> + 1, &memaddr_pte));
> + set_ept_pte(pml4, memaddr_pte, 1, memaddr_pte | EPT_RA);
> + ept_sync(INVEPT_SINGLE, eptp);
> + break;
> + case 6:
> + if (exit_qual & EPT_VLT_WR)
> + vmx_inc_test_stage();
> + TEST_ASSERT(get_ept_pte(pml4, (unsigned long)pci_physaddr,
> + 1, &memaddr_pte));
> + set_ept_pte(pml4, memaddr_pte, 1, memaddr_pte | EPT_RA | EPT_WA);
> + ept_sync(INVEPT_SINGLE, eptp);
> + break;
> default:
> // Should not reach here
> report("ERROR : unexpected stage, %d", false,
>
Pushed, thanks!
Paolo