2021-10-04 18:32:48

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 00/16] KVM: arm64: MMIO guard PV services

This is the second version of this series initially posted at [1] that
aims at letting a guest express what it considers as MMIO, and only
let this through to userspace. Together with the guest memory made
(mostly) inaccessible to the host kernel and userspace, this allows an
implementation of a hardened IO subsystem.

A lot has been fixed/revamped/improved since the initial posting,
although I am still not pleased with the ioremap plugging on the guest
side. I'll take any idea to get rid of it!

The series is based on 5.15-rc3.

[1] https://lore.kernel.org/kvmarm/[email protected]

Marc Zyngier (16):
KVM: arm64: Generalise VM features into a set of flags
KVM: arm64: Check for PTE valitity when checking for
executable/cacheable
KVM: arm64: Turn kvm_pgtable_stage2_set_owner into
kvm_pgtable_stage2_annotate
KVM: arm64: Add MMIO checking infrastructure
KVM: arm64: Plumb MMIO checking into the fault handling
KVM: arm64: Force a full unmap on vpcu reinit
KVM: arm64: Wire MMIO guard hypercalls
KVM: arm64: Add tracepoint for failed MMIO guard check
KVM: arm64: Advertise a capability for MMIO guard
KVM: arm64: Add some documentation for the MMIO guard feature
firmware/smccc: Call arch-specific hook on discovering KVM services
mm/vmalloc: Add arch-specific callbacks to track io{remap,unmap}
physical pages
arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard
arm64: Enroll into KVM's MMIO guard if required
arm64: Add a helper to retrieve the PTE of a fixmap
arm64: Register earlycon fixmap with the MMIO guard

.../admin-guide/kernel-parameters.txt | 3 +
Documentation/virt/kvm/arm/index.rst | 1 +
Documentation/virt/kvm/arm/mmio-guard.rst | 74 ++++++++
arch/arm/include/asm/hypervisor.h | 1 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/fixmap.h | 2 +
arch/arm64/include/asm/hypervisor.h | 2 +
arch/arm64/include/asm/kvm_host.h | 14 +-
arch/arm64/include/asm/kvm_mmu.h | 5 +
arch/arm64/include/asm/kvm_pgtable.h | 12 +-
arch/arm64/kernel/setup.c | 6 +
arch/arm64/kvm/arm.c | 30 ++--
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 11 +-
arch/arm64/kvm/hyp/nvhe/setup.c | 10 +-
arch/arm64/kvm/hyp/pgtable.c | 29 ++--
arch/arm64/kvm/hypercalls.c | 38 ++++
arch/arm64/kvm/mmio.c | 20 ++-
arch/arm64/kvm/mmu.c | 111 ++++++++++++
arch/arm64/kvm/psci.c | 8 +
arch/arm64/kvm/trace_arm.h | 17 ++
arch/arm64/mm/ioremap.c | 162 ++++++++++++++++++
arch/arm64/mm/mmu.c | 15 ++
drivers/firmware/smccc/kvm_guest.c | 4 +
include/linux/arm-smccc.h | 28 +++
include/linux/io.h | 2 +
include/uapi/linux/kvm.h | 1 +
mm/Kconfig | 5 +
mm/vmalloc.c | 12 +-
29 files changed, 575 insertions(+), 51 deletions(-)
create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst

--
2.30.2


2021-10-04 18:34:16

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 08/16] KVM: arm64: Add tracepoint for failed MMIO guard check

In order to make debugging easier, expose a new trace point
that triggers when a MMIO check fails.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kvm/mmu.c | 4 +++-
arch/arm64/kvm/trace_arm.h | 17 +++++++++++++++++
2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 2470a55ca675..8c38d856533f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1275,8 +1275,10 @@ bool kvm_check_ioguard_page(struct kvm_vcpu *vcpu, gpa_t ipa)
ret = __check_ioguard_page(vcpu, ipa & PAGE_MASK);
spin_unlock(&vcpu->kvm->mmu_lock);

- if (!ret)
+ if (!ret) {
+ trace_kvm_failed_mmio_check(*vcpu_pc(vcpu), ipa);
kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+ }

return ret;
}
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index 33e4e7dd2719..e40cfeb251ad 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -89,6 +89,23 @@ TRACE_EVENT(kvm_access_fault,
TP_printk("IPA: %lx", __entry->ipa)
);

+TRACE_EVENT(kvm_failed_mmio_check,
+ TP_PROTO(unsigned long vcpu_pc, unsigned long ipa),
+ TP_ARGS(vcpu_pc, ipa),
+
+ TP_STRUCT__entry(
+ __field( unsigned long, vcpu_pc )
+ __field( unsigned long, ipa )
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu_pc = vcpu_pc;
+ __entry->ipa = ipa;
+ ),
+
+ TP_printk("PC: %lx IPA: %lx", __entry->vcpu_pc, __entry->ipa)
+);
+
TRACE_EVENT(kvm_irq_line,
TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
TP_ARGS(type, vcpu_idx, irq_num, level),
--
2.30.2

2021-10-04 18:43:28

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 10/16] KVM: arm64: Add some documentation for the MMIO guard feature

Document the hypercalls user for the MMIO guard infrastructure.

Signed-off-by: Marc Zyngier <[email protected]>
---
Documentation/virt/kvm/arm/index.rst | 1 +
Documentation/virt/kvm/arm/mmio-guard.rst | 74 +++++++++++++++++++++++
2 files changed, 75 insertions(+)
create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst

diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index 78a9b670aafe..e77a0ee2e2d4 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -11,3 +11,4 @@ ARM
psci
pvtime
ptp_kvm
+ mmio-guard
diff --git a/Documentation/virt/kvm/arm/mmio-guard.rst b/Documentation/virt/kvm/arm/mmio-guard.rst
new file mode 100644
index 000000000000..8b3c852c5d92
--- /dev/null
+++ b/Documentation/virt/kvm/arm/mmio-guard.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+KVM MMIO guard
+==============
+
+KVM implements device emulation by handling translation faults to any
+IPA range that is not contained in a memory slot. Such a translation
+fault is in most cases passed on to userspace (or in rare cases to the
+host kernel) with the address, size and possibly data of the access
+for emulation.
+
+Should the guest exit with an address that is not one that corresponds
+to an emulatable device, userspace may take measures that are not the
+most graceful as far as the guest is concerned (such as terminating it
+or delivering a fatal exception).
+
+There is also an element of trust: by forwarding the request to
+userspace, the kernel assumes that the guest trusts userspace to do
+the right thing.
+
+The KVM MMIO guard offers a way to mitigate this last point: a guest
+can request that only certain regions of the IPA space are valid as
+MMIO. Only these regions will be handled as an MMIO, and any other
+will result in an exception being delivered to the guest.
+
+This relies on a set of hypercalls defined in the KVM-specific range,
+using the HVC64 calling convention.
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
+
+ ============== ======== ================================
+ Function ID: (uint32) 0xC6000002
+ Arguments: none
+ Return Values: (int64) NOT_SUPPORTED(-1) on error, or
+ (uint64) Protection Granule (PG) size in
+ bytes (r0)
+ ============== ======== ================================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
+
+ ============== ======== ==============================
+ Function ID: (uint32) 0xC6000003
+ Arguments: none
+ Return Values: (int64) NOT_SUPPORTED(-1) on error, or
+ RET_SUCCESS(0) (r0)
+ ============== ======== ==============================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
+
+ ============== ======== ====================================
+ Function ID: (uint32) 0xC6000004
+ Arguments: (uint64) The base of the PG-sized IPA range
+ that is allowed to be accessed as
+ MMIO. Must be aligned to the PG size
+ (r1)
+ (uint64) Index in the MAIR_EL1 register
+ providing the memory attribute that
+ is used by the guest (r2)
+ Return Values: (int64) NOT_SUPPORTED(-1) on error, or
+ RET_SUCCESS(0) (r0)
+ ============== ======== ====================================
+
+* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
+
+ ============== ======== ======================================
+ Function ID: (uint32) 0xC6000005
+ Arguments: (uint64) PG-sized IPA range aligned to the PG
+ size which has been previously mapped.
+ Must be aligned to the PG size and
+ have been previously mapped (r1)
+ Return Values: (int64) NOT_SUPPORTED(-1) on error, or
+ RET_SUCCESS(0) (r0)
+ ============== ======== ======================================
--
2.30.2

2021-10-04 18:43:53

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 11/16] firmware/smccc: Call arch-specific hook on discovering KVM services

arm64 will soon require its own callback to initialise services
that are only availably on this architecture. Introduce a hook
that can be overloaded by the architecture.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm/include/asm/hypervisor.h | 1 +
arch/arm64/include/asm/hypervisor.h | 1 +
drivers/firmware/smccc/kvm_guest.c | 4 ++++
3 files changed, 6 insertions(+)

diff --git a/arch/arm/include/asm/hypervisor.h b/arch/arm/include/asm/hypervisor.h
index bd61502b9715..8133c8c81a35 100644
--- a/arch/arm/include/asm/hypervisor.h
+++ b/arch/arm/include/asm/hypervisor.h
@@ -6,5 +6,6 @@

void kvm_init_hyp_services(void);
bool kvm_arm_hyp_service_available(u32 func_id);
+void kvm_arm_init_hyp_services(void);

#endif
diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index 0ae427f352c8..8e77f411903f 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -6,5 +6,6 @@

void kvm_init_hyp_services(void);
bool kvm_arm_hyp_service_available(u32 func_id);
+void kvm_arm_init_hyp_services(void);

#endif
diff --git a/drivers/firmware/smccc/kvm_guest.c b/drivers/firmware/smccc/kvm_guest.c
index 2d3e866decaa..56169e73252a 100644
--- a/drivers/firmware/smccc/kvm_guest.c
+++ b/drivers/firmware/smccc/kvm_guest.c
@@ -9,6 +9,8 @@

#include <asm/hypervisor.h>

+void __weak kvm_arm_init_hyp_services(void) {}
+
static DECLARE_BITMAP(__kvm_arm_hyp_services, ARM_SMCCC_KVM_NUM_FUNCS) __ro_after_init = { };

void __init kvm_init_hyp_services(void)
@@ -38,6 +40,8 @@ void __init kvm_init_hyp_services(void)

pr_info("hypervisor services detected (0x%08lx 0x%08lx 0x%08lx 0x%08lx)\n",
res.a3, res.a2, res.a1, res.a0);
+
+ kvm_arm_init_hyp_services();
}

bool kvm_arm_hyp_service_available(u32 func_id)
--
2.30.2

2021-10-04 18:46:27

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 13/16] arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard

Implement the previously defined ioremap/iounmap hooks for arm64,
calling into KVM's MMIO guard if available.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/mm/ioremap.c | 112 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)

diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index b7c81dacabf0..5334cbdc9f64 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -9,13 +9,125 @@
* Copyright (C) 2012 ARM Ltd.
*/

+#define pr_fmt(fmt) "ioremap: " fmt
+
#include <linux/export.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
+#include <linux/slab.h>
#include <linux/io.h>
+#include <linux/arm-smccc.h>

#include <asm/fixmap.h>
#include <asm/tlbflush.h>
+#include <asm/hypervisor.h>
+
+struct ioremap_guard_ref {
+ refcount_t count;
+};
+
+static DEFINE_STATIC_KEY_FALSE(ioremap_guard_key);
+static DEFINE_XARRAY(ioremap_guard_array);
+static DEFINE_MUTEX(ioremap_guard_lock);
+
+void ioremap_phys_range_hook(phys_addr_t phys_addr, size_t size, pgprot_t prot)
+{
+ if (!static_branch_unlikely(&ioremap_guard_key))
+ return;
+
+ if (pfn_valid(__phys_to_pfn(phys_addr)))
+ return;
+
+ mutex_lock(&ioremap_guard_lock);
+
+ while (size) {
+ u64 pfn = phys_addr >> PAGE_SHIFT;
+ struct ioremap_guard_ref *ref;
+ struct arm_smccc_res res;
+
+ ref = xa_load(&ioremap_guard_array, pfn);
+ if (ref) {
+ refcount_inc(&ref->count);
+ goto next;
+ }
+
+ /*
+ * It is acceptable for the allocation to fail, specially
+ * if trying to ioremap something very early on, like with
+ * earlycon, which happens long before kmem_cache_init.
+ * This page will be permanently accessible, similar to a
+ * saturated refcount.
+ */
+ ref = kzalloc(sizeof(*ref), GFP_KERNEL);
+ if (ref) {
+ refcount_set(&ref->count, 1);
+ if (xa_err(xa_store(&ioremap_guard_array, pfn, ref,
+ GFP_KERNEL))) {
+ kfree(ref);
+ ref = NULL;
+ }
+ }
+
+ arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID,
+ phys_addr, prot, &res);
+ if (res.a0 != SMCCC_RET_SUCCESS) {
+ pr_warn_ratelimited("Failed to register %llx\n",
+ phys_addr);
+ xa_erase(&ioremap_guard_array, pfn);
+ kfree(ref);
+ goto out;
+ }
+
+ next:
+ size -= PAGE_SIZE;
+ phys_addr += PAGE_SIZE;
+ }
+out:
+ mutex_unlock(&ioremap_guard_lock);
+}
+
+void iounmap_phys_range_hook(phys_addr_t phys_addr, size_t size)
+{
+ if (!static_branch_unlikely(&ioremap_guard_key))
+ return;
+
+ VM_BUG_ON(phys_addr & ~PAGE_MASK || size & ~PAGE_MASK);
+
+ mutex_lock(&ioremap_guard_lock);
+
+ while (size) {
+ u64 pfn = phys_addr >> PAGE_SHIFT;
+ struct ioremap_guard_ref *ref;
+ struct arm_smccc_res res;
+
+ ref = xa_load(&ioremap_guard_array, pfn);
+ if (!ref) {
+ pr_warn_ratelimited("%llx not tracked, left mapped\n",
+ phys_addr);
+ goto next;
+ }
+
+ if (!refcount_dec_and_test(&ref->count))
+ goto next;
+
+ xa_erase(&ioremap_guard_array, pfn);
+ kfree(ref);
+
+ arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID,
+ phys_addr, &res);
+ if (res.a0 != SMCCC_RET_SUCCESS) {
+ pr_warn_ratelimited("Failed to unregister %llx\n",
+ phys_addr);
+ goto out;
+ }
+
+ next:
+ size -= PAGE_SIZE;
+ phys_addr += PAGE_SIZE;
+ }
+out:
+ mutex_unlock(&ioremap_guard_lock);
+}

static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
pgprot_t prot, void *caller)
--
2.30.2

2021-10-04 23:09:43

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 02/16] KVM: arm64: Check for PTE valitity when checking for executable/cacheable

Don't blindly assume that the PTE is valid when checking whether
it describes an executable or cacheable mapping.

This makes sure that we don't issue CMOs for invalid mappings.

Suggested-by: Will Deacon <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f8ceebe4982e..6bbfd952f0c5 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -638,12 +638,12 @@ static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
{
u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
- return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
+ return kvm_pte_valid(pte) && memattr == KVM_S2_MEMATTR(pgt, NORMAL);
}

static bool stage2_pte_executable(kvm_pte_t pte)
{
- return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
+ return kvm_pte_valid(pte) && !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
}

static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
@@ -688,8 +688,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
/* Perform CMOs before installation of the guest stage-2 PTE */
if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
- granule);
-
+ granule);
if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);

@@ -1091,7 +1090,7 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
kvm_pte_t pte = *ptep;
kvm_pte_t *pte_follow;

- if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
+ if (!stage2_pte_cacheable(pgt, pte))
return 0;

pte_follow = kvm_pte_follow(pte, mm_ops);
--
2.30.2

2021-10-04 23:09:47

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 01/16] KVM: arm64: Generalise VM features into a set of flags

We currently deal with a set of booleans for VM features,
while they could be better represented as set of flags
contained in an unsigned long, similarily to what we are
doing on the CPU side.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 12 +++++++-----
arch/arm64/kvm/arm.c | 5 +++--
arch/arm64/kvm/mmio.c | 3 ++-
3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f8be56d5342b..f63ca8fb4e58 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -122,7 +122,10 @@ struct kvm_arch {
* should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
* supported.
*/
- bool return_nisv_io_abort_to_user;
+#define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER 0
+ /* Memory Tagging Extension enabled for the guest */
+#define KVM_ARCH_FLAG_MTE_ENABLED 1
+ unsigned long flags;

/*
* VM-wide PMU filter, implemented as a bitmap and big enough for
@@ -133,9 +136,6 @@ struct kvm_arch {

u8 pfr0_csv2;
u8 pfr0_csv3;
-
- /* Memory Tagging Extension enabled for the guest */
- bool mte_enabled;
};

struct kvm_vcpu_fault_info {
@@ -786,7 +786,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
#define kvm_arm_vcpu_sve_finalized(vcpu) \
((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)

-#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
+#define kvm_has_mte(kvm) \
+ (system_supports_mte() && \
+ test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags))
#define kvm_vcpu_has_pmu(vcpu) \
(test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..ed9c89ec0b4f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -89,7 +89,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
switch (cap->cap) {
case KVM_CAP_ARM_NISV_TO_USER:
r = 0;
- kvm->arch.return_nisv_io_abort_to_user = true;
+ set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+ &kvm->arch.flags);
break;
case KVM_CAP_ARM_MTE:
mutex_lock(&kvm->lock);
@@ -97,7 +98,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = -EINVAL;
} else {
r = 0;
- kvm->arch.mte_enabled = true;
+ set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
}
mutex_unlock(&kvm->lock);
break;
diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 3e2d8ba11a02..3dd38a151d2a 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -135,7 +135,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
* volunteered to do so, and bail out otherwise.
*/
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
- if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
+ if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+ &vcpu->kvm->arch.flags)) {
run->exit_reason = KVM_EXIT_ARM_NISV;
run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu);
run->arm_nisv.fault_ipa = fault_ipa;
--
2.30.2

2021-10-04 23:10:46

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 05/16] KVM: arm64: Plumb MMIO checking into the fault handling

Plumb the MMIO checking code into the MMIO fault handling code.
Nothing allows a region to be registered yet, so there should be
no funtional change either.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kvm/mmio.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
index 3dd38a151d2a..dfa375823264 100644
--- a/arch/arm64/kvm/mmio.c
+++ b/arch/arm64/kvm/mmio.c
@@ -6,6 +6,7 @@

#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
#include <trace/events/kvm.h>

#include "trace.h"
@@ -135,6 +136,13 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
* volunteered to do so, and bail out otherwise.
*/
if (!kvm_vcpu_dabt_isvalid(vcpu)) {
+ /* With MMIO guard enabled, the guest should know better */
+ if (test_bit(KVM_ARCH_FLAG_MMIO_GUARD,
+ &vcpu->kvm->arch.flags)) {
+ kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
+ return 1;
+ }
+
if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
&vcpu->kvm->arch.flags)) {
run->exit_reason = KVM_EXIT_ARM_NISV;
@@ -156,6 +164,15 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
len = kvm_vcpu_dabt_get_as(vcpu);
rt = kvm_vcpu_dabt_get_rd(vcpu);

+ /* Check failed? Return to the guest for debriefing... */
+ if (!kvm_check_ioguard_page(vcpu, fault_ipa))
+ return 1;
+
+ /* If we cross a page boundary, check that too... */
+ if (((fault_ipa + len - 1) & PAGE_MASK) != (fault_ipa & PAGE_MASK) &&
+ !kvm_check_ioguard_page(vcpu, fault_ipa + len - 1))
+ return 1;
+
if (is_write) {
data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
len);
--
2.30.2

2021-10-04 23:23:46

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 15/16] arm64: Add a helper to retrieve the PTE of a fixmap

In order to transfer the early mapping state into KVM's MMIO
guard infrastucture, provide a small helper that will retrieve
the associated PTE.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/include/asm/fixmap.h | 2 ++
arch/arm64/mm/mmu.c | 15 +++++++++++++++
2 files changed, 17 insertions(+)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 4335800201c9..1aae625b944f 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -105,6 +105,8 @@ void __init early_fixmap_init(void);

extern void __set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t prot);

+extern pte_t *__get_fixmap_pte(enum fixed_addresses idx);
+
#include <asm-generic/fixmap.h>

#endif /* !__ASSEMBLY__ */
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index cfd9deb347c3..e7029db5b540 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1286,6 +1286,21 @@ void __set_fixmap(enum fixed_addresses idx,
}
}

+pte_t *__get_fixmap_pte(enum fixed_addresses idx)
+{
+ unsigned long addr = __fix_to_virt(idx);
+ pte_t *ptep;
+
+ BUG_ON(idx <= FIX_HOLE || idx >= __end_of_fixed_addresses);
+
+ ptep = fixmap_pte(addr);
+
+ if (!pte_valid(*ptep))
+ return NULL;
+
+ return ptep;
+}
+
void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
{
const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
--
2.30.2

2021-10-04 23:23:47

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 16/16] arm64: Register earlycon fixmap with the MMIO guard

On initialising the MMIO guard infrastructure, register the
earlycon mapping if present.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/mm/ioremap.c | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index 62b715ea31e5..0e10b788500c 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -39,6 +39,17 @@ static int __init ioremap_guard_setup(char *str)
}
early_param("ioremap_guard", ioremap_guard_setup);

+static void fixup_fixmap(void)
+{
+ pte_t *ptep = __get_fixmap_pte(FIX_EARLYCON_MEM_BASE);
+
+ if (!ptep)
+ return;
+
+ ioremap_phys_range_hook(__pte_to_phys(*ptep), PAGE_SIZE,
+ __pgprot(pte_val(*ptep) & PTE_ATTRINDX_MASK));
+}
+
void kvm_init_ioremap_services(void)
{
struct arm_smccc_res res;
@@ -62,6 +73,7 @@ void kvm_init_ioremap_services(void)
&res);
if (res.a0 == SMCCC_RET_SUCCESS) {
static_branch_enable(&ioremap_guard_key);
+ fixup_fixmap();
pr_info("Using KVM MMIO guard for ioremap\n");
} else {
pr_warn("KVM MMIO guard registration failed (%ld)\n", res.a0);
--
2.30.2

2021-10-04 23:24:56

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 14/16] arm64: Enroll into KVM's MMIO guard if required

Should a guest desire to enroll into the MMIO guard, allow it to
do so with a command-line option.

Signed-off-by: Marc Zyngier <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 3 ++
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/hypervisor.h | 1 +
arch/arm64/kernel/setup.c | 6 +++
arch/arm64/mm/ioremap.c | 38 +++++++++++++++++++
5 files changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 91ba391f9b32..62bb25b412a7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2076,6 +2076,9 @@
1 - Bypass the IOMMU for DMA.
unset - Use value of CONFIG_IOMMU_DEFAULT_PASSTHROUGH.

+ ioremap_guard [ARM64] enable the KVM MMIO guard functionality
+ if available.
+
io7= [HW] IO7 for Marvel-based Alpha systems
See comment before marvel_specify_io7 in
arch/alpha/kernel/core_marvel.c.
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5c7ae4c3954b..4982d21d3c47 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -25,6 +25,7 @@ config ARM64
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE
+ select ARCH_HAS_IOREMAP_PHYS_HOOKS
select ARCH_HAS_KCOV
select ARCH_HAS_KEEPINITRD
select ARCH_HAS_MEMBARRIER_SYNC_CORE
diff --git a/arch/arm64/include/asm/hypervisor.h b/arch/arm64/include/asm/hypervisor.h
index 8e77f411903f..b130c7b82eaa 100644
--- a/arch/arm64/include/asm/hypervisor.h
+++ b/arch/arm64/include/asm/hypervisor.h
@@ -7,5 +7,6 @@
void kvm_init_hyp_services(void);
bool kvm_arm_hyp_service_available(u32 func_id);
void kvm_arm_init_hyp_services(void);
+void kvm_init_ioremap_services(void);

#endif
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index be5f85b0a24d..c325647f675f 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -49,6 +49,7 @@
#include <asm/tlbflush.h>
#include <asm/traps.h>
#include <asm/efi.h>
+#include <asm/hypervisor.h>
#include <asm/xen/hypervisor.h>
#include <asm/mmu_context.h>

@@ -445,3 +446,8 @@ static int __init register_arm64_panic_block(void)
return 0;
}
device_initcall(register_arm64_panic_block);
+
+void kvm_arm_init_hyp_services(void)
+{
+ kvm_init_ioremap_services();
+}
diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
index 5334cbdc9f64..62b715ea31e5 100644
--- a/arch/arm64/mm/ioremap.c
+++ b/arch/arm64/mm/ioremap.c
@@ -30,6 +30,44 @@ static DEFINE_STATIC_KEY_FALSE(ioremap_guard_key);
static DEFINE_XARRAY(ioremap_guard_array);
static DEFINE_MUTEX(ioremap_guard_lock);

+static bool ioremap_guard;
+static int __init ioremap_guard_setup(char *str)
+{
+ ioremap_guard = true;
+
+ return 0;
+}
+early_param("ioremap_guard", ioremap_guard_setup);
+
+void kvm_init_ioremap_services(void)
+{
+ struct arm_smccc_res res;
+
+ if (!ioremap_guard)
+ return;
+
+ /* We need all the functions to be implemented */
+ if (!kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO) ||
+ !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL) ||
+ !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP) ||
+ !kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP))
+ return;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID,
+ &res);
+ if (res.a0 != PAGE_SIZE)
+ return;
+
+ arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID,
+ &res);
+ if (res.a0 == SMCCC_RET_SUCCESS) {
+ static_branch_enable(&ioremap_guard_key);
+ pr_info("Using KVM MMIO guard for ioremap\n");
+ } else {
+ pr_warn("KVM MMIO guard registration failed (%ld)\n", res.a0);
+ }
+}
+
void ioremap_phys_range_hook(phys_addr_t phys_addr, size_t size, pgprot_t prot)
{
if (!static_branch_unlikely(&ioremap_guard_key))
--
2.30.2

2021-10-05 00:19:02

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

kvm_pgtable_stage2_set_owner() could be generalised into a way
to store up to 63 bits in the page tables, as long as we don't
set bit 0.

Let's just do that.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 12 ++++++-----
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 11 ++++------
arch/arm64/kvm/hyp/nvhe/setup.c | 10 +++++++++-
arch/arm64/kvm/hyp/pgtable.c | 20 ++++++-------------
5 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 027783829584..d4d3ae0b5edb 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -329,14 +329,16 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
void *mc);

/**
- * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
- * track ownership.
+ * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
+ * to track ownership (and more).
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Base intermediate physical address to annotate.
* @size: Size of the annotated range.
* @mc: Cache of pre-allocated and zeroed memory from which to allocate
* page-table pages.
- * @owner_id: Unique identifier for the owner of the page.
+ * @annotation: A 63 bit value that will be stored in the page tables.
+ * @annotation[0] must be 0, and @annotation[63:1] is stored
+ * in the page tables.
*
* By default, all page-tables are owned by identifier 0. This function can be
* used to mark portions of the IPA space as owned by other entities. When a
@@ -345,8 +347,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
*
* Return: 0 on success, negative error code on failure.
*/
-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
- void *mc, u8 owner_id);
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ void *mc, kvm_pte_t annotation);

/**
* kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index b58c910babaf..9d2ca173ea9a 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -53,7 +53,7 @@ int __pkvm_host_share_hyp(u64 pfn);

bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
-int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
+int host_stage2_annotate_locked(phys_addr_t addr, u64 size, kvm_pte_t owner_id);
int kvm_host_prepare_stage2(void *pgt_pool_base);
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index bacd493a4eac..8cd0c3bdb911 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -286,17 +286,14 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
enum kvm_pgtable_prot prot)
{
- hyp_assert_lock_held(&host_kvm.lock);
-
return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
}

-int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
+int host_stage2_annotate_locked(phys_addr_t addr, u64 size,
+ kvm_pte_t annotation)
{
- hyp_assert_lock_held(&host_kvm.lock);
-
- return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
- addr, size, &host_s2_pool, owner_id);
+ return host_stage2_try(kvm_pgtable_stage2_annotate, &host_kvm.pgt,
+ addr, size, &host_s2_pool, annotation);
}

static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 57c27846320f..d489fb9b8c29 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -159,6 +159,13 @@ static void hpool_put_page(void *addr)
hyp_put_page(&hpool, addr);
}

+#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
+
+static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
+{
+ return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
+}
+
static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
@@ -186,7 +193,8 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
switch (state) {
case PKVM_PAGE_OWNED:
- return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
+ return host_stage2_annotate_locked(phys, PAGE_SIZE,
+ kvm_init_invalid_leaf_owner(pkvm_hyp_id));
case PKVM_PAGE_SHARED_OWNED:
prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
break;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 6bbfd952f0c5..4201e512da48 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -46,9 +46,6 @@
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)

-#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
-#define KVM_MAX_OWNER_ID 1
-
struct kvm_pgtable_walk_data {
struct kvm_pgtable *pgt;
struct kvm_pgtable_walker *walker;
@@ -167,11 +164,6 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
return pte;
}

-static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
-{
- return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
-}
-
static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag)
@@ -503,7 +495,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
struct stage2_map_data {
u64 phys;
kvm_pte_t attr;
- u8 owner_id;
+ u64 annotation;

kvm_pte_t *anchor;
kvm_pte_t *childp;
@@ -670,7 +662,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
if (kvm_phys_is_valid(phys))
new = kvm_init_valid_leaf_pte(phys, data->attr, level);
else
- new = kvm_init_invalid_leaf_owner(data->owner_id);
+ new = data->annotation;

if (stage2_pte_is_counted(old)) {
/*
@@ -859,8 +851,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return ret;
}

-int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
- void *mc, u8 owner_id)
+int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ void *mc, kvm_pte_t annotation)
{
int ret;
struct stage2_map_data map_data = {
@@ -868,8 +860,8 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
.mmu = pgt->mmu,
.memcache = mc,
.mm_ops = pgt->mm_ops,
- .owner_id = owner_id,
.force_pte = true,
+ .annotation = annotation,
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
@@ -879,7 +871,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
.arg = &map_data,
};

- if (owner_id > KVM_MAX_OWNER_ID)
+ if (annotation & PTE_VALID)
return -EINVAL;

ret = kvm_pgtable_walk(pgt, addr, size, &walker);
--
2.30.2

2021-10-05 00:19:24

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 07/16] KVM: arm64: Wire MMIO guard hypercalls

Plumb in the hypercall interface to allow a guest to discover,
enroll, map and unmap MMIO regions.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kvm/hypercalls.c | 28 ++++++++++++++++++++++++++++
include/linux/arm-smccc.h | 28 ++++++++++++++++++++++++++++
2 files changed, 56 insertions(+)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..c39aab55ecae 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -5,6 +5,7 @@
#include <linux/kvm_host.h>

#include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>

#include <kvm/arm_hypercalls.h>
#include <kvm/arm_psci.h>
@@ -129,10 +130,37 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+ /* Only advertise MMIO guard to 64bit guests */
+ if (!vcpu_mode_is_32bit(vcpu)) {
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
+ val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP);
+ }
break;
case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
kvm_ptp_get_time(vcpu, val);
break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID:
+ if (!vcpu_mode_is_32bit(vcpu))
+ val[0] = PAGE_SIZE;
+ break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
+ if (!vcpu_mode_is_32bit(vcpu)) {
+ set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
+ val[0] = SMCCC_RET_SUCCESS;
+ }
+ break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
+ if (!vcpu_mode_is_32bit(vcpu) &&
+ kvm_install_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
+ val[0] = SMCCC_RET_SUCCESS;
+ break;
+ case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID:
+ if (!vcpu_mode_is_32bit(vcpu) &&
+ kvm_remove_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
+ val[0] = SMCCC_RET_SUCCESS;
+ break;
case ARM_SMCCC_TRNG_VERSION:
case ARM_SMCCC_TRNG_FEATURES:
case ARM_SMCCC_TRNG_GET_UUID:
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 7d1cabe15262..4aab2078d8d3 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -107,6 +107,10 @@
/* KVM "vendor specific" services */
#define ARM_SMCCC_KVM_FUNC_FEATURES 0
#define ARM_SMCCC_KVM_FUNC_PTP 1
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO 2
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL 3
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP 4
+#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP 5
#define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
#define ARM_SMCCC_KVM_NUM_FUNCS 128

@@ -133,6 +137,30 @@
#define KVM_PTP_VIRT_COUNTER 0
#define KVM_PTP_PHYS_COUNTER 1

+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP)
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID \
+ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
+ ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP)
+
/* Paravirtualised time calls (defined by ARM DEN0057A) */
#define ARM_SMCCC_HV_PV_TIME_FEATURES \
ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
--
2.30.2

2021-10-05 00:20:07

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 06/16] KVM: arm64: Force a full unmap on vpcu reinit

As we now keep information in the S2PT, we must be careful not
to keep it across a VM reboot, which could otherwise lead to
interesting problems.

Make sure that the S2 is completely discarded on reset of
a vcpu, and remove the flag that enforces the MMIO check.

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kvm/psci.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 74c47d420253..6c9cb041f764 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -12,6 +12,7 @@

#include <asm/cputype.h>
#include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>

#include <kvm/arm_psci.h>
#include <kvm/arm_hypercalls.h>
@@ -180,6 +181,13 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
tmp->arch.power_off = true;
kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);

+ /*
+ * If the MMIO guard was enabled, we pay the price of a full
+ * unmap to get back to a sane state (and clear the flag).
+ */
+ if (test_and_clear_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
+ stage2_unmap_vm(vcpu->kvm);
+
memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
vcpu->run->system_event.type = type;
vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
--
2.30.2

2021-10-05 00:20:13

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 12/16] mm/vmalloc: Add arch-specific callbacks to track io{remap,unmap} physical pages

Add a pair of hooks (ioremap_phys_range_hook/iounmap_phys_range_hook)
that can be implemented by an architecture. Contrary to the existing
arch_sync_kernel_mappings(), this one tracks things at the physical
address level.

This is specially useful in these virtualised environments where
the guest has to tell the host whether (and how) it intends to use
a MMIO device.

Signed-off-by: Marc Zyngier <[email protected]>
---
include/linux/io.h | 2 ++
mm/Kconfig | 5 +++++
mm/vmalloc.c | 12 +++++++++++-
3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/linux/io.h b/include/linux/io.h
index 9595151d800d..84eac81e8834 100644
--- a/include/linux/io.h
+++ b/include/linux/io.h
@@ -21,6 +21,8 @@ void __ioread32_copy(void *to, const void __iomem *from, size_t count);
void __iowrite64_copy(void __iomem *to, const void *from, size_t count);

#ifdef CONFIG_MMU
+void ioremap_phys_range_hook(phys_addr_t phys_addr, size_t size, pgprot_t prot);
+void iounmap_phys_range_hook(phys_addr_t phys_addr, size_t size);
int ioremap_page_range(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot);
#else
diff --git a/mm/Kconfig b/mm/Kconfig
index d16ba9249bc5..a154803836b7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -894,6 +894,11 @@ config IO_MAPPING
config SECRETMEM
def_bool ARCH_HAS_SET_DIRECT_MAP && !EMBEDDED

+# Some architectures want callbacks for all IO mappings in order to
+# track the physical addresses that get used as devices.
+config ARCH_HAS_IOREMAP_PHYS_HOOKS
+ bool
+
source "mm/damon/Kconfig"

endmenu
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d77830ff604c..babcf3a75502 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -38,6 +38,7 @@
#include <linux/pgtable.h>
#include <linux/uaccess.h>
#include <linux/hugetlb.h>
+#include <linux/io.h>
#include <asm/tlbflush.h>
#include <asm/shmparam.h>

@@ -316,9 +317,14 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
{
int err;

- err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
+ prot = pgprot_nx(prot);
+ err = vmap_range_noflush(addr, end, phys_addr, prot,
ioremap_max_page_shift);
flush_cache_vmap(addr, end);
+
+ if (IS_ENABLED(CONFIG_ARCH_HAS_IOREMAP_PHYS_HOOKS) && !err)
+ ioremap_phys_range_hook(phys_addr, end - addr, prot);
+
return err;
}

@@ -2608,6 +2614,10 @@ static void __vunmap(const void *addr, int deallocate_pages)

kasan_poison_vmalloc(area->addr, get_vm_area_size(area));

+ if (IS_ENABLED(CONFIG_ARCH_HAS_IOREMAP_PHYS_HOOKS) &&
+ area->flags & VM_IOREMAP)
+ iounmap_phys_range_hook(area->phys_addr, get_vm_area_size(area));
+
vm_remove_mappings(area, deallocate_pages);

if (deallocate_pages) {
--
2.30.2

2021-10-05 00:21:06

by Marc Zyngier

[permalink] [raw]
Subject: [PATCH v2 09/16] KVM: arm64: Advertise a capability for MMIO guard

In order for userspace to find out whether the MMIO guard is
exposed to a guest, expose a capability that says so.

We take this opportunity to make it incompatible with the NISV
option, as that would be rather counter-productive!

Signed-off-by: Marc Zyngier <[email protected]>
---
arch/arm64/kvm/arm.c | 29 ++++++++++++++++++-----------
arch/arm64/kvm/hypercalls.c | 14 ++++++++++++--
include/uapi/linux/kvm.h | 1 +
3 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ed9c89ec0b4f..1c9a7abe2728 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -81,32 +81,33 @@ int kvm_arch_check_processor_compat(void *opaque)
int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
struct kvm_enable_cap *cap)
{
- int r;
+ int r = -EINVAL;

if (cap->flags)
return -EINVAL;

+ mutex_lock(&kvm->lock);
+
switch (cap->cap) {
case KVM_CAP_ARM_NISV_TO_USER:
- r = 0;
- set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
- &kvm->arch.flags);
+ /* This is incompatible with MMIO guard */
+ if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &kvm->arch.flags)) {
+ r = 0;
+ set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+ &kvm->arch.flags);
+ }
break;
case KVM_CAP_ARM_MTE:
- mutex_lock(&kvm->lock);
- if (!system_supports_mte() || kvm->created_vcpus) {
- r = -EINVAL;
- } else {
+ if (system_supports_mte() && !kvm->created_vcpus) {
r = 0;
set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
}
- mutex_unlock(&kvm->lock);
break;
default:
- r = -EINVAL;
break;
}

+ mutex_unlock(&kvm->lock);
return r;
}

@@ -211,13 +212,19 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_IMMEDIATE_EXIT:
case KVM_CAP_VCPU_EVENTS:
case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
- case KVM_CAP_ARM_NISV_TO_USER:
case KVM_CAP_ARM_INJECT_EXT_DABT:
case KVM_CAP_SET_GUEST_DEBUG:
case KVM_CAP_VCPU_ATTRIBUTES:
case KVM_CAP_PTP_KVM:
r = 1;
break;
+ case KVM_CAP_ARM_NISV_TO_USER:
+ r = !test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &kvm->arch.flags);
+ break;
+ case KVM_CAP_ARM_MMIO_GUARD:
+ r = !test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+ &kvm->arch.flags);
+ break;
case KVM_CAP_SET_GUEST_DEBUG2:
return KVM_GUESTDBG_VALID_MASK;
case KVM_CAP_ARM_SET_DEVICE_ADDR:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index c39aab55ecae..e4fade6a96f6 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -59,6 +59,14 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
val[3] = lower_32_bits(cycles);
}

+static bool mmio_guard_allowed(struct kvm_vcpu *vcpu)
+{
+ return (!test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
+ &vcpu->kvm->arch.flags) &&
+ !vcpu_mode_is_32bit(vcpu));
+
+}
+
int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
{
u32 func_id = smccc_get_function(vcpu);
@@ -131,7 +139,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
/* Only advertise MMIO guard to 64bit guests */
- if (!vcpu_mode_is_32bit(vcpu)) {
+ if (mmio_guard_allowed(vcpu)) {
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
@@ -146,10 +154,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
val[0] = PAGE_SIZE;
break;
case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
- if (!vcpu_mode_is_32bit(vcpu)) {
+ mutex_lock(&vcpu->kvm->lock);
+ if (mmio_guard_allowed(vcpu)) {
set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
val[0] = SMCCC_RET_SUCCESS;
}
+ mutex_unlock(&vcpu->kvm->lock);
break;
case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
if (!vcpu_mode_is_32bit(vcpu) &&
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a067410ebea5..ef171186e7be 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_BINARY_STATS_FD 203
#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
#define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_ARM_MMIO_GUARD 206

#ifdef KVM_CAP_IRQ_ROUTING

--
2.30.2

2021-10-06 10:26:42

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 01/16] KVM: arm64: Generalise VM features into a set of flags

On Mon, Oct 04, 2021 at 06:48:34PM +0100, Marc Zyngier wrote:
> We currently deal with a set of booleans for VM features,
> while they could be better represented as set of flags
> contained in an unsigned long, similarily to what we are
> doing on the CPU side.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 12 +++++++-----
> arch/arm64/kvm/arm.c | 5 +++--
> arch/arm64/kvm/mmio.c | 3 ++-
> 3 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index f8be56d5342b..f63ca8fb4e58 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -122,7 +122,10 @@ struct kvm_arch {
> * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is
> * supported.
> */
> - bool return_nisv_io_abort_to_user;
> +#define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER 0
> + /* Memory Tagging Extension enabled for the guest */
> +#define KVM_ARCH_FLAG_MTE_ENABLED 1
> + unsigned long flags;
>
> /*
> * VM-wide PMU filter, implemented as a bitmap and big enough for
> @@ -133,9 +136,6 @@ struct kvm_arch {
>
> u8 pfr0_csv2;
> u8 pfr0_csv3;
> -
> - /* Memory Tagging Extension enabled for the guest */
> - bool mte_enabled;
> };
>
> struct kvm_vcpu_fault_info {
> @@ -786,7 +786,9 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> #define kvm_arm_vcpu_sve_finalized(vcpu) \
> ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
>
> -#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
> +#define kvm_has_mte(kvm) \
> + (system_supports_mte() && \
> + test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags))
> #define kvm_vcpu_has_pmu(vcpu) \
> (test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index fe102cd2e518..ed9c89ec0b4f 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -89,7 +89,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> switch (cap->cap) {
> case KVM_CAP_ARM_NISV_TO_USER:
> r = 0;
> - kvm->arch.return_nisv_io_abort_to_user = true;
> + set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
> + &kvm->arch.flags);
> break;
> case KVM_CAP_ARM_MTE:
> mutex_lock(&kvm->lock);
> @@ -97,7 +98,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> r = -EINVAL;
> } else {
> r = 0;
> - kvm->arch.mte_enabled = true;
> + set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
> }
> mutex_unlock(&kvm->lock);
> break;
> diff --git a/arch/arm64/kvm/mmio.c b/arch/arm64/kvm/mmio.c
> index 3e2d8ba11a02..3dd38a151d2a 100644
> --- a/arch/arm64/kvm/mmio.c
> +++ b/arch/arm64/kvm/mmio.c
> @@ -135,7 +135,8 @@ int io_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> * volunteered to do so, and bail out otherwise.
> */
> if (!kvm_vcpu_dabt_isvalid(vcpu)) {
> - if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
> + if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
> + &vcpu->kvm->arch.flags)) {
> run->exit_reason = KVM_EXIT_ARM_NISV;
> run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu);
> run->arm_nisv.fault_ipa = fault_ipa;
> --
> 2.30.2
>

Maybe a kvm_arm_has_feature(struct kvm *kvm) helper would be nice to avoid
all the &vcpu->kvm->arch.flags types of references getting scattered
around.

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2021-10-06 10:43:55

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 02/16] KVM: arm64: Check for PTE valitity when checking for executable/cacheable


Hi Marc,

sed s/valitity/validity/ <<<"$SUBJECT"

On Mon, Oct 04, 2021 at 06:48:35PM +0100, Marc Zyngier wrote:
> Don't blindly assume that the PTE is valid when checking whether
> it describes an executable or cacheable mapping.
>
> This makes sure that we don't issue CMOs for invalid mappings.
>
> Suggested-by: Will Deacon <[email protected]>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/kvm/hyp/pgtable.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index f8ceebe4982e..6bbfd952f0c5 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -638,12 +638,12 @@ static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
> {
> u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
> - return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
> + return kvm_pte_valid(pte) && memattr == KVM_S2_MEMATTR(pgt, NORMAL);
> }
>
> static bool stage2_pte_executable(kvm_pte_t pte)
> {
> - return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
> + return kvm_pte_valid(pte) && !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
> }
>
> static bool stage2_leaf_mapping_allowed(u64 addr, u64 end, u32 level,
> @@ -688,8 +688,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> /* Perform CMOs before installation of the guest stage-2 PTE */
> if (mm_ops->dcache_clean_inval_poc && stage2_pte_cacheable(pgt, new))
> mm_ops->dcache_clean_inval_poc(kvm_pte_follow(new, mm_ops),
> - granule);
> -
> + granule);
> if (mm_ops->icache_inval_pou && stage2_pte_executable(new))
> mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
>
> @@ -1091,7 +1090,7 @@ static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> kvm_pte_t pte = *ptep;
> kvm_pte_t *pte_follow;
>
> - if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pgt, pte))
> + if (!stage2_pte_cacheable(pgt, pte))
> return 0;
>
> pte_follow = kvm_pte_follow(pte, mm_ops);
> --
> 2.30.2
>

Reviewed-by: Andrew Jones <[email protected]>

2021-10-06 11:06:19

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

On Mon, Oct 04, 2021 at 06:48:36PM +0100, Marc Zyngier wrote:
> kvm_pgtable_stage2_set_owner() could be generalised into a way
> to store up to 63 bits in the page tables, as long as we don't
> set bit 0.
>
> Let's just do that.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/include/asm/kvm_pgtable.h | 12 ++++++-----
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +-
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 11 ++++------
> arch/arm64/kvm/hyp/nvhe/setup.c | 10 +++++++++-
> arch/arm64/kvm/hyp/pgtable.c | 20 ++++++-------------
> 5 files changed, 27 insertions(+), 28 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 027783829584..d4d3ae0b5edb 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -329,14 +329,16 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> void *mc);
>
> /**
> - * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
> - * track ownership.
> + * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
> + * to track ownership (and more).
> * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
> * @addr: Base intermediate physical address to annotate.
> * @size: Size of the annotated range.
> * @mc: Cache of pre-allocated and zeroed memory from which to allocate
> * page-table pages.
> - * @owner_id: Unique identifier for the owner of the page.
> + * @annotation: A 63 bit value that will be stored in the page tables.
> + * @annotation[0] must be 0, and @annotation[63:1] is stored
> + * in the page tables.
> *
> * By default, all page-tables are owned by identifier 0. This function can be
> * used to mark portions of the IPA space as owned by other entities. When a
> @@ -345,8 +347,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> *
> * Return: 0 on success, negative error code on failure.
> */
> -int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> - void *mc, u8 owner_id);
> +int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
> + void *mc, kvm_pte_t annotation);
>
> /**
> * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index b58c910babaf..9d2ca173ea9a 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -53,7 +53,7 @@ int __pkvm_host_share_hyp(u64 pfn);
>
> bool addr_is_memory(phys_addr_t phys);
> int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> -int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
> +int host_stage2_annotate_locked(phys_addr_t addr, u64 size, kvm_pte_t owner_id);
> int kvm_host_prepare_stage2(void *pgt_pool_base);
> void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index bacd493a4eac..8cd0c3bdb911 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -286,17 +286,14 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
> enum kvm_pgtable_prot prot)
> {
> - hyp_assert_lock_held(&host_kvm.lock);
> -
> return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
> }
>
> -int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
> +int host_stage2_annotate_locked(phys_addr_t addr, u64 size,
> + kvm_pte_t annotation)
> {
> - hyp_assert_lock_held(&host_kvm.lock);

Hi Marc,

Why are the lock asserts getting dropped?

> -
> - return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
> - addr, size, &host_s2_pool, owner_id);
> + return host_stage2_try(kvm_pgtable_stage2_annotate, &host_kvm.pgt,
> + addr, size, &host_s2_pool, annotation);
> }
>
> static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 57c27846320f..d489fb9b8c29 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -159,6 +159,13 @@ static void hpool_put_page(void *addr)
> hyp_put_page(&hpool, addr);
> }
>
> +#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
> +
> +static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
> +{
> + return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
> +}
> +
> static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> kvm_pte_t *ptep,
> enum kvm_pgtable_walk_flags flag,
> @@ -186,7 +193,8 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> switch (state) {
> case PKVM_PAGE_OWNED:
> - return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
> + return host_stage2_annotate_locked(phys, PAGE_SIZE,
> + kvm_init_invalid_leaf_owner(pkvm_hyp_id));
> case PKVM_PAGE_SHARED_OWNED:
> prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
> break;
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 6bbfd952f0c5..4201e512da48 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -46,9 +46,6 @@
> KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
> KVM_PTE_LEAF_ATTR_HI_S2_XN)
>
> -#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
> -#define KVM_MAX_OWNER_ID 1
> -
> struct kvm_pgtable_walk_data {
> struct kvm_pgtable *pgt;
> struct kvm_pgtable_walker *walker;
> @@ -167,11 +164,6 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
> return pte;
> }
>
> -static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
> -{
> - return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
> -}
> -
> static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> u32 level, kvm_pte_t *ptep,
> enum kvm_pgtable_walk_flags flag)
> @@ -503,7 +495,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
> struct stage2_map_data {
> u64 phys;
> kvm_pte_t attr;
> - u8 owner_id;
> + u64 annotation;
>
> kvm_pte_t *anchor;
> kvm_pte_t *childp;
> @@ -670,7 +662,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> if (kvm_phys_is_valid(phys))
> new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> else
> - new = kvm_init_invalid_leaf_owner(data->owner_id);
> + new = data->annotation;
>
> if (stage2_pte_is_counted(old)) {
> /*
> @@ -859,8 +851,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> return ret;
> }
>
> -int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> - void *mc, u8 owner_id)
> +int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
> + void *mc, kvm_pte_t annotation)
> {
> int ret;
> struct stage2_map_data map_data = {
> @@ -868,8 +860,8 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> .mmu = pgt->mmu,
> .memcache = mc,
> .mm_ops = pgt->mm_ops,
> - .owner_id = owner_id,
> .force_pte = true,
> + .annotation = annotation,
> };
> struct kvm_pgtable_walker walker = {
> .cb = stage2_map_walker,
> @@ -879,7 +871,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> .arg = &map_data,
> };
>
> - if (owner_id > KVM_MAX_OWNER_ID)
> + if (annotation & PTE_VALID)
> return -EINVAL;
>
> ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> --
> 2.30.2
>

Thanks,
drew

2021-10-06 11:23:47

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 03/16] KVM: arm64: Turn kvm_pgtable_stage2_set_owner into kvm_pgtable_stage2_annotate

On Wed, Oct 06, 2021 at 01:02:11PM +0200, Andrew Jones wrote:
> On Mon, Oct 04, 2021 at 06:48:36PM +0100, Marc Zyngier wrote:
> > kvm_pgtable_stage2_set_owner() could be generalised into a way
> > to store up to 63 bits in the page tables, as long as we don't
> > set bit 0.
> >
> > Let's just do that.
> >
> > Signed-off-by: Marc Zyngier <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_pgtable.h | 12 ++++++-----
> > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 +-
> > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 11 ++++------
> > arch/arm64/kvm/hyp/nvhe/setup.c | 10 +++++++++-
> > arch/arm64/kvm/hyp/pgtable.c | 20 ++++++-------------
> > 5 files changed, 27 insertions(+), 28 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 027783829584..d4d3ae0b5edb 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -329,14 +329,16 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > void *mc);
> >
> > /**
> > - * kvm_pgtable_stage2_set_owner() - Unmap and annotate pages in the IPA space to
> > - * track ownership.
> > + * kvm_pgtable_stage2_annotate() - Unmap and annotate pages in the IPA space
> > + * to track ownership (and more).
> > * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
> > * @addr: Base intermediate physical address to annotate.
> > * @size: Size of the annotated range.
> > * @mc: Cache of pre-allocated and zeroed memory from which to allocate
> > * page-table pages.
> > - * @owner_id: Unique identifier for the owner of the page.
> > + * @annotation: A 63 bit value that will be stored in the page tables.
> > + * @annotation[0] must be 0, and @annotation[63:1] is stored
> > + * in the page tables.
> > *
> > * By default, all page-tables are owned by identifier 0. This function can be
> > * used to mark portions of the IPA space as owned by other entities. When a
> > @@ -345,8 +347,8 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > *
> > * Return: 0 on success, negative error code on failure.
> > */
> > -int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > - void *mc, u8 owner_id);
> > +int kvm_pgtable_stage2_annotate(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > + void *mc, kvm_pte_t annotation);
> >
> > /**
> > * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index b58c910babaf..9d2ca173ea9a 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -53,7 +53,7 @@ int __pkvm_host_share_hyp(u64 pfn);
> >
> > bool addr_is_memory(phys_addr_t phys);
> > int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
> > -int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
> > +int host_stage2_annotate_locked(phys_addr_t addr, u64 size, kvm_pte_t owner_id);
> > int kvm_host_prepare_stage2(void *pgt_pool_base);
> > void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index bacd493a4eac..8cd0c3bdb911 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -286,17 +286,14 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> > int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
> > enum kvm_pgtable_prot prot)
> > {
> > - hyp_assert_lock_held(&host_kvm.lock);
> > -
> > return host_stage2_try(__host_stage2_idmap, addr, addr + size, prot);
> > }
> >
> > -int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
> > +int host_stage2_annotate_locked(phys_addr_t addr, u64 size,
> > + kvm_pte_t annotation)
> > {
> > - hyp_assert_lock_held(&host_kvm.lock);
>
> Hi Marc,
>
> Why are the lock asserts getting dropped?

Ah, I see. host_stage2_try already has the same assert.

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2021-10-07 13:52:38

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 06/16] KVM: arm64: Force a full unmap on vpcu reinit

On Mon, Oct 04, 2021 at 06:48:39PM +0100, Marc Zyngier wrote:
> As we now keep information in the S2PT, we must be careful not
> to keep it across a VM reboot, which could otherwise lead to
> interesting problems.
>
> Make sure that the S2 is completely discarded on reset of
> a vcpu, and remove the flag that enforces the MMIO check.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/kvm/psci.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 74c47d420253..6c9cb041f764 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -12,6 +12,7 @@
>
> #include <asm/cputype.h>
> #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>
> #include <kvm/arm_psci.h>
> #include <kvm/arm_hypercalls.h>
> @@ -180,6 +181,13 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> tmp->arch.power_off = true;
> kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>
> + /*
> + * If the MMIO guard was enabled, we pay the price of a full
> + * unmap to get back to a sane state (and clear the flag).
> + */
> + if (test_and_clear_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags))
> + stage2_unmap_vm(vcpu->kvm);
> +
> memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> vcpu->run->system_event.type = type;
> vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> --
> 2.30.2
>

Reviewed-by: Andrew Jones <[email protected]>

2021-10-07 13:58:05

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 07/16] KVM: arm64: Wire MMIO guard hypercalls

On Mon, Oct 04, 2021 at 06:48:40PM +0100, Marc Zyngier wrote:
> Plumb in the hypercall interface to allow a guest to discover,
> enroll, map and unmap MMIO regions.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/kvm/hypercalls.c | 28 ++++++++++++++++++++++++++++
> include/linux/arm-smccc.h | 28 ++++++++++++++++++++++++++++
> 2 files changed, 56 insertions(+)
>
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 30da78f72b3b..c39aab55ecae 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -5,6 +5,7 @@
> #include <linux/kvm_host.h>
>
> #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>
> #include <kvm/arm_hypercalls.h>
> #include <kvm/arm_psci.h>
> @@ -129,10 +130,37 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
> val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
> + /* Only advertise MMIO guard to 64bit guests */
> + if (!vcpu_mode_is_32bit(vcpu)) {
> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
> + val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP);
> + }
> break;
> case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
> kvm_ptp_get_time(vcpu, val);
> break;
> + case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID:
> + if (!vcpu_mode_is_32bit(vcpu))
> + val[0] = PAGE_SIZE;
> + break;
> + case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
> + if (!vcpu_mode_is_32bit(vcpu)) {
> + set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> + val[0] = SMCCC_RET_SUCCESS;
> + }
> + break;
> + case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
> + if (!vcpu_mode_is_32bit(vcpu) &&
> + kvm_install_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> + val[0] = SMCCC_RET_SUCCESS;
> + break;
> + case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID:
> + if (!vcpu_mode_is_32bit(vcpu) &&
> + kvm_remove_ioguard_page(vcpu, vcpu_get_reg(vcpu, 1)))
> + val[0] = SMCCC_RET_SUCCESS;
> + break;

Since these are all ARM_SMCCC_SMC_64 calls, can we do some sort of
refactoring first, similar to Oliver's "KVM: arm64: Clean up SMC64 PSCI
filtering for AArch32 guests", which would avoid the need for all the
!vcpu_mode_is_32bit's?

> case ARM_SMCCC_TRNG_VERSION:
> case ARM_SMCCC_TRNG_FEATURES:
> case ARM_SMCCC_TRNG_GET_UUID:
> diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
> index 7d1cabe15262..4aab2078d8d3 100644
> --- a/include/linux/arm-smccc.h
> +++ b/include/linux/arm-smccc.h
> @@ -107,6 +107,10 @@
> /* KVM "vendor specific" services */
> #define ARM_SMCCC_KVM_FUNC_FEATURES 0
> #define ARM_SMCCC_KVM_FUNC_PTP 1
> +#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO 2
> +#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL 3
> +#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP 4
> +#define ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP 5
> #define ARM_SMCCC_KVM_FUNC_FEATURES_2 127
> #define ARM_SMCCC_KVM_NUM_FUNCS 128
>
> @@ -133,6 +137,30 @@
> #define KVM_PTP_VIRT_COUNTER 0
> #define KVM_PTP_PHYS_COUNTER 1
>
> +#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_INFO_FUNC_ID \
> + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
> + ARM_SMCCC_SMC_64, \
> + ARM_SMCCC_OWNER_VENDOR_HYP, \
> + ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO)
> +
> +#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID \
> + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
> + ARM_SMCCC_SMC_64, \
> + ARM_SMCCC_OWNER_VENDOR_HYP, \
> + ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL)
> +
> +#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID \
> + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
> + ARM_SMCCC_SMC_64, \
> + ARM_SMCCC_OWNER_VENDOR_HYP, \
> + ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP)
> +
> +#define ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID \
> + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
> + ARM_SMCCC_SMC_64, \
> + ARM_SMCCC_OWNER_VENDOR_HYP, \
> + ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP)
> +
> /* Paravirtualised time calls (defined by ARM DEN0057A) */
> #define ARM_SMCCC_HV_PV_TIME_FEATURES \
> ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
> --
> 2.30.2
>

Besides the refactoring suggestion,

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2021-10-07 15:49:06

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 10/16] KVM: arm64: Add some documentation for the MMIO guard feature

On Mon, Oct 04, 2021 at 06:48:43PM +0100, Marc Zyngier wrote:
> Document the hypercalls user for the MMIO guard infrastructure.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> Documentation/virt/kvm/arm/index.rst | 1 +
> Documentation/virt/kvm/arm/mmio-guard.rst | 74 +++++++++++++++++++++++
> 2 files changed, 75 insertions(+)
> create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst
>
> diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
> index 78a9b670aafe..e77a0ee2e2d4 100644
> --- a/Documentation/virt/kvm/arm/index.rst
> +++ b/Documentation/virt/kvm/arm/index.rst
> @@ -11,3 +11,4 @@ ARM
> psci
> pvtime
> ptp_kvm
> + mmio-guard
> diff --git a/Documentation/virt/kvm/arm/mmio-guard.rst b/Documentation/virt/kvm/arm/mmio-guard.rst
> new file mode 100644
> index 000000000000..8b3c852c5d92
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/mmio-guard.rst
> @@ -0,0 +1,74 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +KVM MMIO guard
> +==============
> +
> +KVM implements device emulation by handling translation faults to any
> +IPA range that is not contained in a memory slot. Such a translation
> +fault is in most cases passed on to userspace (or in rare cases to the
> +host kernel) with the address, size and possibly data of the access
> +for emulation.
> +
> +Should the guest exit with an address that is not one that corresponds
> +to an emulatable device, userspace may take measures that are not the
> +most graceful as far as the guest is concerned (such as terminating it
> +or delivering a fatal exception).
> +
> +There is also an element of trust: by forwarding the request to
> +userspace, the kernel assumes that the guest trusts userspace to do
> +the right thing.
> +
> +The KVM MMIO guard offers a way to mitigate this last point: a guest
> +can request that only certain regions of the IPA space are valid as
> +MMIO. Only these regions will be handled as an MMIO, and any other
> +will result in an exception being delivered to the guest.
> +
> +This relies on a set of hypercalls defined in the KVM-specific range,
> +using the HVC64 calling convention.
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
> +
> + ============== ======== ================================
> + Function ID: (uint32) 0xC6000002
> + Arguments: none
> + Return Values: (int64) NOT_SUPPORTED(-1) on error, or
> + (uint64) Protection Granule (PG) size in
> + bytes (r0)
> + ============== ======== ================================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
> +
> + ============== ======== ==============================
> + Function ID: (uint32) 0xC6000003
> + Arguments: none
> + Return Values: (int64) NOT_SUPPORTED(-1) on error, or
> + RET_SUCCESS(0) (r0)
> + ============== ======== ==============================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
> +
> + ============== ======== ====================================
> + Function ID: (uint32) 0xC6000004
> + Arguments: (uint64) The base of the PG-sized IPA range
> + that is allowed to be accessed as
> + MMIO. Must be aligned to the PG size
> + (r1)
> + (uint64) Index in the MAIR_EL1 register
> + providing the memory attribute that
> + is used by the guest (r2)
^^ some tabs got in here

> + Return Values: (int64) NOT_SUPPORTED(-1) on error, or
> + RET_SUCCESS(0) (r0)
> + ============== ======== ====================================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
> +
> + ============== ======== ======================================
> + Function ID: (uint32) 0xC6000005
> + Arguments: (uint64) PG-sized IPA range aligned to the PG
> + size which has been previously mapped.
> + Must be aligned to the PG size and
> + have been previously mapped (r1)
> + Return Values: (int64) NOT_SUPPORTED(-1) on error, or
> + RET_SUCCESS(0) (r0)
> + ============== ======== ======================================
> --
> 2.30.2
>

Otherwise,

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2021-10-07 15:50:04

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 10/16] KVM: arm64: Add some documentation for the MMIO guard feature

On Mon, Oct 04, 2021 at 06:48:43PM +0100, Marc Zyngier wrote:
> Document the hypercalls user for the MMIO guard infrastructure.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> Documentation/virt/kvm/arm/index.rst | 1 +
> Documentation/virt/kvm/arm/mmio-guard.rst | 74 +++++++++++++++++++++++
> 2 files changed, 75 insertions(+)
> create mode 100644 Documentation/virt/kvm/arm/mmio-guard.rst
>
> diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
> index 78a9b670aafe..e77a0ee2e2d4 100644
> --- a/Documentation/virt/kvm/arm/index.rst
> +++ b/Documentation/virt/kvm/arm/index.rst
> @@ -11,3 +11,4 @@ ARM
> psci
> pvtime
> ptp_kvm
> + mmio-guard
> diff --git a/Documentation/virt/kvm/arm/mmio-guard.rst b/Documentation/virt/kvm/arm/mmio-guard.rst
> new file mode 100644
> index 000000000000..8b3c852c5d92
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/mmio-guard.rst
> @@ -0,0 +1,74 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +KVM MMIO guard
> +==============
> +
> +KVM implements device emulation by handling translation faults to any
> +IPA range that is not contained in a memory slot. Such a translation
> +fault is in most cases passed on to userspace (or in rare cases to the
> +host kernel) with the address, size and possibly data of the access
> +for emulation.
> +
> +Should the guest exit with an address that is not one that corresponds
> +to an emulatable device, userspace may take measures that are not the
> +most graceful as far as the guest is concerned (such as terminating it
> +or delivering a fatal exception).
> +
> +There is also an element of trust: by forwarding the request to
> +userspace, the kernel assumes that the guest trusts userspace to do
> +the right thing.
> +
> +The KVM MMIO guard offers a way to mitigate this last point: a guest
> +can request that only certain regions of the IPA space are valid as
> +MMIO. Only these regions will be handled as an MMIO, and any other
> +will result in an exception being delivered to the guest.
> +
> +This relies on a set of hypercalls defined in the KVM-specific range,
> +using the HVC64 calling convention.
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO
> +
> + ============== ======== ================================
> + Function ID: (uint32) 0xC6000002
> + Arguments: none
> + Return Values: (int64) NOT_SUPPORTED(-1) on error, or
> + (uint64) Protection Granule (PG) size in
> + bytes (r0)
> + ============== ======== ================================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL
> +
> + ============== ======== ==============================
> + Function ID: (uint32) 0xC6000003
> + Arguments: none
> + Return Values: (int64) NOT_SUPPORTED(-1) on error, or
> + RET_SUCCESS(0) (r0)
> + ============== ======== ==============================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP
> +
> + ============== ======== ====================================
> + Function ID: (uint32) 0xC6000004
> + Arguments: (uint64) The base of the PG-sized IPA range
> + that is allowed to be accessed as
> + MMIO. Must be aligned to the PG size
> + (r1)
> + (uint64) Index in the MAIR_EL1 register
> + providing the memory attribute that
> + is used by the guest (r2)

Already gave my r-b for this document, but after double checking I see
that this r2 argument doesn't appeared used by the implementation
yet. Is this left over from an older design or reserved for later use?

Thanks,
drew


> + Return Values: (int64) NOT_SUPPORTED(-1) on error, or
> + RET_SUCCESS(0) (r0)
> + ============== ======== ====================================
> +
> +* ARM_SMCCC_KVM_FUNC_MMIO_GUARD_UNMAP
> +
> + ============== ======== ======================================
> + Function ID: (uint32) 0xC6000005
> + Arguments: (uint64) PG-sized IPA range aligned to the PG
> + size which has been previously mapped.
> + Must be aligned to the PG size and
> + have been previously mapped (r1)
> + Return Values: (int64) NOT_SUPPORTED(-1) on error, or
> + RET_SUCCESS(0) (r0)
> + ============== ======== ======================================
> --
> 2.30.2
>

2021-10-07 15:50:32

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 09/16] KVM: arm64: Advertise a capability for MMIO guard

On Mon, Oct 04, 2021 at 06:48:42PM +0100, Marc Zyngier wrote:
> In order for userspace to find out whether the MMIO guard is
> exposed to a guest, expose a capability that says so.
>
> We take this opportunity to make it incompatible with the NISV
> option, as that would be rather counter-productive!
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 29 ++++++++++++++++++-----------
> arch/arm64/kvm/hypercalls.c | 14 ++++++++++++--
> include/uapi/linux/kvm.h | 1 +
> 3 files changed, 31 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ed9c89ec0b4f..1c9a7abe2728 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -81,32 +81,33 @@ int kvm_arch_check_processor_compat(void *opaque)
> int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> struct kvm_enable_cap *cap)
> {
> - int r;
> + int r = -EINVAL;
>
> if (cap->flags)
> return -EINVAL;
>
> + mutex_lock(&kvm->lock);
> +
> switch (cap->cap) {
> case KVM_CAP_ARM_NISV_TO_USER:
> - r = 0;
> - set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
> - &kvm->arch.flags);
> + /* This is incompatible with MMIO guard */
> + if (!test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &kvm->arch.flags)) {

But KVM_ARCH_FLAG_MMIO_GUARD will never be set at VM creation time, which
is the traditional time to probe and enable capabilities, because the
guest hasn't run yet, so it hasn't had a chance to issue the hypercall to
enable the mmio guard yet.

> + r = 0;
> + set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
> + &kvm->arch.flags);
> + }
> break;
> case KVM_CAP_ARM_MTE:
> - mutex_lock(&kvm->lock);
> - if (!system_supports_mte() || kvm->created_vcpus) {
> - r = -EINVAL;
> - } else {
> + if (system_supports_mte() && !kvm->created_vcpus) {
> r = 0;
> set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags);
> }
> - mutex_unlock(&kvm->lock);
> break;
> default:
> - r = -EINVAL;
> break;
> }
>
> + mutex_unlock(&kvm->lock);
> return r;
> }
>
> @@ -211,13 +212,19 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> case KVM_CAP_IMMEDIATE_EXIT:
> case KVM_CAP_VCPU_EVENTS:
> case KVM_CAP_ARM_IRQ_LINE_LAYOUT_2:
> - case KVM_CAP_ARM_NISV_TO_USER:
> case KVM_CAP_ARM_INJECT_EXT_DABT:
> case KVM_CAP_SET_GUEST_DEBUG:
> case KVM_CAP_VCPU_ATTRIBUTES:
> case KVM_CAP_PTP_KVM:
> r = 1;
> break;
> + case KVM_CAP_ARM_NISV_TO_USER:
> + r = !test_bit(KVM_ARCH_FLAG_MMIO_GUARD, &kvm->arch.flags);
> + break;
> + case KVM_CAP_ARM_MMIO_GUARD:
> + r = !test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
> + &kvm->arch.flags);
> + break;
> case KVM_CAP_SET_GUEST_DEBUG2:
> return KVM_GUESTDBG_VALID_MASK;
> case KVM_CAP_ARM_SET_DEVICE_ADDR:
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index c39aab55ecae..e4fade6a96f6 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -59,6 +59,14 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> val[3] = lower_32_bits(cycles);
> }
>
> +static bool mmio_guard_allowed(struct kvm_vcpu *vcpu)
> +{
> + return (!test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER,
> + &vcpu->kvm->arch.flags) &&
> + !vcpu_mode_is_32bit(vcpu));
> +
> +}
> +
> int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> {
> u32 func_id = smccc_get_function(vcpu);
> @@ -131,7 +139,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
> val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
> /* Only advertise MMIO guard to 64bit guests */
> - if (!vcpu_mode_is_32bit(vcpu)) {
> + if (mmio_guard_allowed(vcpu)) {
> val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_INFO);
> val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_ENROLL);
> val[0] |= BIT(ARM_SMCCC_KVM_FUNC_MMIO_GUARD_MAP);
> @@ -146,10 +154,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
> val[0] = PAGE_SIZE;
> break;
> case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_ENROLL_FUNC_ID:
> - if (!vcpu_mode_is_32bit(vcpu)) {
> + mutex_lock(&vcpu->kvm->lock);
> + if (mmio_guard_allowed(vcpu)) {
> set_bit(KVM_ARCH_FLAG_MMIO_GUARD, &vcpu->kvm->arch.flags);
> val[0] = SMCCC_RET_SUCCESS;
> }
> + mutex_unlock(&vcpu->kvm->lock);
> break;
> case ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID:
> if (!vcpu_mode_is_32bit(vcpu) &&
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index a067410ebea5..ef171186e7be 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_BINARY_STATS_FD 203
> #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
> #define KVM_CAP_ARM_MTE 205
> +#define KVM_CAP_ARM_MMIO_GUARD 206
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 2.30.2
>

Thanks,
drew

2021-10-07 16:00:26

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 13/16] arm64: Implement ioremap/iounmap hooks calling into KVM's MMIO guard

On Mon, Oct 04, 2021 at 06:48:46PM +0100, Marc Zyngier wrote:
> Implement the previously defined ioremap/iounmap hooks for arm64,
> calling into KVM's MMIO guard if available.
>
> Signed-off-by: Marc Zyngier <[email protected]>
> ---
> arch/arm64/mm/ioremap.c | 112 ++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 112 insertions(+)
>
> diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> index b7c81dacabf0..5334cbdc9f64 100644
> --- a/arch/arm64/mm/ioremap.c
> +++ b/arch/arm64/mm/ioremap.c
> @@ -9,13 +9,125 @@
> * Copyright (C) 2012 ARM Ltd.
> */
>
> +#define pr_fmt(fmt) "ioremap: " fmt
> +
> #include <linux/export.h>
> #include <linux/mm.h>
> #include <linux/vmalloc.h>
> +#include <linux/slab.h>
> #include <linux/io.h>
> +#include <linux/arm-smccc.h>
>
> #include <asm/fixmap.h>
> #include <asm/tlbflush.h>
> +#include <asm/hypervisor.h>
> +
> +struct ioremap_guard_ref {
> + refcount_t count;
> +};
> +
> +static DEFINE_STATIC_KEY_FALSE(ioremap_guard_key);
> +static DEFINE_XARRAY(ioremap_guard_array);
> +static DEFINE_MUTEX(ioremap_guard_lock);
> +
> +void ioremap_phys_range_hook(phys_addr_t phys_addr, size_t size, pgprot_t prot)
> +{
> + if (!static_branch_unlikely(&ioremap_guard_key))
> + return;
> +
> + if (pfn_valid(__phys_to_pfn(phys_addr)))
> + return;
> +
> + mutex_lock(&ioremap_guard_lock);
> +
> + while (size) {
> + u64 pfn = phys_addr >> PAGE_SHIFT;
> + struct ioremap_guard_ref *ref;
> + struct arm_smccc_res res;
> +
> + ref = xa_load(&ioremap_guard_array, pfn);
> + if (ref) {
> + refcount_inc(&ref->count);
> + goto next;
> + }
> +
> + /*
> + * It is acceptable for the allocation to fail, specially
> + * if trying to ioremap something very early on, like with
> + * earlycon, which happens long before kmem_cache_init.
> + * This page will be permanently accessible, similar to a
> + * saturated refcount.
> + */
> + ref = kzalloc(sizeof(*ref), GFP_KERNEL);
> + if (ref) {
> + refcount_set(&ref->count, 1);
> + if (xa_err(xa_store(&ioremap_guard_array, pfn, ref,
> + GFP_KERNEL))) {
> + kfree(ref);
> + ref = NULL;
> + }
> + }
> +
> + arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_MAP_FUNC_ID,
> + phys_addr, prot, &res);

OK, I see this follows the document and passes prot in x2, even though the
hypercall implementation doesn't look at it [yet].

> + if (res.a0 != SMCCC_RET_SUCCESS) {
> + pr_warn_ratelimited("Failed to register %llx\n",
> + phys_addr);
> + xa_erase(&ioremap_guard_array, pfn);
> + kfree(ref);
> + goto out;
> + }
> +
> + next:
> + size -= PAGE_SIZE;
> + phys_addr += PAGE_SIZE;

Looks like we're assuming the guard granule to be PAGE_SIZE here. Looking
ahead at the next patch, I see it must be PAGE_SIZE, because if the info
hypercall doesn't have a matching value, then mmio guarding doesn't
happen at all. Maybe it should be documented that for this feature the
host and guest must have matching page sizes.

> + }
> +out:
> + mutex_unlock(&ioremap_guard_lock);
> +}
> +
> +void iounmap_phys_range_hook(phys_addr_t phys_addr, size_t size)
> +{
> + if (!static_branch_unlikely(&ioremap_guard_key))
> + return;
> +
> + VM_BUG_ON(phys_addr & ~PAGE_MASK || size & ~PAGE_MASK);
> +
> + mutex_lock(&ioremap_guard_lock);
> +
> + while (size) {
> + u64 pfn = phys_addr >> PAGE_SHIFT;
> + struct ioremap_guard_ref *ref;
> + struct arm_smccc_res res;
> +
> + ref = xa_load(&ioremap_guard_array, pfn);
> + if (!ref) {
> + pr_warn_ratelimited("%llx not tracked, left mapped\n",
> + phys_addr);
> + goto next;
> + }
> +
> + if (!refcount_dec_and_test(&ref->count))
> + goto next;
> +
> + xa_erase(&ioremap_guard_array, pfn);
> + kfree(ref);
> +
> + arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_KVM_MMIO_GUARD_UNMAP_FUNC_ID,
> + phys_addr, &res);
> + if (res.a0 != SMCCC_RET_SUCCESS) {
> + pr_warn_ratelimited("Failed to unregister %llx\n",
> + phys_addr);
> + goto out;
> + }
> +
> + next:
> + size -= PAGE_SIZE;
> + phys_addr += PAGE_SIZE;
> + }
> +out:
> + mutex_unlock(&ioremap_guard_lock);
> +}
>
> static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size,
> pgprot_t prot, void *caller)
> --
> 2.30.2
>

Thanks,
drew

2021-10-07 20:48:45

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v2 00/16] KVM: arm64: MMIO guard PV services

On Mon, Oct 04, 2021 at 06:48:33PM +0100, Marc Zyngier wrote:
> This is the second version of this series initially posted at [1] that
> aims at letting a guest express what it considers as MMIO, and only
> let this through to userspace. Together with the guest memory made
> (mostly) inaccessible to the host kernel and userspace, this allows an
> implementation of a hardened IO subsystem.
>
> A lot has been fixed/revamped/improved since the initial posting,
> although I am still not pleased with the ioremap plugging on the guest
> side. I'll take any idea to get rid of it!
>

Pros/cons of the hooks

Pros:
1) VM only needs to have a kernel that supports the feature (and a
kernel command line that enables it)
2) All the ioremapped MMIO ranges are permitted immediately, rather than
deferring until some other event (which would probably be too late in
many cases)

Cons:
1) Having to have hooks, which is never pretty
2) Adding the hypcalls to each ioremap, which adds some overhead
3) Having to reference count all the mappings, which adds even more overhead
4) Not giving the owner of the VM control over what MMIO is permitted
(Well, maybe the VM owner just needs to blacklist drivers for anything
that it doesn't want.)

I don't think any of the Con's are too bad and probably Pro-2 more or less
makes the hooks a necessity.

Thanks,
drew