2017-08-25 08:32:20

by Florent Revest

[permalink] [raw]
Subject: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

Hi,

This series implements a mechanism to sandbox EFI Runtime Services on arm64.
It can be enabled with CONFIG_EFI_SANDBOX. At boot it spawns an internal KVM
virtual machine that is ran everytime an EFI Runtime Service is called. This
limits the possible security and stability impact of EFI runtime on the kernel.

The patch set is split as follow:
- Patches 1 and 2: Give more control over HVC handling to KVM
- Patches 3 to 6: Introduce the concept of KVM "internal VMs"
- Patches 7 to 9: Reorder KVM and EFI initialization on ARM
- Patch 10: Introduces the EFI sandboxing VM and wrappers
- Patch 11: Workarounds some EFI Runtime Services relying on EL3

The sandboxing has been tested to work reliably (rtc and efivars) on a
SoftIron OverDrive 1000 box and on a ARMv8.3 model with VHE enabled. Normal
userspace KVM instance have also been tested to still work correctly.

Those patches apply cleanly on the Linus' v4.13-rc6 tag and have no other
dependencies.

Florent Revest (11):
arm64: Add an SMCCC function IDs header
KVM: arm64: Return an Unknown ID on unhandled HVC
KVM: Allow VM lifecycle management without userspace
KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
KVM: Expose VM/VCPU creation functions
KVM, arm64: Expose a VCPU initialization function
KVM: Allow initialization before the module target
KVM, arm, arm64: Initialize KVM's core earlier
EFI, arm, arm64: Enable EFI Runtime Services later
efi, arm64: Sandbox Runtime Services in a VM
KVM, arm64: Don't trap internal VMs SMC calls

arch/arm/include/asm/efi.h | 7 +
arch/arm/include/asm/kvm_coproc.h | 3 +
arch/arm/include/asm/kvm_host.h | 1 +
arch/arm/include/asm/kvm_psci.h | 1 +
arch/arm/kvm/coproc.c | 6 +
arch/arm/kvm/coproc_a15.c | 3 +-
arch/arm/kvm/coproc_a7.c | 3 +-
arch/arm64/include/asm/efi.h | 71 ++++
arch/arm64/include/asm/kvm_emulate.h | 3 +
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/kvm_psci.h | 1 +
arch/arm64/kernel/asm-offsets.c | 3 +
arch/arm64/kvm/handle_exit.c | 27 +-
arch/arm64/kvm/sys_regs_generic_v8.c | 8 +-
arch/x86/include/asm/efi.h | 2 +
drivers/firmware/efi/Kconfig | 10 +
drivers/firmware/efi/Makefile | 1 +
drivers/firmware/efi/arm-runtime.c | 5 +-
drivers/firmware/efi/arm-sandbox-payload.S | 96 +++++
drivers/firmware/efi/arm-sandbox.c | 569 +++++++++++++++++++++++++++++
drivers/firmware/efi/efi.c | 3 +
include/linux/kvm_host.h | 4 +
include/linux/smccc_fn.h | 53 +++
include/uapi/linux/psci.h | 2 +
virt/kvm/arm/arm.c | 18 +-
virt/kvm/arm/mmu.c | 76 +++-
virt/kvm/arm/psci.c | 21 ++
virt/kvm/kvm_main.c | 102 ++++--
28 files changed, 1050 insertions(+), 53 deletions(-)
create mode 100644 drivers/firmware/efi/arm-sandbox-payload.S
create mode 100644 drivers/firmware/efi/arm-sandbox.c
create mode 100644 include/linux/smccc_fn.h

--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


2017-08-25 08:32:32

by Florent Revest

[permalink] [raw]
Subject: [RFC 02/11] KVM: arm64: Return an Unknown ID on unhandled HVC

So far, when the KVM hypervisor received an hvc from a guest, it only
routed the hypercall to the PSCI calls handler. If the function ID of the
hypercall wouldn't be supported by the PSCI code, a PSCI_RET_NOT_SUPPORTED
error code would be returned in x0.

This patch introduces a kvm_psci_is_call() check which is verified before
entering the PSCI calls handling code. The HVC is now only routed to the
PSCI code if its function ID is in the ranges of PSCI functions defined by
SMCCC (0x84000000-0x8400001f and 0xc4000000-0xc400001f).

If the function ID is not in those ranges, an Unknown Function Identifier
is returned in x0. This implements the behavior defined by SMCCC and paves
the way for other hvc handlers.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm/include/asm/kvm_psci.h | 1 +
arch/arm64/include/asm/kvm_psci.h | 1 +
arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++------
include/uapi/linux/psci.h | 2 ++
virt/kvm/arm/psci.c | 21 +++++++++++++++++++++
5 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_psci.h b/arch/arm/include/asm/kvm_psci.h
index 6bda945..8dcd642 100644
--- a/arch/arm/include/asm/kvm_psci.h
+++ b/arch/arm/include/asm/kvm_psci.h
@@ -22,6 +22,7 @@
#define KVM_ARM_PSCI_0_2 2

int kvm_psci_version(struct kvm_vcpu *vcpu);
+bool kvm_psci_is_call(struct kvm_vcpu *vcpu);
int kvm_psci_call(struct kvm_vcpu *vcpu);

#endif /* __ARM_KVM_PSCI_H__ */
diff --git a/arch/arm64/include/asm/kvm_psci.h b/arch/arm64/include/asm/kvm_psci.h
index bc39e55..1a28809 100644
--- a/arch/arm64/include/asm/kvm_psci.h
+++ b/arch/arm64/include/asm/kvm_psci.h
@@ -22,6 +22,7 @@
#define KVM_ARM_PSCI_0_2 2

int kvm_psci_version(struct kvm_vcpu *vcpu);
+bool kvm_psci_is_call(struct kvm_vcpu *vcpu);
int kvm_psci_call(struct kvm_vcpu *vcpu);

#endif /* __ARM64_KVM_PSCI_H__ */
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 17d8a16..bc7ade5 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -21,6 +21,7 @@

#include <linux/kvm.h>
#include <linux/kvm_host.h>
+#include <linux/smccc_fn.h>

#include <asm/esr.h>
#include <asm/kvm_asm.h>
@@ -34,19 +35,30 @@

typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *);

+/*
+ * handle_hvc - handle a guest hypercall
+ *
+ * @vcpu: the vcpu pointer
+ * @run: access to the kvm_run structure for results
+ *
+ * Route a given hypercall to its right HVC handler thanks to its function ID.
+ * If no corresponding handler is found, write an Unknown ID in x0 (cf. SMCCC).
+ *
+ * This function returns: > 0 (success), 0 (success but exit to user
+ * space), and < 0 (errors)
+ */
static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- int ret;
+ int ret = 1;

trace_kvm_hvc_arm64(*vcpu_pc(vcpu), vcpu_get_reg(vcpu, 0),
kvm_vcpu_hvc_get_imm(vcpu));
vcpu->stat.hvc_exit_stat++;

- ret = kvm_psci_call(vcpu);
- if (ret < 0) {
- kvm_inject_undefined(vcpu);
- return 1;
- }
+ if (kvm_psci_is_call(vcpu))
+ ret = kvm_psci_call(vcpu);
+ else
+ vcpu_set_reg(vcpu, 0, SMCCC_STD_RET_UNKNOWN_ID);

return ret;
}
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 3d7a0fc..79704fe 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -24,10 +24,12 @@
/* PSCI v0.2 interface */
#define PSCI_0_2_FN_BASE 0x84000000
#define PSCI_0_2_FN(n) (PSCI_0_2_FN_BASE + (n))
+#define PSCI_0_2_FN_END PSCI_0_2_FN(0x1F)
#define PSCI_0_2_64BIT 0x40000000
#define PSCI_0_2_FN64_BASE \
(PSCI_0_2_FN_BASE + PSCI_0_2_64BIT)
#define PSCI_0_2_FN64(n) (PSCI_0_2_FN64_BASE + (n))
+#define PSCI_0_2_FN64_END PSCI_0_2_FN64(0x1F)

#define PSCI_0_2_FN_PSCI_VERSION PSCI_0_2_FN(0)
#define PSCI_0_2_FN_CPU_SUSPEND PSCI_0_2_FN(1)
diff --git a/virt/kvm/arm/psci.c b/virt/kvm/arm/psci.c
index f1e363b..9602894 100644
--- a/virt/kvm/arm/psci.c
+++ b/virt/kvm/arm/psci.c
@@ -332,3 +332,24 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
return -EINVAL;
};
}
+
+/**
+ * kvm_psci_is_call - checks if a HVC function ID is in a PSCI range
+ * @vcpu: Pointer to the VCPU struct
+ *
+ * When a hypercall is received from a guest. The SMCCC defines a function ID
+ * as a value to be put in x0 to identify the destination of the call. The same
+ * document defines ranges of function IDs to be used by PSCI. This function
+ * checks whether a given vcpu is requesting a PSCI related handler.
+ *
+ * This function returns:
+ * - true if this HVC should be handled by kvm_psci_call
+ * - false if it shouldn't
+ */
+inline bool kvm_psci_is_call(struct kvm_vcpu *vcpu)
+{
+ unsigned long fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0);
+
+ return ((fn >= PSCI_0_2_FN_BASE && fn <= PSCI_0_2_FN_END) ||
+ (fn >= PSCI_0_2_FN64_BASE && fn <= PSCI_0_2_FN64_END));
+}
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:32:51

by Florent Revest

[permalink] [raw]
Subject: [RFC 05/11] KVM: Expose VM/VCPU creation functions

Now that KVM is capable of creating internal virtual machines, the rest of
the kernel needs an API to access this capability.

This patch exposes two functions for VMs and VCPUs creation in kvm_host.h:
- kvm_create_internal_vm: ensures that kvm->mm is kept NULL at VM creation
- kvm_vm_create_vcpu: simple alias of kvm_vm_ioctl_create_vcpu for clarity

Signed-off-by: Florent Revest <[email protected]>
---
include/linux/kvm_host.h | 3 +++
virt/kvm/kvm_main.c | 10 ++++++++++
2 files changed, 13 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 21a6fd6..dd10d3b 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -565,6 +565,9 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
struct module *module);
void kvm_exit(void);

+struct kvm *kvm_create_internal_vm(unsigned long type);
+int kvm_vm_create_vcpu(struct kvm *kvm, u32 id);
+
void kvm_get_kvm(struct kvm *kvm);
void kvm_put_kvm(struct kvm *kvm);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2e7af1a..c1c8bb6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -733,6 +733,11 @@ static struct kvm *kvm_create_vm(unsigned long type, struct mm_struct *mm)
return ERR_PTR(r);
}

+struct kvm *kvm_create_internal_vm(unsigned long type)
+{
+ return kvm_create_vm(type, NULL);
+}
+
static void kvm_destroy_devices(struct kvm *kvm)
{
struct kvm_device *dev, *tmp;
@@ -2549,6 +2554,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
return r;
}

+int kvm_vm_create_vcpu(struct kvm *kvm, u32 id)
+{
+ return kvm_vm_ioctl_create_vcpu(kvm, id);
+}
+
static int kvm_vcpu_ioctl_set_sigmask(struct kvm_vcpu *vcpu, sigset_t *sigset)
{
if (sigset) {
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:33:02

by Florent Revest

[permalink] [raw]
Subject: [RFC 09/11] EFI, arm, arm64: Enable EFI Runtime Services later

EFI Runtime Services on ARM are enabled very early in the boot process
although they aren't used until substantially later. This patch modifies
the efi initialization sequence on ARM to enable runtime services just
before they are effectively needed (in a subsys target instead of early).

The reason behind this change is that eventually, a late Runtime Services
initialization could take advantage of KVM's internal virtual machines to
sandbox firmware code execution. Since KVM's core is only available
starting from the subsys target, this reordering would be compulsory.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm/include/asm/efi.h | 2 ++
arch/arm64/include/asm/efi.h | 2 ++
arch/x86/include/asm/efi.h | 2 ++
drivers/firmware/efi/arm-runtime.c | 3 +--
drivers/firmware/efi/efi.c | 3 +++
5 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index 17f1f1a..ed575ae 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -35,6 +35,8 @@
__f(args); \
})

+int efi_arch_late_enable_runtime_services(void);
+
#define ARCH_EFI_IRQ_FLAGS_MASK \
(PSR_J_BIT | PSR_E_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | \
PSR_T_BIT | MODE_MASK)
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 8f3043a..373d94d 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -37,6 +37,8 @@
kernel_neon_end(); \
})

+int efi_arch_late_enable_runtime_services(void);
+
#define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)

/* arch specific definitions used by the stub code */
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 796ff6c..869efbb 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -233,6 +233,8 @@ static inline bool efi_is_64bit(void)

extern bool efi_reboot_required(void);

+int __init efi_arch_late_enable_runtime_services(void) {}
+
#else
static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {}
static inline bool efi_reboot_required(void)
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index 1cc41c3..d94d240 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -115,7 +115,7 @@ static bool __init efi_virtmap_init(void)
* non-early mapping of the UEFI system table and virtual mappings for all
* EFI_MEMORY_RUNTIME regions.
*/
-static int __init arm_enable_runtime_services(void)
+int __init efi_arch_late_enable_runtime_services(void)
{
u64 mapsize;

@@ -154,7 +154,6 @@ static int __init arm_enable_runtime_services(void)

return 0;
}
-early_initcall(arm_enable_runtime_services);

void efi_virtmap_load(void)
{
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 045d6d3..2b447b4 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -33,6 +33,7 @@
#include <linux/memblock.h>

#include <asm/early_ioremap.h>
+#include <asm/efi.h>

struct efi __read_mostly efi = {
.mps = EFI_INVALID_TABLE_ADDR,
@@ -304,6 +305,8 @@ static int __init efisubsys_init(void)
{
int error;

+ efi_arch_late_enable_runtime_services();
+
if (!efi_enabled(EFI_BOOT))
return 0;

--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:33:34

by Florent Revest

[permalink] [raw]
Subject: [RFC 11/11] KVM, arm64: Don't trap internal VMs SMC calls

Internal virtual machines can be used to sandbox code such as EFI Runtime
Services. However, some implementations of those Runtime Services rely on
handlers placed in the Secure World (e.g: SoftIron Overdrive 1000) and need
access to SMC calls.

This patch modifies the Hypervisor Configuration Register to avoid trapping
SMC calls of internal virtual machines. Normal userspace VMs are not
affected by this patch.

Note: Letting Runtime Services VMs access EL3 without control can
potentially be a security threat on its own. An alternative would be to
forward SMC calls selectively from inside handle_smc. However, this would
require some level of knowledge of the SMC calls arguments and EFI
Runtime Services implementations.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm64/include/asm/kvm_emulate.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index fe39e68..4b46cd0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -49,6 +49,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
vcpu->arch.hcr_el2 |= HCR_E2H;
if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
vcpu->arch.hcr_el2 &= ~HCR_RW;
+
+ if (!vcpu->kvm->mm)
+ vcpu->arch.hcr_el2 &= ~HCR_TSC;
}

static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu)
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:33:29

by Florent Revest

[permalink] [raw]
Subject: [RFC 10/11] efi, arm64: Sandbox Runtime Services in a VM

EFI Runtime Services are binary blobs currently executed in a special
memory context but with the privileges of the kernel. This can potentially
cause security or stability issues (registers corruption for example).

This patch adds a CONFIG_EFI_SANDBOX option that can be used on arm64 to
enclose the Runtime Services in a virtual machine and limit the impact they
can potentially have on the kernel. This sandboxing can also be useful for
debugging as exceptions caused by the firmware code can be recovered and
examined.

When booting the machine, an internal KVM virtual machine is created with
physical and virtual addresses mirroring the host's EFI context.

One page of code and at least 16K of data pages are kept in low memory for
the usage of an internal (in the VM) assembly function call wrapper
(efi_sandbox_wrapper). Calling this internal wrapper is done from external
C function wrappers (e.g: efi_sandbox_get_next_variable) filling the VCPU
registers with arguments and the data page with copies of memory buffers
first.

When a Runtime Service returns, the internal wrapper issues an HVC to let
the host know the efi status return value in x1. In case of exception,
an internal handler also sends an HVC with an EFI_ABORTED error code.

Details of the VCPU initialization, VM memory mapping and service call/ret
are extensively documented in arm-sandbox.c and arm-sandbox-payload.S.

Note: The current version of this patch could potentially cause problems of
arguments alignment when calling an EFI Runtime Service. Indeed, the
buffers arguments are just pushed onto a stack in a data page without any
regards to the ARM Calling Convention alignments.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm/include/asm/efi.h | 5 +
arch/arm64/include/asm/efi.h | 69 ++++
arch/arm64/kernel/asm-offsets.c | 3 +
arch/arm64/kvm/handle_exit.c | 3 +
drivers/firmware/efi/Kconfig | 10 +
drivers/firmware/efi/Makefile | 1 +
drivers/firmware/efi/arm-runtime.c | 2 +
drivers/firmware/efi/arm-sandbox-payload.S | 96 +++++
drivers/firmware/efi/arm-sandbox.c | 569 +++++++++++++++++++++++++++++
include/linux/smccc_fn.h | 3 +
10 files changed, 761 insertions(+)
create mode 100644 drivers/firmware/efi/arm-sandbox-payload.S
create mode 100644 drivers/firmware/efi/arm-sandbox.c

diff --git a/arch/arm/include/asm/efi.h b/arch/arm/include/asm/efi.h
index ed575ae..524f0dd 100644
--- a/arch/arm/include/asm/efi.h
+++ b/arch/arm/include/asm/efi.h
@@ -35,6 +35,11 @@
__f(args); \
})

+struct kvm_vcpu;
+static inline void efi_arm_sandbox_init(struct mm_struct *efi_mm) { }
+static inline bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu) { return false; }
+static inline int efi_sandbox_exit(struct kvm_vcpu *vcpu) { return -1; }
+
int efi_arch_late_enable_runtime_services(void);

#define ARCH_EFI_IRQ_FLAGS_MASK \
diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 373d94d..f1c33cd 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -1,6 +1,8 @@
#ifndef _ASM_EFI_H
#define _ASM_EFI_H

+#include <linux/efi.h>
+
#include <asm/boot.h>
#include <asm/cpufeature.h>
#include <asm/io.h>
@@ -18,6 +20,10 @@
int efi_create_mapping(struct mm_struct *mm, efi_memory_desc_t *md);
int efi_set_mapping_permissions(struct mm_struct *mm, efi_memory_desc_t *md);

+struct kvm_vcpu;
+
+#ifndef CONFIG_EFI_SANDBOX
+
#define arch_efi_call_virt_setup() \
({ \
kernel_neon_begin(); \
@@ -37,6 +43,69 @@
kernel_neon_end(); \
})

+static inline void efi_arm_sandbox_init(struct mm_struct *efi_mm) { }
+static inline bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu) { return false; }
+static inline int efi_sandbox_exit(struct kvm_vcpu *vcpu) { return -1; }
+
+#else
+
+void efi_arm_sandbox_init(struct mm_struct *efi_mm);
+bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu);
+int efi_sandbox_exit(struct kvm_vcpu *vcpu);
+
+void efi_sandbox_excpt(void);
+void efi_sandbox_wrapper(void);
+void efi_sandbox_vectors(void);
+
+#define arch_efi_call_virt_setup() ({})
+#define arch_efi_call_virt(p, f, args...) efi_sandbox_##f(args)
+#define arch_efi_call_virt_teardown() ({})
+
+/*
+ * The following function wrappers are needed in order to serialize the variadic
+ * macro's arguments (arch_efi_call_virt(p, f, args...)) in the vcpu's registers
+ * p is also ignored since it is available in the context of the virtual machine
+ */
+
+efi_status_t efi_sandbox_get_time(efi_time_t *tm,
+ efi_time_cap_t *tc);
+efi_status_t efi_sandbox_set_time(efi_time_t *tm);
+efi_status_t efi_sandbox_get_wakeup_time(efi_bool_t *enabled,
+ efi_bool_t *pending,
+ efi_time_t *tm);
+efi_status_t efi_sandbox_set_wakeup_time(efi_bool_t enabled,
+ efi_time_t *tm);
+efi_status_t efi_sandbox_get_variable(efi_char16_t *name,
+ efi_guid_t *vendor,
+ u32 *attr,
+ unsigned long *data_size,
+ void *data);
+efi_status_t efi_sandbox_get_next_variable(unsigned long *name_size,
+ efi_char16_t *name,
+ efi_guid_t *vendor);
+efi_status_t efi_sandbox_set_variable(efi_char16_t *name,
+ efi_guid_t *vendor,
+ u32 attr,
+ unsigned long data_size,
+ void *data);
+efi_status_t efi_sandbox_query_variable_info(u32 attr,
+ u64 *storage_space,
+ u64 *remaining_space,
+ u64 *max_variable_size);
+efi_status_t efi_sandbox_get_next_high_mono_count(u32 *count);
+efi_status_t efi_sandbox_reset_system(int reset_type,
+ efi_status_t status,
+ unsigned long data_size,
+ efi_char16_t *data);
+efi_status_t efi_sandbox_update_capsule(efi_capsule_header_t **capsules,
+ unsigned long count,
+ unsigned long sg_list);
+efi_status_t efi_sandbox_query_capsule_caps(efi_capsule_header_t **capsules,
+ unsigned long count,
+ u64 *max_size,
+ int *reset_type);
+#endif
+
int efi_arch_late_enable_runtime_services(void);

#define ARCH_EFI_IRQ_FLAGS_MASK (PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index b3bb7ef..c6d7ef7a 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -31,6 +31,7 @@
#include <asm/vdso_datapage.h>
#include <linux/kbuild.h>
#include <linux/arm-smccc.h>
+#include <linux/efi.h>

int main(void)
{
@@ -153,5 +154,7 @@ int main(void)
DEFINE(HIBERN_PBE_ADDR, offsetof(struct pbe, address));
DEFINE(HIBERN_PBE_NEXT, offsetof(struct pbe, next));
DEFINE(ARM64_FTR_SYSVAL, offsetof(struct arm64_ftr_reg, sys_val));
+
+ DEFINE(EFI_SYSTAB_RUNTIME, offsetof(efi_system_table_t, runtime));
return 0;
}
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index bc7ade5..389402a 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -23,6 +23,7 @@
#include <linux/kvm_host.h>
#include <linux/smccc_fn.h>

+#include <asm/efi.h>
#include <asm/esr.h>
#include <asm/kvm_asm.h>
#include <asm/kvm_coproc.h>
@@ -57,6 +58,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)

if (kvm_psci_is_call(vcpu))
ret = kvm_psci_call(vcpu);
+ else if (efi_sandbox_is_exit(vcpu))
+ ret = efi_sandbox_exit(vcpu);
else
vcpu_set_reg(vcpu, 0, SMCCC_STD_RET_UNKNOWN_ID);

diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig
index 394db40..a441acc 100644
--- a/drivers/firmware/efi/Kconfig
+++ b/drivers/firmware/efi/Kconfig
@@ -151,6 +151,16 @@ config APPLE_PROPERTIES

If unsure, say Y if you have a Mac. Otherwise N.

+config EFI_SANDBOX
+ bool "EFI Runtime Services sandboxing"
+ depends on EFI && KVM && ARM64
+ default n
+ help
+ Select this option to sandbox Runtime Services in a virtual machine
+ and limit their possible impact on security and stability.
+
+ Say Y here to enable the runtime services sandbox. If unsure, say N.
+
endmenu

config UEFI_CPER
diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile
index 0329d31..2da9ad4 100644
--- a/drivers/firmware/efi/Makefile
+++ b/drivers/firmware/efi/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_EFI_DEV_PATH_PARSER) += dev-path-parser.o
obj-$(CONFIG_APPLE_PROPERTIES) += apple-properties.o

arm-obj-$(CONFIG_EFI) := arm-init.o arm-runtime.o
+arm-obj-$(CONFIG_EFI_SANDBOX) += arm-sandbox.o arm-sandbox-payload.o
obj-$(CONFIG_ARM) += $(arm-obj-y)
obj-$(CONFIG_ARM64) += $(arm-obj-y)
obj-$(CONFIG_EFI_CAPSULE_LOADER) += capsule-loader.o
diff --git a/drivers/firmware/efi/arm-runtime.c b/drivers/firmware/efi/arm-runtime.c
index d94d240..81ce0de 100644
--- a/drivers/firmware/efi/arm-runtime.c
+++ b/drivers/firmware/efi/arm-runtime.c
@@ -107,6 +107,8 @@ static bool __init efi_virtmap_init(void)
if (efi_memattr_apply_permissions(&efi_mm, efi_set_mapping_permissions))
return false;

+ efi_arm_sandbox_init(&efi_mm);
+
return true;
}

diff --git a/drivers/firmware/efi/arm-sandbox-payload.S b/drivers/firmware/efi/arm-sandbox-payload.S
new file mode 100644
index 0000000..0f34576
--- /dev/null
+++ b/drivers/firmware/efi/arm-sandbox-payload.S
@@ -0,0 +1,96 @@
+/*
+ * Copyright (C) 2017 ARM Ltd.
+ * Author: Florent Revest <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/linkage.h>
+#include <linux/smccc_fn.h>
+
+#include <asm/assembler.h>
+#include <asm/asm-offsets.h>
+
+#define EFI_ABORTED 0x8000000000000015
+
+/*
+ * The following assembly is meant to be a page of payload code mapped at the
+ * address 0x0 of the EFI Runtime Services sandbox memory. Hence, align to page:
+ */
+.align PAGE_SHIFT
+
+/*
+ * efi_sandbox_excpt - Exception handler
+ *
+ * Exit the virtual machine cleanly with an EFI status that allows recovering
+ * and debugging in case of an exception.
+ */
+ENTRY(efi_sandbox_excpt)
+ ldr x1, =EFI_ABORTED /* Error code for debugging */
+ mov x0, SMCCC_FN_EFI_SANDBOX_RET /* Fill the Function ID reg */
+ hvc #0 /* Notify the host of the ret */
+
+efi_sandbox_excpt_safeguard:
+ b efi_sandbox_excpt_safeguard /* Should never be executed */
+ENDPROC(efi_sandbox_excpt)
+
+/*
+ * efi_sandbox_wrapper - Runtime Service internal wrapper
+ *
+ * When called, the registers are expected to be in the following state:
+ * - x0, x1, x2...: EFI parameters according to the ARM calling convention
+ * - x10: Virtual address of the UEFI System Table
+ * - x11: Offset of the wanted service in the runtime services table
+ *
+ * When this wrapper returns with an HVC, the registers should be as follow:
+ * - x0: HVC Function ID (SMCCC_FN_EFI_SANDBOX_RET) according to SMCCC
+ * - x1: The EFI status returned by the runtime service
+ */
+ENTRY(efi_sandbox_wrapper)
+ ldr x10, [x10, #EFI_SYSTAB_RUNTIME] /* Get systab.runtime */
+ ldr x10, [x10, x11] /* Get function pointer */
+
+ blr x10 /* Call the runtime service */
+
+ mov x1, x0 /* Keep ret in x1 */
+ mov x0, SMCCC_FN_EFI_SANDBOX_RET /* Fill the Function ID reg */
+ hvc #0 /* Notify the host of the ret */
+
+efi_sandbox_wrapper_safeguard:
+ b efi_sandbox_wrapper_safeguard /* Should never be executed */
+ENDPROC(efi_sandbox_wrapper)
+
+/*
+ * efi_sandbox_vectors - Exception vectors table
+ *
+ * Always call efi_sandbox_excpt (mapped at 0x0) in case of an exception.
+ */
+.align 11
+ENTRY(efi_sandbox_vectors)
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+ ventry efi_sandbox_excpt
+END(efi_sandbox_vectors)
diff --git a/drivers/firmware/efi/arm-sandbox.c b/drivers/firmware/efi/arm-sandbox.c
new file mode 100644
index 0000000..009037d
--- /dev/null
+++ b/drivers/firmware/efi/arm-sandbox.c
@@ -0,0 +1,569 @@
+/*
+ * Copyright (C) 2017 ARM Ltd.
+ * Author: Florent Revest <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/smccc_fn.h>
+
+#include <asm/efi.h>
+#include <asm/pgtable-hwdef.h>
+#include <asm/kvm_emulate.h>
+#include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
+#include <asm/sysreg.h>
+
+/*
+ * RUNTIME SERVICES SANDBOX
+ *
+ * If CONFIG_EFI_SANDBOX is enabled, an internal virtual machine with one VCPU
+ * is spawned using KVM early in the boot process. The memory (both stage 2 and
+ * stage 1) and registers (general purpose, system etc) are setup from the host
+ * code in efi_arm_sandbox_init to prepare a suitable execution environment for
+ * EFI. The VCPU's system registers(SCTLR & TCR) are configured as follow.
+ */
+
+static struct kvm_vcpu *sandbox_vcpu;
+
+/* Translation Control Register - Enable TTBR0 only, with host's page size */
+
+#ifdef CONFIG_ARM64_64K_PAGES
+#define EFI_SANDBOX_TCR_TG TCR_TG0_64K
+#elif defined(CONFIG_ARM64_16K_PAGES)
+#define EFI_SANDBOX_TCR_TG TCR_TG0_16K
+#else /* CONFIG_ARM64_64K_PAGES */
+#define EFI_SANDBOX_TCR_TG TCR_TG0_4K
+#endif
+
+#define EFI_SANDBOX_TCR_EPD1 (1UL << 23)
+
+#define EFI_SANDBOX_TCR_IPS ((read_sysreg(id_aa64mmfr0_el1) & 7) << 32)
+
+#define EFI_SANDBOX_TCR (TCR_T0SZ(VA_BITS) | TCR_IRGN0_WBWA | \
+ TCR_ORGN0_WBWA | TCR_SH0_INNER | \
+ EFI_SANDBOX_TCR_EPD1 | EFI_SANDBOX_TCR_TG | \
+ EFI_SANDBOX_TCR_TG | EFI_SANDBOX_TCR_IPS)
+
+/* System Control Register - Enables MMU, Cache, Stack Alignment and I-Cache */
+
+#define SCTLR_EL1_RES1 ((1UL << 11) | (1UL << 20) | (1UL << 22) | \
+ (1UL << 23) | (1UL << 28) | (1UL << 29))
+#define EFI_SANDBOX_SCTLR (SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_SA | \
+ SCTLR_ELx_I | SCTLR_EL1_RES1)
+
+/*
+ * GUEST'S PHYSICAL MEMORY MAPPING
+ *
+ * Quite a few pages are idmapped from the host physical memory to the guest
+ * intermediary physical memory. The addresses of the GPAs are then those of the
+ * HPAs. Those pages cover:
+ *
+ * - The code page:
+ * A code payload (the content of arm-sandbox-payload.S) offers an internal
+ * wrapper to call a Runtime Service then leave the VM with a hypercall and
+ * also an exception handler.
+ *
+ * - The data page(s):
+ * A range of writable memory is reserved to pass internal copies of buffer
+ * args (maintained as a stack growing positively) and also for the VCPU's own
+ * stack (growing negatively). The EFI Runtime Services need a VCPU stack size
+ * of 8K, so the total writable memory must be at least 16K long which can
+ * represent a different number of pages depending on our PAGE_SIZE.
+ *
+ * - The page tables:
+ * Since the host takes care of the guest's stage1 virtual mapping (cf. next
+ * section), stage1 page tables need to be created from the host kernel and
+ * then mapped into the guest.
+ *
+ * - Various EFI segments:
+ * The rest of the physical memory can be made of any number of memory ranges
+ * given by EFI memory descriptors at boottime. Those regions can possibly be
+ * non-contiguous and cover everything needed to run the Runtime Services (i.e:
+ * the code, data and IO).
+ *
+ * The above stage2 mapping is achieved using several memslots, one for each
+ * contiguous area of idmapped memory.
+ */
+
+static void *efi_data_pages;
+#define EDI_SDBX_DATA_PAGES_SIZE PAGE_ALIGN(SZ_16K)
+
+#define EFI_SDBX_BASE_MEMSLOT KVM_USER_MEM_SLOTS
+
+static void idmap_hpa_to_gpa(struct kvm *kvm, unsigned long sz, phys_addr_t pa)
+{
+ static int slot_nb = EFI_SDBX_BASE_MEMSLOT;
+ struct kvm_userspace_memory_region mem = {
+ .slot = slot_nb,
+ .memory_size = sz,
+ .guest_phys_addr = pa,
+ };
+
+ kvm_set_memory_region(kvm, &mem);
+
+ slot_nb++;
+}
+
+/*
+ * GUEST'S VIRTUAL MEMORY MAPPING
+ *
+ * 0x0 +------+
+ * | Code | The code and data pages can be accessed from the
+ * PAGE_SIZE +------+ virtual low memory. The code page is the first page of
+ * | Data | VM and the data pages follow starting from the 2nd page
+ * ... +------+
+ * | // |
+ * Higher mem +------+
+ * | EFI2 | The rest of the memory is still dedicated to EFI. The
+ * +------+ mapping is identical to the virtual mapping used by the
+ * | // | host, so the kernel's EFI arch specific code should
+ * +------+ already have configured EFI for this mapping using a
+ * | EFI1 | call to SetVirtualAddressMap (SVAM).
+ * +------+
+ * | // |
+ * +------+
+ *
+ * The above stage1 is achieved by extending, mapping and using efi_mm in the VM
+ */
+
+#define EFI_SDBX_CODE_PAGE_GVA 0x0
+#define EFI_SDBX_EXCEPTION_GVA EFI_SDBX_CODE_PAGE_GVA
+#define EFI_SDBX_WRAPPER_GVA (efi_sandbox_wrapper-efi_sandbox_excpt)
+#define EFI_SDBX_VECTORS_GVA (efi_sandbox_vectors-efi_sandbox_excpt)
+
+#define EFI_SDBX_DATA_PAGE_GVA PAGE_SIZE
+#define EFI_SDBX_STACK_POINTER_GVA (PAGE_SIZE + EDI_SDBX_DATA_PAGES_SIZE)
+
+static void map_ptes_to_guest(struct kvm *kvm, pmd_t *pmd)
+{
+ pte_t *pte = pte_offset_kernel(pmd, 0UL);
+
+ idmap_hpa_to_gpa(kvm, PTRS_PER_PTE * sizeof(pte), virt_to_phys(pte));
+}
+
+static void map_pmds_to_guest(struct kvm *kvm, pud_t *pud)
+{
+ pmd_t *pmd = pmd_offset(pud, 0UL);
+ unsigned int i;
+
+ idmap_hpa_to_gpa(kvm, PTRS_PER_PTE * sizeof(pmd), virt_to_phys(pmd));
+ for (i = 0; i < PTRS_PER_PMD; i++, pmd++) {
+ if (!pmd_none(*pmd) && !pmd_sect(*pmd) && !pmd_bad(*pmd))
+ map_ptes_to_guest(kvm, pmd);
+ }
+}
+
+static void map_puds_to_guest(struct kvm *kvm, pgd_t *pgd)
+{
+ pud_t *pud = pud_offset(pgd, 0UL);
+ unsigned int i;
+
+ idmap_hpa_to_gpa(kvm, PTRS_PER_PTE * sizeof(pud), virt_to_phys(pud));
+ for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
+ if (!pud_none(*pud) && !pud_sect(*pud) && !pud_bad(*pud))
+ map_pmds_to_guest(kvm, pud);
+ }
+}
+
+static void map_pgds_to_guest(struct kvm *kvm, struct mm_struct *mm)
+{
+ pgd_t *pgd = pgd_offset(mm, 0UL);
+ unsigned int i;
+
+ idmap_hpa_to_gpa(kvm, PTRS_PER_PTE * sizeof(pgd), virt_to_phys(pgd));
+ for (i = 0; i < PTRS_PER_PGD; i++, pgd++) {
+ if (!pgd_none(*pgd) && !pgd_bad(*pgd))
+ map_puds_to_guest(kvm, pgd);
+ }
+}
+
+/*
+ * RUNTIME SERVICE EXECUTION AND RETURN
+ *
+ * When the kernel or a module needs access to an EFI Runtime Service, the arch
+ * agnostic part of EFI enters "arch_efi_call_virt" which resolves in a call to
+ * one of the various efi_sandbox_* external wrappers at the end of this file.
+ *
+ * No two runtime services are ever executed at the same time thanks to the
+ * efi_runtime_lock semaphore, hence we don't have to worry about interlocking
+ * accesses to the VCPU.
+ *
+ * However, those external wrappers need to wait for the end of the execution of
+ * the runtime service in the VM. This is achieved thanks to the internal
+ * wrapper "efi_sandbox_wrapper" which issues an HVC when done.
+ *
+ * The HVC is caught by efi_sandbox_exit which, if appropriate, finishes the
+ * execution of kvm_arch_vcpu_ioctl_run. Therefore, the VM is only running when
+ * needed.
+ *
+ * EFI return values are written in x0 according to the ARM calling convention.
+ * However, since the VM is left using a hypercall, x0 contains the function
+ * ID according to the ARM SMCCC, so the Runtime Service's ret value is to be
+ * found in the x1 register.
+ */
+
+#define efi_sandbox_call_iwrapper(func) \
+({ \
+ vcpu_set_reg(sandbox_vcpu, 10, (unsigned long int)efi.systab); \
+ vcpu_set_reg(sandbox_vcpu, 11, offsetof(efi_runtime_services_t, func));\
+ *vcpu_pc(sandbox_vcpu) = EFI_SDBX_WRAPPER_GVA; \
+ \
+ if (!vcpu_load(sandbox_vcpu)) { \
+ kvm_arch_vcpu_ioctl_run(sandbox_vcpu, sandbox_vcpu->run); \
+ vcpu_put(sandbox_vcpu); \
+ } \
+ \
+ vcpu_get_reg(sandbox_vcpu, 1); \
+})
+
+inline bool efi_sandbox_is_exit(struct kvm_vcpu *vcpu)
+{
+ unsigned long fn = vcpu_get_reg(vcpu, 0) & ~((u32) 0);
+ return fn == SMCCC_FN_EFI_SANDBOX_RET;
+}
+
+/*
+ * If we receive a SMCCC_FN_EFI_SANDBOX_RET from our own sandbox, we can stop
+ * the current kvm_arch_vcpu_ioctl_run cleanly by returning 0. Otherwise we
+ * propagate an error code.
+ */
+int efi_sandbox_exit(struct kvm_vcpu *origin_vcpu)
+{
+ if (origin_vcpu == sandbox_vcpu)
+ return 0;
+
+ return -EINVAL;
+}
+
+/*
+ * RUNTIME SERVICE ARGUMENTS PASSING
+ *
+ * The Runtime Services expect arguments to be passed in their registers using
+ * the usual ARM calling convention. Simple arguments such as integer values can
+ * just be serialized into the VCPU from the host. However, some of the Services
+ * expect pointers to buffers or data structures to access or return complex
+ * data. Those buffers are copied into the data pages defined above before
+ * executing the function and then copied back for host's usage. This avoids
+ * having to map kernel's pages to the guest.
+ */
+
+#define efi_sandbox_set_arg(arg_num, val) \
+ vcpu_set_reg(sandbox_vcpu, arg_num, (unsigned long int)val)
+
+#define efi_sandbox_push_arg_ptr(arg_num, val, arg_size, offset) \
+({ \
+ if (val) { \
+ phys_addr_t gva = EFI_SDBX_DATA_PAGE_GVA + offset; \
+ memcpy(efi_data_pages + offset, val, (size_t)arg_size); \
+ vcpu_set_reg(sandbox_vcpu, arg_num, gva); \
+ offset += arg_size; \
+ } \
+})
+
+#define efi_sandbox_pop_arg_ptr(val, arg_size, offset) \
+({ \
+ if (val) { \
+ offset -= arg_size; \
+ memcpy(val, efi_data_pages + offset, arg_size); \
+ } \
+})
+
+/*
+ * Boot-time initialization
+ */
+
+void __init efi_arm_sandbox_init(struct mm_struct *efi_mm)
+{
+ efi_memory_desc_t *md;
+ struct kvm *kvm;
+ struct kvm_vcpu_init init;
+ phys_addr_t efi_code_page_pa, efi_data_pages_pa;
+
+ /* Create and configure the sandboxing context */
+ kvm = kvm_create_internal_vm(0);
+ kvm_vm_create_vcpu(kvm, 0);
+ sandbox_vcpu = kvm_get_vcpu_by_id(kvm, 0);
+ kvm_vcpu_preferred_target(&init);
+ kvm_arm_vcpu_init(sandbox_vcpu, &init);
+
+ vcpu_sys_reg(sandbox_vcpu, TTBR0_EL1) = virt_to_phys(efi_mm->pgd);
+ vcpu_sys_reg(sandbox_vcpu, TCR_EL1) = EFI_SANDBOX_TCR;
+ vcpu_sys_reg(sandbox_vcpu, SCTLR_EL1) = EFI_SANDBOX_SCTLR;
+ vcpu_sys_reg(sandbox_vcpu, VBAR_EL1) = EFI_SDBX_VECTORS_GVA;
+ vcpu_gp_regs(sandbox_vcpu)->sp_el1 = EFI_SDBX_STACK_POINTER_GVA;
+
+ /* Create the physical memory mapping (stage2 translation) */
+ efi_code_page_pa = __pa_symbol(efi_sandbox_excpt);
+ idmap_hpa_to_gpa(kvm, PAGE_SIZE, efi_code_page_pa);
+
+ efi_data_pages = kmalloc(EDI_SDBX_DATA_PAGES_SIZE, GFP_KERNEL);
+ efi_data_pages_pa = virt_to_phys(efi_data_pages);
+ idmap_hpa_to_gpa(kvm, EDI_SDBX_DATA_PAGES_SIZE, efi_data_pages_pa);
+
+ for_each_efi_memory_desc(md) {
+ phys_addr_t phys = md->phys_addr;
+ size_t size = md->num_pages << EFI_PAGE_SHIFT;
+
+ if (md->attribute & EFI_MEMORY_RUNTIME)
+ idmap_hpa_to_gpa(kvm, size, phys);
+ }
+
+ /* Create the virtual memory mapping (stage1 translation) */
+ create_pgd_mapping(efi_mm, efi_code_page_pa, EFI_SDBX_CODE_PAGE_GVA,
+ PAGE_SIZE, PAGE_KERNEL_EXEC, true);
+
+ create_pgd_mapping(efi_mm, efi_data_pages_pa, EFI_SDBX_DATA_PAGE_GVA,
+ EDI_SDBX_DATA_PAGES_SIZE, PAGE_KERNEL, true);
+
+ map_pgds_to_guest(kvm, efi_mm);
+}
+
+/*
+ * Runtime Services External Wrappers
+ */
+
+efi_status_t efi_sandbox_get_time(efi_time_t *tm,
+ efi_time_cap_t *tc)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, tm, sizeof(efi_time_t), offset);
+ efi_sandbox_push_arg_ptr(1, tc, sizeof(efi_time_cap_t), offset);
+
+ ret = efi_sandbox_call_iwrapper(get_time);
+
+ efi_sandbox_pop_arg_ptr(tc, sizeof(efi_time_cap_t), offset);
+ efi_sandbox_pop_arg_ptr(tm, sizeof(efi_time_t), offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_set_time(efi_time_t *tm)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, tm, sizeof(efi_time_t), offset);
+
+ ret = efi_sandbox_call_iwrapper(set_time);
+
+ efi_sandbox_pop_arg_ptr(tm, sizeof(efi_time_t), offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_get_wakeup_time(efi_bool_t *enabled,
+ efi_bool_t *pending,
+ efi_time_t *tm)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, enabled, sizeof(efi_bool_t), offset);
+ efi_sandbox_push_arg_ptr(1, pending, sizeof(efi_bool_t), offset);
+ efi_sandbox_push_arg_ptr(2, tm, sizeof(efi_time_t), offset);
+
+ ret = efi_sandbox_call_iwrapper(get_wakeup_time);
+
+ efi_sandbox_pop_arg_ptr(tm, sizeof(efi_time_t), offset);
+ efi_sandbox_pop_arg_ptr(pending, sizeof(efi_bool_t), offset);
+ efi_sandbox_pop_arg_ptr(enabled, sizeof(efi_bool_t), offset);
+
+ return ret;
+
+}
+
+efi_status_t efi_sandbox_set_wakeup_time(efi_bool_t enabled,
+ efi_time_t *tm)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_set_arg(0, enabled);
+ efi_sandbox_push_arg_ptr(1, tm, sizeof(efi_time_t), offset);
+
+ ret = efi_sandbox_call_iwrapper(set_wakeup_time);
+
+ efi_sandbox_pop_arg_ptr(tm, sizeof(efi_time_t), offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_get_variable(efi_char16_t *name,
+ efi_guid_t *vendor,
+ u32 *attr,
+ unsigned long *data_size,
+ void *data)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, name, 1024 * sizeof(efi_char16_t), offset);
+ efi_sandbox_push_arg_ptr(1, vendor, sizeof(efi_guid_t), offset);
+ efi_sandbox_push_arg_ptr(2, attr, sizeof(u32), offset);
+ efi_sandbox_push_arg_ptr(3, data_size, sizeof(unsigned long), offset);
+ efi_sandbox_push_arg_ptr(4, data, *data_size, offset);
+
+ ret = efi_sandbox_call_iwrapper(get_variable);
+
+ efi_sandbox_pop_arg_ptr(data, *data_size, offset);
+ efi_sandbox_pop_arg_ptr(data_size, sizeof(unsigned long), offset);
+ efi_sandbox_pop_arg_ptr(attr, sizeof(u32), offset);
+ efi_sandbox_pop_arg_ptr(vendor, sizeof(efi_guid_t), offset);
+ efi_sandbox_pop_arg_ptr(name, 1024 * sizeof(efi_char16_t), offset);
+
+ return ret;
+}
+
+
+efi_status_t efi_sandbox_get_next_variable(unsigned long *name_size,
+ efi_char16_t *name,
+ efi_guid_t *vendor)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, name_size, sizeof(unsigned long), offset);
+ efi_sandbox_push_arg_ptr(1, name, 1024 * sizeof(efi_char16_t), offset);
+ efi_sandbox_push_arg_ptr(2, vendor, sizeof(efi_guid_t), offset);
+
+ ret = efi_sandbox_call_iwrapper(get_next_variable);
+
+ efi_sandbox_pop_arg_ptr(vendor, sizeof(efi_guid_t), offset);
+ efi_sandbox_pop_arg_ptr(name, 1024 * sizeof(efi_char16_t), offset);
+ efi_sandbox_pop_arg_ptr(name_size, sizeof(unsigned long), offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_set_variable(efi_char16_t *name,
+ efi_guid_t *vendor,
+ u32 attr,
+ unsigned long data_size,
+ void *data)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, name, 1024 * sizeof(efi_char16_t), offset);
+ efi_sandbox_push_arg_ptr(1, vendor, sizeof(efi_guid_t), offset);
+ efi_sandbox_set_arg(2, attr);
+ efi_sandbox_set_arg(3, data_size);
+ efi_sandbox_push_arg_ptr(4, data, data_size, offset);
+
+ ret = efi_sandbox_call_iwrapper(set_variable);
+
+ efi_sandbox_pop_arg_ptr(data, data_size, offset);
+ efi_sandbox_pop_arg_ptr(vendor, sizeof(efi_guid_t), offset);
+ efi_sandbox_pop_arg_ptr(name, 1024 * sizeof(efi_char16_t), offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_query_variable_info(u32 attr,
+ u64 *storage_space,
+ u64 *remaining_space,
+ u64 *max_variable_size)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(1, storage_space, sizeof(u64), offset);
+ efi_sandbox_push_arg_ptr(2, remaining_space, sizeof(u64), offset);
+ efi_sandbox_push_arg_ptr(3, max_variable_size, sizeof(u64), offset);
+
+ ret = efi_sandbox_call_iwrapper(query_variable_info);
+
+ efi_sandbox_pop_arg_ptr(max_variable_size, sizeof(u64), offset);
+ efi_sandbox_pop_arg_ptr(remaining_space, sizeof(u64), offset);
+ efi_sandbox_pop_arg_ptr(storage_space, sizeof(u64), offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_get_next_high_mono_count(u32 *count)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, count, sizeof(u32), offset);
+
+ ret = efi_sandbox_call_iwrapper(get_next_high_mono_count);
+
+ efi_sandbox_pop_arg_ptr(count, sizeof(u32), offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_reset_system(int reset_type,
+ efi_status_t status,
+ unsigned long data_size,
+ efi_char16_t *data)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_set_arg(0, reset_type);
+ efi_sandbox_set_arg(1, status);
+ efi_sandbox_set_arg(2, data_size);
+ efi_sandbox_push_arg_ptr(3, data, data_size, offset);
+
+ ret = efi_sandbox_call_iwrapper(reset_system);
+
+ efi_sandbox_pop_arg_ptr(data, data_size, offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_update_capsule(efi_capsule_header_t **capsules,
+ unsigned long count,
+ unsigned long sg_list)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, capsules, sizeof(efi_capsule_header_t *),
+ offset);
+ efi_sandbox_set_arg(1, count);
+ efi_sandbox_set_arg(2, sg_list);
+
+ ret = efi_sandbox_call_iwrapper(update_capsule);
+
+ efi_sandbox_pop_arg_ptr(capsules, sizeof(efi_capsule_header_t *),
+ offset);
+
+ return ret;
+}
+
+efi_status_t efi_sandbox_query_capsule_caps(efi_capsule_header_t **capsules,
+ unsigned long count,
+ u64 *max_size,
+ int *reset_type)
+{
+ unsigned long offset = 0;
+ efi_status_t ret;
+
+ efi_sandbox_push_arg_ptr(0, capsules, sizeof(efi_capsule_header_t *),
+ offset);
+ efi_sandbox_set_arg(1, count);
+ efi_sandbox_push_arg_ptr(2, max_size, sizeof(u64), offset);
+ efi_sandbox_push_arg_ptr(3, reset_type, sizeof(int), offset);
+
+ ret = efi_sandbox_call_iwrapper(query_capsule_caps);
+
+ efi_sandbox_pop_arg_ptr(reset_type, sizeof(int), offset);
+ efi_sandbox_pop_arg_ptr(max_size, sizeof(u64), offset);
+ efi_sandbox_pop_arg_ptr(capsules, sizeof(efi_capsule_header_t *),
+ offset);
+
+ return ret;
+}
diff --git a/include/linux/smccc_fn.h b/include/linux/smccc_fn.h
index f08145d..d5fe672 100644
--- a/include/linux/smccc_fn.h
+++ b/include/linux/smccc_fn.h
@@ -47,4 +47,7 @@
#define SMCCC64_VDR_HYP_FN_BASE 0xc6000000
#define SMCCC64_VDR_HYP_FN(n) (SMCCC64_VDR_HYP_FN_BASE + (n))

+/* EFI Sandboxing services */
+#define SMCCC_FN_EFI_SANDBOX_RET SMCCC64_VDR_HYP_FN(0)
+
#endif /* __LINUX_SMCCC_FN_H */
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:35:22

by Florent Revest

[permalink] [raw]
Subject: [RFC 08/11] KVM, arm, arm64: Initialize KVM's core earlier

In order to use internal VMs early in the boot process, the arm_init
function, in charge of initializing KVM, is split in two parts:
- A subsys_initcall target initializing KVM's core only
- A module_init target initializing KVM's userspace facing files

An implicit dependency of VM execution on arm and arm64, the
initialization of KVM system registers, is also rescheduled to be
effective as soon as KVM's core is initialized.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm/include/asm/kvm_coproc.h | 3 +++
arch/arm/include/asm/kvm_host.h | 1 +
arch/arm/kvm/coproc.c | 6 ++++++
arch/arm/kvm/coproc_a15.c | 3 +--
arch/arm/kvm/coproc_a7.c | 3 +--
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/sys_regs_generic_v8.c | 8 ++++++--
virt/kvm/arm/arm.c | 13 +++++++++++--
8 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/arch/arm/include/asm/kvm_coproc.h b/arch/arm/include/asm/kvm_coproc.h
index e74ab0f..1502723 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -45,4 +45,7 @@ struct kvm_coproc_target_table {
int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
+
+int coproc_a15_init(void);
+int coproc_a7_init(void);
#endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 127e2dd..fb94666 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -287,6 +287,7 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+void kvm_arm_init_sys_reg(void);
static inline void kvm_arm_init_debug(void) {}
static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {}
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 6d1d2e2..28bc397 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -1369,3 +1369,9 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
if (vcpu_cp15(vcpu, num) == 0x42424242)
panic("Didn't reset vcpu_cp15(vcpu, %zi)", num);
}
+
+void kvm_arm_init_sys_reg(void)
+{
+ coproc_a7_init();
+ coproc_a15_init();
+}
diff --git a/arch/arm/kvm/coproc_a15.c b/arch/arm/kvm/coproc_a15.c
index a713675..83102a3 100644
--- a/arch/arm/kvm/coproc_a15.c
+++ b/arch/arm/kvm/coproc_a15.c
@@ -43,9 +43,8 @@
.num = ARRAY_SIZE(a15_regs),
};

-static int __init coproc_a15_init(void)
+int coproc_a15_init(void)
{
kvm_register_target_coproc_table(&a15_target_table);
return 0;
}
-late_initcall(coproc_a15_init);
diff --git a/arch/arm/kvm/coproc_a7.c b/arch/arm/kvm/coproc_a7.c
index b19e46d..b365ac0 100644
--- a/arch/arm/kvm/coproc_a7.c
+++ b/arch/arm/kvm/coproc_a7.c
@@ -46,9 +46,8 @@
.num = ARRAY_SIZE(a7_regs),
};

-static int __init coproc_a7_init(void)
+int coproc_a7_init(void)
{
kvm_register_target_coproc_table(&a7_target_table);
return 0;
}
-late_initcall(coproc_a7_init);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 07b7460..e360bb3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -374,6 +374,7 @@ static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);

+void kvm_arm_init_sys_reg(void);
void kvm_arm_init_debug(void);
void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/sys_regs_generic_v8.c b/arch/arm64/kvm/sys_regs_generic_v8.c
index 969ade1..0fe755d 100644
--- a/arch/arm64/kvm/sys_regs_generic_v8.c
+++ b/arch/arm64/kvm/sys_regs_generic_v8.c
@@ -72,7 +72,7 @@ static void reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
},
};

-static int __init sys_reg_genericv8_init(void)
+static int sys_reg_genericv8_init(void)
{
unsigned int i;

@@ -95,4 +95,8 @@ static int __init sys_reg_genericv8_init(void)

return 0;
}
-late_initcall(sys_reg_genericv8_init);
+
+void kvm_arm_init_sys_reg(void)
+{
+ sys_reg_genericv8_init();
+}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index aa29a5d..7d0aa4f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1451,6 +1451,8 @@ int kvm_arch_init(void *opaque)
int err;
int ret, cpu;

+ kvm_arm_init_sys_reg();
+
if (!is_hyp_mode_available()) {
kvm_err("HYP mode not available\n");
return -ENODEV;
@@ -1496,8 +1498,15 @@ void kvm_arch_exit(void)

static int arm_init(void)
{
- int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
+ int rc = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, NULL);
return rc;
}

-module_init(arm_init);
+subsys_initcall(arm_init);
+
+static int arm_module_init(void)
+{
+ return kvm_set_module(THIS_MODULE);
+}
+
+module_init(arm_module_init);
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:32:47

by Florent Revest

[permalink] [raw]
Subject: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs

Usual KVM virtual machines map guest's physical addresses from a process
userspace memory. However, with the new concept of internal VMs, a virtual
machine can be created from the kernel, without any link to a userspace
context. Hence, some of the KVM's architecture-specific code needs to be
modified to take this kind of VMs into account.

The approach chosen with this patch is to let internal VMs idmap physical
addresses into intermediary physical addresses by calling
kvm_set_memory_region with a kvm_userspace_memory_region where the
guest_phys_addr field points both to the original PAs and to the IPAs. The
userspace_addr field of this struct is therefore ignored with internal VMs.

This patch extends the capabilities of the arm and arm64 stage2 MMU code
to handle internal VMs. Three things are changed:

- Various parts of the MMU code which are related to a userspace context
are now only executed if kvm->mm is present.

- When this pointer is NULL, struct kvm_userspace_memory_regions are
treated by internal_vm_prep_mem as idmaps of physical memory.

- A set of 256 additional private memslots is now reserved on arm64 for the
usage of internal VMs memory idmapping.

Note: this patch should have pretty much no performance impact on the
critical path of traditional VMs since only one unlikely branch had to be
added to the page fault handler.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
virt/kvm/arm/mmu.c | 76 +++++++++++++++++++++++++++++++++++++--
2 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d686300..65aab35 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -32,6 +32,7 @@
#define __KVM_HAVE_ARCH_INTC_INITIALIZED

#define KVM_USER_MEM_SLOTS 512
+#define KVM_PRIVATE_MEM_SLOTS 256
#define KVM_HALT_POLL_NS_DEFAULT 500000

#include <kvm/arm_vgic.h>
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 2ea21da..1d2d3df 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm *kvm,
phys_addr_t size = PAGE_SIZE * memslot->npages;
hva_t reg_end = hva + size;

+ if (unlikely(!kvm->mm)) {
+ unmap_stage2_range(kvm, addr, size);
+ return;
+ }
+
/*
* A memory region could potentially cover multiple VMAs, and any holes
* between them, so iterate over all of them to find out if we should
@@ -819,7 +824,8 @@ void stage2_unmap_vm(struct kvm *kvm)
int idx;

idx = srcu_read_lock(&kvm->srcu);
- down_read(&current->mm->mmap_sem);
+ if (likely(kvm->mm))
+ down_read(&current->mm->mmap_sem);
spin_lock(&kvm->mmu_lock);

slots = kvm_memslots(kvm);
@@ -827,7 +833,8 @@ void stage2_unmap_vm(struct kvm *kvm)
stage2_unmap_memslot(kvm, memslot);

spin_unlock(&kvm->mmu_lock);
- up_read(&current->mm->mmap_sem);
+ if (likely(kvm->mm))
+ up_read(&current->mm->mmap_sem);
srcu_read_unlock(&kvm->srcu, idx);
}

@@ -1303,6 +1310,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}

+ if (unlikely(!kvm->mm)) {
+ kvm_err("Unexpected internal VM page fault\n");
+ kvm_inject_vabt(vcpu);
+ return 0;
+ }
+
/* Let's check if we will get back a huge page backed by hugetlbfs */
down_read(&current->mm->mmap_sem);
vma = find_vma_intersection(current->mm, hva, hva + 1);
@@ -1850,6 +1863,54 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
kvm_mmu_wp_memory_region(kvm, mem->slot);
}

+/*
+ * internal_vm_prep_mem - maps a range of hpa to gpa at stage2
+ *
+ * While userspace VMs manage gpas using hvas, internal virtual machines need a
+ * way to map physical addresses to a guest. In order to avoid code duplication,
+ * the kvm_set_memory_region call is kept for internal VMs, however it usually
+ * expects a struct kvm_userspace_memory_region with a userspace_addr field.
+ * With internal VMs, this field is ignored and physical memory memory pointed
+ * by guest_phys_addr can only be idmapped.
+ */
+static int internal_vm_prep_mem(struct kvm *kvm,
+ const struct kvm_userspace_memory_region *mem)
+{
+ phys_addr_t addr, end;
+ unsigned long pfn;
+ int ret;
+ struct kvm_mmu_memory_cache cache = { 0 };
+
+ end = mem->guest_phys_addr + mem->memory_size;
+ pfn = __phys_to_pfn(mem->guest_phys_addr);
+ addr = mem->guest_phys_addr;
+
+ for (; addr < end; addr += PAGE_SIZE) {
+ pte_t pte = pfn_pte(pfn, PAGE_S2);
+
+ pte = kvm_s2pte_mkwrite(pte);
+
+ ret = mmu_topup_memory_cache(&cache,
+ KVM_MMU_CACHE_MIN_PAGES,
+ KVM_NR_MEM_OBJS);
+ if (ret) {
+ mmu_free_memory_cache(&cache);
+ return ret;
+ }
+ spin_lock(&kvm->mmu_lock);
+ ret = stage2_set_pte(kvm, &cache, addr, &pte, 0);
+ spin_unlock(&kvm->mmu_lock);
+ if (ret) {
+ mmu_free_memory_cache(&cache);
+ return ret;
+ }
+
+ pfn++;
+ }
+
+ return ret;
+}
+
int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
const struct kvm_userspace_memory_region *mem,
@@ -1872,6 +1933,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
(KVM_PHYS_SIZE >> PAGE_SHIFT))
return -EFAULT;

+ if (unlikely(!kvm->mm)) {
+ ret = internal_vm_prep_mem(kvm, mem);
+ if (ret)
+ goto out;
+ goto out_internal_vm;
+ }
+
down_read(&current->mm->mmap_sem);
/*
* A memory region could potentially cover multiple VMAs, and any holes
@@ -1930,6 +1998,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
hva = vm_end;
} while (hva < reg_end);

+out_internal_vm:
if (change == KVM_MR_FLAGS_ONLY)
goto out;

@@ -1940,7 +2009,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
stage2_flush_memslot(kvm, memslot);
spin_unlock(&kvm->mmu_lock);
out:
- up_read(&current->mm->mmap_sem);
+ if (kvm->mm)
+ up_read(&current->mm->mmap_sem);
return ret;
}

--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:35:47

by Florent Revest

[permalink] [raw]
Subject: [RFC 07/11] KVM: Allow initialization before the module target

The kvm_init function has been designed to be executed during the
module_init target. It requires a struct module pointer to be used as
the owner of the /dev/* files and also tries to register /dev/kvm with a
function (misc_register) that can only be used late in the boot process.

This patch modifies kvm_init to execute this late initialization code
conditionally, only in the context of a module_init. It also offers a
kvm_set_module function to be used for /dev/kvm registration and device
files owning once the module target is reached.

As is, this patch does not change anything. However it could be used by
certain architectures to initialize the core of kvm earlier in the boot
(e.g: in a subsys_initcall) and then initialize the userspace facing files
in a module_init target. This can be useful to create internal VMs before
being able to offer the userspace APIs.

Signed-off-by: Florent Revest <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 28 ++++++++++++++++++++--------
2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index dd10d3b..15a0a8d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -563,6 +563,7 @@ static inline void kvm_irqfd_exit(void)
#endif
int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
struct module *module);
+int kvm_set_module(struct module *module);
void kvm_exit(void);

struct kvm *kvm_create_internal_vm(unsigned long type);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c1c8bb6..3c9cb00 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4086,14 +4086,10 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
if (r)
goto out_free;

- kvm_chardev_ops.owner = module;
- kvm_vm_fops.owner = module;
- kvm_vcpu_fops.owner = module;
-
- r = misc_register(&kvm_dev);
- if (r) {
- pr_err("kvm: misc device register failed\n");
- goto out_unreg;
+ if (module) {
+ r = kvm_set_module(module);
+ if (r)
+ goto out_unreg;
}

register_syscore_ops(&kvm_syscore_ops);
@@ -4136,6 +4132,22 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
}
EXPORT_SYMBOL_GPL(kvm_init);

+int kvm_set_module(struct module *module)
+{
+ int r;
+
+ kvm_chardev_ops.owner = module;
+ kvm_vm_fops.owner = module;
+ kvm_vcpu_fops.owner = module;
+
+ r = misc_register(&kvm_dev);
+ if (r)
+ pr_err("kvm: misc device register failed\n");
+
+ return r;
+}
+EXPORT_SYMBOL_GPL(kvm_set_module);
+
void kvm_exit(void)
{
debugfs_remove_recursive(kvm_debugfs_dir);
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:36:51

by Florent Revest

[permalink] [raw]
Subject: [RFC 06/11] KVM, arm64: Expose a VCPU initialization function

KVM's core now offers internal virtual machine capabilities, however on ARM
the KVM_ARM_VCPU_INIT ioctl also has to be used to initialize a virtual CPU

This patch exposes a kvm_arm_vcpu_init() function to the rest of the kernel
on arm64 so that it can be used for arm64 internal VM initialization.

This function actually used to be named kvm_arch_vcpu_ioctl_vcpu_init() but
the "ioctl" part of the name wasn't consistent with the rest of the KVM arm
ioctl handlers. Moreover, it wasn't relevant to the usage of internal VMs.
Therefore it has been decided to rename the function to make it less
misleading.

Signed-off-by: Florent Revest <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
virt/kvm/arm/arm.c | 5 ++---
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 65aab35..07b7460 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -372,6 +372,8 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}

+int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init);
+
void kvm_arm_init_debug(void);
void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a39a1e1..aa29a5d 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -888,8 +888,7 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
}


-static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
- struct kvm_vcpu_init *init)
+int kvm_arm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
{
int ret;

@@ -973,7 +972,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (copy_from_user(&init, argp, sizeof(init)))
return -EFAULT;

- return kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init);
+ return kvm_arm_vcpu_init(vcpu, &init);
}
case KVM_SET_ONE_REG:
case KVM_GET_ONE_REG: {
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:37:21

by Florent Revest

[permalink] [raw]
Subject: [RFC 03/11] KVM: Allow VM lifecycle management without userspace

The current codebase of KVM makes many assumptions regarding the origin of
the virtual machine being executed or configured. Indeed, the KVM API
implementation has been written with userspace usage in mind and lots of
userspace-specific code is used (namely preempt_notifiers, eventfd, mmu
notifiers, current->mm...)

The aim of this patch is to make the KVM API (create_vm, create_vcpu etc)
usable from a kernel context. A simple trick is used to distinguish
userspace VMs (coming from QEMU or LKVM...) from internal VMs. (coming
from other subsystems, for example for sandboxing purpose):
- When a VM is created from an ioctl, kvm->mm is set to current->mm
- When a VM is created from the kernel, kvm->mm must be set to NULL

This ensures that no userspace program can create internal VMs and allows
to easily check whether a given VM is attached to a process or is internal.

This patch simply encloses the userspace-specific pieces of code of
kvm_main in conditions checking if kvm->mm is present and modifies the
prototype of kvm_create_vm to enable NULL mm.

Signed-off-by: Florent Revest <[email protected]>
---
virt/kvm/kvm_main.c | 64 ++++++++++++++++++++++++++++++++++-------------------
1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 15252d7..2e7af1a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -154,7 +154,8 @@ int vcpu_load(struct kvm_vcpu *vcpu)
if (mutex_lock_killable(&vcpu->mutex))
return -EINTR;
cpu = get_cpu();
- preempt_notifier_register(&vcpu->preempt_notifier);
+ if (vcpu->kvm->mm)
+ preempt_notifier_register(&vcpu->preempt_notifier);
kvm_arch_vcpu_load(vcpu, cpu);
put_cpu();
return 0;
@@ -165,7 +166,8 @@ void vcpu_put(struct kvm_vcpu *vcpu)
{
preempt_disable();
kvm_arch_vcpu_put(vcpu);
- preempt_notifier_unregister(&vcpu->preempt_notifier);
+ if (vcpu->kvm->mm)
+ preempt_notifier_unregister(&vcpu->preempt_notifier);
preempt_enable();
mutex_unlock(&vcpu->mutex);
}
@@ -640,7 +642,7 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
return 0;
}

-static struct kvm *kvm_create_vm(unsigned long type)
+static struct kvm *kvm_create_vm(unsigned long type, struct mm_struct *mm)
{
int r, i;
struct kvm *kvm = kvm_arch_alloc_vm();
@@ -649,9 +651,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
return ERR_PTR(-ENOMEM);

spin_lock_init(&kvm->mmu_lock);
- mmgrab(current->mm);
- kvm->mm = current->mm;
- kvm_eventfd_init(kvm);
+ kvm->mm = mm;
+ if (mm) {
+ mmgrab(current->mm);
+ kvm_eventfd_init(kvm);
+ }
mutex_init(&kvm->lock);
mutex_init(&kvm->irq_lock);
mutex_init(&kvm->slots_lock);
@@ -697,15 +701,18 @@ static struct kvm *kvm_create_vm(unsigned long type)
goto out_err;
}

- r = kvm_init_mmu_notifier(kvm);
- if (r)
- goto out_err;
+ if (mm) {
+ r = kvm_init_mmu_notifier(kvm);
+ if (r)
+ goto out_err;
+ }

spin_lock(&kvm_lock);
list_add(&kvm->vm_list, &vm_list);
spin_unlock(&kvm_lock);

- preempt_notifier_inc();
+ if (mm)
+ preempt_notifier_inc();

return kvm;

@@ -721,7 +728,8 @@ static struct kvm *kvm_create_vm(unsigned long type)
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
kvm_arch_free_vm(kvm);
- mmdrop(current->mm);
+ if (mm)
+ mmdrop(mm);
return ERR_PTR(r);
}

@@ -772,9 +780,11 @@ static void kvm_destroy_vm(struct kvm *kvm)
cleanup_srcu_struct(&kvm->irq_srcu);
cleanup_srcu_struct(&kvm->srcu);
kvm_arch_free_vm(kvm);
- preempt_notifier_dec();
+ if (mm)
+ preempt_notifier_dec();
hardware_disable_all();
- mmdrop(mm);
+ if (mm)
+ mmdrop(mm);
}

void kvm_get_kvm(struct kvm *kvm)
@@ -1269,6 +1279,9 @@ unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn)
if (kvm_is_error_hva(addr))
return PAGE_SIZE;

+ if (!kvm->mm)
+ return PAGE_SIZE;
+
down_read(&current->mm->mmap_sem);
vma = find_vma(current->mm, addr);
if (!vma)
@@ -2486,9 +2499,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
if (r)
goto vcpu_destroy;

- r = kvm_create_vcpu_debugfs(vcpu);
- if (r)
- goto vcpu_destroy;
+ if (kvm->mm) {
+ r = kvm_create_vcpu_debugfs(vcpu);
+ if (r)
+ goto vcpu_destroy;
+ }

mutex_lock(&kvm->lock);
if (kvm_get_vcpu_by_id(kvm, id)) {
@@ -2499,11 +2514,13 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
BUG_ON(kvm->vcpus[atomic_read(&kvm->online_vcpus)]);

/* Now it's all set up, let userspace reach it */
- kvm_get_kvm(kvm);
- r = create_vcpu_fd(vcpu);
- if (r < 0) {
- kvm_put_kvm(kvm);
- goto unlock_vcpu_destroy;
+ if (kvm->mm) {
+ kvm_get_kvm(kvm);
+ r = create_vcpu_fd(vcpu);
+ if (r < 0) {
+ kvm_put_kvm(kvm);
+ goto unlock_vcpu_destroy;
+ }
}

kvm->vcpus[atomic_read(&kvm->online_vcpus)] = vcpu;
@@ -2521,7 +2538,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)

unlock_vcpu_destroy:
mutex_unlock(&kvm->lock);
- debugfs_remove_recursive(vcpu->debugfs_dentry);
+ if (kvm->mm)
+ debugfs_remove_recursive(vcpu->debugfs_dentry);
vcpu_destroy:
kvm_arch_vcpu_destroy(vcpu);
vcpu_decrement:
@@ -3191,7 +3209,7 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
struct kvm *kvm;
struct file *file;

- kvm = kvm_create_vm(type);
+ kvm = kvm_create_vm(type, current->mm);
if (IS_ERR(kvm))
return PTR_ERR(kvm);
#ifdef CONFIG_KVM_MMIO
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 08:38:19

by Florent Revest

[permalink] [raw]
Subject: [RFC 01/11] arm64: Add an SMCCC function IDs header

The ARM SMC Calling Convention (DEN 0028B) introduces function IDs for
hypercalls, given in the x0 register during an SMC or HVC from a guest.

The document defines ranges of function IDs targeting different kinds of
hypervisors or supervisors.

Two ID ranges are of particular interest for the kernel:
- Standard hypervisor service calls
- Vendor specific hypervisor service calls

This patch introduces a couple of useful macros when working with SMCCC.
They provide defines of those ID ranges to be used by HVC handling (KVM)
or calling. (e.g: to leverage paravirtualized services)

The document also defines standard return values to be written into x0
after a hypercall handling. Once again, those macros can potentially be
used from both the hypervisor or the guest.

Signed-off-by: Florent Revest <[email protected]>
---
include/linux/smccc_fn.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
create mode 100644 include/linux/smccc_fn.h

diff --git a/include/linux/smccc_fn.h b/include/linux/smccc_fn.h
new file mode 100644
index 0000000..f08145d
--- /dev/null
+++ b/include/linux/smccc_fn.h
@@ -0,0 +1,50 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * Copyright (C) 2017 ARM Limited
+ */
+
+#ifndef __LINUX_SMCCC_FN_H
+#define __LINUX_SMCCC_FN_H
+
+/*
+ * Standard return values
+ */
+
+#define SMCCC_STD_RET_SUCCESS 0
+#define SMCCC_STD_RET_UNKNOWN_ID -1
+
+
+/*
+ * SMC32
+ */
+
+/* Standard hypervisor services interface */
+#define SMCCC32_STD_HYP_FN_BASE 0x85000000
+#define SMCCC32_STD_HYP_FN(n) (SMCCC32_STD_HYP_FN_BASE + (n))
+
+/* Vendor specific hypervisor services interface */
+#define SMCCC32_VDR_HYP_FN_BASE 0x86000000
+#define SMCCC32_VDR_HYP_FN(n) (SMCCC32_VDR_HYP_FN_BASE + (n))
+
+
+/*
+ * SMC64
+ */
+
+/* Standard hypervisor services interface */
+#define SMCCC64_STD_HYP_FN_BASE 0xc5000000
+#define SMCCC64_STD_HYP_FN(n) (SMCCC64_STD_HYP_FN_BASE + (n))
+
+/* Vendor specific hypervisor services interface */
+#define SMCCC64_VDR_HYP_FN_BASE 0xc6000000
+#define SMCCC64_VDR_HYP_FN(n) (SMCCC64_VDR_HYP_FN_BASE + (n))
+
+#endif /* __LINUX_SMCCC_FN_H */
--
1.9.1

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-25 09:40:53

by Florent Revest

[permalink] [raw]
Subject: Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

Hi,

I just realised that my email client was not configured correctly and
the confidential disclaimer at the bottom of my emails obviously don't
apply. Sorry about that.

Florent

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-08-31 09:23:12

by Christoffer Dall

[permalink] [raw]
Subject: Re: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs

Hi Florent,

I'd like for the UEFI folks and arm64 kernel maintainers to express
their views on this overall approach before I do an in-depth review, but
I have some random comments based on reading this patch:

On Fri, Aug 25, 2017 at 09:31:34AM +0100, Florent Revest wrote:
> Usual KVM virtual machines map guest's physical addresses from a process
> userspace memory. However, with the new concept of internal VMs, a virtual
> machine can be created from the kernel, without any link to a userspace
> context. Hence, some of the KVM's architecture-specific code needs to be
> modified to take this kind of VMs into account.
>
> The approach chosen with this patch is to let internal VMs idmap physical
> addresses into intermediary physical addresses by calling
> kvm_set_memory_region with a kvm_userspace_memory_region where the
> guest_phys_addr field points both to the original PAs and to the IPAs. The
> userspace_addr field of this struct is therefore ignored with internal VMs.
>
> This patch extends the capabilities of the arm and arm64 stage2 MMU code
> to handle internal VMs. Three things are changed:
>
> - Various parts of the MMU code which are related to a userspace context
> are now only executed if kvm->mm is present.
>
> - When this pointer is NULL, struct kvm_userspace_memory_regions are
> treated by internal_vm_prep_mem as idmaps of physical memory.
>
> - A set of 256 additional private memslots is now reserved on arm64 for the
> usage of internal VMs memory idmapping.
>
> Note: this patch should have pretty much no performance impact on the
> critical path of traditional VMs since only one unlikely branch had to be
> added to the page fault handler.
>
> Signed-off-by: Florent Revest <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 1 +
> virt/kvm/arm/mmu.c | 76 +++++++++++++++++++++++++++++++++++++--
> 2 files changed, 74 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index d686300..65aab35 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -32,6 +32,7 @@
> #define __KVM_HAVE_ARCH_INTC_INITIALIZED
>
> #define KVM_USER_MEM_SLOTS 512
> +#define KVM_PRIVATE_MEM_SLOTS 256
> #define KVM_HALT_POLL_NS_DEFAULT 500000
>
> #include <kvm/arm_vgic.h>
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 2ea21da..1d2d3df 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm *kvm,
> phys_addr_t size = PAGE_SIZE * memslot->npages;
> hva_t reg_end = hva + size;
>
> + if (unlikely(!kvm->mm)) {

I think you should consider using a predicate so that it's clear that
this is for in-kernel VMs and not just some random situation where mm
can be NULL.

> + unmap_stage2_range(kvm, addr, size);
> + return;
> + }
> +
> /*
> * A memory region could potentially cover multiple VMAs, and any holes
> * between them, so iterate over all of them to find out if we should
> @@ -819,7 +824,8 @@ void stage2_unmap_vm(struct kvm *kvm)
> int idx;
>
> idx = srcu_read_lock(&kvm->srcu);
> - down_read(&current->mm->mmap_sem);
> + if (likely(kvm->mm))
> + down_read(&current->mm->mmap_sem);
> spin_lock(&kvm->mmu_lock);
>
> slots = kvm_memslots(kvm);
> @@ -827,7 +833,8 @@ void stage2_unmap_vm(struct kvm *kvm)
> stage2_unmap_memslot(kvm, memslot);
>
> spin_unlock(&kvm->mmu_lock);
> - up_read(&current->mm->mmap_sem);
> + if (likely(kvm->mm))
> + up_read(&current->mm->mmap_sem);
> srcu_read_unlock(&kvm->srcu, idx);
> }
>
> @@ -1303,6 +1310,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> return -EFAULT;
> }
>
> + if (unlikely(!kvm->mm)) {
> + kvm_err("Unexpected internal VM page fault\n");
> + kvm_inject_vabt(vcpu);
> + return 0;
> + }
> +

So it's unclear to me why we don't need any special casing in
kvm_handle_guest_abort, related to MMIO exits etc. You probably assume
that we will never do emulation, but that should be described and
addressed somewhere before I can critically review this patch.

> /* Let's check if we will get back a huge page backed by hugetlbfs */
> down_read(&current->mm->mmap_sem);
> vma = find_vma_intersection(current->mm, hva, hva + 1);
> @@ -1850,6 +1863,54 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> kvm_mmu_wp_memory_region(kvm, mem->slot);
> }
>
> +/*
> + * internal_vm_prep_mem - maps a range of hpa to gpa at stage2
> + *
> + * While userspace VMs manage gpas using hvas, internal virtual machines need a
> + * way to map physical addresses to a guest. In order to avoid code duplication,
> + * the kvm_set_memory_region call is kept for internal VMs, however it usually
> + * expects a struct kvm_userspace_memory_region with a userspace_addr field.
> + * With internal VMs, this field is ignored and physical memory memory pointed
> + * by guest_phys_addr can only be idmapped.
> + */
> +static int internal_vm_prep_mem(struct kvm *kvm,
> + const struct kvm_userspace_memory_region *mem)
> +{
> + phys_addr_t addr, end;
> + unsigned long pfn;
> + int ret;
> + struct kvm_mmu_memory_cache cache = { 0 };
> +
> + end = mem->guest_phys_addr + mem->memory_size;
> + pfn = __phys_to_pfn(mem->guest_phys_addr);
> + addr = mem->guest_phys_addr;

My main concern here is that we don't do any checks on this region and
we could be mapping device memory here as well. Are we intending that
to be ok, and are we then relying on the guest to use proper memory
attributes ?

> +
> + for (; addr < end; addr += PAGE_SIZE) {
> + pte_t pte = pfn_pte(pfn, PAGE_S2);
> +
> + pte = kvm_s2pte_mkwrite(pte);
> +
> + ret = mmu_topup_memory_cache(&cache,
> + KVM_MMU_CACHE_MIN_PAGES,
> + KVM_NR_MEM_OBJS);

You should be able to allocate all you need up front instead of doing it
in sequences.

> + if (ret) {
> + mmu_free_memory_cache(&cache);
> + return ret;
> + }
> + spin_lock(&kvm->mmu_lock);
> + ret = stage2_set_pte(kvm, &cache, addr, &pte, 0);
> + spin_unlock(&kvm->mmu_lock);

Since you're likely to allocate some large contiguous chunks here, can
you have a look at using section mappings?

> + if (ret) {
> + mmu_free_memory_cache(&cache);
> + return ret;
> + }
> +
> + pfn++;
> + }
> +
> + return ret;
> +}
> +
> int kvm_arch_prepare_memory_region(struct kvm *kvm,
> struct kvm_memory_slot *memslot,
> const struct kvm_userspace_memory_region *mem,
> @@ -1872,6 +1933,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> (KVM_PHYS_SIZE >> PAGE_SHIFT))
> return -EFAULT;
>
> + if (unlikely(!kvm->mm)) {
> + ret = internal_vm_prep_mem(kvm, mem);
> + if (ret)
> + goto out;
> + goto out_internal_vm;
> + }
> +
> down_read(&current->mm->mmap_sem);
> /*
> * A memory region could potentially cover multiple VMAs, and any holes
> @@ -1930,6 +1998,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> hva = vm_end;
> } while (hva < reg_end);
>
> +out_internal_vm:
> if (change == KVM_MR_FLAGS_ONLY)
> goto out;
>
> @@ -1940,7 +2009,8 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> stage2_flush_memslot(kvm, memslot);
> spin_unlock(&kvm->mmu_lock);
> out:
> - up_read(&current->mm->mmap_sem);
> + if (kvm->mm)
> + up_read(&current->mm->mmap_sem);
> return ret;
> }
>
> --
> 1.9.1
>

Thanks,
-Christoffer

2017-08-31 09:26:21

by Christoffer Dall

[permalink] [raw]
Subject: Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

Hi Florent,

On Fri, Aug 25, 2017 at 09:31:30AM +0100, Florent Revest wrote:
> Hi,
>
> This series implements a mechanism to sandbox EFI Runtime Services on arm64.
> It can be enabled with CONFIG_EFI_SANDBOX. At boot it spawns an internal KVM
> virtual machine that is ran everytime an EFI Runtime Service is called. This
> limits the possible security and stability impact of EFI runtime on the kernel.

I wonder if this should be split into two series; one that sets up
anything you may need from KVM, and another one that uses that for UEFI.

There's a lot KVM and UEFI intertwined logic and assumptions in patch
10, which makes this series a bit hard to read.

I'd like some documentation (in the series and in
Documentation/virtual/kvm) of how this works, and which hidden
assumptions there are. For example, how do you ensure you never attempt
to return to userspace? How many VCPUs do you support? Do you support
any form of virtual interrupts? How about timers? Can a VM access
physical devices? How do you debug and trace something like this? Can
the VM be monitored from userspace?

These feel like fundamental questions to me that needs addressing before
I can competently review the code.

I think a slightly more concrete motivation and outlining the example of
the broken UEFI on Seattle would help paving the way for these patches.


Thanks,
-Christoffer

>
> The patch set is split as follow:
> - Patches 1 and 2: Give more control over HVC handling to KVM
> - Patches 3 to 6: Introduce the concept of KVM "internal VMs"
> - Patches 7 to 9: Reorder KVM and EFI initialization on ARM
> - Patch 10: Introduces the EFI sandboxing VM and wrappers
> - Patch 11: Workarounds some EFI Runtime Services relying on EL3
>
> The sandboxing has been tested to work reliably (rtc and efivars) on a
> SoftIron OverDrive 1000 box and on a ARMv8.3 model with VHE enabled. Normal
> userspace KVM instance have also been tested to still work correctly.
>
> Those patches apply cleanly on the Linus' v4.13-rc6 tag and have no other
> dependencies.
>
> Florent Revest (11):
> arm64: Add an SMCCC function IDs header
> KVM: arm64: Return an Unknown ID on unhandled HVC
> KVM: Allow VM lifecycle management without userspace
> KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
> KVM: Expose VM/VCPU creation functions
> KVM, arm64: Expose a VCPU initialization function
> KVM: Allow initialization before the module target
> KVM, arm, arm64: Initialize KVM's core earlier
> EFI, arm, arm64: Enable EFI Runtime Services later
> efi, arm64: Sandbox Runtime Services in a VM
> KVM, arm64: Don't trap internal VMs SMC calls
>
> arch/arm/include/asm/efi.h | 7 +
> arch/arm/include/asm/kvm_coproc.h | 3 +
> arch/arm/include/asm/kvm_host.h | 1 +
> arch/arm/include/asm/kvm_psci.h | 1 +
> arch/arm/kvm/coproc.c | 6 +
> arch/arm/kvm/coproc_a15.c | 3 +-
> arch/arm/kvm/coproc_a7.c | 3 +-
> arch/arm64/include/asm/efi.h | 71 ++++
> arch/arm64/include/asm/kvm_emulate.h | 3 +
> arch/arm64/include/asm/kvm_host.h | 4 +
> arch/arm64/include/asm/kvm_psci.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 3 +
> arch/arm64/kvm/handle_exit.c | 27 +-
> arch/arm64/kvm/sys_regs_generic_v8.c | 8 +-
> arch/x86/include/asm/efi.h | 2 +
> drivers/firmware/efi/Kconfig | 10 +
> drivers/firmware/efi/Makefile | 1 +
> drivers/firmware/efi/arm-runtime.c | 5 +-
> drivers/firmware/efi/arm-sandbox-payload.S | 96 +++++
> drivers/firmware/efi/arm-sandbox.c | 569 +++++++++++++++++++++++++++++
> drivers/firmware/efi/efi.c | 3 +
> include/linux/kvm_host.h | 4 +
> include/linux/smccc_fn.h | 53 +++
> include/uapi/linux/psci.h | 2 +
> virt/kvm/arm/arm.c | 18 +-
> virt/kvm/arm/mmu.c | 76 +++-
> virt/kvm/arm/psci.c | 21 ++
> virt/kvm/kvm_main.c | 102 ++++--
> 28 files changed, 1050 insertions(+), 53 deletions(-)
> create mode 100644 drivers/firmware/efi/arm-sandbox-payload.S
> create mode 100644 drivers/firmware/efi/arm-sandbox.c
> create mode 100644 include/linux/smccc_fn.h
>
> --
> 1.9.1
>
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2017-09-22 21:44:54

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

On 25 August 2017 at 01:31, Florent Revest <[email protected]> wrote:
> Hi,
>
> This series implements a mechanism to sandbox EFI Runtime Services on arm64.
> It can be enabled with CONFIG_EFI_SANDBOX. At boot it spawns an internal KVM
> virtual machine that is ran everytime an EFI Runtime Service is called. This
> limits the possible security and stability impact of EFI runtime on the kernel.
>
> The patch set is split as follow:
> - Patches 1 and 2: Give more control over HVC handling to KVM
> - Patches 3 to 6: Introduce the concept of KVM "internal VMs"
> - Patches 7 to 9: Reorder KVM and EFI initialization on ARM
> - Patch 10: Introduces the EFI sandboxing VM and wrappers
> - Patch 11: Workarounds some EFI Runtime Services relying on EL3
>
> The sandboxing has been tested to work reliably (rtc and efivars) on a
> SoftIron OverDrive 1000 box and on a ARMv8.3 model with VHE enabled. Normal
> userspace KVM instance have also been tested to still work correctly.
>
> Those patches apply cleanly on the Linus' v4.13-rc6 tag and have no other
> dependencies.
>
> Florent Revest (11):
> arm64: Add an SMCCC function IDs header
> KVM: arm64: Return an Unknown ID on unhandled HVC
> KVM: Allow VM lifecycle management without userspace
> KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs
> KVM: Expose VM/VCPU creation functions
> KVM, arm64: Expose a VCPU initialization function
> KVM: Allow initialization before the module target
> KVM, arm, arm64: Initialize KVM's core earlier
> EFI, arm, arm64: Enable EFI Runtime Services later
> efi, arm64: Sandbox Runtime Services in a VM
> KVM, arm64: Don't trap internal VMs SMC calls
>

Hello Florent,

This is really nice work. Thanks for contributing it.

>From the EFI side, there are some minor concerns on my part regarding
the calling convention, and the fact that we can no longer invoke
runtime services from a kernel running at EL1, but those all seem
fixable. I will respond to the patches in question in greater detail
at a later time.

In the mean time, Christoffer has raised a number for valid concerns,
and those need to be addressed first before it makes sense to talk
about EFI specifics. I hope you will find more time to invest in this:
I would really love to have this feature upstream.

Regards,
Ard.

2017-09-26 21:14:52

by Florent Revest

[permalink] [raw]
Subject: Re: [RFC 04/11] KVM, arm, arm64: Offer PAs to IPAs idmapping to internal VMs

On Thu, 2017-08-31 at 11:23 +0200, Christoffer Dall wrote:
> > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> > index 2ea21da..1d2d3df 100644
> > --- a/virt/kvm/arm/mmu.c
> > +++ b/virt/kvm/arm/mmu.c
> > @@ -772,6 +772,11 @@ static void stage2_unmap_memslot(struct kvm
> > *kvm,
> >         phys_addr_t size = PAGE_SIZE * memslot->npages;
> >         hva_t reg_end = hva + size;
> >
> > +       if (unlikely(!kvm->mm)) {
> I think you should consider using a predicate so that it's clear that
> this is for in-kernel VMs and not just some random situation where mm
> can be NULL.

Internal VMs should be the only usage when kvm->mm would be NULL.
However if you'd prefer it otherwise, I'll make sure this condition
will be made clearer.

> So it's unclear to me why we don't need any special casing in
> kvm_handle_guest_abort, related to MMIO exits etc.  You probably
> assume that we will never do emulation, but that should be described
> and addressed somewhere before I can critically review this patch.

This is indeed what I was assuming. This RFC does not allow MMIO with
internal VMs. I can not think of a usage when this would be useful. I'd
make sure this would be documented in an eventual later RFC.

> > +static int internal_vm_prep_mem(struct kvm *kvm,
> > +                               const struct
> > kvm_userspace_memory_region *mem)
> > +{
> > +       phys_addr_t addr, end;
> > +       unsigned long pfn;
> > +       int ret;
> > +       struct kvm_mmu_memory_cache cache = { 0 };
> > +
> > +       end = mem->guest_phys_addr + mem->memory_size;
> > +       pfn = __phys_to_pfn(mem->guest_phys_addr);
> > +       addr = mem->guest_phys_addr;
> My main concern here is that we don't do any checks on this region
> and we could be mapping device memory here as well.  Are we intending
> that to be ok, and are we then relying on the guest to use proper
> memory attributes ?

Indeed, being able to map device memory is intended. It is needed for
Runtime Services sandboxing. It also relies on the guest being
correctly configured.

> > +
> > +       for (; addr < end; addr += PAGE_SIZE) {
> > +               pte_t pte = pfn_pte(pfn, PAGE_S2);
> > +
> > +               pte = kvm_s2pte_mkwrite(pte);
> > +
> > +               ret = mmu_topup_memory_cache(&cache,
> > +                                            KVM_MMU_CACHE_MIN_PAGE
> > S,
> > +                                            KVM_NR_MEM_OBJS);
> You should be able to allocate all you need up front instead of doing
> it in sequences.

Ok.

> >
> > +               if (ret) {
> > +                       mmu_free_memory_cache(&cache);
> > +                       return ret;
> > +               }
> > +               spin_lock(&kvm->mmu_lock);
> > +               ret = stage2_set_pte(kvm, &cache, addr, &pte, 0);
> > +               spin_unlock(&kvm->mmu_lock);
> Since you're likely to allocate some large contiguous chunks here,
> can you have a look at using section mappings?

Will do.

Thank you very much,
    Florent

2017-09-26 21:14:58

by Florent Revest

[permalink] [raw]
Subject: Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

On Thu, 2017-08-31 at 11:26 +0200, Christoffer Dall wrote:
> I wonder if this should be split into two series; one that sets up
> anything you may need from KVM, and another one that uses that for
> UEFI.
>
> There's a lot KVM and UEFI intertwined logic and assumptions in patch
> 10, which makes this series a bit hard to read.

The way hypercalls are currently handled in handle_hvc required this
mixed patch. Would some kind of HVC subscription mechanism be suitable
to have in KVM? (e.g: a function allowing to register a callback on a
certain HVC function ID) This would allow the 10/11 patch to keep the
kvm code intact.

> I'd like some documentation (in the series and in
> Documentation/virtual/kvm) of how this works, and which hidden
> assumptions there are. For example, how do you ensure you never
> attempt to return to userspace?

I don't think my code ensured this. I'd need to give it a second look.

> How many VCPUs do you support?

You can create as many VCPUs as you would in a "normal VM". Also, each
VCPU can be ran in a kthread.

> Do you support any form of virtual interrupts? How about timers?

No support for virtual interrupts or timers indeed. The EFI Runtime
Services sandboxing wouldn't require that.

> Can a VM access physical devices?

The very idea of Runtime Services sandboxing requires Internal VMs to
have access to some of the physical devices.

> How do you debug and trace something like this? Can the VM be
> monitored from userspace?

There is nothing ready for that.

> These feel like fundamental questions to me that needs addressing
> before I can competently review the code.
>
> I think a slightly more concrete motivation and outlining the example
> of the broken UEFI on Seattle would help paving the way for these
> patches.

As far as I can remember, EFI Runtime Services on this platform have
already been reported to sometimes disable or enable interrupts. Maybe
someone at ARM has more details about the problem ?

Thanks a lot for your review,
    Florent

2017-09-26 21:15:04

by Florent Revest

[permalink] [raw]
Subject: Re: [RFC 00/11] KVM, EFI, arm64: EFI Runtime Services Sandboxing

On Fri, 2017-09-22 at 14:44 -0700, Ard Biesheuvel wrote:
> From the EFI side, there are some minor concerns on my part regarding
> the calling convention, and the fact that we can no longer invoke
> runtime services from a kernel running at EL1, but those all seem
> fixable. I will respond to the patches in question in greater detail
> at a later time.

Indeed, this RFC currently breaks EFI Runtime Services at EL1. This
would need to be fixed in a new patchset.

The patch 10/11 also underlines that the current argument passing
method does not respect alignment. The way arguments are currently
pushed and pulled makes it quite hard to fix the issue. Any suggestion
would be welcome.

> In the mean time, Christoffer has raised a number for valid concerns,
> and those need to be addressed first before it makes sense to talk
> about EFI specifics. I hope you will find more time to invest in
> this: I would really love to have this feature upstream.

Unfortunately, I'm no longer working at ARM and my other projects keep
me very busy. I would also love to invest more time in this patchset to
have it upstream but I'm really unsure when I will be able to find the
time for this.

Best,
    Florent