Nested virtualization is the ability to run a virtual machine inside another
virtual machine. In other words, it’s about running a hypervisor (the guest
hypervisor) on top of another hypervisor (the host hypervisor).
Supporting nested virtualization on ARM means that the hypervisor provides not
only EL0/EL1 execution environment to VMs as it usually does but also the
virtualization extensions including EL2 execution environment. Once the host
hypervisor provides those execution environments to the VMs, then the guest
hypervisor can run its own VMs (nested VMs) naturally.
This series supports nested virtualization on arm64. ARM recently announced an
extension (ARMv8.3) which has support for nested virtualization[1]. This patch
set is based on the ARMv8.3 specification and tested on the FastModel with
ARMv8.3 extension.
The whole patch set to support nested virtualization is huge over 70
patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
virtualization. This patch series is the first part.
CPU virtualization patch series provides basic nested virtualization framework
and instruction emulations including v8.1 VHE feature and v8.3 nested
virtualization feature for VMs.
This patch series again can be divided into four parts. Patch 1 to 5 introduces
nested virtualization by discovering hardware feature, adding a kernel
parameter and allowing the userspace to set the initial CPU mode to EL2.
Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
host hypervisor manages virtual EL2 register state for the guest hypervisor
and shadow EL1 register state that reflects the virtual EL2 register state to
run the guest hypervisor in EL1.
Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
Extensions. These patches emulate newly defined registers and bits in v8.1 and
allow the virtual EL2 to access EL2 register states via EL1 register accesses
as in the real EL2.
Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
These enable recursive nested virtualization.
This patch set is tested on the FastModel with the v8.3 extension for arm64 and
a cubietruck for arm32. On the FastModel, the host and the guest kernels are
compiled with and without VHE, so there are four combinations. I was able to
boot SMP Linux in the nested VM on all four configurations and able to run
hackbench. I also checked that regular VMs could boot when the nested
virtualization kernel parameter was not set. On the cubietruck, I also verified
that regular VMs could boot as well.
I'll share my experiment setup shortly.
Even though this work has some limitations and TODOs, I'd appreciate early
feedback on this RFC. Specifically, I'm interested in:
- Overall design to manage vcpu context for the virtual EL2
- Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
(Patch 30 and 32)
- Patch organization and coding style
This patch series is based on kvm/next d38338e.
The whole patch series including memory, VGIC, and timer patches is available
here:
[email protected]:columbia/nesting-pub.git rfc-v2
Limitations:
- There are some cases that the target exception level of a VM is ambiguous when
emulating eret instruction. I'm discussing this issue with Christoffer and
Marc. Meanwhile, I added a temporary patch (not included in this
series. f1beaba in the repo) and used 4.10.0 kernel when testing the guest
hypervisor with VHE.
- Recursive nested virtualization is not tested yet.
- Other hypervisors (such as Xen) on KVM are not tested.
TODO:
- Submit memory, VGIC, and timer patches
- Evaluate regular VM performance to see if there's a negative impact.
- Test other hypervisors such as Xen on KVM
- Test recursive nested virtualization
v1-->v2:
- Added support for the virtual EL2 with VHE
- Rewrote commit messages and comments from the perspective of supporting
execution environments to VMs, rather than from the perspective of the guest
hypervisor running in them.
- Fixed a few bugs to make it run on the FastModel.
- Tested on ARMv8.3 with four configurations. (host/guest. with/without VHE.)
- Rebased to kvm/next
[1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
Christoffer Dall (7):
KVM: arm64: Add KVM nesting feature
KVM: arm64: Allow userspace to set PSR_MODE_EL2x
KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
KVM: arm/arm64: Add a framework to prepare virtual EL2 execution
arm64: Add missing TCR hw defines
KVM: arm64: Create shadow EL1 registers
KVM: arm64: Trap EL1 VM register accesses in virtual EL2
Jintack Lim (31):
arm64: Add ARM64_HAS_NESTED_VIRT feature
KVM: arm/arm64: Enable nested virtualization via command-line
KVM: arm/arm64: Check if nested virtualization is in use
KVM: arm64: Add EL2 system registers to vcpu context
KVM: arm64: Add EL2 special registers to vcpu context
KVM: arm64: Add the shadow context for virtual EL2 execution
KVM: arm64: Set vcpu context depending on the guest exception level
KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
exit
KVM: arm64: Move exception macros and enums to a common file
KVM: arm64: Support to inject exceptions to the virtual EL2
KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
KVM: arm64: Trap CPACR_EL1 access in virtual EL2
KVM: arm64: Handle eret instruction traps
KVM: arm64: Set a handler for the system instruction traps
KVM: arm64: Handle PSCI call via smc from the guest
KVM: arm64: Inject HVC exceptions to the virtual EL2
KVM: arm64: Respect virtual HCR_EL2.TWX setting
KVM: arm64: Respect virtual CPTR_EL2.TFP setting
KVM: arm64: Add macros to support the virtual EL2 with VHE
KVM: arm64: Add EL2 registers defined in ARMv8.1 to vcpu context
KVM: arm64: Emulate EL12 register accesses from the virtual EL2
KVM: arm64: Support a VM with VHE considering EL0 of the VHE host
KVM: arm64: Allow the virtual EL2 to access EL2 states without trap
KVM: arm64: Manage the shadow states when virtual E2H bit enabled
KVM: arm64: Trap and emulate CPTR_EL2 accesses via CPACR_EL1 from the
virtual EL2 with VHE
KVM: arm64: Emulate appropriate VM control system registers
KVM: arm64: Respect the virtual HCR_EL2.NV bit setting
KVM: arm64: Respect the virtual HCR_EL2.NV bit setting for EL12
register traps
KVM: arm64: Respect virtual HCR_EL2.TVM and TRVM settings
KVM: arm64: Respect the virtual HCR_EL2.NV1 bit setting
KVM: arm64: Respect the virtual CPTR_EL2.TCPAC setting
Documentation/admin-guide/kernel-parameters.txt | 4 +
arch/arm/include/asm/kvm_emulate.h | 17 ++
arch/arm/include/asm/kvm_host.h | 15 +
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/esr.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 2 +
arch/arm64/include/asm/kvm_coproc.h | 3 +-
arch/arm64/include/asm/kvm_emulate.h | 56 ++++
arch/arm64/include/asm/kvm_host.h | 64 ++++-
arch/arm64/include/asm/kvm_hyp.h | 24 --
arch/arm64/include/asm/pgtable-hwdef.h | 6 +
arch/arm64/include/asm/sysreg.h | 70 +++++
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kernel/cpufeature.c | 11 +
arch/arm64/kvm/Makefile | 5 +-
arch/arm64/kvm/context.c | 346 +++++++++++++++++++++++
arch/arm64/kvm/emulate-nested.c | 83 ++++++
arch/arm64/kvm/guest.c | 2 +
arch/arm64/kvm/handle_exit.c | 89 +++++-
arch/arm64/kvm/hyp/entry.S | 13 +
arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
arch/arm64/kvm/hyp/switch.c | 33 ++-
arch/arm64/kvm/hyp/sysreg-sr.c | 117 ++++----
arch/arm64/kvm/inject_fault.c | 12 -
arch/arm64/kvm/nested.c | 63 +++++
arch/arm64/kvm/reset.c | 8 +
arch/arm64/kvm/sys_regs.c | 359 +++++++++++++++++++++++-
arch/arm64/kvm/sys_regs.h | 8 +
arch/arm64/kvm/trace.h | 43 ++-
virt/kvm/arm/arm.c | 20 ++
31 files changed, 1363 insertions(+), 118 deletions(-)
create mode 100644 arch/arm64/kvm/context.c
create mode 100644 arch/arm64/kvm/emulate-nested.c
create mode 100644 arch/arm64/kvm/nested.c
--
1.9.1
Add a new ARM64_HAS_NESTED_VIRT feature to indicate that the
CPU has the ARMv8.3 nested virtualization capability.
This will be used to support nested virtualization in KVM.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/cpucaps.h | 3 ++-
arch/arm64/include/asm/sysreg.h | 1 +
arch/arm64/kernel/cpufeature.c | 11 +++++++++++
3 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8d2272c..64df263 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -39,7 +39,8 @@
#define ARM64_WORKAROUND_QCOM_FALKOR_E1003 18
#define ARM64_WORKAROUND_858921 19
#define ARM64_WORKAROUND_CAVIUM_30115 20
+#define ARM64_HAS_NESTED_VIRT 21
-#define ARM64_NCAPS 21
+#define ARM64_NCAPS 22
#endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 16e44fa..737ca30 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -376,6 +376,7 @@
#define ID_AA64MMFR1_VMIDBITS_16 2
/* id_aa64mmfr2 */
+#define ID_AA64MMFR2_NV_SHIFT 24
#define ID_AA64MMFR2_LVA_SHIFT 16
#define ID_AA64MMFR2_IESB_SHIFT 12
#define ID_AA64MMFR2_LSM_SHIFT 8
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 94b8f7f..523f998 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -144,6 +144,7 @@
};
static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = {
+ ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64MMFR2_NV_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64MMFR2_LVA_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64MMFR2_IESB_SHIFT, 4, 0),
ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64MMFR2_LSM_SHIFT, 4, 0),
@@ -867,6 +868,16 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
.min_field_value = 0,
.matches = has_no_fpsimd,
},
+ {
+ .desc = "Nested Virtualization Support",
+ .capability = ARM64_HAS_NESTED_VIRT,
+ .def_scope = SCOPE_SYSTEM,
+ .matches = has_cpuid_feature,
+ .sys_reg = SYS_ID_AA64MMFR2_EL1,
+ .sign = FTR_UNSIGNED,
+ .field_pos = ID_AA64MMFR2_NV_SHIFT,
+ .min_field_value = 1,
+ },
{},
};
--
1.9.1
From: Christoffer Dall <[email protected]>
Set the initial exception level of the guest to EL2 if nested
virtualization feature is enabled.
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 +-
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/reset.c | 8 ++++++++
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index dcc4df8..6df0c7c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -39,7 +39,7 @@
#define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
-#define KVM_VCPU_MAX_FEATURES 4
+#define KVM_VCPU_MAX_FEATURES 5
#define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 9f3ca24..4a71a72 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -99,6 +99,7 @@ struct kvm_regs {
#define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
#define KVM_ARM_VCPU_PSCI_0_2 2 /* CPU uses PSCI v0.2 */
#define KVM_ARM_VCPU_PMU_V3 3 /* Support guest PMUv3 */
+#define KVM_ARM_VCPU_NESTED_VIRT 4 /* Support nested virtualization */
struct kvm_vcpu_init {
__u32 target;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..1353516 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -41,6 +41,11 @@
PSR_F_BIT | PSR_D_BIT),
};
+static const struct kvm_regs default_regs_reset_el2 = {
+ .regs.pstate = (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT |
+ PSR_F_BIT | PSR_D_BIT),
+};
+
static const struct kvm_regs default_regs_reset32 = {
.regs.pstate = (COMPAT_PSR_MODE_SVC | COMPAT_PSR_A_BIT |
COMPAT_PSR_I_BIT | COMPAT_PSR_F_BIT),
@@ -106,6 +111,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
if (!cpu_has_32bit_el1())
return -EINVAL;
cpu_reset = &default_regs_reset32;
+ } else if (test_bit(KVM_ARM_VCPU_NESTED_VIRT,
+ vcpu->arch.features)) {
+ cpu_reset = &default_regs_reset_el2;
} else {
cpu_reset = &default_regs_reset;
}
--
1.9.1
From: Christoffer Dall <[email protected]>
We were not allowing userspace to set a more privileged mode for the VCPU
than EL1, but now that we support nesting with a virtual EL2 mode, do
allow this!
Signed-off-by: Christoffer Dall <[email protected]>
---
arch/arm64/kvm/guest.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5c7f657..5e673ae 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -117,6 +117,8 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
case PSR_MODE_EL0t:
case PSR_MODE_EL1t:
case PSR_MODE_EL1h:
+ case PSR_MODE_EL2h:
+ case PSR_MODE_EL2t:
break;
default:
err = -EINVAL;
--
1.9.1
To support the virtual EL2 execution, we need to maintain the EL2
special registers such as SPSR_EL2, ELR_EL2 and SP_EL2 in vcpu context.
Note that SP_EL2 is not accessible in EL2, so we don't need a trap
handler for this register.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
arch/arm64/include/asm/sysreg.h | 4 ++++
arch/arm64/kvm/sys_regs.c | 38 +++++++++++++++++++++++++++++++++-----
arch/arm64/kvm/sys_regs.h | 8 ++++++++
4 files changed, 57 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1dc4ed6..57dccde 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -171,6 +171,15 @@ enum vcpu_sysreg {
NR_SYS_REGS /* Nothing after this line! */
};
+enum el2_special_regs {
+ __INVALID_EL2_SPECIAL_REG__,
+ SPSR_EL2, /* Saved Program Status Register (EL2) */
+ ELR_EL2, /* Exception Link Register (EL2) */
+ SP_EL2, /* Stack Pointer (EL2) */
+
+ NR_EL2_SPECIAL_REGS
+};
+
/* 32bit mapping */
#define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
#define c0_CSSELR (CSSELR_EL1 * 2)/* Cache Size Selection Register */
@@ -218,6 +227,8 @@ struct kvm_cpu_context {
u64 sys_regs[NR_SYS_REGS];
u32 copro[NR_COPRO_REGS];
};
+
+ u64 el2_special_regs[NR_EL2_SPECIAL_REGS];
};
typedef struct kvm_cpu_context kvm_cpu_context_t;
@@ -307,6 +318,7 @@ struct kvm_vcpu_arch {
#define vcpu_gp_regs(v) (&(v)->arch.ctxt.gp_regs)
#define vcpu_sys_reg(v,r) ((v)->arch.ctxt.sys_regs[(r)])
+#define vcpu_el2_sreg(v,r) ((v)->arch.ctxt.el2_special_regs[(r)])
/*
* CP14 and CP15 live in the same array, as they are backed by the
* same system registers.
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 9277c4a..98c32ef 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -268,6 +268,8 @@
#define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
+#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
+#define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1)
#define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0)
#define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
@@ -332,6 +334,8 @@
#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
#define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
+#define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)
+
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_EE (1 << 25)
#define SCTLR_ELx_I (1 << 12)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1568f8b..2b3ed70 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -900,15 +900,33 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
*sysreg = p->regval;
}
+static u64 *get_special_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+ u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+ switch (reg) {
+ case SYS_SP_EL1:
+ return &vcpu->arch.ctxt.gp_regs.sp_el1;
+ case SYS_ELR_EL2:
+ return &vcpu_el2_sreg(vcpu, ELR_EL2);
+ case SYS_SPSR_EL2:
+ return &vcpu_el2_sreg(vcpu, SPSR_EL2);
+ default:
+ return NULL;
+ };
+}
+
static bool trap_el2_regs(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
- /* SP_EL1 is NOT maintained in sys_regs array */
- if (sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2) == SYS_SP_EL1)
- access_rw(p, &vcpu->arch.ctxt.gp_regs.sp_el1);
- else
- access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ u64 *sys_reg;
+
+ sys_reg = get_special_reg(vcpu, p);
+ if (!sys_reg)
+ sys_reg = &vcpu_sys_reg(vcpu, r->reg);
+
+ access_rw(p, sys_reg);
return true;
}
@@ -1116,6 +1134,8 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
+ { SYS_DESC(SYS_SPSR_EL2), trap_el2_regs, reset_special, SPSR_EL2, 0 },
+ { SYS_DESC(SYS_ELR_EL2), trap_el2_regs, reset_special, ELR_EL2, 0 },
{ SYS_DESC(SYS_SP_EL1), trap_el2_regs },
{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
@@ -1138,6 +1158,8 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
{ SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },
+
+ { SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
};
static bool trap_dbgidr(struct kvm_vcpu *vcpu,
@@ -2271,6 +2293,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
/* Catch someone adding a register without putting in reset entry. */
memset(&vcpu->arch.ctxt.sys_regs, 0x42, sizeof(vcpu->arch.ctxt.sys_regs));
+ memset(&vcpu->arch.ctxt.el2_special_regs, 0x42,
+ sizeof(vcpu->arch.ctxt.el2_special_regs));
/* Generic chip reset first (so target could override). */
reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
@@ -2281,4 +2305,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
for (num = 1; num < NR_SYS_REGS; num++)
if (vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
panic("Didn't reset vcpu_sys_reg(%zi)", num);
+
+ for (num = 1; num < NR_EL2_SPECIAL_REGS; num++)
+ if (vcpu_el2_sreg(vcpu, num) == 0x4242424242424242)
+ panic("Didn't reset vcpu_el2_sreg(%zi)", num);
}
diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
index 060f534..827717b 100644
--- a/arch/arm64/kvm/sys_regs.h
+++ b/arch/arm64/kvm/sys_regs.h
@@ -99,6 +99,14 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
vcpu_sys_reg(vcpu, r->reg) = r->val;
}
+static inline void reset_special(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r)
+{
+ BUG_ON(!r->reg);
+ BUG_ON(r->reg >= NR_EL2_SPECIAL_REGS);
+ vcpu_el2_sreg(vcpu, r->reg) = r->val;
+}
+
static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
const struct sys_reg_desc *i2)
{
--
1.9.1
When running in virtual EL2 we use the shadow EL1 systerm register array
for the save/restore process, so that hardware and especially the memory
subsystem behaves as code written for EL2 expects while really running
in EL1.
This works great for EL1 system register accesses that we trap, because
these accesses will be written into the virtual state for the EL1 system
registers used when eventually switching the VCPU mode to EL1.
However, there was a collection of EL1 system registers which we do not
trap, and as a consequence all save/restore operations of these
registers were happening locally in the shadow array, with no benefit to
software actually running in virtual EL1 at all.
To fix this, simply synchronize the shadow and real EL1 state for these
registers on entry/exit to/from virtual EL2 state.
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/context.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 56 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index e965049..e1bc753 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -86,6 +86,58 @@ static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
s_sys_regs[CPACR_EL1] = cptr_to_cpacr(vcpu_sys_reg(vcpu, CPTR_EL2));
}
+
+/*
+ * List of EL0 and EL1 registers which we allow the virtual EL2 mode to access
+ * directly without trapping. This is possible because the impact of
+ * accessing those registers are the same regardless of the exception
+ * levels that are allowed.
+ */
+static const int el1_non_trap_regs[] = {
+ CNTKCTL_EL1,
+ CSSELR_EL1,
+ PAR_EL1,
+ TPIDR_EL0,
+ TPIDR_EL1,
+ TPIDRRO_EL0
+};
+
+/**
+ * copy_shadow_non_trap_el1_state
+ * @vcpu: The VCPU pointer
+ * @setup: True, if on the way to the guest (called from setup)
+ * False, if returning form the guet (calld from restore)
+ *
+ * Some EL1 registers are accessed directly by the virtual EL2 mode because
+ * they in no way affect execution state in virtual EL2. However, we must
+ * still ensure that virtual EL2 observes the same state of the EL1 registers
+ * as the normal VM's EL1 mode, so copy this state as needed on setup/restore.
+ */
+static void copy_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu, bool setup)
+{
+ u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
+ const int sr = el1_non_trap_regs[i];
+
+ if (setup)
+ s_sys_regs[sr] = vcpu_sys_reg(vcpu, sr);
+ else
+ vcpu_sys_reg(vcpu, sr) = s_sys_regs[sr];
+ }
+}
+
+static void sync_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu)
+{
+ copy_shadow_non_trap_el1_state(vcpu, false);
+}
+
+static void flush_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu)
+{
+ copy_shadow_non_trap_el1_state(vcpu, true);
+}
+
static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -162,6 +214,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
if (unlikely(vcpu_mode_el2(vcpu))) {
flush_shadow_special_regs(vcpu);
flush_shadow_el1_sysregs(vcpu);
+ flush_shadow_non_trap_el1_state(vcpu);
ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
} else {
flush_special_regs(vcpu);
@@ -176,9 +229,10 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
*/
void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
{
- if (unlikely(vcpu_mode_el2(vcpu)))
+ if (unlikely(vcpu_mode_el2(vcpu))) {
sync_shadow_special_regs(vcpu);
- else
+ sync_shadow_non_trap_el1_state(vcpu);
+ } else
sync_special_regs(vcpu);
}
--
1.9.1
Forward traps due to FP/ASIMD register accesses to the virtual EL2 if
virtual CPTR_EL2.TFP is set. Note that if TFP bit is set, then even
accesses to FP/ASIMD register from EL2 as well as NS EL0/1 will trap to
EL2. So, we don't check the VM's exception level.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kernel/asm-offsets.c | 1 +
arch/arm64/kvm/handle_exit.c | 15 +++++++++++----
arch/arm64/kvm/hyp/entry.S | 13 +++++++++++++
arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
4 files changed, 26 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index b3bb7ef..f5117a3 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -134,6 +134,7 @@ int main(void)
DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs));
DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context));
+ DEFINE(VIRTUAL_CPTR_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[CPTR_EL2]));
#endif
#ifdef CONFIG_CPU_PM
DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx));
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 25ec824..d4e7b2b 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -84,11 +84,18 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
}
/*
- * Guest access to FP/ASIMD registers are routed to this handler only
- * when the system doesn't support FP/ASIMD.
+ * When the system supports FP/ASMID and we are NOT running nested
+ * virtualization, FP/ASMID traps are handled in EL2 directly.
+ * This handler handles the cases those are not belong to the above case.
*/
-static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_fpasimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
+
+ /* This is for nested virtualization */
+ if (vcpu_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TFP)
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+ /* This is the case when the system doesn't support FP/ASIMD. */
kvm_inject_undefined(vcpu);
return 1;
}
@@ -220,7 +227,7 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
[ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
[ESR_ELx_EC_BKPT32] = kvm_handle_guest_debug,
[ESR_ELx_EC_BRK64] = kvm_handle_guest_debug,
- [ESR_ELx_EC_FP_ASIMD] = handle_no_fpsimd,
+ [ESR_ELx_EC_FP_ASIMD] = kvm_handle_fpasimd,
};
static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index 12ee62d..95af673 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -158,6 +158,19 @@ abort_guest_exit_end:
1: ret
ENDPROC(__guest_exit)
+ENTRY(__fpsimd_guest_trap)
+ // If virtual CPTR_EL2.TFP is set, then forward the trap to the
+ // virtual EL2. For the non-nested case, this bit is always 0.
+ mrs x1, tpidr_el2
+ ldr x0, [x1, #VIRTUAL_CPTR_EL2]
+ and x0, x0, #CPTR_EL2_TFP
+ cbnz x0, 1f
+ b __fpsimd_guest_restore
+1:
+ mov x0, #ARM_EXCEPTION_TRAP
+ b __guest_exit
+ENDPROC(__fpsimd_guest_trap)
+
ENTRY(__fpsimd_guest_restore)
stp x2, x3, [sp, #-16]!
stp x4, lr, [sp, #-16]!
diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
index 5170ce1..ab169fd 100644
--- a/arch/arm64/kvm/hyp/hyp-entry.S
+++ b/arch/arm64/kvm/hyp/hyp-entry.S
@@ -113,7 +113,7 @@ el1_trap:
*/
alternative_if_not ARM64_HAS_NO_FPSIMD
cmp x0, #ESR_ELx_EC_FP_ASIMD
- b.eq __fpsimd_guest_restore
+ b.eq __fpsimd_guest_trap
alternative_else_nop_endif
mrs x1, tpidr_el2
--
1.9.1
On VHE systems, EL0 of the host kernel is considered as a part of 'VHE
host'; The execution of EL0 is affected by system registers set by the
VHE kernel including the hypervisor. To emulate this for a VM, we use
the same set of system registers (i.e. shadow registers) for the virtual
EL2 and EL0 execution.
Note that the assumption so far is that a hypervisor in a VM always runs
in the virtual EL2, and the exception level change from/to the virtual
EL2 always goes through the host hypervisor. With VHE support for a VM,
however, the exception level can be changed from EL0 to virtual EL2
without trapping to the host hypervisor. So, when returning from the VHE
host mode, set the vcpu mode depending on the physical exception level.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/context.c | 36 ++++++++++++++++++++++--------------
1 file changed, 22 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index f3d3398..39bd92d 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -150,16 +150,18 @@ static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
- /*
- * We can emulate the guest's configuration of which
- * stack pointer to use when executing in virtual EL2 by
- * using the equivalent feature in EL1 to point to
- * either the EL1 or EL0 stack pointer.
- */
- if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
- ctxt->hw_pstate |= PSR_MODE_EL1h;
- else
- ctxt->hw_pstate |= PSR_MODE_EL1t;
+ if (vcpu_mode_el2(vcpu)) {
+ /*
+ * We can emulate the guest's configuration of which
+ * stack pointer to use when executing in virtual EL2 by
+ * using the equivalent feature in EL1 to point to
+ * either the EL1 or EL0 stack pointer.
+ */
+ if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
+ ctxt->hw_pstate |= PSR_MODE_EL1h;
+ else
+ ctxt->hw_pstate |= PSR_MODE_EL1t;
+ }
ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
ctxt->hw_sp_el1 = vcpu_el2_sreg(vcpu, SP_EL2);
@@ -182,8 +184,14 @@ static void sync_shadow_special_regs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
- *vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
- *vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
+ *vcpu_cpsr(vcpu) = ctxt->hw_pstate;
+ *vcpu_cpsr(vcpu) &= ~PSR_MODE_MASK;
+ /* Set vcpu exception level depending on the physical EL */
+ if ((ctxt->hw_pstate & PSR_MODE_MASK) == PSR_MODE_EL0t)
+ *vcpu_cpsr(vcpu) |= PSR_MODE_EL0t;
+ else
+ *vcpu_cpsr(vcpu) |= PSR_MODE_EL2h;
+
vcpu_el2_sreg(vcpu, SP_EL2) = ctxt->hw_sp_el1;
vcpu_el2_sreg(vcpu, ELR_EL2) = ctxt->hw_elr_el1;
vcpu_el2_sreg(vcpu, SPSR_EL2) = ctxt->hw_spsr_el1;
@@ -218,7 +226,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
- if (unlikely(vcpu_mode_el2(vcpu))) {
+ if (unlikely(is_hyp_ctxt(vcpu))) {
flush_shadow_special_regs(vcpu);
flush_shadow_el1_sysregs(vcpu);
flush_shadow_non_trap_el1_state(vcpu);
@@ -236,7 +244,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
*/
void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
{
- if (unlikely(vcpu_mode_el2(vcpu))) {
+ if (unlikely(is_hyp_ctxt(vcpu))) {
sync_shadow_special_regs(vcpu);
sync_shadow_non_trap_el1_state(vcpu);
} else
--
1.9.1
Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.
This is for recursive nested virtualization.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/kvm/sys_regs.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index aeaac4e..a1274b7 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,7 @@
#include <asm/types.h>
/* Hyp Configuration Register (HCR) bits */
+#define HCR_NV1 (UL(1) << 43)
#define HCR_NV (UL(1) << 42)
#define HCR_E2H (UL(1) << 34)
#define HCR_ID (UL(1) << 33)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3e4ec5e..6f67666 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1031,6 +1031,15 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
return true;
}
+/* This function is to support the recursive nested virtualization */
+static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+ if (!vcpu_mode_el2(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV1))
+ return true;
+
+ return false;
+}
+
static bool access_elr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
@@ -1038,6 +1047,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
return true;
}
@@ -1049,6 +1061,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
return true;
}
@@ -1060,6 +1075,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
return true;
}
--
1.9.1
Forward traps due to HCR_EL2.NV bit to the virtual EL2 if they are not
coming from the virtual EL2 and the virtual HCR_EL2.NV bit is set.
This is for recursive nested virtualization.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_arm.h | 1 +
arch/arm64/include/asm/kvm_coproc.h | 1 +
arch/arm64/kvm/handle_exit.c | 13 +++++++++++++
arch/arm64/kvm/sys_regs.c | 22 ++++++++++++++++++++++
4 files changed, 37 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 6e99978..aeaac4e 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,7 @@
#include <asm/types.h>
/* Hyp Configuration Register (HCR) bits */
+#define HCR_NV (UL(1) << 42)
#define HCR_E2H (UL(1) << 34)
#define HCR_ID (UL(1) << 33)
#define HCR_CD (UL(1) << 32)
diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
index 1b3d21b..6223df6 100644
--- a/arch/arm64/include/asm/kvm_coproc.h
+++ b/arch/arm64/include/asm/kvm_coproc.h
@@ -44,6 +44,7 @@ void kvm_register_target_sys_reg_table(unsigned int target,
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
+bool forward_nv_traps(struct kvm_vcpu *vcpu);
#define kvm_coproc_table_init kvm_sys_reg_table_init
void kvm_sys_reg_table_init(void);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index d4e7b2b..fccd9d6 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -61,6 +61,12 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
int ret;
+ /*
+ * Forward this trapped smc instruction to the virtual EL2.
+ */
+ if (forward_nv_traps(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_TSC))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/* If imm is non-zero, it's not defined */
if (kvm_vcpu_hvc_get_imm(vcpu)) {
kvm_inject_undefined(vcpu);
@@ -197,6 +203,13 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
vcpu_el2_sreg(vcpu, SPSR_EL2));
/*
+ * Forward this trap to the virtual EL2 if the virtual HCR_EL2.NV
+ * bit is set.
+ */
+ if (forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+ /*
* Note that the current exception level is always the virtual EL2,
* since we set HCR_EL2.NV bit only when entering the virtual EL2.
*/
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 910b50d..4fd7090 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -939,6 +939,14 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
return true;
}
+/* This function is to support the recursive nested virtualization */
+bool forward_nv_traps(struct kvm_vcpu *vcpu)
+{
+ if (!vcpu_mode_el2(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV))
+ return true;
+ return false;
+}
+
static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
{
if (!p->is_write)
@@ -977,6 +985,13 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
{
u64 *sys_reg;
+ /*
+ * Forward this trap to the virtual EL2 if the virtual HCR_EL2.NV
+ * bit is set.
+ */
+ if (forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
sys_reg = get_special_reg(vcpu, p);
if (!sys_reg)
sys_reg = &vcpu_sys_reg(vcpu, r->reg);
@@ -1914,6 +1929,13 @@ static int emulate_sys_instr(struct kvm_vcpu *vcpu,
{
int ret = 0;
+ /*
+ * Forward this trap to the virtual EL2 if the virtual HCR_EL2.NV
+ * bit is set.
+ */
+ if (forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/* TLB maintenance instructions*/
if (params->CRn == 0b1000)
ret = emulate_tlbi(vcpu, params);
--
1.9.1
Forward CPACR_EL1 traps to the virtual EL2 if virtual CPTR_EL2 is
configured to trap CPACR_EL1 accesses from EL1.
This is for recursive nested virtualization.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6f67666..ba2966d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1091,6 +1091,11 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ /* Forward this trap to the virtual EL2 if CPTR_EL2.TCPAC is set*/
+ if (!el12_reg(p) && !vcpu_mode_el2(vcpu) &&
+ (vcpu_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TCPAC))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/*
* When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
* in the virtual EL2 is to access CPTR_EL2.
--
1.9.1
Forward the EL1 virtual memory register traps to the virtual EL2 if they
are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
bit is set.
This is for recursive nested virtualization.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3559cf7..3e4ec5e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -135,6 +135,27 @@ static inline bool el12_reg(struct sys_reg_params *p)
return (p->Op1 == 5);
}
+/* This function is to support the recursive nested virtualization */
+static bool forward_vm_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
+{
+ u64 hcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
+
+ /* If a trap comes from the virtual EL2, the host hypervisor handles. */
+ if (vcpu_mode_el2(vcpu))
+ return false;
+
+ /*
+ * If the virtual HCR_EL2.TVM or TRVM bit is set, we need to foward
+ * this trap to the virtual EL2.
+ */
+ if ((hcr_el2 & HCR_TVM) && p->is_write)
+ return true;
+ else if ((hcr_el2 & HCR_TRVM) && !p->is_write)
+ return true;
+
+ return false;
+}
+
/*
* Generic accessor for VM registers. Only called as long as HCR_TVM
* is set. If the guest enables the MMU, we stop trapping the VM
@@ -152,6 +173,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
if (el12_reg(p) && forward_nv_traps(vcpu))
return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ if (!el12_reg(p) && forward_vm_traps(vcpu, p))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/*
* Redirect EL1 register accesses to the corresponding EL2 registers if
* they are meant to access EL2 registers.
--
1.9.1
When the virtual E2H bit is set, we can support EL2 register accesses
via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
better alternative, however, is to allow the virtual EL2 to access EL2
register states without trap. This can be easily achieved by not traping
EL1 registers since those registers already have EL2 register states.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/hyp/switch.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d513da9..fffd0c7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -74,6 +74,7 @@ static hyp_alternate_select(__activate_traps_arch,
static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
+ u64 vhcr_el2;
/*
* We are about to set CPTR_EL2.TFP to trap all floating point
@@ -89,8 +90,26 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(1 << 30, fpexc32_el2);
isb();
}
- if (vcpu_mode_el2(vcpu))
- val |= HCR_TVM | HCR_TRVM;
+
+ if (is_hyp_ctxt(vcpu)) {
+ /*
+ * For a guest hypervisor on v8.0, trap and emulate the EL1
+ * virtual memory control register accesses.
+ */
+ if (!vcpu_el2_e2h_is_set(vcpu))
+ val |= HCR_TVM | HCR_TRVM;
+ /*
+ * For a guest hypervisor on v8.1 (VHE), allow to access the
+ * EL1 virtual memory control registers natively. These accesses
+ * are to access EL2 register states.
+ * Note that we stil need to respect the virtual HCR_EL2 state.
+ */
+ else {
+ vhcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
+ val |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
+ }
+ }
+
write_sysreg(val, hcr_el2);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
--
1.9.1
In addition to EL2 register accesses, setting NV bit will also make EL12
register accesses trap to EL2. To emulate this for the virtual EL2,
forword traps due to EL12 register accessses to the virtual EL2 if the
virtual HCR_EL2.NV bit is set.
This is for recursive nested virtualization.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 4fd7090..3559cf7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -149,6 +149,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
int i;
const struct el1_el2_map *map;
+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/*
* Redirect EL1 register accesses to the corresponding EL2 registers if
* they are meant to access EL2 registers.
@@ -959,6 +962,9 @@ static bool access_cntkctl_el12(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
+ if (forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
return true;
}
@@ -1005,6 +1011,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
return true;
}
@@ -1013,6 +1022,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
return true;
}
@@ -1021,6 +1033,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
return true;
}
@@ -1031,6 +1046,9 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
{
u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+ if (el12_reg(p) && forward_nv_traps(vcpu))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
/*
* When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
* in the virtual EL2 is to access CPTR_EL2.
--
1.9.1
With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
trap to EL2. Handle those traps just like we do for EL1 registers.
One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
bit. Therefore, add a handler for it and don't treat it as a
non-trap-registers when preparing a shadow context.
Move EL12 system register macros to a common place to reuse them.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_hyp.h | 24 ------------------------
arch/arm64/include/asm/sysreg.h | 24 ++++++++++++++++++++++++
arch/arm64/kvm/context.c | 7 +++++++
arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++++++
4 files changed, 56 insertions(+), 24 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b..353b895 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -73,30 +73,6 @@
#define read_sysreg_el1(r) read_sysreg_elx(r, _EL1, _EL12)
#define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)
-/* The VHE specific system registers and their encoding */
-#define sctlr_EL12 sys_reg(3, 5, 1, 0, 0)
-#define cpacr_EL12 sys_reg(3, 5, 1, 0, 2)
-#define ttbr0_EL12 sys_reg(3, 5, 2, 0, 0)
-#define ttbr1_EL12 sys_reg(3, 5, 2, 0, 1)
-#define tcr_EL12 sys_reg(3, 5, 2, 0, 2)
-#define afsr0_EL12 sys_reg(3, 5, 5, 1, 0)
-#define afsr1_EL12 sys_reg(3, 5, 5, 1, 1)
-#define esr_EL12 sys_reg(3, 5, 5, 2, 0)
-#define far_EL12 sys_reg(3, 5, 6, 0, 0)
-#define mair_EL12 sys_reg(3, 5, 10, 2, 0)
-#define amair_EL12 sys_reg(3, 5, 10, 3, 0)
-#define vbar_EL12 sys_reg(3, 5, 12, 0, 0)
-#define contextidr_EL12 sys_reg(3, 5, 13, 0, 1)
-#define cntkctl_EL12 sys_reg(3, 5, 14, 1, 0)
-#define cntp_tval_EL02 sys_reg(3, 5, 14, 2, 0)
-#define cntp_ctl_EL02 sys_reg(3, 5, 14, 2, 1)
-#define cntp_cval_EL02 sys_reg(3, 5, 14, 2, 2)
-#define cntv_tval_EL02 sys_reg(3, 5, 14, 3, 0)
-#define cntv_ctl_EL02 sys_reg(3, 5, 14, 3, 1)
-#define cntv_cval_EL02 sys_reg(3, 5, 14, 3, 2)
-#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
-#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
-
/**
* hyp_alternate_select - Generates patchable code sequences that are
* used to switch between two implementations of a function, depending
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index b01c608..b8d4d0c 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -338,6 +338,30 @@
#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
#define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
+/* The VHE specific system registers and their encoding */
+#define sctlr_EL12 sys_reg(3, 5, 1, 0, 0)
+#define cpacr_EL12 sys_reg(3, 5, 1, 0, 2)
+#define ttbr0_EL12 sys_reg(3, 5, 2, 0, 0)
+#define ttbr1_EL12 sys_reg(3, 5, 2, 0, 1)
+#define tcr_EL12 sys_reg(3, 5, 2, 0, 2)
+#define afsr0_EL12 sys_reg(3, 5, 5, 1, 0)
+#define afsr1_EL12 sys_reg(3, 5, 5, 1, 1)
+#define esr_EL12 sys_reg(3, 5, 5, 2, 0)
+#define far_EL12 sys_reg(3, 5, 6, 0, 0)
+#define mair_EL12 sys_reg(3, 5, 10, 2, 0)
+#define amair_EL12 sys_reg(3, 5, 10, 3, 0)
+#define vbar_EL12 sys_reg(3, 5, 12, 0, 0)
+#define contextidr_EL12 sys_reg(3, 5, 13, 0, 1)
+#define cntkctl_EL12 sys_reg(3, 5, 14, 1, 0)
+#define cntp_tval_EL02 sys_reg(3, 5, 14, 2, 0)
+#define cntp_ctl_EL02 sys_reg(3, 5, 14, 2, 1)
+#define cntp_cval_EL02 sys_reg(3, 5, 14, 2, 2)
+#define cntv_tval_EL02 sys_reg(3, 5, 14, 3, 0)
+#define cntv_ctl_EL02 sys_reg(3, 5, 14, 3, 1)
+#define cntv_cval_EL02 sys_reg(3, 5, 14, 3, 2)
+#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
+#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
+
#define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)
/* Common SCTLR_ELx flags. */
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index e1bc753..f3d3398 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -121,6 +121,13 @@ static void copy_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu, bool setup)
for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
const int sr = el1_non_trap_regs[i];
+ /*
+ * We trap on cntkctl_el12 accesses from virtual EL2 as suppose
+ * to not trapping on cntlctl_el1 accesses.
+ */
+ if (vcpu_el2_e2h_is_set(vcpu) && sr == CNTKCTL_EL1)
+ continue;
+
if (setup)
s_sys_regs[sr] = vcpu_sys_reg(vcpu, sr);
else
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b3e0cb8..2aa922c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -905,6 +905,14 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
*sysreg = p->regval;
}
+static bool access_cntkctl_el12(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ return true;
+}
+
static u64 *get_special_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
{
u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
@@ -1201,6 +1209,23 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
{ SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },
+ { SYS_DESC(sctlr_EL12), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
+ { SYS_DESC(cpacr_EL12), access_cpacr, reset_val, CPACR_EL1, 0 },
+ { SYS_DESC(ttbr0_EL12), access_vm_reg, reset_unknown, TTBR0_EL1 },
+ { SYS_DESC(ttbr1_EL12), access_vm_reg, reset_unknown, TTBR1_EL1 },
+ { SYS_DESC(tcr_EL12), access_vm_reg, reset_val, TCR_EL1, 0 },
+ { SYS_DESC(spsr_EL12), access_spsr},
+ { SYS_DESC(elr_EL12), access_elr},
+ { SYS_DESC(afsr0_EL12), access_vm_reg, reset_unknown, AFSR0_EL1 },
+ { SYS_DESC(afsr1_EL12), access_vm_reg, reset_unknown, AFSR1_EL1 },
+ { SYS_DESC(esr_EL12), access_vm_reg, reset_unknown, ESR_EL1 },
+ { SYS_DESC(far_EL12), access_vm_reg, reset_unknown, FAR_EL1 },
+ { SYS_DESC(mair_EL12), access_vm_reg, reset_unknown, MAIR_EL1 },
+ { SYS_DESC(amair_EL12), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
+ { SYS_DESC(vbar_EL12), access_vbar, reset_val, VBAR_EL1, 0 },
+ { SYS_DESC(contextidr_EL12), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
+ { SYS_DESC(cntkctl_EL12), access_cntkctl_el12, reset_val, CNTKCTL_EL1, 0 },
+
{ SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
};
--
1.9.1
Now that the virtual EL2 can access EL2 register states via EL1
registers, we need to consider it when selecting the register to
emulate.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/sys_regs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 44 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 79980be..910b50d 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -110,6 +110,31 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
return true;
}
+struct el1_el2_map {
+ int el1;
+ int el2;
+};
+
+static const struct el1_el2_map vm_map[] = {
+ {SCTLR_EL1, SCTLR_EL2},
+ {TTBR0_EL1, TTBR0_EL2},
+ {TTBR1_EL1, TTBR1_EL2},
+ {TCR_EL1, TCR_EL2},
+ {ESR_EL1, ESR_EL2},
+ {FAR_EL1, FAR_EL2},
+ {AFSR0_EL1, AFSR0_EL2},
+ {AFSR1_EL1, AFSR1_EL2},
+ {MAIR_EL1, MAIR_EL2},
+ {AMAIR_EL1, AMAIR_EL2},
+ {CONTEXTIDR_EL1, CONTEXTIDR_EL2},
+};
+
+static inline bool el12_reg(struct sys_reg_params *p)
+{
+ /* All *_EL12 registers have Op1=5. */
+ return (p->Op1 == 5);
+}
+
/*
* Generic accessor for VM registers. Only called as long as HCR_TVM
* is set. If the guest enables the MMU, we stop trapping the VM
@@ -120,16 +145,33 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *r)
{
bool was_enabled = vcpu_has_cache_enabled(vcpu);
+ u64 *sysreg = &vcpu_sys_reg(vcpu, r->reg);
+ int i;
+ const struct el1_el2_map *map;
+
+ /*
+ * Redirect EL1 register accesses to the corresponding EL2 registers if
+ * they are meant to access EL2 registers.
+ */
+ if (vcpu_el2_e2h_is_set(vcpu) && !el12_reg(p)) {
+ for (i = 0; i < ARRAY_SIZE(vm_map); i++) {
+ map = &vm_map[i];
+ if (map->el1 == r->reg) {
+ sysreg = &vcpu_sys_reg(vcpu, map->el2);
+ break;
+ }
+ }
+ }
BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
if (!p->is_write) {
- p->regval = vcpu_sys_reg(vcpu, r->reg);
+ p->regval = *sysreg;
return true;
}
if (!p->is_aarch32) {
- vcpu_sys_reg(vcpu, r->reg) = p->regval;
+ *sysreg = p->regval;
} else {
if (!p->is_32bit)
vcpu_cp15_64_high(vcpu, r->reg) = upper_32_bits(p->regval);
--
1.9.1
While the EL1 virtual memory control registers can be accessed in the
virtual EL2 with VHE without trap to manuplate the virtual EL2 states,
we can't do that for CPTR_EL2 for an unfortunate reason.
This is because the top bit of CPTR_EL2, which is TCPAC, will be ignored
if it is accessed via CPACR_EL1 in the virtual EL2 without trap since
the top bot of cpacr_el1 is RES0. Therefore we need to trap CPACR_EL1
accesses from the virtual EL2 to emulate this bit correctly.
Set CPTR_EL2.TCPAC bit to trap CPACR_EL1 accesses and handle them in the
existing handler considering that they could be meant to access CPTR_EL2
instead in the virtual EL2 with VHE.
Note that CPTR_EL2 format depends on HCR_EL2.E2H bit. We always keep it
in v8.0 format for the convenience. Otherwise, we need to check E2H bit
and use different bit masks in the entry.S, and we also check E2H bit in
all places we access virtual CPTR_EL2. The downside of using v8.0 format
is to convert the format when copying states between CPTR_EL2 and
CPACR_EL1 to support the virtual EL2 with VHE. The decision is subject
to change depending on the future discussion.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_emulate.h | 2 ++
arch/arm64/kvm/context.c | 29 ++++++++++++++++++++++++++---
arch/arm64/kvm/hyp/switch.c | 2 ++
arch/arm64/kvm/sys_regs.c | 18 +++++++++++++++++-
4 files changed, 47 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 68aafbd..4776bfc 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -59,6 +59,8 @@ enum exception_type {
void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
+u64 cptr_to_cpacr(u64 cptr_el2);
+u64 cpacr_to_cptr(u64 cpacr_el1);
static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 9947bc8..a7811e1 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -66,7 +66,7 @@ static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
<< TCR_IPS_SHIFT;
}
-static inline u64 cptr_to_cpacr(u64 cptr_el2)
+u64 cptr_to_cpacr(u64 cptr_el2)
{
u64 cpacr_el1 = 0;
@@ -78,6 +78,21 @@ static inline u64 cptr_to_cpacr(u64 cptr_el2)
return cpacr_el1;
}
+u64 cpacr_to_cptr(u64 cpacr_el1)
+{
+ u64 cptr_el2;
+
+ cptr_el2 = CPTR_EL2_DEFAULT;
+ if (!(cpacr_el1 & CPACR_EL1_FPEN))
+ cptr_el2 |= CPTR_EL2_TFP;
+ if (cpacr_el1 & CPACR_EL1_TTA)
+ cptr_el2 |= CPTR_EL2_TTA;
+ if (cpacr_el1 & CPTR_EL2_TCPAC)
+ cptr_el2 |= CPTR_EL2_TCPAC;
+
+ return cptr_el2;
+}
+
static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
{
u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
@@ -93,8 +108,12 @@ static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
const struct el1_el2_map *map = &vhe_map[i];
+ u64 *el2_reg = &vcpu_sys_reg(vcpu, map->el2);
- vcpu_sys_reg(vcpu, map->el2) = s_sys_regs[map->el1];
+ /* We do trap-and-emulate CPACR_EL1 accesses. So, don't sync */
+ if (map->el2 == CPTR_EL2)
+ continue;
+ *el2_reg = s_sys_regs[map->el1];
}
}
@@ -138,8 +157,12 @@ static void flush_shadow_el1_sysregs_vhe(struct kvm_vcpu *vcpu)
*/
for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
const struct el1_el2_map *map = &vhe_map[i];
+ u64 *el1_reg = &s_sys_regs[map->el1];
- s_sys_regs[map->el1] = vcpu_sys_reg(vcpu, map->el2);
+ if (map->el2 == CPTR_EL2)
+ *el1_reg = cptr_to_cpacr(vcpu_sys_reg(vcpu, map->el2));
+ else
+ *el1_reg = vcpu_sys_reg(vcpu, map->el2);
}
}
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index fffd0c7..50c90f2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -50,6 +50,8 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
val = read_sysreg(cpacr_el1);
val |= CPACR_EL1_TTA;
val &= ~CPACR_EL1_FPEN;
+ if (is_hyp_ctxt(vcpu))
+ val |= CPTR_EL2_TCPAC;
write_sysreg(val, cpacr_el1);
write_sysreg(__kvm_hyp_vector, vbar_el1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2aa922c..79980be 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -972,7 +972,23 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
struct sys_reg_params *p,
const struct sys_reg_desc *r)
{
- access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
+
+ /*
+ * When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
+ * in the virtual EL2 is to access CPTR_EL2.
+ */
+ if (vcpu_el2_e2h_is_set(vcpu) && (reg == SYS_CPACR_EL1)) {
+ u64 *sysreg = &vcpu_sys_reg(vcpu, CPTR_EL2);
+
+ /* We keep the value in ARMv8.0 CPTR_EL2 format. */
+ if (!p->is_write)
+ p->regval = cptr_to_cpacr(*sysreg);
+ else
+ *sysreg = cpacr_to_cptr(p->regval);
+ } else /* CPACR_EL1 access with E2H == 0 or CPACR_EL12 access */
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+
return true;
}
--
1.9.1
When creating the shadow context for the virtual EL2 execution, we can
directly copy the EL2 register states to the shadow EL1 register states
if the virtual HCR_EL2.E2H bit is set. This is because EL1 and EL2
system register formats compatible with E2H=1.
Now that we allow the virtual EL2 modify its EL2 registers without trap
via the physical EL1 system register accesses, we need to reflect the
changes made to the EL1 system registers to the virtual EL2 register
states. This is not required to the virtual EL2 without VHE, since the
virtual EL2 should always use _EL2 accessors, which traps to EL2.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/context.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 66 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 39bd92d..9947bc8 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -39,6 +39,27 @@ struct el1_el2_map {
{ VBAR_EL1, VBAR_EL2 },
};
+/*
+ * List of pair of EL1/EL2 registers which are used to access real EL2
+ * registers in EL2 with E2H bit set.
+ */
+static const struct el1_el2_map vhe_map[] = {
+ { SCTLR_EL1, SCTLR_EL2 },
+ { CPACR_EL1, CPTR_EL2 },
+ { TTBR0_EL1, TTBR0_EL2 },
+ { TTBR1_EL1, TTBR1_EL2 },
+ { TCR_EL1, TCR_EL2},
+ { AFSR0_EL1, AFSR0_EL2 },
+ { AFSR1_EL1, AFSR1_EL2 },
+ { ESR_EL1, ESR_EL2},
+ { FAR_EL1, FAR_EL2},
+ { MAIR_EL1, MAIR_EL2 },
+ { AMAIR_EL1, AMAIR_EL2 },
+ { VBAR_EL1, VBAR_EL2 },
+ { CONTEXTIDR_EL1, CONTEXTIDR_EL2 },
+ { CNTKCTL_EL1, CNTHCTL_EL2 },
+};
+
static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
{
return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
@@ -57,7 +78,27 @@ static inline u64 cptr_to_cpacr(u64 cptr_el2)
return cpacr_el1;
}
-static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+{
+ u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+ int i;
+
+ /*
+ * In the virtual EL2 without VHE no EL1 system registers can't be
+ * changed without trap except el1_non_trap_regs[]. So we have nothing
+ * to sync on exit from a guest.
+ */
+ if (!vcpu_el2_e2h_is_set(vcpu))
+ return;
+
+ for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
+ const struct el1_el2_map *map = &vhe_map[i];
+
+ vcpu_sys_reg(vcpu, map->el2) = s_sys_regs[map->el1];
+ }
+}
+
+static void flush_shadow_el1_sysregs_nvhe(struct kvm_vcpu *vcpu)
{
u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
u64 tcr_el2;
@@ -86,6 +127,29 @@ static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
s_sys_regs[CPACR_EL1] = cptr_to_cpacr(vcpu_sys_reg(vcpu, CPTR_EL2));
}
+static void flush_shadow_el1_sysregs_vhe(struct kvm_vcpu *vcpu)
+{
+ u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+ int i;
+
+ /*
+ * When e2h bit is set, EL2 registers becomes compatible
+ * with corrensponding EL1 registers. So, no conversion required.
+ */
+ for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
+ const struct el1_el2_map *map = &vhe_map[i];
+
+ s_sys_regs[map->el1] = vcpu_sys_reg(vcpu, map->el2);
+ }
+}
+
+static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+{
+ if (vcpu_el2_e2h_is_set(vcpu))
+ flush_shadow_el1_sysregs_vhe(vcpu);
+ else
+ flush_shadow_el1_sysregs_nvhe(vcpu);
+}
/*
* List of EL0 and EL1 registers which we allow the virtual EL2 mode to access
@@ -247,6 +311,7 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
if (unlikely(is_hyp_ctxt(vcpu))) {
sync_shadow_special_regs(vcpu);
sync_shadow_non_trap_el1_state(vcpu);
+ sync_shadow_el1_sysregs(vcpu);
} else
sync_special_regs(vcpu);
}
--
1.9.1
ARMv8.1 added more EL2 registers: TTBR1_EL2, CONTEXTIDR_EL2, and three
EL2 virtual timer registers. Add the first two registers to vcpu context
and set their handlers. The timer registers and their handlers will be
added in a separate patch.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/asm/sysreg.h | 2 ++
arch/arm64/kvm/sys_regs.c | 2 ++
3 files changed, 6 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 53b0b33..373235c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -151,6 +151,7 @@ enum vcpu_sysreg {
HSTR_EL2, /* Hypervisor System Trap Register */
HACR_EL2, /* Hypervisor Auxiliary Control Register */
TTBR0_EL2, /* Translation Table Base Register 0 (EL2) */
+ TTBR1_EL2, /* Translation Table Base Register 1 (EL2) */
TCR_EL2, /* Translation Control Register (EL2) */
VTTBR_EL2, /* Virtualization Translation Table Base Register */
VTCR_EL2, /* Virtualization Translation Control Register */
@@ -164,6 +165,7 @@ enum vcpu_sysreg {
VBAR_EL2, /* Vector Base Address Register (EL2) */
RVBAR_EL2, /* Reset Vector Base Address Register */
RMR_EL2, /* Reset Management Register */
+ CONTEXTIDR_EL2, /* Context ID Register (EL2) */
TPIDR_EL2, /* EL2 Software Thread ID Register */
CNTVOFF_EL2, /* Counter-timer Virtual Offset register */
CNTHCTL_EL2, /* Counter-timer Hypervisor Control register */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 6373d3d..b01c608 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -264,6 +264,7 @@
#define SYS_HACR_EL2 sys_reg(3, 4, 1, 1, 7)
#define SYS_TTBR0_EL2 sys_reg(3, 4, 2, 0, 0)
+#define SYS_TTBR1_EL2 sys_reg(3, 4, 2, 0, 1)
#define SYS_TCR_EL2 sys_reg(3, 4, 2, 0, 2)
#define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0)
#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)
@@ -331,6 +332,7 @@
#define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
#define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
+#define SYS_CONTEXTIDR_EL2 sys_reg(3, 4, 13, 0, 1)
#define SYS_TPIDR_EL2 sys_reg(3, 4, 13, 0, 2)
#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index dbf5022..b3e0cb8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1168,6 +1168,7 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_HACR_EL2), trap_el2_regs, reset_val, HACR_EL2, 0 },
{ SYS_DESC(SYS_TTBR0_EL2), trap_el2_regs, reset_val, TTBR0_EL2, 0 },
+ { SYS_DESC(SYS_TTBR1_EL2), trap_el2_regs, reset_val, TTBR1_EL2, 0 },
{ SYS_DESC(SYS_TCR_EL2), trap_el2_regs, reset_val, TCR_EL2, 0 },
{ SYS_DESC(SYS_VTTBR_EL2), trap_el2_regs, reset_val, VTTBR_EL2, 0 },
{ SYS_DESC(SYS_VTCR_EL2), trap_el2_regs, reset_val, VTCR_EL2, 0 },
@@ -1194,6 +1195,7 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_RVBAR_EL2), trap_el2_regs, reset_val, RVBAR_EL2, 0 },
{ SYS_DESC(SYS_RMR_EL2), trap_el2_regs, reset_val, RMR_EL2, 0 },
+ { SYS_DESC(SYS_CONTEXTIDR_EL2), trap_el2_regs, reset_val, CONTEXTIDR_EL2, 0 },
{ SYS_DESC(SYS_TPIDR_EL2), trap_el2_regs, reset_val, TPIDR_EL2, 0 },
{ SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
--
1.9.1
These macros will be used to support the virtual EL2 with VHE.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_emulate.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 3017234..68aafbd 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -173,6 +173,30 @@ static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
}
+static inline bool vcpu_el2_e2h_is_set(const struct kvm_vcpu *vcpu)
+{
+ return (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_E2H);
+}
+
+static inline bool vcpu_el2_tge_is_set(const struct kvm_vcpu *vcpu)
+{
+ return (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_TGE);
+}
+
+static inline bool is_hyp_ctxt(const struct kvm_vcpu *vcpu)
+{
+ /*
+ * We are in a hypervisor context if the vcpu mode is EL2 or
+ * E2H and TGE bits are set. The latter means we are in the user space
+ * of the VHE kernel. ARMv8.1 ARM describes this as 'InHost'
+ */
+ if (vcpu_mode_el2(vcpu) ||
+ (vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)))
+ return true;
+
+ return false;
+}
+
static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
{
return vcpu->arch.fault.esr_el2;
--
1.9.1
For the same reason we trap virtual memory register accesses in virtual
EL2, we trap CPACR_EL1 access too; We allow the virtual EL2 mode to
access EL1 system register state instead of the virtual EL2 one.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/hyp/switch.c | 10 +++++++---
arch/arm64/kvm/sys_regs.c | 10 +++++++++-
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index ec91cd08..d513da9 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -42,7 +42,8 @@ bool __hyp_text __fpsimd_enabled(void)
return __fpsimd_is_enabled()();
}
-static void __hyp_text __activate_traps_vhe(void)
+static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
+
{
u64 val;
@@ -54,12 +55,15 @@ static void __hyp_text __activate_traps_vhe(void)
write_sysreg(__kvm_hyp_vector, vbar_el1);
}
-static void __hyp_text __activate_traps_nvhe(void)
+static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
+
{
u64 val;
val = CPTR_EL2_DEFAULT;
val |= CPTR_EL2_TTA | CPTR_EL2_TFP;
+ if (vcpu_mode_el2(vcpu))
+ val |= CPTR_EL2_TCPAC;
write_sysreg(val, cptr_el2);
}
@@ -99,7 +103,7 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(0, pmselr_el0);
write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
- __activate_traps_arch()();
+ __activate_traps_arch()(vcpu);
}
static void __hyp_text __deactivate_traps_vhe(void)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index b83fef2..7062645 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -960,6 +960,14 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
return true;
}
+static bool access_cpacr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ return true;
+}
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1013,7 +1021,7 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
{ SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
- { SYS_DESC(SYS_CPACR_EL1), NULL, reset_val, CPACR_EL1, 0 },
+ { SYS_DESC(SYS_CPACR_EL1), access_cpacr, reset_val, CPACR_EL1, 0 },
{ SYS_DESC(SYS_TTBR0_EL1), access_vm_reg, reset_unknown, TTBR0_EL1 },
{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
{ SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 },
--
1.9.1
Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/kvm/handle_exit.c | 13 ++++++++++++-
arch/arm64/kvm/nested.c | 20 ++++++++++++++++++++
3 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 46880c3..53b0b33 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -442,5 +442,6 @@ static inline void __cpu_init_stage2(void)
int __init kvmarm_nested_cfg(char *buf);
int init_nested_virt(void);
bool nested_virt_in_use(struct kvm_vcpu *vcpu);
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 8b398b2..25ec824 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -107,7 +107,18 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
*/
static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+ bool is_wfe = !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE);
+
+ if (nested_virt_in_use(vcpu)) {
+ int ret = handle_wfx_nested(vcpu, is_wfe);
+
+ if (ret < 0 && ret != -EINVAL)
+ return ret;
+ else if (ret >= 0)
+ return ret;
+ }
+
+ if (is_wfe) {
trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
vcpu->stat.wfe_exit_stat++;
kvm_vcpu_on_spin(vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 9a05c76..042d304 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -18,6 +18,8 @@
#include <linux/kvm.h>
#include <linux/kvm_host.h>
+#include <asm/kvm_emulate.h>
+
static bool nested_param;
int __init kvmarm_nested_cfg(char *buf)
@@ -41,3 +43,21 @@ bool nested_virt_in_use(struct kvm_vcpu *vcpu)
return false;
}
+
+/*
+ * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
+ * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
+ * handle this.
+ */
+int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
+{
+ u64 hcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
+
+ if (vcpu_mode_el2(vcpu))
+ return -EINVAL;
+
+ if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
+ return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+
+ return -EINVAL;
+}
--
1.9.1
Now that the psci call is done by the smc instruction when nested
virtualization is enabled, it is clear that all hvc instruction from the
VM (including from the virtual EL2) are supposed to handled in the
virtual EL2.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/handle_exit.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 6cf6b93..8b398b2 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -42,6 +42,12 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
kvm_vcpu_hvc_get_imm(vcpu));
vcpu->stat.hvc_exit_stat++;
+ /* Forward hvc instructions to the virtual EL2 if the guest has EL2. */
+ if (nested_virt_in_use(vcpu)) {
+ kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
+ return 1;
+ }
+
ret = kvm_psci_call(vcpu);
if (ret < 0) {
kvm_inject_undefined(vcpu);
--
1.9.1
When HCR.NV bit is set, eret instructions trap to EL2 with EC code 0x1A.
Emulate eret instructions by setting pc and pstate.
Note that the current exception level is always the virtual EL2, since
we set HCR_EL2.NV bit only when entering the virtual EL2. So, we take
spsr and elr states from the virtual _EL2 registers.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/esr.h | 1 +
arch/arm64/kvm/handle_exit.c | 16 ++++++++++++++++
arch/arm64/kvm/trace.h | 21 +++++++++++++++++++++
3 files changed, 38 insertions(+)
diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index e7d8e28..210fde6 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -43,6 +43,7 @@
#define ESR_ELx_EC_HVC64 (0x16)
#define ESR_ELx_EC_SMC64 (0x17)
#define ESR_ELx_EC_SYS64 (0x18)
+#define ESR_ELx_EC_ERET (0x1A)
/* Unallocated EC: 0x19 - 0x1E */
#define ESR_ELx_EC_IMP_DEF (0x1f)
#define ESR_ELx_EC_IABT_LOW (0x20)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 17d8a16..9259881 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -147,6 +147,21 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
return 1;
}
+static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+ trace_kvm_nested_eret(vcpu, vcpu_el2_sreg(vcpu, ELR_EL2),
+ vcpu_el2_sreg(vcpu, SPSR_EL2));
+
+ /*
+ * Note that the current exception level is always the virtual EL2,
+ * since we set HCR_EL2.NV bit only when entering the virtual EL2.
+ */
+ *vcpu_pc(vcpu) = vcpu_el2_sreg(vcpu, ELR_EL2);
+ *vcpu_cpsr(vcpu) = vcpu_el2_sreg(vcpu, SPSR_EL2);
+
+ return 1;
+}
+
static exit_handle_fn arm_exit_handlers[] = {
[0 ... ESR_ELx_EC_MAX] = kvm_handle_unknown_ec,
[ESR_ELx_EC_WFx] = kvm_handle_wfx,
@@ -160,6 +175,7 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
[ESR_ELx_EC_HVC64] = handle_hvc,
[ESR_ELx_EC_SMC64] = handle_smc,
[ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
+ [ESR_ELx_EC_ERET] = kvm_handle_eret,
[ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 7c86cfb..5f40987 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -187,6 +187,27 @@
TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
__entry->vcpu, __entry->esr_el2, __entry->pc)
);
+
+TRACE_EVENT(kvm_nested_eret,
+ TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
+ unsigned long spsr_el2),
+ TP_ARGS(vcpu, elr_el2, spsr_el2),
+
+ TP_STRUCT__entry(
+ __field(struct kvm_vcpu *, vcpu)
+ __field(unsigned long, elr_el2)
+ __field(unsigned long, spsr_el2)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->elr_el2 = elr_el2;
+ __entry->spsr_el2 = spsr_el2;
+ ),
+
+ TP_printk("vcpu: %p, eret to elr_el2: 0x%016lx, with spsr_el2: 0x%08lx",
+ __entry->vcpu, __entry->elr_el2, __entry->spsr_el2)
+);
#endif /* _TRACE_ARM64_KVM_H */
#undef TRACE_INCLUDE_PATH
--
1.9.1
When HCR.NV bit is set, execution of the EL2 translation regime address
aranslation instructions and TLB maintenance instructions are trapped to
EL2. In addition, execution of the EL1 translation regime address
aranslation instructions and TLB maintenance instructions that are only
accessible from EL2 and above are trapped to EL2. In these cases,
ESR_EL2.EC will be set to 0x18.
Change the existing handler to handle those system instructions as well
as MRS/MSR instructions. Emulation of each system instructions will be
done in separate patches.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_coproc.h | 2 +-
arch/arm64/kvm/handle_exit.c | 2 +-
arch/arm64/kvm/sys_regs.c | 53 ++++++++++++++++++++++++++++++++-----
arch/arm64/kvm/trace.h | 2 +-
4 files changed, 50 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
index 0b52377..1b3d21b 100644
--- a/arch/arm64/include/asm/kvm_coproc.h
+++ b/arch/arm64/include/asm/kvm_coproc.h
@@ -43,7 +43,7 @@ void kvm_register_target_sys_reg_table(unsigned int target,
int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
#define kvm_coproc_table_init kvm_sys_reg_table_init
void kvm_sys_reg_table_init(void);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 9259881..d19e253 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -174,7 +174,7 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
[ESR_ELx_EC_SMC32] = handle_smc,
[ESR_ELx_EC_HVC64] = handle_hvc,
[ESR_ELx_EC_SMC64] = handle_smc,
- [ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
+ [ESR_ELx_EC_SYS64] = kvm_handle_sys,
[ESR_ELx_EC_ERET] = kvm_handle_eret,
[ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
[ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7062645..dbf5022 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1808,6 +1808,40 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
return 1;
}
+static int emulate_tlbi(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *params)
+{
+ /* TODO: support tlbi instruction emulation*/
+ kvm_inject_undefined(vcpu);
+ return 1;
+}
+
+static int emulate_at(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *params)
+{
+ /* TODO: support address translation instruction emulation */
+ kvm_inject_undefined(vcpu);
+ return 1;
+}
+
+static int emulate_sys_instr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *params)
+{
+ int ret = 0;
+
+ /* TLB maintenance instructions*/
+ if (params->CRn == 0b1000)
+ ret = emulate_tlbi(vcpu, params);
+ /* Address Translation instructions */
+ else if (params->CRn == 0b0111 && params->CRm == 0b1000)
+ ret = emulate_at(vcpu, params);
+
+ if (ret)
+ kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+
+ return ret;
+}
+
static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
const struct sys_reg_desc *table, size_t num)
{
@@ -1819,18 +1853,19 @@ static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
}
/**
- * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
+ * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
+ on a guest execution
* @vcpu: The VCPU pointer
* @run: The kvm_run struct
*/
-int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
+int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
struct sys_reg_params params;
unsigned long esr = kvm_vcpu_get_hsr(vcpu);
int Rt = kvm_vcpu_sys_get_rt(vcpu);
int ret;
- trace_kvm_handle_sys_reg(esr);
+ trace_kvm_handle_sys(esr);
params.is_aarch32 = false;
params.is_32bit = false;
@@ -1842,10 +1877,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
params.regval = vcpu_get_reg(vcpu, Rt);
params.is_write = !(esr & 1);
- ret = emulate_sys_reg(vcpu, ¶ms);
+ if (params.Op0 == 1) {
+ /* System instructions */
+ ret = emulate_sys_instr(vcpu, ¶ms);
+ } else {
+ /* MRS/MSR instructions */
+ ret = emulate_sys_reg(vcpu, ¶ms);
+ if (!params.is_write)
+ vcpu_set_reg(vcpu, Rt, params.regval);
+ }
- if (!params.is_write)
- vcpu_set_reg(vcpu, Rt, params.regval);
return ret;
}
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 5f40987..192708e 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -134,7 +134,7 @@
TP_printk("%s %s reg %d (0x%08llx)", __entry->fn, __entry->is_write?"write to":"read from", __entry->reg, __entry->write_value)
);
-TRACE_EVENT(kvm_handle_sys_reg,
+TRACE_EVENT(kvm_handle_sys,
TP_PROTO(unsigned long hsr),
TP_ARGS(hsr),
--
1.9.1
VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
However, when we come to provide the virtual EL2 mode to the VM, the
host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
it's hard to differentiate between them from the host hypervisor's point
of view.
So, let the VM execute smc instruction for the psci call. On ARMv8.3,
even if EL3 is not implemented, a smc instruction executed at non-secure
EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
UNDEFINED. So, the host hypervisor can handle this psci call without any
confusion.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index d19e253..6cf6b93 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -53,8 +53,28 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
- kvm_inject_undefined(vcpu);
- return 1;
+ int ret;
+
+ /* If imm is non-zero, it's not defined */
+ if (kvm_vcpu_hvc_get_imm(vcpu)) {
+ kvm_inject_undefined(vcpu);
+ return 1;
+ }
+
+ /*
+ * If imm is zero, it's a psci call.
+ * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
+ * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
+ * being treated as UNDEFINED.
+ */
+ ret = kvm_psci_call(vcpu);
+ if (ret < 0) {
+ kvm_inject_undefined(vcpu);
+ return 1;
+ }
+ kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+
+ return ret;
}
/*
--
1.9.1
For the same reason we trap virtual memory register accesses at virtual
EL2, we need to trap SPSR_EL1, ELR_EL1 and VBAR_EL1 accesses. ARM v8.3
introduces the HCR_EL2.NV1 bit to be able to trap on those register
accesses in EL1. Do not set this bit until the whole nesting support is
completed.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/sysreg.h | 2 ++
arch/arm64/kvm/sys_regs.c | 29 ++++++++++++++++++++++++++++-
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 98c32ef..6373d3d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -164,6 +164,8 @@
#define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1)
#define SYS_TCR_EL1 sys_reg(3, 0, 2, 0, 2)
+#define SYS_SPSR_EL1 sys_reg(3, 0, 4, 0, 0)
+#define SYS_ELR_EL1 sys_reg(3, 0, 4, 0, 1)
#define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0)
#define SYS_AFSR0_EL1 sys_reg(3, 0, 5, 1, 0)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d8b1d4b..b83fef2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -936,6 +936,30 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
return true;
}
+static bool access_elr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
+ return true;
+}
+
+static bool access_spsr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
+ return true;
+}
+
+static bool access_vbar(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+ return true;
+}
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -994,6 +1018,9 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_TTBR1_EL1), access_vm_reg, reset_unknown, TTBR1_EL1 },
{ SYS_DESC(SYS_TCR_EL1), access_vm_reg, reset_val, TCR_EL1, 0 },
+ { SYS_DESC(SYS_SPSR_EL1), access_spsr},
+ { SYS_DESC(SYS_ELR_EL1), access_elr},
+
{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
@@ -1006,7 +1033,7 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
{ SYS_DESC(SYS_MAIR_EL1), access_vm_reg, reset_unknown, MAIR_EL1 },
{ SYS_DESC(SYS_AMAIR_EL1), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
- { SYS_DESC(SYS_VBAR_EL1), NULL, reset_val, VBAR_EL1, 0 },
+ { SYS_DESC(SYS_VBAR_EL1), access_vbar, reset_val, VBAR_EL1, 0 },
{ SYS_DESC(SYS_ICC_IAR0_EL1), write_to_read_only },
{ SYS_DESC(SYS_ICC_EOIR0_EL1), read_from_write_only },
--
1.9.1
From: Christoffer Dall <[email protected]>
When running in virtual EL2 mode, we actually run the hardware in EL1
and therefore have to use the EL1 registers to ensure correct operation.
By setting the HCR.TVM and HCR.TVRM we ensure that the virtual EL2 mode
doesn't shoot itself in the foot when setting up what it believes to be
a different mode's system register state (for example when preparing to
switch to a VM).
We can leverage the existing sysregs infrastructure to support trapped
accesses to these registers.
Signed-off-by: Christoffer Dall <[email protected]>
---
arch/arm64/kvm/hyp/switch.c | 2 ++
arch/arm64/kvm/sys_regs.c | 7 ++++++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..ec91cd08 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -85,6 +85,8 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
write_sysreg(1 << 30, fpexc32_el2);
isb();
}
+ if (vcpu_mode_el2(vcpu))
+ val |= HCR_TVM | HCR_TRVM;
write_sysreg(val, hcr_el2);
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2b3ed70..d8b1d4b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -121,7 +121,12 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
{
bool was_enabled = vcpu_has_cache_enabled(vcpu);
- BUG_ON(!p->is_write);
+ BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
+
+ if (!p->is_write) {
+ p->regval = vcpu_sys_reg(vcpu, r->reg);
+ return true;
+ }
if (!p->is_aarch32) {
vcpu_sys_reg(vcpu, r->reg) = p->regval;
--
1.9.1
These macros and enums can be reused to inject exceptions
for nested virtualization.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_emulate.h | 12 ++++++++++++
arch/arm64/kvm/inject_fault.c | 12 ------------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 14c4ce9..94f98cc 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -31,6 +31,18 @@
#include <asm/cputype.h>
#include <asm/virt.h>
+#define CURRENT_EL_SP_EL0_VECTOR 0x0
+#define CURRENT_EL_SP_ELx_VECTOR 0x200
+#define LOWER_EL_AArch64_VECTOR 0x400
+#define LOWER_EL_AArch32_VECTOR 0x600
+
+enum exception_type {
+ except_type_sync = 0,
+ except_type_irq = 0x80,
+ except_type_fiq = 0x100,
+ except_type_serror = 0x180,
+};
+
unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index da6a8cf..94679fb 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -28,11 +28,6 @@
#define PSTATE_FAULT_BITS_64 (PSR_MODE_EL1h | PSR_A_BIT | PSR_F_BIT | \
PSR_I_BIT | PSR_D_BIT)
-#define CURRENT_EL_SP_EL0_VECTOR 0x0
-#define CURRENT_EL_SP_ELx_VECTOR 0x200
-#define LOWER_EL_AArch64_VECTOR 0x400
-#define LOWER_EL_AArch32_VECTOR 0x600
-
static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
{
unsigned long cpsr;
@@ -101,13 +96,6 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
*fsr = 0x14;
}
-enum exception_type {
- except_type_sync = 0,
- except_type_irq = 0x80,
- except_type_fiq = 0x100,
- except_type_serror = 0x180,
-};
-
static u64 get_except_vector(struct kvm_vcpu *vcpu, enum exception_type type)
{
u64 exc_offset;
--
1.9.1
Support inject synchronous exceptions to the virtual EL2 as
described in ARM ARM AArch64.TakeException().
This can be easily extended to support to inject asynchronous exceptions
to the virtual EL2, but it will be added in a later patch when appropriate.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm/include/asm/kvm_emulate.h | 7 +++
arch/arm64/include/asm/kvm_emulate.h | 2 +
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/emulate-nested.c | 83 ++++++++++++++++++++++++++++++++++++
arch/arm64/kvm/trace.h | 20 +++++++++
5 files changed, 113 insertions(+)
create mode 100644 arch/arm64/kvm/emulate-nested.c
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 0a03b7d..29a4dec 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -47,6 +47,13 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+ kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+ __func__);
+ return -EINVAL;
+}
+
static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 94f98cc..3017234 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -54,6 +54,8 @@ enum exception_type {
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
+
void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5762337..0263ef0 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
+kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
new file mode 100644
index 0000000..48b84cc
--- /dev/null
+++ b/arch/arm64/kvm/emulate-nested.c
@@ -0,0 +1,83 @@
+/*
+ * Copyright (C) 2016 - Linaro and Columbia University
+ * Author: Jintack Lim <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+#include <asm/kvm_emulate.h>
+
+#include "trace.h"
+
+/* This is borrowed from get_except_vector in inject_fault.c */
+static u64 get_el2_except_vector(struct kvm_vcpu *vcpu,
+ enum exception_type type)
+{
+ u64 exc_offset;
+
+ switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
+ case PSR_MODE_EL2t:
+ exc_offset = CURRENT_EL_SP_EL0_VECTOR;
+ break;
+ case PSR_MODE_EL2h:
+ exc_offset = CURRENT_EL_SP_ELx_VECTOR;
+ break;
+ case PSR_MODE_EL1t:
+ case PSR_MODE_EL1h:
+ case PSR_MODE_EL0t:
+ exc_offset = LOWER_EL_AArch64_VECTOR;
+ break;
+ default:
+ kvm_err("Unexpected previous exception level: aarch32\n");
+ exc_offset = LOWER_EL_AArch32_VECTOR;
+ }
+
+ return vcpu_sys_reg(vcpu, VBAR_EL2) + exc_offset + type;
+}
+
+/*
+ * Emulate taking an exception to EL2.
+ * See ARM ARM J8.1.2 AArch64.TakeException()
+ */
+static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
+ enum exception_type type)
+{
+ int ret = 1;
+
+ if (!nested_virt_in_use(vcpu)) {
+ kvm_err("Unexpected call to %s for the non-nesting configuration\n",
+ __func__);
+ return -EINVAL;
+ }
+
+ vcpu_el2_sreg(vcpu, SPSR_EL2) = *vcpu_cpsr(vcpu);
+ vcpu_el2_sreg(vcpu, ELR_EL2) = *vcpu_pc(vcpu);
+ vcpu_sys_reg(vcpu, ESR_EL2) = esr_el2;
+
+ *vcpu_pc(vcpu) = get_el2_except_vector(vcpu, type);
+ /* On an exception, PSTATE.SP becomes 1 */
+ *vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
+ *vcpu_cpsr(vcpu) |= (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
+
+ trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
+
+ return ret;
+}
+
+int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
+{
+ return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
+}
diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
index 7fb0008..7c86cfb 100644
--- a/arch/arm64/kvm/trace.h
+++ b/arch/arm64/kvm/trace.h
@@ -167,6 +167,26 @@
);
+TRACE_EVENT(kvm_inject_nested_exception,
+ TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
+ unsigned long pc),
+ TP_ARGS(vcpu, esr_el2, pc),
+
+ TP_STRUCT__entry(
+ __field(struct kvm_vcpu *, vcpu)
+ __field(unsigned long, esr_el2)
+ __field(unsigned long, pc)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->esr_el2 = esr_el2;
+ __entry->pc = pc;
+ ),
+
+ TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
+ __entry->vcpu, __entry->esr_el2, __entry->pc)
+);
#endif /* _TRACE_ARM64_KVM_H */
#undef TRACE_INCLUDE_PATH
--
1.9.1
From: Christoffer Dall <[email protected]>
When entering virtual EL2, we need to reflect virtual EL2 register
states to corresponding shadow EL1 registers. We can simply copy them if
their formats are identical. Otherwise, we need to convert EL2 register
state to EL1 register state.
When entering EL1/EL0, we need a special care for MPIDR_EL1. Read of
this register returns the value of VMPIDR_EL2, so when a VM has the
virtual EL2, the value of MPIDR_EL1 should come from the virtual
VMPIDR_EL2.
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/context.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index 2645787..e965049 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -17,6 +17,74 @@
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
+#include <asm/esr.h>
+
+struct el1_el2_map {
+ enum vcpu_sysreg el1;
+ enum vcpu_sysreg el2;
+};
+
+/*
+ * List of EL2 registers which can be directly applied to EL1 registers to
+ * emulate running EL2 in EL1.
+ */
+static const struct el1_el2_map el1_el2_map[] = {
+ { AMAIR_EL1, AMAIR_EL2 },
+ { MAIR_EL1, MAIR_EL2 },
+ { TTBR0_EL1, TTBR0_EL2 },
+ { ACTLR_EL1, ACTLR_EL2 },
+ { AFSR0_EL1, AFSR0_EL2 },
+ { AFSR1_EL1, AFSR1_EL2 },
+ { SCTLR_EL1, SCTLR_EL2 },
+ { VBAR_EL1, VBAR_EL2 },
+};
+
+static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
+{
+ return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
+ << TCR_IPS_SHIFT;
+}
+
+static inline u64 cptr_to_cpacr(u64 cptr_el2)
+{
+ u64 cpacr_el1 = 0;
+
+ if (!(cptr_el2 & CPTR_EL2_TFP))
+ cpacr_el1 |= CPACR_EL1_FPEN;
+ if (cptr_el2 & CPTR_EL2_TTA)
+ cpacr_el1 |= CPACR_EL1_TTA;
+
+ return cpacr_el1;
+}
+
+static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
+{
+ u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
+ u64 tcr_el2;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(el1_el2_map); i++) {
+ const struct el1_el2_map *map = &el1_el2_map[i];
+
+ s_sys_regs[map->el1] = vcpu_sys_reg(vcpu, map->el2);
+ }
+
+ tcr_el2 = vcpu_sys_reg(vcpu, TCR_EL2);
+ s_sys_regs[TCR_EL1] =
+ TCR_EPD1 | /* disable TTBR1_EL1 */
+ ((tcr_el2 & TCR_EL2_TBI) ? TCR_TBI0 : 0) |
+ tcr_el2_ips_to_tcr_el1_ps(tcr_el2) |
+ (tcr_el2 & TCR_EL2_TG0_MASK) |
+ (tcr_el2 & TCR_EL2_ORGN0_MASK) |
+ (tcr_el2 & TCR_EL2_IRGN0_MASK) |
+ (tcr_el2 & TCR_EL2_T0SZ_MASK);
+
+ /* Rely on separate VMID for VA context, always use ASID 0 */
+ s_sys_regs[TTBR0_EL1] &= ~GENMASK_ULL(63, 48);
+ s_sys_regs[TTBR1_EL1] = 0;
+
+ s_sys_regs[CPACR_EL1] = cptr_to_cpacr(vcpu_sys_reg(vcpu, CPTR_EL2));
+}
static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
{
@@ -72,6 +140,17 @@ static void sync_special_regs(struct kvm_vcpu *vcpu)
ctxt->gp_regs.spsr[KVM_SPSR_EL1] = ctxt->hw_spsr_el1;
}
+static void setup_mpidr_el1(struct kvm_vcpu *vcpu)
+{
+ /*
+ * A non-secure EL0 or EL1 read of MPIDR_EL1 returns
+ * the value of VMPIDR_EL2. For nested virtualization,
+ * it comes from the virtual VMPIDR_EL2.
+ */
+ if (nested_virt_in_use(vcpu))
+ vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu_sys_reg(vcpu, VMPIDR_EL2);
+}
+
/**
* kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
* @vcpu: The VCPU pointer
@@ -82,9 +161,11 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
if (unlikely(vcpu_mode_el2(vcpu))) {
flush_shadow_special_regs(vcpu);
+ flush_shadow_el1_sysregs(vcpu);
ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
} else {
flush_special_regs(vcpu);
+ setup_mpidr_el1(vcpu);
ctxt->hw_sys_regs = ctxt->sys_regs;
}
}
--
1.9.1
With the nested virtualization support, a hypervisor running inside a VM
(i.e. a guest hypervisor) is now deprivilaged and runs in EL1 instead of
EL2. So, the host hypervisor manages the shadow context for the virtual
EL2 execution.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 57dccde..46880c3 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -229,6 +229,19 @@ struct kvm_cpu_context {
};
u64 el2_special_regs[NR_EL2_SPECIAL_REGS];
+
+ u64 shadow_sys_regs[NR_SYS_REGS]; /* only used for virtual EL2 */
+
+ /*
+ * hw_* will be written to the hardware when entering to a VM.
+ * They have either the virtual EL2 or EL1/EL0 context depending
+ * on the vcpu mode.
+ */
+ u64 *hw_sys_regs;
+ u64 hw_sp_el1;
+ u64 hw_pstate;
+ u64 hw_elr_el1;
+ u64 hw_spsr_el1;
};
typedef struct kvm_cpu_context kvm_cpu_context_t;
--
1.9.1
From: Christoffer Dall <[email protected]>
Some bits of the TCR weren't defined and since we're about to use these
in KVM, add these defines.
Signed-off-by: Christoffer Dall <[email protected]>
---
arch/arm64/include/asm/pgtable-hwdef.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd..d26cab7 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -272,9 +272,15 @@
#define TCR_TG1_4K (UL(2) << TCR_TG1_SHIFT)
#define TCR_TG1_64K (UL(3) << TCR_TG1_SHIFT)
+#define TCR_IPS_SHIFT 32
+#define TCR_IPS_MASK (UL(7) << TCR_IPS_SHIFT)
+
#define TCR_ASID16 (UL(1) << 36)
#define TCR_TBI0 (UL(1) << 37)
#define TCR_HA (UL(1) << 39)
#define TCR_HD (UL(1) << 40)
+#define TCR_EPD1 (UL(1) << 23)
+#define TCR_EPD0 (UL(1) << 7)
+
#endif
--
1.9.1
From: Christoffer Dall <[email protected]>
Add functions setting up and restoring the guest's context on each entry
and exit. These functions will come in handy when we want to use
different context for normal EL0/EL1 and virtual EL2 execution.
No functional change yet.
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm/include/asm/kvm_emulate.h | 4 ++
arch/arm64/include/asm/kvm_emulate.h | 4 ++
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/context.c | 54 ++++++++++++++++
arch/arm64/kvm/hyp/sysreg-sr.c | 117 +++++++++++++++++++----------------
virt/kvm/arm/arm.c | 14 +++++
6 files changed, 140 insertions(+), 55 deletions(-)
create mode 100644 arch/arm64/kvm/context.c
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 399cd75e..0a03b7d 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -47,6 +47,10 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
+static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
+static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
+
static inline bool kvm_condition_valid(const struct kvm_vcpu *vcpu)
{
return kvm_condition_valid32(vcpu);
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5d6f3d0..14c4ce9 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -42,6 +42,10 @@
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
+void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
+void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
+
static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index f513047..5762337 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -15,7 +15,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/e
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
-kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
+kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o context.o
kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
new file mode 100644
index 0000000..bc43e66
--- /dev/null
+++ b/arch/arm64/kvm/context.c
@@ -0,0 +1,54 @@
+/*
+ * Copyright (C) 2016 - Linaro Ltd.
+ * Author: Christoffer Dall <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_emulate.h>
+
+/**
+ * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ ctxt->hw_pstate = *vcpu_cpsr(vcpu);
+ ctxt->hw_sys_regs = ctxt->sys_regs;
+ ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
+ ctxt->hw_elr_el1 = ctxt->gp_regs.elr_el1;
+ ctxt->hw_spsr_el1 = ctxt->gp_regs.spsr[KVM_SPSR_EL1];
+}
+
+/**
+ * kvm_arm_restore_shadow_state -- write back shadow state from guest
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ *vcpu_cpsr(vcpu) = ctxt->hw_pstate;
+ ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
+ ctxt->gp_regs.elr_el1 = ctxt->hw_elr_el1;
+ ctxt->gp_regs.spsr[KVM_SPSR_EL1] = ctxt->hw_spsr_el1;
+}
+
+void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
+{
+ /* This is to set hw_sys_regs of host_cpu_context */
+ cpu_ctxt->hw_sys_regs = cpu_ctxt->sys_regs;
+}
diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
index 9341376..b7a67b1 100644
--- a/arch/arm64/kvm/hyp/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/sysreg-sr.c
@@ -19,6 +19,7 @@
#include <linux/kvm_host.h>
#include <asm/kvm_asm.h>
+#include <asm/kvm_emulate.h>
#include <asm/kvm_hyp.h>
/* Yes, this does nothing, on purpose */
@@ -33,39 +34,43 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
{
- ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
- ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
- ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
- ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
- ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
+ u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+ sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
+ sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
+ sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
+ sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
+ sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
ctxt->gp_regs.regs.pc = read_sysreg_el2(elr);
- ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr);
+ ctxt->hw_pstate = read_sysreg_el2(spsr);
}
static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
{
- ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
- ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
- ctxt->sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
- ctxt->sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
- ctxt->sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
- ctxt->sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
- ctxt->sys_regs[TCR_EL1] = read_sysreg_el1(tcr);
- ctxt->sys_regs[ESR_EL1] = read_sysreg_el1(esr);
- ctxt->sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0);
- ctxt->sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1);
- ctxt->sys_regs[FAR_EL1] = read_sysreg_el1(far);
- ctxt->sys_regs[MAIR_EL1] = read_sysreg_el1(mair);
- ctxt->sys_regs[VBAR_EL1] = read_sysreg_el1(vbar);
- ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr);
- ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair);
- ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
- ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
-
- ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
- ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr);
- ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
+ u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+ sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
+ sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
+ sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
+ sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
+ sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
+ sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
+ sys_regs[TCR_EL1] = read_sysreg_el1(tcr);
+ sys_regs[ESR_EL1] = read_sysreg_el1(esr);
+ sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0);
+ sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1);
+ sys_regs[FAR_EL1] = read_sysreg_el1(far);
+ sys_regs[MAIR_EL1] = read_sysreg_el1(mair);
+ sys_regs[VBAR_EL1] = read_sysreg_el1(vbar);
+ sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr);
+ sys_regs[AMAIR_EL1] = read_sysreg_el1(amair);
+ sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
+ sys_regs[PAR_EL1] = read_sysreg(par_el1);
+
+ ctxt->hw_sp_el1 = read_sysreg(sp_el1);
+ ctxt->hw_elr_el1 = read_sysreg_el1(elr);
+ ctxt->hw_spsr_el1 = read_sysreg_el1(spsr);
}
static hyp_alternate_select(__sysreg_call_save_host_state,
@@ -86,39 +91,43 @@ void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
{
- write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
- write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
- write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
- write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
- write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
+ u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+ write_sysreg(sys_regs[ACTLR_EL1], actlr_el1);
+ write_sysreg(sys_regs[TPIDR_EL0], tpidr_el0);
+ write_sysreg(sys_regs[TPIDRRO_EL0], tpidrro_el0);
+ write_sysreg(sys_regs[TPIDR_EL1], tpidr_el1);
+ write_sysreg(sys_regs[MDSCR_EL1], mdscr_el1);
write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
write_sysreg_el2(ctxt->gp_regs.regs.pc, elr);
- write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
+ write_sysreg_el2(ctxt->hw_pstate, spsr);
}
static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
{
- write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
- write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
- write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1], sctlr);
- write_sysreg_el1(ctxt->sys_regs[CPACR_EL1], cpacr);
- write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1], ttbr0);
- write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1], ttbr1);
- write_sysreg_el1(ctxt->sys_regs[TCR_EL1], tcr);
- write_sysreg_el1(ctxt->sys_regs[ESR_EL1], esr);
- write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1], afsr0);
- write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1], afsr1);
- write_sysreg_el1(ctxt->sys_regs[FAR_EL1], far);
- write_sysreg_el1(ctxt->sys_regs[MAIR_EL1], mair);
- write_sysreg_el1(ctxt->sys_regs[VBAR_EL1], vbar);
- write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
- write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair);
- write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl);
- write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
-
- write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
- write_sysreg_el1(ctxt->gp_regs.elr_el1, elr);
- write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
+ u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
+
+ write_sysreg(sys_regs[MPIDR_EL1], vmpidr_el2);
+ write_sysreg(sys_regs[CSSELR_EL1], csselr_el1);
+ write_sysreg_el1(sys_regs[SCTLR_EL1], sctlr);
+ write_sysreg_el1(sys_regs[CPACR_EL1], cpacr);
+ write_sysreg_el1(sys_regs[TTBR0_EL1], ttbr0);
+ write_sysreg_el1(sys_regs[TTBR1_EL1], ttbr1);
+ write_sysreg_el1(sys_regs[TCR_EL1], tcr);
+ write_sysreg_el1(sys_regs[ESR_EL1], esr);
+ write_sysreg_el1(sys_regs[AFSR0_EL1], afsr0);
+ write_sysreg_el1(sys_regs[AFSR1_EL1], afsr1);
+ write_sysreg_el1(sys_regs[FAR_EL1], far);
+ write_sysreg_el1(sys_regs[MAIR_EL1], mair);
+ write_sysreg_el1(sys_regs[VBAR_EL1], vbar);
+ write_sysreg_el1(sys_regs[CONTEXTIDR_EL1], contextidr);
+ write_sysreg_el1(sys_regs[AMAIR_EL1], amair);
+ write_sysreg_el1(sys_regs[CNTKCTL_EL1], cntkctl);
+ write_sysreg(sys_regs[PAR_EL1], par_el1);
+
+ write_sysreg(ctxt->hw_sp_el1, sp_el1);
+ write_sysreg_el1(ctxt->hw_elr_el1, elr);
+ write_sysreg_el1(ctxt->hw_spsr_el1, spsr);
}
static hyp_alternate_select(__sysreg_call_restore_host_state,
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 36aae3a..0ff2997 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -689,6 +689,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
}
kvm_arm_setup_debug(vcpu);
+ kvm_arm_setup_shadow_state(vcpu);
/**************************************************************
* Enter the guest
@@ -704,6 +705,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
* Back from guest
*************************************************************/
+ kvm_arm_restore_shadow_state(vcpu);
kvm_arm_clear_debug(vcpu);
/*
@@ -1334,6 +1336,16 @@ static void teardown_hyp_mode(void)
static int init_vhe_mode(void)
{
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ kvm_cpu_context_t *cpu_ctxt;
+
+ cpu_ctxt = per_cpu_ptr(kvm_host_cpu_state, cpu);
+
+ kvm_arm_init_cpu_context(cpu_ctxt);
+ }
+
kvm_info("VHE mode initialized successfully\n");
return 0;
}
@@ -1416,6 +1428,8 @@ static int init_hyp_mode(void)
kvm_err("Cannot map host CPU state: %d\n", err);
goto out_err;
}
+
+ kvm_arm_init_cpu_context(cpu_ctxt);
}
kvm_info("Hyp mode initialized successfully\n");
--
1.9.1
If the guest exception level is EL2, then set up the shadow context of
the virtual EL2 to hardware. Otherwise, set the regular EL0/EL1 context.
Note that the shadow context content will be prepared in subsequent
patches.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/kvm/context.c | 74 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 64 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
index bc43e66..2645787 100644
--- a/arch/arm64/kvm/context.c
+++ b/arch/arm64/kvm/context.c
@@ -18,11 +18,29 @@
#include <linux/kvm_host.h>
#include <asm/kvm_emulate.h>
-/**
- * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
- * @vcpu: The VCPU pointer
- */
-void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
+static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
+ /*
+ * We can emulate the guest's configuration of which
+ * stack pointer to use when executing in virtual EL2 by
+ * using the equivalent feature in EL1 to point to
+ * either the EL1 or EL0 stack pointer.
+ */
+ if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
+ ctxt->hw_pstate |= PSR_MODE_EL1h;
+ else
+ ctxt->hw_pstate |= PSR_MODE_EL1t;
+
+ ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
+ ctxt->hw_sp_el1 = vcpu_el2_sreg(vcpu, SP_EL2);
+ ctxt->hw_elr_el1 = vcpu_el2_sreg(vcpu, ELR_EL2);
+ ctxt->hw_spsr_el1 = vcpu_el2_sreg(vcpu, SPSR_EL2);
+}
+
+static void flush_special_regs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -33,11 +51,18 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
ctxt->hw_spsr_el1 = ctxt->gp_regs.spsr[KVM_SPSR_EL1];
}
-/**
- * kvm_arm_restore_shadow_state -- write back shadow state from guest
- * @vcpu: The VCPU pointer
- */
-void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
+static void sync_shadow_special_regs(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ *vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
+ *vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
+ vcpu_el2_sreg(vcpu, SP_EL2) = ctxt->hw_sp_el1;
+ vcpu_el2_sreg(vcpu, ELR_EL2) = ctxt->hw_elr_el1;
+ vcpu_el2_sreg(vcpu, SPSR_EL2) = ctxt->hw_spsr_el1;
+}
+
+static void sync_special_regs(struct kvm_vcpu *vcpu)
{
struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
@@ -47,6 +72,35 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
ctxt->gp_regs.spsr[KVM_SPSR_EL1] = ctxt->hw_spsr_el1;
}
+/**
+ * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
+
+ if (unlikely(vcpu_mode_el2(vcpu))) {
+ flush_shadow_special_regs(vcpu);
+ ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
+ } else {
+ flush_special_regs(vcpu);
+ ctxt->hw_sys_regs = ctxt->sys_regs;
+ }
+}
+
+/**
+ * kvm_arm_restore_shadow_state -- write back shadow state from guest
+ * @vcpu: The VCPU pointer
+ */
+void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
+{
+ if (unlikely(vcpu_mode_el2(vcpu)))
+ sync_shadow_special_regs(vcpu);
+ else
+ sync_special_regs(vcpu);
+}
+
void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
{
/* This is to set hw_sys_regs of host_cpu_context */
--
1.9.1
ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When
this bit is set, accessing EL2 registers in EL1 traps to EL2. In
addition, executing the following instructions in EL1 will trap to EL2:
tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the
instructions that trap to EL2 with the NV bit were undef at EL1 prior to
ARM v8.3. The only instruction that was not undef is eret.
This patch sets up a handler for EL2 registers and SP_EL1 register
accesses at EL1. The host hypervisor keeps those register values in
memory, and will emulate their behavior.
This patch doesn't set the NV bit yet. It will be set in a later patch
once nested virtualization support is completed.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 30 +++++++++++++++++++-
arch/arm64/include/asm/sysreg.h | 37 +++++++++++++++++++++++++
arch/arm64/kvm/sys_regs.c | 58 +++++++++++++++++++++++++++++++++++++++
3 files changed, 124 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 86d4b6c..1dc4ed6 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -134,12 +134,40 @@ enum vcpu_sysreg {
PMSWINC_EL0, /* Software Increment Register */
PMUSERENR_EL0, /* User Enable Register */
- /* 32bit specific registers. Keep them at the end of the range */
+ /* 32bit specific registers. */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
FPEXC32_EL2, /* Floating-Point Exception Control Register */
DBGVCR32_EL2, /* Debug Vector Catch Register */
+ /* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */
+ VPIDR_EL2, /* Virtualization Processor ID Register */
+ VMPIDR_EL2, /* Virtualization Multiprocessor ID Register */
+ SCTLR_EL2, /* System Control Register (EL2) */
+ ACTLR_EL2, /* Auxiliary Control Register (EL2) */
+ HCR_EL2, /* Hypervisor Configuration Register */
+ MDCR_EL2, /* Monitor Debug Configuration Register (EL2) */
+ CPTR_EL2, /* Architectural Feature Trap Register (EL2) */
+ HSTR_EL2, /* Hypervisor System Trap Register */
+ HACR_EL2, /* Hypervisor Auxiliary Control Register */
+ TTBR0_EL2, /* Translation Table Base Register 0 (EL2) */
+ TCR_EL2, /* Translation Control Register (EL2) */
+ VTTBR_EL2, /* Virtualization Translation Table Base Register */
+ VTCR_EL2, /* Virtualization Translation Control Register */
+ AFSR0_EL2, /* Auxiliary Fault Status Register 0 (EL2) */
+ AFSR1_EL2, /* Auxiliary Fault Status Register 1 (EL2) */
+ ESR_EL2, /* Exception Syndrome Register (EL2) */
+ FAR_EL2, /* Hypervisor IPA Fault Address Register */
+ HPFAR_EL2, /* Hypervisor IPA Fault Address Register */
+ MAIR_EL2, /* Memory Attribute Indirection Register (EL2) */
+ AMAIR_EL2, /* Auxiliary Memory Attribute Indirection Register (EL2) */
+ VBAR_EL2, /* Vector Base Address Register (EL2) */
+ RVBAR_EL2, /* Reset Vector Base Address Register */
+ RMR_EL2, /* Reset Management Register */
+ TPIDR_EL2, /* EL2 Software Thread ID Register */
+ CNTVOFF_EL2, /* Counter-timer Virtual Offset register */
+ CNTHCTL_EL2, /* Counter-timer Hypervisor Control register */
+
NR_SYS_REGS /* Nothing after this line! */
};
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 737ca30..9277c4a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -250,10 +250,42 @@
#define SYS_PMCCFILTR_EL0 sys_reg (3, 3, 14, 15, 7)
+#define SYS_VPIDR_EL2 sys_reg(3, 4, 0, 0, 0)
+#define SYS_VMPIDR_EL2 sys_reg(3, 4, 0, 0, 5)
+
+#define SYS_SCTLR_EL2 sys_reg(3, 4, 1, 0, 0)
+#define SYS_ACTLR_EL2 sys_reg(3, 4, 1, 0, 1)
+#define SYS_HCR_EL2 sys_reg(3, 4, 1, 1, 0)
+#define SYS_MDCR_EL2 sys_reg(3, 4, 1, 1, 1)
+#define SYS_CPTR_EL2 sys_reg(3, 4, 1, 1, 2)
+#define SYS_HSTR_EL2 sys_reg(3, 4, 1, 1, 3)
+#define SYS_HACR_EL2 sys_reg(3, 4, 1, 1, 7)
+
+#define SYS_TTBR0_EL2 sys_reg(3, 4, 2, 0, 0)
+#define SYS_TCR_EL2 sys_reg(3, 4, 2, 0, 2)
+#define SYS_VTTBR_EL2 sys_reg(3, 4, 2, 1, 0)
+#define SYS_VTCR_EL2 sys_reg(3, 4, 2, 1, 2)
+
#define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
+
+#define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0)
+
#define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
+#define SYS_AFSR0_EL2 sys_reg(3, 4, 5, 1, 0)
+#define SYS_AFSR1_EL2 sys_reg(3, 4, 5, 1, 1)
+#define SYS_ESR_EL2 sys_reg(3, 4, 5, 2, 0)
#define SYS_FPEXC32_EL2 sys_reg(3, 4, 5, 3, 0)
+#define SYS_FAR_EL2 sys_reg(3, 4, 6, 0, 0)
+#define SYS_HPFAR_EL2 sys_reg(3, 4, 6, 0, 4)
+
+#define SYS_MAIR_EL2 sys_reg(3, 4, 10, 2, 0)
+#define SYS_AMAIR_EL2 sys_reg(3, 4, 10, 3, 0)
+
+#define SYS_VBAR_EL2 sys_reg(3, 4, 12, 0, 0)
+#define SYS_RVBAR_EL2 sys_reg(3, 4, 12, 0, 1)
+#define SYS_RMR_EL2 sys_reg(3, 4, 12, 0, 2)
+
#define __SYS__AP0Rx_EL2(x) sys_reg(3, 4, 12, 8, x)
#define SYS_ICH_AP0R0_EL2 __SYS__AP0Rx_EL2(0)
#define SYS_ICH_AP0R1_EL2 __SYS__AP0Rx_EL2(1)
@@ -295,6 +327,11 @@
#define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
#define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
+#define SYS_TPIDR_EL2 sys_reg(3, 4, 13, 0, 2)
+
+#define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
+#define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
+
/* Common SCTLR_ELx flags. */
#define SCTLR_ELx_EE (1 << 25)
#define SCTLR_ELx_I (1 << 12)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7786288..1568f8b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -892,6 +892,27 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
return true;
}
+static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
+{
+ if (!p->is_write)
+ p->regval = *sysreg;
+ else
+ *sysreg = p->regval;
+}
+
+static bool trap_el2_regs(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+ /* SP_EL1 is NOT maintained in sys_regs array */
+ if (sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2) == SYS_SP_EL1)
+ access_rw(p, &vcpu->arch.ctxt.gp_regs.sp_el1);
+ else
+ access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
+
+ return true;
+}
+
/*
* Architected system registers.
* Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -1077,9 +1098,46 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
*/
{ SYS_DESC(SYS_PMCCFILTR_EL0), access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 },
+ { SYS_DESC(SYS_VPIDR_EL2), trap_el2_regs, reset_val, VPIDR_EL2, 0 },
+ { SYS_DESC(SYS_VMPIDR_EL2), trap_el2_regs, reset_val, VMPIDR_EL2, 0 },
+
+ { SYS_DESC(SYS_SCTLR_EL2), trap_el2_regs, reset_val, SCTLR_EL2, 0 },
+ { SYS_DESC(SYS_ACTLR_EL2), trap_el2_regs, reset_val, ACTLR_EL2, 0 },
+ { SYS_DESC(SYS_HCR_EL2), trap_el2_regs, reset_val, HCR_EL2, 0 },
+ { SYS_DESC(SYS_MDCR_EL2), trap_el2_regs, reset_val, MDCR_EL2, 0 },
+ { SYS_DESC(SYS_CPTR_EL2), trap_el2_regs, reset_val, CPTR_EL2, 0 },
+ { SYS_DESC(SYS_HSTR_EL2), trap_el2_regs, reset_val, HSTR_EL2, 0 },
+ { SYS_DESC(SYS_HACR_EL2), trap_el2_regs, reset_val, HACR_EL2, 0 },
+
+ { SYS_DESC(SYS_TTBR0_EL2), trap_el2_regs, reset_val, TTBR0_EL2, 0 },
+ { SYS_DESC(SYS_TCR_EL2), trap_el2_regs, reset_val, TCR_EL2, 0 },
+ { SYS_DESC(SYS_VTTBR_EL2), trap_el2_regs, reset_val, VTTBR_EL2, 0 },
+ { SYS_DESC(SYS_VTCR_EL2), trap_el2_regs, reset_val, VTCR_EL2, 0 },
+
{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
+
+ { SYS_DESC(SYS_SP_EL1), trap_el2_regs },
+
{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
+ { SYS_DESC(SYS_AFSR0_EL2), trap_el2_regs, reset_val, AFSR0_EL2, 0 },
+ { SYS_DESC(SYS_AFSR1_EL2), trap_el2_regs, reset_val, AFSR1_EL2, 0 },
+ { SYS_DESC(SYS_ESR_EL2), trap_el2_regs, reset_val, ESR_EL2, 0 },
{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x70 },
+
+ { SYS_DESC(SYS_FAR_EL2), trap_el2_regs, reset_val, FAR_EL2, 0 },
+ { SYS_DESC(SYS_HPFAR_EL2), trap_el2_regs, reset_val, HPFAR_EL2, 0 },
+
+ { SYS_DESC(SYS_MAIR_EL2), trap_el2_regs, reset_val, MAIR_EL2, 0 },
+ { SYS_DESC(SYS_AMAIR_EL2), trap_el2_regs, reset_val, AMAIR_EL2, 0 },
+
+ { SYS_DESC(SYS_VBAR_EL2), trap_el2_regs, reset_val, VBAR_EL2, 0 },
+ { SYS_DESC(SYS_RVBAR_EL2), trap_el2_regs, reset_val, RVBAR_EL2, 0 },
+ { SYS_DESC(SYS_RMR_EL2), trap_el2_regs, reset_val, RMR_EL2, 0 },
+
+ { SYS_DESC(SYS_TPIDR_EL2), trap_el2_regs, reset_val, TPIDR_EL2, 0 },
+
+ { SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
+ { SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },
};
static bool trap_dbgidr(struct kvm_vcpu *vcpu,
--
1.9.1
From: Christoffer Dall <[email protected]>
When running a nested hypervisor we occasionally have to figure out if
the mode we are switching into is the virtual EL2 mode or a regular
EL0/1 mode.
Signed-off-by: Christoffer Dall <[email protected]>
---
arch/arm/include/asm/kvm_emulate.h | 6 ++++++
arch/arm64/include/asm/kvm_emulate.h | 12 ++++++++++++
2 files changed, 18 insertions(+)
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 9a8a45a..399cd75e 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -77,6 +77,12 @@ static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu)
return 1;
}
+/* We don't support nesting on arm */
+static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+
static inline unsigned long *vcpu_pc(struct kvm_vcpu *vcpu)
{
return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_pc;
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index fe39e68..5d6f3d0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -143,6 +143,18 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
return mode != PSR_MODE_EL0t;
}
+static inline bool vcpu_mode_el2(const struct kvm_vcpu *vcpu)
+{
+ u32 mode;
+
+ if (vcpu_mode_is_32bit(vcpu))
+ return false;
+
+ mode = *vcpu_cpsr(vcpu) & PSR_MODE_MASK;
+
+ return mode == PSR_MODE_EL2h || mode == PSR_MODE_EL2t;
+}
+
static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
{
return vcpu->arch.fault.esr_el2;
--
1.9.1
Nested virtualizaion is in use only if all three conditions are met:
- The architecture supports nested virtualization.
- The kernel parameter is set.
- The userspace uses nested virtualiztion feature.
Signed-off-by: Jintack Lim <[email protected]>
---
arch/arm/include/asm/kvm_host.h | 11 +++++++++++
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/nested.c | 17 +++++++++++++++++
virt/kvm/arm/arm.c | 4 ++++
4 files changed, 34 insertions(+)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 00b0f97..7e9e6c8 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -303,4 +303,15 @@ static inline int __init kvmarm_nested_cfg(char *buf)
{
return 0;
}
+
+static inline int init_nested_virt(void)
+{
+ return 0;
+}
+
+static inline bool nested_virt_in_use(struct kvm_vcpu *vcpu)
+{
+ return false;
+}
+
#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6df0c7c..86d4b6c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -387,5 +387,7 @@ static inline void __cpu_init_stage2(void)
}
int __init kvmarm_nested_cfg(char *buf);
+int init_nested_virt(void);
+bool nested_virt_in_use(struct kvm_vcpu *vcpu);
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 79f38da..9a05c76 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -24,3 +24,20 @@ int __init kvmarm_nested_cfg(char *buf)
{
return strtobool(buf, &nested_param);
}
+
+int init_nested_virt(void)
+{
+ if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
+ kvm_info("Nested virtualization is supported\n");
+
+ return 0;
+}
+
+bool nested_virt_in_use(struct kvm_vcpu *vcpu)
+{
+ if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT)
+ && test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
+ return true;
+
+ return false;
+}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 1c1c772..36aae3a 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1478,6 +1478,10 @@ int kvm_arch_init(void *opaque)
if (err)
goto out_err;
+ err = init_nested_virt();
+ if (err)
+ return err;
+
err = init_subsystems();
if (err)
goto out_hyp;
--
1.9.1
Add a new kernel parameter(kvm-arm.nested) to enable KVM/ARM nested
virtualization support. This kernel parameter on arm architecture is
ignored since nested virtualization is not supported on arm.
Note that this kernel parameter will not have any impact until nested
virtualization support is completed. Just add this parameter first to
use it when implementing nested virtualization support.
Signed-off-by: Jintack Lim <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 4 ++++
arch/arm/include/asm/kvm_host.h | 4 ++++
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/Makefile | 2 ++
arch/arm64/kvm/nested.c | 26 +++++++++++++++++++++++++
virt/kvm/arm/arm.c | 2 ++
6 files changed, 40 insertions(+)
create mode 100644 arch/arm64/kvm/nested.c
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index aa8341e..8fb152d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1841,6 +1841,10 @@
[KVM,ARM] Trap guest accesses to GICv3 common
system registers
+ kvm-arm.nested=
+ [KVM,ARM] Allow nested virtualization in KVM/ARM.
+ Default is 0 (disabled)
+
kvm-intel.ept= [KVM,Intel] Disable extended page tables
(virtualized MMU) support on capable Intel chips.
Default is 1 (enabled)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 127e2dd..00b0f97 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -299,4 +299,8 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
+static inline int __init kvmarm_nested_cfg(char *buf)
+{
+ return 0;
+}
#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0c4fd1f..dcc4df8 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -386,4 +386,6 @@ static inline void __cpu_init_stage2(void)
"PARange is %d bits, unsupported configuration!", parange);
}
+int __init kvmarm_nested_cfg(char *buf);
+
#endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5d98100..f513047 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -35,3 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
+
+kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
new file mode 100644
index 0000000..79f38da
--- /dev/null
+++ b/arch/arm64/kvm/nested.c
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2017 - Columbia University and Linaro Ltd.
+ * Author: Jintack Lim <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+
+static bool nested_param;
+
+int __init kvmarm_nested_cfg(char *buf)
+{
+ return strtobool(buf, &nested_param);
+}
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index a39a1e1..1c1c772 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -67,6 +67,8 @@
static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
+early_param("kvm-arm.nested", kvmarm_nested_cfg);
+
static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
{
BUG_ON(preemptible());
--
1.9.1
On Tue, Jul 18, 2017 at 12:58 PM, Jintack Lim <[email protected]> wrote:
> Nested virtualization is the ability to run a virtual machine inside another
> virtual machine. In other words, it’s about running a hypervisor (the guest
> hypervisor) on top of another hypervisor (the host hypervisor).
>
> Supporting nested virtualization on ARM means that the hypervisor provides not
> only EL0/EL1 execution environment to VMs as it usually does but also the
> virtualization extensions including EL2 execution environment. Once the host
> hypervisor provides those execution environments to the VMs, then the guest
> hypervisor can run its own VMs (nested VMs) naturally.
>
> This series supports nested virtualization on arm64. ARM recently announced an
> extension (ARMv8.3) which has support for nested virtualization[1]. This patch
> set is based on the ARMv8.3 specification and tested on the FastModel with
> ARMv8.3 extension.
>
> The whole patch set to support nested virtualization is huge over 70
> patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
> virtualization. This patch series is the first part.
>
> CPU virtualization patch series provides basic nested virtualization framework
> and instruction emulations including v8.1 VHE feature and v8.3 nested
> virtualization feature for VMs.
>
> This patch series again can be divided into four parts. Patch 1 to 5 introduces
> nested virtualization by discovering hardware feature, adding a kernel
> parameter and allowing the userspace to set the initial CPU mode to EL2.
>
> Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
> a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
> virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
> host hypervisor manages virtual EL2 register state for the guest hypervisor
> and shadow EL1 register state that reflects the virtual EL2 register state to
> run the guest hypervisor in EL1.
>
> Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
> Extensions. These patches emulate newly defined registers and bits in v8.1 and
> allow the virtual EL2 to access EL2 register states via EL1 register accesses
> as in the real EL2.
>
> Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
> These enable recursive nested virtualization.
>
> This patch set is tested on the FastModel with the v8.3 extension for arm64 and
> a cubietruck for arm32. On the FastModel, the host and the guest kernels are
> compiled with and without VHE, so there are four combinations. I was able to
> boot SMP Linux in the nested VM on all four configurations and able to run
> hackbench. I also checked that regular VMs could boot when the nested
> virtualization kernel parameter was not set. On the cubietruck, I also verified
> that regular VMs could boot as well.
>
> I'll share my experiment setup shortly.
I summarized my experiment setup here.
https://github.com/columbia/nesting-pub/wiki/Nested-virtualization-on-ARM-setup
>
> Even though this work has some limitations and TODOs, I'd appreciate early
> feedback on this RFC. Specifically, I'm interested in:
>
> - Overall design to manage vcpu context for the virtual EL2
> - Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
> (Patch 30 and 32)
> - Patch organization and coding style
I also wonder if the hardware and/or KVM do not support nested
virtualization but the userspace uses nested virtualization option,
which one is better: giving an error or launching a regular VM
silently.
>
> This patch series is based on kvm/next d38338e.
> The whole patch series including memory, VGIC, and timer patches is available
> here:
>
> [email protected]:columbia/nesting-pub.git rfc-v2
>
> Limitations:
> - There are some cases that the target exception level of a VM is ambiguous when
> emulating eret instruction. I'm discussing this issue with Christoffer and
> Marc. Meanwhile, I added a temporary patch (not included in this
> series. f1beaba in the repo) and used 4.10.0 kernel when testing the guest
> hypervisor with VHE.
> - Recursive nested virtualization is not tested yet.
> - Other hypervisors (such as Xen) on KVM are not tested.
>
> TODO:
> - Submit memory, VGIC, and timer patches
> - Evaluate regular VM performance to see if there's a negative impact.
> - Test other hypervisors such as Xen on KVM
> - Test recursive nested virtualization
>
> v1-->v2:
> - Added support for the virtual EL2 with VHE
> - Rewrote commit messages and comments from the perspective of supporting
> execution environments to VMs, rather than from the perspective of the guest
> hypervisor running in them.
> - Fixed a few bugs to make it run on the FastModel.
> - Tested on ARMv8.3 with four configurations. (host/guest. with/without VHE.)
> - Rebased to kvm/next
>
> [1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
>
> Christoffer Dall (7):
> KVM: arm64: Add KVM nesting feature
> KVM: arm64: Allow userspace to set PSR_MODE_EL2x
> KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
> KVM: arm/arm64: Add a framework to prepare virtual EL2 execution
> arm64: Add missing TCR hw defines
> KVM: arm64: Create shadow EL1 registers
> KVM: arm64: Trap EL1 VM register accesses in virtual EL2
>
> Jintack Lim (31):
> arm64: Add ARM64_HAS_NESTED_VIRT feature
> KVM: arm/arm64: Enable nested virtualization via command-line
> KVM: arm/arm64: Check if nested virtualization is in use
> KVM: arm64: Add EL2 system registers to vcpu context
> KVM: arm64: Add EL2 special registers to vcpu context
> KVM: arm64: Add the shadow context for virtual EL2 execution
> KVM: arm64: Set vcpu context depending on the guest exception level
> KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
> exit
> KVM: arm64: Move exception macros and enums to a common file
> KVM: arm64: Support to inject exceptions to the virtual EL2
> KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
> KVM: arm64: Trap CPACR_EL1 access in virtual EL2
> KVM: arm64: Handle eret instruction traps
> KVM: arm64: Set a handler for the system instruction traps
> KVM: arm64: Handle PSCI call via smc from the guest
> KVM: arm64: Inject HVC exceptions to the virtual EL2
> KVM: arm64: Respect virtual HCR_EL2.TWX setting
> KVM: arm64: Respect virtual CPTR_EL2.TFP setting
> KVM: arm64: Add macros to support the virtual EL2 with VHE
> KVM: arm64: Add EL2 registers defined in ARMv8.1 to vcpu context
> KVM: arm64: Emulate EL12 register accesses from the virtual EL2
> KVM: arm64: Support a VM with VHE considering EL0 of the VHE host
> KVM: arm64: Allow the virtual EL2 to access EL2 states without trap
> KVM: arm64: Manage the shadow states when virtual E2H bit enabled
> KVM: arm64: Trap and emulate CPTR_EL2 accesses via CPACR_EL1 from the
> virtual EL2 with VHE
> KVM: arm64: Emulate appropriate VM control system registers
> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting
> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting for EL12
> register traps
> KVM: arm64: Respect virtual HCR_EL2.TVM and TRVM settings
> KVM: arm64: Respect the virtual HCR_EL2.NV1 bit setting
> KVM: arm64: Respect the virtual CPTR_EL2.TCPAC setting
>
> Documentation/admin-guide/kernel-parameters.txt | 4 +
> arch/arm/include/asm/kvm_emulate.h | 17 ++
> arch/arm/include/asm/kvm_host.h | 15 +
> arch/arm64/include/asm/cpucaps.h | 3 +-
> arch/arm64/include/asm/esr.h | 1 +
> arch/arm64/include/asm/kvm_arm.h | 2 +
> arch/arm64/include/asm/kvm_coproc.h | 3 +-
> arch/arm64/include/asm/kvm_emulate.h | 56 ++++
> arch/arm64/include/asm/kvm_host.h | 64 ++++-
> arch/arm64/include/asm/kvm_hyp.h | 24 --
> arch/arm64/include/asm/pgtable-hwdef.h | 6 +
> arch/arm64/include/asm/sysreg.h | 70 +++++
> arch/arm64/include/uapi/asm/kvm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 1 +
> arch/arm64/kernel/cpufeature.c | 11 +
> arch/arm64/kvm/Makefile | 5 +-
> arch/arm64/kvm/context.c | 346 +++++++++++++++++++++++
> arch/arm64/kvm/emulate-nested.c | 83 ++++++
> arch/arm64/kvm/guest.c | 2 +
> arch/arm64/kvm/handle_exit.c | 89 +++++-
> arch/arm64/kvm/hyp/entry.S | 13 +
> arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
> arch/arm64/kvm/hyp/switch.c | 33 ++-
> arch/arm64/kvm/hyp/sysreg-sr.c | 117 ++++----
> arch/arm64/kvm/inject_fault.c | 12 -
> arch/arm64/kvm/nested.c | 63 +++++
> arch/arm64/kvm/reset.c | 8 +
> arch/arm64/kvm/sys_regs.c | 359 +++++++++++++++++++++++-
> arch/arm64/kvm/sys_regs.h | 8 +
> arch/arm64/kvm/trace.h | 43 ++-
> virt/kvm/arm/arm.c | 20 ++
> 31 files changed, 1363 insertions(+), 118 deletions(-)
> create mode 100644 arch/arm64/kvm/context.c
> create mode 100644 arch/arm64/kvm/emulate-nested.c
> create mode 100644 arch/arm64/kvm/nested.c
>
> --
> 1.9.1
>
On Tue, Jul 18, 2017 at 12:59 PM, Jintack Lim <[email protected]> wrote:
> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.
>
> This is for recursive nested virtualization.
>
> Signed-off-by: Jintack Lim <[email protected]>
This should be linaro e-mail address. Will fix it.
> ---
> arch/arm64/include/asm/kvm_arm.h | 1 +
> arch/arm64/kvm/sys_regs.c | 18 ++++++++++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index aeaac4e..a1274b7 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -23,6 +23,7 @@
> #include <asm/types.h>
>
> /* Hyp Configuration Register (HCR) bits */
> +#define HCR_NV1 (UL(1) << 43)
> #define HCR_NV (UL(1) << 42)
> #define HCR_E2H (UL(1) << 34)
> #define HCR_ID (UL(1) << 33)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 3e4ec5e..6f67666 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1031,6 +1031,15 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
> return true;
> }
>
> +/* This function is to support the recursive nested virtualization */
> +static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> + if (!vcpu_mode_el2(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV1))
> + return true;
> +
> + return false;
> +}
> +
> static bool access_elr(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> @@ -1038,6 +1047,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
> if (el12_reg(p) && forward_nv_traps(vcpu))
> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>
> + if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
> return true;
> }
> @@ -1049,6 +1061,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
> if (el12_reg(p) && forward_nv_traps(vcpu))
> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>
> + if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
> return true;
> }
> @@ -1060,6 +1075,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
> if (el12_reg(p) && forward_nv_traps(vcpu))
> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>
> + if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> return true;
> }
> --
> 1.9.1
>
Hi Jintack,
On Tue, Jul 18, 2017 at 10:23:05PM -0400, Jintack Lim wrote:
> On Tue, Jul 18, 2017 at 12:58 PM, Jintack Lim <[email protected]> wrote:
> > Nested virtualization is the ability to run a virtual machine inside another
> > virtual machine. In other words, it’s about running a hypervisor (the guest
> > hypervisor) on top of another hypervisor (the host hypervisor).
> >
> > Supporting nested virtualization on ARM means that the hypervisor provides not
> > only EL0/EL1 execution environment to VMs as it usually does but also the
> > virtualization extensions including EL2 execution environment. Once the host
> > hypervisor provides those execution environments to the VMs, then the guest
> > hypervisor can run its own VMs (nested VMs) naturally.
> >
> > This series supports nested virtualization on arm64. ARM recently announced an
> > extension (ARMv8.3) which has support for nested virtualization[1]. This patch
> > set is based on the ARMv8.3 specification and tested on the FastModel with
> > ARMv8.3 extension.
> >
> > The whole patch set to support nested virtualization is huge over 70
> > patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
> > virtualization. This patch series is the first part.
> >
> > CPU virtualization patch series provides basic nested virtualization framework
> > and instruction emulations including v8.1 VHE feature and v8.3 nested
> > virtualization feature for VMs.
> >
> > This patch series again can be divided into four parts. Patch 1 to 5 introduces
> > nested virtualization by discovering hardware feature, adding a kernel
> > parameter and allowing the userspace to set the initial CPU mode to EL2.
> >
> > Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
> > a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
> > virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
> > host hypervisor manages virtual EL2 register state for the guest hypervisor
> > and shadow EL1 register state that reflects the virtual EL2 register state to
> > run the guest hypervisor in EL1.
> >
> > Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
> > Extensions. These patches emulate newly defined registers and bits in v8.1 and
> > allow the virtual EL2 to access EL2 register states via EL1 register accesses
> > as in the real EL2.
> >
> > Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
> > These enable recursive nested virtualization.
> >
> > This patch set is tested on the FastModel with the v8.3 extension for arm64 and
> > a cubietruck for arm32. On the FastModel, the host and the guest kernels are
> > compiled with and without VHE, so there are four combinations. I was able to
> > boot SMP Linux in the nested VM on all four configurations and able to run
> > hackbench. I also checked that regular VMs could boot when the nested
> > virtualization kernel parameter was not set. On the cubietruck, I also verified
> > that regular VMs could boot as well.
> >
> > I'll share my experiment setup shortly.
>
> I summarized my experiment setup here.
>
> https://github.com/columbia/nesting-pub/wiki/Nested-virtualization-on-ARM-setup
>
Thanks for sharing this.
> >
> > Even though this work has some limitations and TODOs, I'd appreciate early
> > feedback on this RFC. Specifically, I'm interested in:
> >
> > - Overall design to manage vcpu context for the virtual EL2
> > - Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
> > (Patch 30 and 32)
> > - Patch organization and coding style
>
> I also wonder if the hardware and/or KVM do not support nested
> virtualization but the userspace uses nested virtualization option,
> which one is better: giving an error or launching a regular VM
> silently.
>
I think KVM should complain to userspace if userspace tries to set a
feature it does not support, and I think userspace should give as
meaningful an error message as possible to the user when that happens.
Thanks,
-Christoffer
On Wed, Jul 19, 2017 at 4:49 AM, Christoffer Dall <[email protected]> wrote:
> Hi Jintack,
>
> On Tue, Jul 18, 2017 at 10:23:05PM -0400, Jintack Lim wrote:
>> On Tue, Jul 18, 2017 at 12:58 PM, Jintack Lim <[email protected]> wrote:
>> > Nested virtualization is the ability to run a virtual machine inside another
>> > virtual machine. In other words, it’s about running a hypervisor (the guest
>> > hypervisor) on top of another hypervisor (the host hypervisor).
>> >
>> > Supporting nested virtualization on ARM means that the hypervisor provides not
>> > only EL0/EL1 execution environment to VMs as it usually does but also the
>> > virtualization extensions including EL2 execution environment. Once the host
>> > hypervisor provides those execution environments to the VMs, then the guest
>> > hypervisor can run its own VMs (nested VMs) naturally.
>> >
>> > This series supports nested virtualization on arm64. ARM recently announced an
>> > extension (ARMv8.3) which has support for nested virtualization[1]. This patch
>> > set is based on the ARMv8.3 specification and tested on the FastModel with
>> > ARMv8.3 extension.
>> >
>> > The whole patch set to support nested virtualization is huge over 70
>> > patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
>> > virtualization. This patch series is the first part.
>> >
>> > CPU virtualization patch series provides basic nested virtualization framework
>> > and instruction emulations including v8.1 VHE feature and v8.3 nested
>> > virtualization feature for VMs.
>> >
>> > This patch series again can be divided into four parts. Patch 1 to 5 introduces
>> > nested virtualization by discovering hardware feature, adding a kernel
>> > parameter and allowing the userspace to set the initial CPU mode to EL2.
>> >
>> > Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
>> > a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
>> > virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
>> > host hypervisor manages virtual EL2 register state for the guest hypervisor
>> > and shadow EL1 register state that reflects the virtual EL2 register state to
>> > run the guest hypervisor in EL1.
>> >
>> > Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
>> > Extensions. These patches emulate newly defined registers and bits in v8.1 and
>> > allow the virtual EL2 to access EL2 register states via EL1 register accesses
>> > as in the real EL2.
>> >
>> > Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
>> > These enable recursive nested virtualization.
>> >
>> > This patch set is tested on the FastModel with the v8.3 extension for arm64 and
>> > a cubietruck for arm32. On the FastModel, the host and the guest kernels are
>> > compiled with and without VHE, so there are four combinations. I was able to
>> > boot SMP Linux in the nested VM on all four configurations and able to run
>> > hackbench. I also checked that regular VMs could boot when the nested
>> > virtualization kernel parameter was not set. On the cubietruck, I also verified
>> > that regular VMs could boot as well.
>> >
>> > I'll share my experiment setup shortly.
>>
>> I summarized my experiment setup here.
>>
>> https://github.com/columbia/nesting-pub/wiki/Nested-virtualization-on-ARM-setup
>>
>
> Thanks for sharing this.
>
>> >
>> > Even though this work has some limitations and TODOs, I'd appreciate early
>> > feedback on this RFC. Specifically, I'm interested in:
>> >
>> > - Overall design to manage vcpu context for the virtual EL2
>> > - Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
>> > (Patch 30 and 32)
>> > - Patch organization and coding style
>>
>> I also wonder if the hardware and/or KVM do not support nested
>> virtualization but the userspace uses nested virtualization option,
>> which one is better: giving an error or launching a regular VM
>> silently.
>>
>
> I think KVM should complain to userspace if userspace tries to set a
> feature it does not support, and I think userspace should give as
> meaningful an error message as possible to the user when that happens.
>
Ok, thanks. I'll work this out.
> Thanks,
> -Christoffer
Jintack Lim <[email protected]> writes:
...
>>
>> I'll share my experiment setup shortly.
>
> I summarized my experiment setup here.
>
> https://github.com/columbia/nesting-pub/wiki/Nested-virtualization-on-ARM-setup
Thanks Jintack! I was able to test L2 boot up with these instructions.
Next, I will try to run some simple tests. Any suggestions on reducing the L2 bootup
time in my test setup ? I think I will try to make the L2 kernel print
less messages; and maybe just get rid of some of the userspace services.
I also applied the patch to reduce the timer frequency btw.
Bandan
>>
>> Even though this work has some limitations and TODOs, I'd appreciate early
>> feedback on this RFC. Specifically, I'm interested in:
>>
>> - Overall design to manage vcpu context for the virtual EL2
>> - Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
>> (Patch 30 and 32)
>> - Patch organization and coding style
>
> I also wonder if the hardware and/or KVM do not support nested
> virtualization but the userspace uses nested virtualization option,
> which one is better: giving an error or launching a regular VM
> silently.
>
>>
>> This patch series is based on kvm/next d38338e.
>> The whole patch series including memory, VGIC, and timer patches is available
>> here:
>>
>> [email protected]:columbia/nesting-pub.git rfc-v2
>>
>> Limitations:
>> - There are some cases that the target exception level of a VM is ambiguous when
>> emulating eret instruction. I'm discussing this issue with Christoffer and
>> Marc. Meanwhile, I added a temporary patch (not included in this
>> series. f1beaba in the repo) and used 4.10.0 kernel when testing the guest
>> hypervisor with VHE.
>> - Recursive nested virtualization is not tested yet.
>> - Other hypervisors (such as Xen) on KVM are not tested.
>>
>> TODO:
>> - Submit memory, VGIC, and timer patches
>> - Evaluate regular VM performance to see if there's a negative impact.
>> - Test other hypervisors such as Xen on KVM
>> - Test recursive nested virtualization
>>
>> v1-->v2:
>> - Added support for the virtual EL2 with VHE
>> - Rewrote commit messages and comments from the perspective of supporting
>> execution environments to VMs, rather than from the perspective of the guest
>> hypervisor running in them.
>> - Fixed a few bugs to make it run on the FastModel.
>> - Tested on ARMv8.3 with four configurations. (host/guest. with/without VHE.)
>> - Rebased to kvm/next
>>
>> [1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
>>
>> Christoffer Dall (7):
>> KVM: arm64: Add KVM nesting feature
>> KVM: arm64: Allow userspace to set PSR_MODE_EL2x
>> KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
>> KVM: arm/arm64: Add a framework to prepare virtual EL2 execution
>> arm64: Add missing TCR hw defines
>> KVM: arm64: Create shadow EL1 registers
>> KVM: arm64: Trap EL1 VM register accesses in virtual EL2
>>
>> Jintack Lim (31):
>> arm64: Add ARM64_HAS_NESTED_VIRT feature
>> KVM: arm/arm64: Enable nested virtualization via command-line
>> KVM: arm/arm64: Check if nested virtualization is in use
>> KVM: arm64: Add EL2 system registers to vcpu context
>> KVM: arm64: Add EL2 special registers to vcpu context
>> KVM: arm64: Add the shadow context for virtual EL2 execution
>> KVM: arm64: Set vcpu context depending on the guest exception level
>> KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
>> exit
>> KVM: arm64: Move exception macros and enums to a common file
>> KVM: arm64: Support to inject exceptions to the virtual EL2
>> KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
>> KVM: arm64: Trap CPACR_EL1 access in virtual EL2
>> KVM: arm64: Handle eret instruction traps
>> KVM: arm64: Set a handler for the system instruction traps
>> KVM: arm64: Handle PSCI call via smc from the guest
>> KVM: arm64: Inject HVC exceptions to the virtual EL2
>> KVM: arm64: Respect virtual HCR_EL2.TWX setting
>> KVM: arm64: Respect virtual CPTR_EL2.TFP setting
>> KVM: arm64: Add macros to support the virtual EL2 with VHE
>> KVM: arm64: Add EL2 registers defined in ARMv8.1 to vcpu context
>> KVM: arm64: Emulate EL12 register accesses from the virtual EL2
>> KVM: arm64: Support a VM with VHE considering EL0 of the VHE host
>> KVM: arm64: Allow the virtual EL2 to access EL2 states without trap
>> KVM: arm64: Manage the shadow states when virtual E2H bit enabled
>> KVM: arm64: Trap and emulate CPTR_EL2 accesses via CPACR_EL1 from the
>> virtual EL2 with VHE
>> KVM: arm64: Emulate appropriate VM control system registers
>> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting
>> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting for EL12
>> register traps
>> KVM: arm64: Respect virtual HCR_EL2.TVM and TRVM settings
>> KVM: arm64: Respect the virtual HCR_EL2.NV1 bit setting
>> KVM: arm64: Respect the virtual CPTR_EL2.TCPAC setting
>>
>> Documentation/admin-guide/kernel-parameters.txt | 4 +
>> arch/arm/include/asm/kvm_emulate.h | 17 ++
>> arch/arm/include/asm/kvm_host.h | 15 +
>> arch/arm64/include/asm/cpucaps.h | 3 +-
>> arch/arm64/include/asm/esr.h | 1 +
>> arch/arm64/include/asm/kvm_arm.h | 2 +
>> arch/arm64/include/asm/kvm_coproc.h | 3 +-
>> arch/arm64/include/asm/kvm_emulate.h | 56 ++++
>> arch/arm64/include/asm/kvm_host.h | 64 ++++-
>> arch/arm64/include/asm/kvm_hyp.h | 24 --
>> arch/arm64/include/asm/pgtable-hwdef.h | 6 +
>> arch/arm64/include/asm/sysreg.h | 70 +++++
>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>> arch/arm64/kernel/asm-offsets.c | 1 +
>> arch/arm64/kernel/cpufeature.c | 11 +
>> arch/arm64/kvm/Makefile | 5 +-
>> arch/arm64/kvm/context.c | 346 +++++++++++++++++++++++
>> arch/arm64/kvm/emulate-nested.c | 83 ++++++
>> arch/arm64/kvm/guest.c | 2 +
>> arch/arm64/kvm/handle_exit.c | 89 +++++-
>> arch/arm64/kvm/hyp/entry.S | 13 +
>> arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
>> arch/arm64/kvm/hyp/switch.c | 33 ++-
>> arch/arm64/kvm/hyp/sysreg-sr.c | 117 ++++----
>> arch/arm64/kvm/inject_fault.c | 12 -
>> arch/arm64/kvm/nested.c | 63 +++++
>> arch/arm64/kvm/reset.c | 8 +
>> arch/arm64/kvm/sys_regs.c | 359 +++++++++++++++++++++++-
>> arch/arm64/kvm/sys_regs.h | 8 +
>> arch/arm64/kvm/trace.h | 43 ++-
>> virt/kvm/arm/arm.c | 20 ++
>> 31 files changed, 1363 insertions(+), 118 deletions(-)
>> create mode 100644 arch/arm64/kvm/context.c
>> create mode 100644 arch/arm64/kvm/emulate-nested.c
>> create mode 100644 arch/arm64/kvm/nested.c
>>
>> --
>> 1.9.1
>>
On Fri, Jul 28, 2017 at 4:13 PM, Bandan Das <[email protected]> wrote:
> Jintack Lim <[email protected]> writes:
> ...
>>>
>>> I'll share my experiment setup shortly.
>>
>> I summarized my experiment setup here.
>>
>> https://github.com/columbia/nesting-pub/wiki/Nested-virtualization-on-ARM-setup
>
> Thanks Jintack! I was able to test L2 boot up with these instructions.
Thanks for the confirmation!
>
> Next, I will try to run some simple tests. Any suggestions on reducing the L2 bootup
> time in my test setup ? I think I will try to make the L2 kernel print
> less messages; and maybe just get rid of some of the userspace services.
> I also applied the patch to reduce the timer frequency btw.
I think you can try to use those kernel parameters: "loglevel=1", with
which the kernel print (almost) nothing during the boot process but
the init process will print something, or "console=none", with which
you don't see anything but the login message. I didn't used them
because I wanted to see the L2 boot message as soon as possible :)
Thanks,
Jintack
>
> Bandan
>
>>>
>>> Even though this work has some limitations and TODOs, I'd appreciate early
>>> feedback on this RFC. Specifically, I'm interested in:
>>>
>>> - Overall design to manage vcpu context for the virtual EL2
>>> - Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
>>> (Patch 30 and 32)
>>> - Patch organization and coding style
>>
>> I also wonder if the hardware and/or KVM do not support nested
>> virtualization but the userspace uses nested virtualization option,
>> which one is better: giving an error or launching a regular VM
>> silently.
>>
>>>
>>> This patch series is based on kvm/next d38338e.
>>> The whole patch series including memory, VGIC, and timer patches is available
>>> here:
>>>
>>> [email protected]:columbia/nesting-pub.git rfc-v2
>>>
>>> Limitations:
>>> - There are some cases that the target exception level of a VM is ambiguous when
>>> emulating eret instruction. I'm discussing this issue with Christoffer and
>>> Marc. Meanwhile, I added a temporary patch (not included in this
>>> series. f1beaba in the repo) and used 4.10.0 kernel when testing the guest
>>> hypervisor with VHE.
>>> - Recursive nested virtualization is not tested yet.
>>> - Other hypervisors (such as Xen) on KVM are not tested.
>>>
>>> TODO:
>>> - Submit memory, VGIC, and timer patches
>>> - Evaluate regular VM performance to see if there's a negative impact.
>>> - Test other hypervisors such as Xen on KVM
>>> - Test recursive nested virtualization
>>>
>>> v1-->v2:
>>> - Added support for the virtual EL2 with VHE
>>> - Rewrote commit messages and comments from the perspective of supporting
>>> execution environments to VMs, rather than from the perspective of the guest
>>> hypervisor running in them.
>>> - Fixed a few bugs to make it run on the FastModel.
>>> - Tested on ARMv8.3 with four configurations. (host/guest. with/without VHE.)
>>> - Rebased to kvm/next
>>>
>>> [1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
>>>
>>> Christoffer Dall (7):
>>> KVM: arm64: Add KVM nesting feature
>>> KVM: arm64: Allow userspace to set PSR_MODE_EL2x
>>> KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
>>> KVM: arm/arm64: Add a framework to prepare virtual EL2 execution
>>> arm64: Add missing TCR hw defines
>>> KVM: arm64: Create shadow EL1 registers
>>> KVM: arm64: Trap EL1 VM register accesses in virtual EL2
>>>
>>> Jintack Lim (31):
>>> arm64: Add ARM64_HAS_NESTED_VIRT feature
>>> KVM: arm/arm64: Enable nested virtualization via command-line
>>> KVM: arm/arm64: Check if nested virtualization is in use
>>> KVM: arm64: Add EL2 system registers to vcpu context
>>> KVM: arm64: Add EL2 special registers to vcpu context
>>> KVM: arm64: Add the shadow context for virtual EL2 execution
>>> KVM: arm64: Set vcpu context depending on the guest exception level
>>> KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
>>> exit
>>> KVM: arm64: Move exception macros and enums to a common file
>>> KVM: arm64: Support to inject exceptions to the virtual EL2
>>> KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
>>> KVM: arm64: Trap CPACR_EL1 access in virtual EL2
>>> KVM: arm64: Handle eret instruction traps
>>> KVM: arm64: Set a handler for the system instruction traps
>>> KVM: arm64: Handle PSCI call via smc from the guest
>>> KVM: arm64: Inject HVC exceptions to the virtual EL2
>>> KVM: arm64: Respect virtual HCR_EL2.TWX setting
>>> KVM: arm64: Respect virtual CPTR_EL2.TFP setting
>>> KVM: arm64: Add macros to support the virtual EL2 with VHE
>>> KVM: arm64: Add EL2 registers defined in ARMv8.1 to vcpu context
>>> KVM: arm64: Emulate EL12 register accesses from the virtual EL2
>>> KVM: arm64: Support a VM with VHE considering EL0 of the VHE host
>>> KVM: arm64: Allow the virtual EL2 to access EL2 states without trap
>>> KVM: arm64: Manage the shadow states when virtual E2H bit enabled
>>> KVM: arm64: Trap and emulate CPTR_EL2 accesses via CPACR_EL1 from the
>>> virtual EL2 with VHE
>>> KVM: arm64: Emulate appropriate VM control system registers
>>> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting
>>> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting for EL12
>>> register traps
>>> KVM: arm64: Respect virtual HCR_EL2.TVM and TRVM settings
>>> KVM: arm64: Respect the virtual HCR_EL2.NV1 bit setting
>>> KVM: arm64: Respect the virtual CPTR_EL2.TCPAC setting
>>>
>>> Documentation/admin-guide/kernel-parameters.txt | 4 +
>>> arch/arm/include/asm/kvm_emulate.h | 17 ++
>>> arch/arm/include/asm/kvm_host.h | 15 +
>>> arch/arm64/include/asm/cpucaps.h | 3 +-
>>> arch/arm64/include/asm/esr.h | 1 +
>>> arch/arm64/include/asm/kvm_arm.h | 2 +
>>> arch/arm64/include/asm/kvm_coproc.h | 3 +-
>>> arch/arm64/include/asm/kvm_emulate.h | 56 ++++
>>> arch/arm64/include/asm/kvm_host.h | 64 ++++-
>>> arch/arm64/include/asm/kvm_hyp.h | 24 --
>>> arch/arm64/include/asm/pgtable-hwdef.h | 6 +
>>> arch/arm64/include/asm/sysreg.h | 70 +++++
>>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>>> arch/arm64/kernel/asm-offsets.c | 1 +
>>> arch/arm64/kernel/cpufeature.c | 11 +
>>> arch/arm64/kvm/Makefile | 5 +-
>>> arch/arm64/kvm/context.c | 346 +++++++++++++++++++++++
>>> arch/arm64/kvm/emulate-nested.c | 83 ++++++
>>> arch/arm64/kvm/guest.c | 2 +
>>> arch/arm64/kvm/handle_exit.c | 89 +++++-
>>> arch/arm64/kvm/hyp/entry.S | 13 +
>>> arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
>>> arch/arm64/kvm/hyp/switch.c | 33 ++-
>>> arch/arm64/kvm/hyp/sysreg-sr.c | 117 ++++----
>>> arch/arm64/kvm/inject_fault.c | 12 -
>>> arch/arm64/kvm/nested.c | 63 +++++
>>> arch/arm64/kvm/reset.c | 8 +
>>> arch/arm64/kvm/sys_regs.c | 359 +++++++++++++++++++++++-
>>> arch/arm64/kvm/sys_regs.h | 8 +
>>> arch/arm64/kvm/trace.h | 43 ++-
>>> virt/kvm/arm/arm.c | 20 ++
>>> 31 files changed, 1363 insertions(+), 118 deletions(-)
>>> create mode 100644 arch/arm64/kvm/context.c
>>> create mode 100644 arch/arm64/kvm/emulate-nested.c
>>> create mode 100644 arch/arm64/kvm/nested.c
>>>
>>> --
>>> 1.9.1
>>>
On Tue, Jul 18, 2017 at 11:58:36AM -0500, Jintack Lim wrote:
> From: Christoffer Dall <[email protected]>
>
> Add functions setting up and restoring the guest's context on each entry
> and exit. These functions will come in handy when we want to use
> different context for normal EL0/EL1 and virtual EL2 execution.
>
> No functional change yet.
>
> Signed-off-by: Christoffer Dall <[email protected]>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm/include/asm/kvm_emulate.h | 4 ++
> arch/arm64/include/asm/kvm_emulate.h | 4 ++
> arch/arm64/kvm/Makefile | 2 +-
> arch/arm64/kvm/context.c | 54 ++++++++++++++++
> arch/arm64/kvm/hyp/sysreg-sr.c | 117 +++++++++++++++++++----------------
> virt/kvm/arm/arm.c | 14 +++++
> 6 files changed, 140 insertions(+), 55 deletions(-)
> create mode 100644 arch/arm64/kvm/context.c
>
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 399cd75e..0a03b7d 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -47,6 +47,10 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
> void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>
> +static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
> +static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
> +static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
> +
> static inline bool kvm_condition_valid(const struct kvm_vcpu *vcpu)
> {
> return kvm_condition_valid32(vcpu);
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5d6f3d0..14c4ce9 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -42,6 +42,10 @@
> void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>
> +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
> +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
> +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
> +
> static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> {
> vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index f513047..5762337 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -15,7 +15,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/e
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o
>
> -kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o context.o
> kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o
> kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o
> kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> new file mode 100644
> index 0000000..bc43e66
> --- /dev/null
> +++ b/arch/arm64/kvm/context.c
> @@ -0,0 +1,54 @@
> +/*
> + * Copyright (C) 2016 - Linaro Ltd.
> + * Author: Christoffer Dall <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_emulate.h>
> +
> +/**
> + * kvm_arm_setup_shadow_state -- prepare shadow state based on emulated mode
> + * @vcpu: The VCPU pointer
> + */
> +void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +
> + ctxt->hw_pstate = *vcpu_cpsr(vcpu);
> + ctxt->hw_sys_regs = ctxt->sys_regs;
> + ctxt->hw_sp_el1 = ctxt->gp_regs.sp_el1;
> + ctxt->hw_elr_el1 = ctxt->gp_regs.elr_el1;
> + ctxt->hw_spsr_el1 = ctxt->gp_regs.spsr[KVM_SPSR_EL1];
> +}
> +
> +/**
> + * kvm_arm_restore_shadow_state -- write back shadow state from guest
> + * @vcpu: The VCPU pointer
> + */
> +void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> +
> + *vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> + ctxt->gp_regs.sp_el1 = ctxt->hw_sp_el1;
> + ctxt->gp_regs.elr_el1 = ctxt->hw_elr_el1;
> + ctxt->gp_regs.spsr[KVM_SPSR_EL1] = ctxt->hw_spsr_el1;
> +}
> +
> +void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt)
> +{
> + /* This is to set hw_sys_regs of host_cpu_context */
> + cpu_ctxt->hw_sys_regs = cpu_ctxt->sys_regs;
> +}
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index 9341376..b7a67b1 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -19,6 +19,7 @@
> #include <linux/kvm_host.h>
>
> #include <asm/kvm_asm.h>
> +#include <asm/kvm_emulate.h>
> #include <asm/kvm_hyp.h>
>
> /* Yes, this does nothing, on purpose */
> @@ -33,39 +34,43 @@ static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { }
>
> static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt)
> {
> - ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
> - ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
> - ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
> - ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
> - ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
> + u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> + sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1);
> + sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0);
> + sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
> + sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1);
> + sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1);
> ctxt->gp_regs.regs.sp = read_sysreg(sp_el0);
> ctxt->gp_regs.regs.pc = read_sysreg_el2(elr);
> - ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr);
> + ctxt->hw_pstate = read_sysreg_el2(spsr);
> }
>
> static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
> {
> - ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
> - ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
> - ctxt->sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
> - ctxt->sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
> - ctxt->sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
> - ctxt->sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
> - ctxt->sys_regs[TCR_EL1] = read_sysreg_el1(tcr);
> - ctxt->sys_regs[ESR_EL1] = read_sysreg_el1(esr);
> - ctxt->sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0);
> - ctxt->sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1);
> - ctxt->sys_regs[FAR_EL1] = read_sysreg_el1(far);
> - ctxt->sys_regs[MAIR_EL1] = read_sysreg_el1(mair);
> - ctxt->sys_regs[VBAR_EL1] = read_sysreg_el1(vbar);
> - ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr);
> - ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair);
> - ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
> - ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> -
> - ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1);
> - ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr);
> - ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr);
> + u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> + sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2);
> + sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1);
> + sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr);
> + sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr);
> + sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0);
> + sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1);
> + sys_regs[TCR_EL1] = read_sysreg_el1(tcr);
> + sys_regs[ESR_EL1] = read_sysreg_el1(esr);
> + sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0);
> + sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1);
> + sys_regs[FAR_EL1] = read_sysreg_el1(far);
> + sys_regs[MAIR_EL1] = read_sysreg_el1(mair);
> + sys_regs[VBAR_EL1] = read_sysreg_el1(vbar);
> + sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr);
> + sys_regs[AMAIR_EL1] = read_sysreg_el1(amair);
> + sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl);
> + sys_regs[PAR_EL1] = read_sysreg(par_el1);
> +
> + ctxt->hw_sp_el1 = read_sysreg(sp_el1);
> + ctxt->hw_elr_el1 = read_sysreg_el1(elr);
> + ctxt->hw_spsr_el1 = read_sysreg_el1(spsr);
> }
>
> static hyp_alternate_select(__sysreg_call_save_host_state,
> @@ -86,39 +91,43 @@ void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt)
>
> static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt)
> {
> - write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1);
> - write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0);
> - write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0);
> - write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1);
> - write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1);
> + u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> + write_sysreg(sys_regs[ACTLR_EL1], actlr_el1);
> + write_sysreg(sys_regs[TPIDR_EL0], tpidr_el0);
> + write_sysreg(sys_regs[TPIDRRO_EL0], tpidrro_el0);
> + write_sysreg(sys_regs[TPIDR_EL1], tpidr_el1);
> + write_sysreg(sys_regs[MDSCR_EL1], mdscr_el1);
> write_sysreg(ctxt->gp_regs.regs.sp, sp_el0);
> write_sysreg_el2(ctxt->gp_regs.regs.pc, elr);
> - write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr);
> + write_sysreg_el2(ctxt->hw_pstate, spsr);
> }
>
> static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
> {
> - write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2);
> - write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1);
> - write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1], sctlr);
> - write_sysreg_el1(ctxt->sys_regs[CPACR_EL1], cpacr);
> - write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1], ttbr0);
> - write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1], ttbr1);
> - write_sysreg_el1(ctxt->sys_regs[TCR_EL1], tcr);
> - write_sysreg_el1(ctxt->sys_regs[ESR_EL1], esr);
> - write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1], afsr0);
> - write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1], afsr1);
> - write_sysreg_el1(ctxt->sys_regs[FAR_EL1], far);
> - write_sysreg_el1(ctxt->sys_regs[MAIR_EL1], mair);
> - write_sysreg_el1(ctxt->sys_regs[VBAR_EL1], vbar);
> - write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr);
> - write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair);
> - write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl);
> - write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1);
> -
> - write_sysreg(ctxt->gp_regs.sp_el1, sp_el1);
> - write_sysreg_el1(ctxt->gp_regs.elr_el1, elr);
> - write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr);
> + u64 *sys_regs = kern_hyp_va(ctxt->hw_sys_regs);
> +
> + write_sysreg(sys_regs[MPIDR_EL1], vmpidr_el2);
> + write_sysreg(sys_regs[CSSELR_EL1], csselr_el1);
> + write_sysreg_el1(sys_regs[SCTLR_EL1], sctlr);
> + write_sysreg_el1(sys_regs[CPACR_EL1], cpacr);
> + write_sysreg_el1(sys_regs[TTBR0_EL1], ttbr0);
> + write_sysreg_el1(sys_regs[TTBR1_EL1], ttbr1);
> + write_sysreg_el1(sys_regs[TCR_EL1], tcr);
> + write_sysreg_el1(sys_regs[ESR_EL1], esr);
> + write_sysreg_el1(sys_regs[AFSR0_EL1], afsr0);
> + write_sysreg_el1(sys_regs[AFSR1_EL1], afsr1);
> + write_sysreg_el1(sys_regs[FAR_EL1], far);
> + write_sysreg_el1(sys_regs[MAIR_EL1], mair);
> + write_sysreg_el1(sys_regs[VBAR_EL1], vbar);
> + write_sysreg_el1(sys_regs[CONTEXTIDR_EL1], contextidr);
> + write_sysreg_el1(sys_regs[AMAIR_EL1], amair);
> + write_sysreg_el1(sys_regs[CNTKCTL_EL1], cntkctl);
> + write_sysreg(sys_regs[PAR_EL1], par_el1);
> +
> + write_sysreg(ctxt->hw_sp_el1, sp_el1);
> + write_sysreg_el1(ctxt->hw_elr_el1, elr);
> + write_sysreg_el1(ctxt->hw_spsr_el1, spsr);
> }
>
> static hyp_alternate_select(__sysreg_call_restore_host_state,
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 36aae3a..0ff2997 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -689,6 +689,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> }
>
> kvm_arm_setup_debug(vcpu);
> + kvm_arm_setup_shadow_state(vcpu);
>
> /**************************************************************
> * Enter the guest
> @@ -704,6 +705,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> * Back from guest
> *************************************************************/
>
> + kvm_arm_restore_shadow_state(vcpu);
If we want to optimize this a bit, we could consider making these calls
static inlines, which either do nothing (nesting not enabled via
cmdline) or call the shadow state functions, selected using a static
key.
Of course, for that to work, the hw_ register value should be changed to
pointers (in the hyp VA space) so that the save restore code
reads/writes directly to the correct backing store and no extra work has
to be done in the save/restore path when not using nesting.
That would actually also optimize the common case even when nesting is
enabled via the cmdline, because we would only have to change the hw
pointers when emulating an exception to vEL2 and when trapping ERET from
virtual EL2, and the rest of the time wouldn't need to do any extra
work, at least with the sysregs and special regs.
Thanks,
-Christoffer
> kvm_arm_clear_debug(vcpu);
>
> /*
> @@ -1334,6 +1336,16 @@ static void teardown_hyp_mode(void)
>
> static int init_vhe_mode(void)
> {
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + kvm_cpu_context_t *cpu_ctxt;
> +
> + cpu_ctxt = per_cpu_ptr(kvm_host_cpu_state, cpu);
> +
> + kvm_arm_init_cpu_context(cpu_ctxt);
> + }
> +
> kvm_info("VHE mode initialized successfully\n");
> return 0;
> }
> @@ -1416,6 +1428,8 @@ static int init_hyp_mode(void)
> kvm_err("Cannot map host CPU state: %d\n", err);
> goto out_err;
> }
> +
> + kvm_arm_init_cpu_context(cpu_ctxt);
> }
>
> kvm_info("Hyp mode initialized successfully\n");
> --
> 1.9.1
>
On Tue, Jul 18, 2017 at 11:58:30AM -0500, Jintack Lim wrote:
> Nested virtualizaion is in use only if all three conditions are met:
> - The architecture supports nested virtualization.
> - The kernel parameter is set.
> - The userspace uses nested virtualiztion feature.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm/include/asm/kvm_host.h | 11 +++++++++++
> arch/arm64/include/asm/kvm_host.h | 2 ++
> arch/arm64/kvm/nested.c | 17 +++++++++++++++++
> virt/kvm/arm/arm.c | 4 ++++
> 4 files changed, 34 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 00b0f97..7e9e6c8 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,4 +303,15 @@ static inline int __init kvmarm_nested_cfg(char *buf)
> {
> return 0;
> }
> +
> +static inline int init_nested_virt(void)
> +{
> + return 0;
> +}
> +
> +static inline bool nested_virt_in_use(struct kvm_vcpu *vcpu)
> +{
> + return false;
> +}
> +
> #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 6df0c7c..86d4b6c 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -387,5 +387,7 @@ static inline void __cpu_init_stage2(void)
> }
>
> int __init kvmarm_nested_cfg(char *buf);
> +int init_nested_virt(void);
> +bool nested_virt_in_use(struct kvm_vcpu *vcpu);
>
> #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 79f38da..9a05c76 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -24,3 +24,20 @@ int __init kvmarm_nested_cfg(char *buf)
> {
> return strtobool(buf, &nested_param);
> }
> +
> +int init_nested_virt(void)
> +{
> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
> + kvm_info("Nested virtualization is supported\n");
> +
> + return 0;
> +}
> +
> +bool nested_virt_in_use(struct kvm_vcpu *vcpu)
> +{
> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT)
> + && test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
> + return true;
> +
> + return false;
> +}
after reading through a lot of your patches, I feel like vm_has_el2()
would be a more elegant name, but it's not a strict requirement to
change it.
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 1c1c772..36aae3a 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1478,6 +1478,10 @@ int kvm_arch_init(void *opaque)
> if (err)
> goto out_err;
>
> + err = init_nested_virt();
> + if (err)
> + return err;
> +
> err = init_subsystems();
> if (err)
> goto out_hyp;
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:28AM -0500, Jintack Lim wrote:
> Add a new kernel parameter(kvm-arm.nested) to enable KVM/ARM nested
> virtualization support. This kernel parameter on arm architecture is
> ignored since nested virtualization is not supported on arm.
>
> Note that this kernel parameter will not have any impact until nested
> virtualization support is completed. Just add this parameter first to
> use it when implementing nested virtualization support.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 4 ++++
> arch/arm/include/asm/kvm_host.h | 4 ++++
> arch/arm64/include/asm/kvm_host.h | 2 ++
> arch/arm64/kvm/Makefile | 2 ++
> arch/arm64/kvm/nested.c | 26 +++++++++++++++++++++++++
> virt/kvm/arm/arm.c | 2 ++
> 6 files changed, 40 insertions(+)
> create mode 100644 arch/arm64/kvm/nested.c
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index aa8341e..8fb152d 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1841,6 +1841,10 @@
> [KVM,ARM] Trap guest accesses to GICv3 common
> system registers
>
> + kvm-arm.nested=
> + [KVM,ARM] Allow nested virtualization in KVM/ARM.
> + Default is 0 (disabled)
We may want to say "on systems that support it" or something like that
here as well.
> +
> kvm-intel.ept= [KVM,Intel] Disable extended page tables
> (virtualized MMU) support on capable Intel chips.
> Default is 1 (enabled)
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 127e2dd..00b0f97 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -299,4 +299,8 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
> int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> struct kvm_device_attr *attr);
>
> +static inline int __init kvmarm_nested_cfg(char *buf)
> +{
> + return 0;
> +}
> #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0c4fd1f..dcc4df8 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -386,4 +386,6 @@ static inline void __cpu_init_stage2(void)
> "PARange is %d bits, unsupported configuration!", parange);
> }
>
> +int __init kvmarm_nested_cfg(char *buf);
> +
> #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 5d98100..f513047 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -35,3 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
> kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
> +
> +kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> new file mode 100644
> index 0000000..79f38da
> --- /dev/null
> +++ b/arch/arm64/kvm/nested.c
> @@ -0,0 +1,26 @@
> +/*
> + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
> + * Author: Jintack Lim <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +static bool nested_param;
> +
> +int __init kvmarm_nested_cfg(char *buf)
> +{
> + return strtobool(buf, &nested_param);
> +}
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index a39a1e1..1c1c772 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -67,6 +67,8 @@
>
> static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
>
> +early_param("kvm-arm.nested", kvmarm_nested_cfg);
> +
> static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
> {
> BUG_ON(preemptible());
> --
> 1.9.1
>
On Tue, Jul 18, 2017 at 11:58:34AM -0500, Jintack Lim wrote:
> To support the virtual EL2 execution, we need to maintain the EL2
> special registers such as SPSR_EL2, ELR_EL2 and SP_EL2 in vcpu context.
>
> Note that SP_EL2 is not accessible in EL2, so we don't need a trap
> handler for this register.
Actually, it's not accessible *in the MRS/MSR instruction* but it is of
course accessible as the current stack pointer (which is why you need
the state, but not the trap handler).
Otherwise, the patch looks good.
Thanks,
-Christoffer
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
> arch/arm64/include/asm/sysreg.h | 4 ++++
> arch/arm64/kvm/sys_regs.c | 38 +++++++++++++++++++++++++++++++++-----
> arch/arm64/kvm/sys_regs.h | 8 ++++++++
> 4 files changed, 57 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 1dc4ed6..57dccde 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -171,6 +171,15 @@ enum vcpu_sysreg {
> NR_SYS_REGS /* Nothing after this line! */
> };
>
> +enum el2_special_regs {
> + __INVALID_EL2_SPECIAL_REG__,
> + SPSR_EL2, /* Saved Program Status Register (EL2) */
> + ELR_EL2, /* Exception Link Register (EL2) */
> + SP_EL2, /* Stack Pointer (EL2) */
> +
> + NR_EL2_SPECIAL_REGS
> +};
> +
> /* 32bit mapping */
> #define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
> #define c0_CSSELR (CSSELR_EL1 * 2)/* Cache Size Selection Register */
> @@ -218,6 +227,8 @@ struct kvm_cpu_context {
> u64 sys_regs[NR_SYS_REGS];
> u32 copro[NR_COPRO_REGS];
> };
> +
> + u64 el2_special_regs[NR_EL2_SPECIAL_REGS];
> };
>
> typedef struct kvm_cpu_context kvm_cpu_context_t;
> @@ -307,6 +318,7 @@ struct kvm_vcpu_arch {
>
> #define vcpu_gp_regs(v) (&(v)->arch.ctxt.gp_regs)
> #define vcpu_sys_reg(v,r) ((v)->arch.ctxt.sys_regs[(r)])
> +#define vcpu_el2_sreg(v,r) ((v)->arch.ctxt.el2_special_regs[(r)])
> /*
> * CP14 and CP15 live in the same array, as they are backed by the
> * same system registers.
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 9277c4a..98c32ef 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -268,6 +268,8 @@
>
> #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
>
> +#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
> +#define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1)
> #define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0)
>
> #define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
> @@ -332,6 +334,8 @@
> #define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
> #define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
>
> +#define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)
> +
> /* Common SCTLR_ELx flags. */
> #define SCTLR_ELx_EE (1 << 25)
> #define SCTLR_ELx_I (1 << 12)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 1568f8b..2b3ed70 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -900,15 +900,33 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
> *sysreg = p->regval;
> }
>
> +static u64 *get_special_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> + u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> + switch (reg) {
> + case SYS_SP_EL1:
> + return &vcpu->arch.ctxt.gp_regs.sp_el1;
> + case SYS_ELR_EL2:
> + return &vcpu_el2_sreg(vcpu, ELR_EL2);
> + case SYS_SPSR_EL2:
> + return &vcpu_el2_sreg(vcpu, SPSR_EL2);
> + default:
> + return NULL;
> + };
> +}
> +
> static bool trap_el2_regs(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> {
> - /* SP_EL1 is NOT maintained in sys_regs array */
> - if (sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2) == SYS_SP_EL1)
> - access_rw(p, &vcpu->arch.ctxt.gp_regs.sp_el1);
> - else
> - access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> + u64 *sys_reg;
> +
> + sys_reg = get_special_reg(vcpu, p);
> + if (!sys_reg)
> + sys_reg = &vcpu_sys_reg(vcpu, r->reg);
> +
> + access_rw(p, sys_reg);
>
> return true;
> }
> @@ -1116,6 +1134,8 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
>
> { SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
>
> + { SYS_DESC(SYS_SPSR_EL2), trap_el2_regs, reset_special, SPSR_EL2, 0 },
> + { SYS_DESC(SYS_ELR_EL2), trap_el2_regs, reset_special, ELR_EL2, 0 },
> { SYS_DESC(SYS_SP_EL1), trap_el2_regs },
>
> { SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
> @@ -1138,6 +1158,8 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
>
> { SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
> { SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },
> +
> + { SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
> };
>
> static bool trap_dbgidr(struct kvm_vcpu *vcpu,
> @@ -2271,6 +2293,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
>
> /* Catch someone adding a register without putting in reset entry. */
> memset(&vcpu->arch.ctxt.sys_regs, 0x42, sizeof(vcpu->arch.ctxt.sys_regs));
> + memset(&vcpu->arch.ctxt.el2_special_regs, 0x42,
> + sizeof(vcpu->arch.ctxt.el2_special_regs));
>
> /* Generic chip reset first (so target could override). */
> reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
> @@ -2281,4 +2305,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
> for (num = 1; num < NR_SYS_REGS; num++)
> if (vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
> panic("Didn't reset vcpu_sys_reg(%zi)", num);
> +
> + for (num = 1; num < NR_EL2_SPECIAL_REGS; num++)
> + if (vcpu_el2_sreg(vcpu, num) == 0x4242424242424242)
> + panic("Didn't reset vcpu_el2_sreg(%zi)", num);
> }
> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
> index 060f534..827717b 100644
> --- a/arch/arm64/kvm/sys_regs.h
> +++ b/arch/arm64/kvm/sys_regs.h
> @@ -99,6 +99,14 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
> vcpu_sys_reg(vcpu, r->reg) = r->val;
> }
>
> +static inline void reset_special(struct kvm_vcpu *vcpu,
> + const struct sys_reg_desc *r)
> +{
> + BUG_ON(!r->reg);
> + BUG_ON(r->reg >= NR_EL2_SPECIAL_REGS);
> + vcpu_el2_sreg(vcpu, r->reg) = r->val;
> +}
> +
> static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
> const struct sys_reg_desc *i2)
> {
> --
> 1.9.1
>
On Tue, Jul 18, 2017 at 11:58:40AM -0500, Jintack Lim wrote:
> When running in virtual EL2 we use the shadow EL1 systerm register array
> for the save/restore process, so that hardware and especially the memory
> subsystem behaves as code written for EL2 expects while really running
> in EL1.
>
> This works great for EL1 system register accesses that we trap, because
> these accesses will be written into the virtual state for the EL1 system
> registers used when eventually switching the VCPU mode to EL1.
>
> However, there was a collection of EL1 system registers which we do not
> trap, and as a consequence all save/restore operations of these
> registers were happening locally in the shadow array, with no benefit to
> software actually running in virtual EL1 at all.
>
> To fix this, simply synchronize the shadow and real EL1 state for these
> registers on entry/exit to/from virtual EL2 state.
>
> Signed-off-by: Christoffer Dall <[email protected]>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/context.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index e965049..e1bc753 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -86,6 +86,58 @@ static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
> s_sys_regs[CPACR_EL1] = cptr_to_cpacr(vcpu_sys_reg(vcpu, CPTR_EL2));
> }
>
> +
> +/*
> + * List of EL0 and EL1 registers which we allow the virtual EL2 mode to access
> + * directly without trapping. This is possible because the impact of
> + * accessing those registers are the same regardless of the exception
> + * levels that are allowed.
I don't understand this last sentence...
> + */
> +static const int el1_non_trap_regs[] = {
> + CNTKCTL_EL1,
> + CSSELR_EL1,
> + PAR_EL1,
> + TPIDR_EL0,
> + TPIDR_EL1,
> + TPIDRRO_EL0
> +};
> +
> +/**
> + * copy_shadow_non_trap_el1_state
> + * @vcpu: The VCPU pointer
> + * @setup: True, if on the way to the guest (called from setup)
should setup be called flush?
then we could do
if (flush) {
...
} else { /* sync */
...
}
> + * False, if returning form the guet (calld from restore)
called
> + *
> + * Some EL1 registers are accessed directly by the virtual EL2 mode because
> + * they in no way affect execution state in virtual EL2. However, we must
> + * still ensure that virtual EL2 observes the same state of the EL1 registers
> + * as the normal VM's EL1 mode, so copy this state as needed on setup/restore.
> + */
Perhaps this could be written more clearly as:
/*
* Synchronize the state of EL1 registers directly accessible by virtual
* EL2 between the shadow sys_regs array and the VCPU's EL1 state
* before/after the world switch code copies the shadow state to/from
* hardware registers.
*/
> +static void copy_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu, bool setup)
> +{
> + u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
> + const int sr = el1_non_trap_regs[i];
> +
> + if (setup)
> + s_sys_regs[sr] = vcpu_sys_reg(vcpu, sr);
> + else
> + vcpu_sys_reg(vcpu, sr) = s_sys_regs[sr];
> + }
> +}
> +
> +static void sync_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu)
> +{
> + copy_shadow_non_trap_el1_state(vcpu, false);
> +}
> +
> +static void flush_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu)
> +{
> + copy_shadow_non_trap_el1_state(vcpu, true);
> +}
> +
> static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
> @@ -162,6 +214,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> if (unlikely(vcpu_mode_el2(vcpu))) {
> flush_shadow_special_regs(vcpu);
> flush_shadow_el1_sysregs(vcpu);
> + flush_shadow_non_trap_el1_state(vcpu);
> ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
> } else {
> flush_special_regs(vcpu);
> @@ -176,9 +229,10 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> */
> void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
> {
> - if (unlikely(vcpu_mode_el2(vcpu)))
> + if (unlikely(vcpu_mode_el2(vcpu))) {
> sync_shadow_special_regs(vcpu);
> - else
> + sync_shadow_non_trap_el1_state(vcpu);
> + } else
> sync_special_regs(vcpu);
> }
>
> --
> 1.9.1
>
On Tue, Jul 18, 2017 at 11:58:46AM -0500, Jintack Lim wrote:
> When HCR.NV bit is set, eret instructions trap to EL2 with EC code 0x1A.
> Emulate eret instructions by setting pc and pstate.
It may be worth noting in the commit message that this is all we have to
do, because the rest of the logic will then discover that the mode could
change from virtual EL2 to EL1 and will setup the hw registers etc. when
changing modes.
>
> Note that the current exception level is always the virtual EL2, since
> we set HCR_EL2.NV bit only when entering the virtual EL2. So, we take
> spsr and elr states from the virtual _EL2 registers.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/include/asm/esr.h | 1 +
> arch/arm64/kvm/handle_exit.c | 16 ++++++++++++++++
> arch/arm64/kvm/trace.h | 21 +++++++++++++++++++++
> 3 files changed, 38 insertions(+)
>
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index e7d8e28..210fde6 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -43,6 +43,7 @@
> #define ESR_ELx_EC_HVC64 (0x16)
> #define ESR_ELx_EC_SMC64 (0x17)
> #define ESR_ELx_EC_SYS64 (0x18)
> +#define ESR_ELx_EC_ERET (0x1A)
> /* Unallocated EC: 0x19 - 0x1E */
> #define ESR_ELx_EC_IMP_DEF (0x1f)
> #define ESR_ELx_EC_IABT_LOW (0x20)
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 17d8a16..9259881 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -147,6 +147,21 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
> return 1;
> }
>
> +static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> + trace_kvm_nested_eret(vcpu, vcpu_el2_sreg(vcpu, ELR_EL2),
> + vcpu_el2_sreg(vcpu, SPSR_EL2));
> +
> + /*
> + * Note that the current exception level is always the virtual EL2,
> + * since we set HCR_EL2.NV bit only when entering the virtual EL2.
> + */
> + *vcpu_pc(vcpu) = vcpu_el2_sreg(vcpu, ELR_EL2);
> + *vcpu_cpsr(vcpu) = vcpu_el2_sreg(vcpu, SPSR_EL2);
> +
> + return 1;
> +}
> +
> static exit_handle_fn arm_exit_handlers[] = {
> [0 ... ESR_ELx_EC_MAX] = kvm_handle_unknown_ec,
> [ESR_ELx_EC_WFx] = kvm_handle_wfx,
> @@ -160,6 +175,7 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
> [ESR_ELx_EC_HVC64] = handle_hvc,
> [ESR_ELx_EC_SMC64] = handle_smc,
> [ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
> + [ESR_ELx_EC_ERET] = kvm_handle_eret,
> [ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
> [ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
> [ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
> index 7c86cfb..5f40987 100644
> --- a/arch/arm64/kvm/trace.h
> +++ b/arch/arm64/kvm/trace.h
> @@ -187,6 +187,27 @@
> TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
> __entry->vcpu, __entry->esr_el2, __entry->pc)
> );
> +
> +TRACE_EVENT(kvm_nested_eret,
> + TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
> + unsigned long spsr_el2),
> + TP_ARGS(vcpu, elr_el2, spsr_el2),
> +
> + TP_STRUCT__entry(
> + __field(struct kvm_vcpu *, vcpu)
> + __field(unsigned long, elr_el2)
> + __field(unsigned long, spsr_el2)
> + ),
> +
> + TP_fast_assign(
> + __entry->vcpu = vcpu;
> + __entry->elr_el2 = elr_el2;
> + __entry->spsr_el2 = spsr_el2;
> + ),
> +
> + TP_printk("vcpu: %p, eret to elr_el2: 0x%016lx, with spsr_el2: 0x%08lx",
> + __entry->vcpu, __entry->elr_el2, __entry->spsr_el2)
> +);
> #endif /* _TRACE_ARM64_KVM_H */
>
> #undef TRACE_INCLUDE_PATH
> --
> 1.9.1
>
Otherwise this patch looks good.
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:47AM -0500, Jintack Lim wrote:
> When HCR.NV bit is set, execution of the EL2 translation regime address
> aranslation instructions and TLB maintenance instructions are trapped to
translation
> EL2. In addition, execution of the EL1 translation regime address
> aranslation instructions and TLB maintenance instructions that are only
translation
> accessible from EL2 and above are trapped to EL2. In these cases,
> ESR_EL2.EC will be set to 0x18.
>
> Change the existing handler to handle those system instructions as well
> as MRS/MSR instructions. Emulation of each system instructions will be
> done in separate patches.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/include/asm/kvm_coproc.h | 2 +-
> arch/arm64/kvm/handle_exit.c | 2 +-
> arch/arm64/kvm/sys_regs.c | 53 ++++++++++++++++++++++++++++++++-----
> arch/arm64/kvm/trace.h | 2 +-
> 4 files changed, 50 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_coproc.h b/arch/arm64/include/asm/kvm_coproc.h
> index 0b52377..1b3d21b 100644
> --- a/arch/arm64/include/asm/kvm_coproc.h
> +++ b/arch/arm64/include/asm/kvm_coproc.h
> @@ -43,7 +43,7 @@ void kvm_register_target_sys_reg_table(unsigned int target,
> int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
> int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
> int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run);
> +int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run);
>
> #define kvm_coproc_table_init kvm_sys_reg_table_init
> void kvm_sys_reg_table_init(void);
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 9259881..d19e253 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -174,7 +174,7 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
> [ESR_ELx_EC_SMC32] = handle_smc,
> [ESR_ELx_EC_HVC64] = handle_hvc,
> [ESR_ELx_EC_SMC64] = handle_smc,
> - [ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
> + [ESR_ELx_EC_SYS64] = kvm_handle_sys,
> [ESR_ELx_EC_ERET] = kvm_handle_eret,
> [ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
> [ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7062645..dbf5022 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1808,6 +1808,40 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
> return 1;
> }
>
> +static int emulate_tlbi(struct kvm_vcpu *vcpu,
> + struct sys_reg_params *params)
> +{
> + /* TODO: support tlbi instruction emulation*/
> + kvm_inject_undefined(vcpu);
> + return 1;
> +}
> +
> +static int emulate_at(struct kvm_vcpu *vcpu,
> + struct sys_reg_params *params)
> +{
> + /* TODO: support address translation instruction emulation */
> + kvm_inject_undefined(vcpu);
> + return 1;
> +}
> +
> +static int emulate_sys_instr(struct kvm_vcpu *vcpu,
> + struct sys_reg_params *params)
> +{
> + int ret = 0;
> +
> + /* TLB maintenance instructions*/
> + if (params->CRn == 0b1000)
> + ret = emulate_tlbi(vcpu, params);
> + /* Address Translation instructions */
> + else if (params->CRn == 0b0111 && params->CRm == 0b1000)
> + ret = emulate_at(vcpu, params);
there are some style issues here. I think it would be nicer to do:
if (x) {
/* Foo */
do_something();
} else if (y) {
/* Bar */
do_something_else();
}
can you remind me why we'd not see any other than these particular two
classes of instructions here?
> +
> + if (ret)
> + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> +
> + return ret;
> +}
> +
> static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
> const struct sys_reg_desc *table, size_t num)
> {
> @@ -1819,18 +1853,19 @@ static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
> }
>
> /**
> - * kvm_handle_sys_reg -- handles a mrs/msr trap on a guest sys_reg access
> + * kvm_handle_sys-- handles a system instruction or mrs/msr instruction trap
> + on a guest execution
> * @vcpu: The VCPU pointer
> * @run: The kvm_run struct
> */
> -int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +int kvm_handle_sys(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> struct sys_reg_params params;
> unsigned long esr = kvm_vcpu_get_hsr(vcpu);
> int Rt = kvm_vcpu_sys_get_rt(vcpu);
> int ret;
>
> - trace_kvm_handle_sys_reg(esr);
> + trace_kvm_handle_sys(esr);
>
> params.is_aarch32 = false;
> params.is_32bit = false;
> @@ -1842,10 +1877,16 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
> params.regval = vcpu_get_reg(vcpu, Rt);
> params.is_write = !(esr & 1);
>
> - ret = emulate_sys_reg(vcpu, ¶ms);
> + if (params.Op0 == 1) {
> + /* System instructions */
> + ret = emulate_sys_instr(vcpu, ¶ms);
> + } else {
> + /* MRS/MSR instructions */
> + ret = emulate_sys_reg(vcpu, ¶ms);
> + if (!params.is_write)
> + vcpu_set_reg(vcpu, Rt, params.regval);
> + }
>
> - if (!params.is_write)
> - vcpu_set_reg(vcpu, Rt, params.regval);
> return ret;
> }
>
> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
> index 5f40987..192708e 100644
> --- a/arch/arm64/kvm/trace.h
> +++ b/arch/arm64/kvm/trace.h
> @@ -134,7 +134,7 @@
> TP_printk("%s %s reg %d (0x%08llx)", __entry->fn, __entry->is_write?"write to":"read from", __entry->reg, __entry->write_value)
> );
>
> -TRACE_EVENT(kvm_handle_sys_reg,
> +TRACE_EVENT(kvm_handle_sys,
> TP_PROTO(unsigned long hsr),
> TP_ARGS(hsr),
>
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:48AM -0500, Jintack Lim wrote:
> VMs used to execute hvc #0 for the psci call if EL3 is not implemented.
> However, when we come to provide the virtual EL2 mode to the VM, the
> host OS inside the VM calls kvm_call_hyp() which is also hvc #0. So,
> it's hard to differentiate between them from the host hypervisor's point
> of view.
This is a bit confusing. I think you should just refer to the fact that
the architecture requires HVC calls to be handled at EL2, and when
emulating EL2 inside the VM, HVC calls from the VM are handled by the VM
itself, and therefore we add the support for SMC as the conduit for PSCI
calls.
>
> So, let the VM execute smc instruction for the psci call. On ARMv8.3,
> even if EL3 is not implemented, a smc instruction executed at non-secure
> EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than being treated as
> UNDEFINED. So, the host hypervisor can handle this psci call without any
> confusion.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/handle_exit.c | 24 ++++++++++++++++++++++--
> 1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index d19e253..6cf6b93 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -53,8 +53,28 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
>
> static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> - kvm_inject_undefined(vcpu);
> - return 1;
> + int ret;
> +
> + /* If imm is non-zero, it's not defined */
> + if (kvm_vcpu_hvc_get_imm(vcpu)) {
> + kvm_inject_undefined(vcpu);
> + return 1;
> + }
> +
> + /*
> + * If imm is zero, it's a psci call.
That's only a necessary, but not sufficient requirement. So we should
say, it may be a PSCI call or we check if it's a PSCI call...
> + * Note that on ARMv8.3, even if EL3 is not implemented, SMC executed
> + * at Non-secure EL1 is trapped to EL2 if HCR_EL2.TSC==1, rather than
> + * being treated as UNDEFINED.
> + */
> + ret = kvm_psci_call(vcpu);
> + if (ret < 0) {
> + kvm_inject_undefined(vcpu);
> + return 1;
> + }
> + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> +
> + return ret;
> }
>
> /*
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:50AM -0500, Jintack Lim wrote:
> Forward exceptions due to WFI or WFE instructions to the virtual EL2 if
> they are not coming from the virtual EL2 and virtual HCR_EL2.TWX is set.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 1 +
> arch/arm64/kvm/handle_exit.c | 13 ++++++++++++-
> arch/arm64/kvm/nested.c | 20 ++++++++++++++++++++
> 3 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 46880c3..53b0b33 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -442,5 +442,6 @@ static inline void __cpu_init_stage2(void)
> int __init kvmarm_nested_cfg(char *buf);
> int init_nested_virt(void);
> bool nested_virt_in_use(struct kvm_vcpu *vcpu);
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe);
>
> #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 8b398b2..25ec824 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -107,7 +107,18 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
> */
> static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> - if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
> + bool is_wfe = !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE);
> +
> + if (nested_virt_in_use(vcpu)) {
> + int ret = handle_wfx_nested(vcpu, is_wfe);
> +
> + if (ret < 0 && ret != -EINVAL)
> + return ret;
> + else if (ret >= 0)
> + return ret;
This is very complicated and you're not documenting the return value of
handle_wfx_nested.
If you get rid of the defensive statement in kvm_inject_nested, you can
turn that one into a void, and handle_wfx_nested can become a bool, and
then this becomes:
if (ret)
return 1;
If for some reason you don't like that, you can still just do:
if (ret == 1)
return 1;
> + }
> +
> + if (is_wfe) {
> trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
> vcpu->stat.wfe_exit_stat++;
> kvm_vcpu_on_spin(vcpu);
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 9a05c76..042d304 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -18,6 +18,8 @@
> #include <linux/kvm.h>
> #include <linux/kvm_host.h>
>
> +#include <asm/kvm_emulate.h>
> +
> static bool nested_param;
>
> int __init kvmarm_nested_cfg(char *buf)
> @@ -41,3 +43,21 @@ bool nested_virt_in_use(struct kvm_vcpu *vcpu)
>
> return false;
> }
> +
> +/*
> + * Inject wfx to the virtual EL2 if this is not from the virtual EL2 and
> + * the virtual HCR_EL2.TWX is set. Otherwise, let the host hypervisor
> + * handle this.
> + */
> +int handle_wfx_nested(struct kvm_vcpu *vcpu, bool is_wfe)
> +{
> + u64 hcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
> +
> + if (vcpu_mode_el2(vcpu))
> + return -EINVAL;
> +
> + if ((is_wfe && (hcr_el2 & HCR_TWE)) || (!is_wfe && (hcr_el2 & HCR_TWI)))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> + return -EINVAL;
> +}
> --
> 1.9.1
>
On Tue, Jul 18, 2017 at 11:58:51AM -0500, Jintack Lim wrote:
> Forward traps due to FP/ASIMD register accesses to the virtual EL2 if
> virtual CPTR_EL2.TFP is set. Note that if TFP bit is set, then even
> accesses to FP/ASIMD register from EL2 as well as NS EL0/1 will trap to
> EL2. So, we don't check the VM's exception level.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kernel/asm-offsets.c | 1 +
> arch/arm64/kvm/handle_exit.c | 15 +++++++++++----
> arch/arm64/kvm/hyp/entry.S | 13 +++++++++++++
> arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
> 4 files changed, 26 insertions(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index b3bb7ef..f5117a3 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -134,6 +134,7 @@ int main(void)
> DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs));
> DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2]));
> DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context));
> + DEFINE(VIRTUAL_CPTR_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[CPTR_EL2]));
> #endif
> #ifdef CONFIG_CPU_PM
> DEFINE(CPU_SUSPEND_SZ, sizeof(struct cpu_suspend_ctx));
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 25ec824..d4e7b2b 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -84,11 +84,18 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
> }
>
> /*
> - * Guest access to FP/ASIMD registers are routed to this handler only
> - * when the system doesn't support FP/ASIMD.
> + * When the system supports FP/ASMID and we are NOT running nested
> + * virtualization, FP/ASMID traps are handled in EL2 directly.
> + * This handler handles the cases those are not belong to the above case.
The parser parses the cases where the sentence are not belong to the
above sentence, and then my head exploted ;)
> */
> -static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +static int kvm_handle_fpasimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
> +
> + /* This is for nested virtualization */
> + if (vcpu_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TFP)
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> + /* This is the case when the system doesn't support FP/ASIMD. */
> kvm_inject_undefined(vcpu);
> return 1;
> }
> @@ -220,7 +227,7 @@ static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
> [ESR_ELx_EC_BREAKPT_LOW]= kvm_handle_guest_debug,
> [ESR_ELx_EC_BKPT32] = kvm_handle_guest_debug,
> [ESR_ELx_EC_BRK64] = kvm_handle_guest_debug,
> - [ESR_ELx_EC_FP_ASIMD] = handle_no_fpsimd,
> + [ESR_ELx_EC_FP_ASIMD] = kvm_handle_fpasimd,
> };
>
> static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 12ee62d..95af673 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -158,6 +158,19 @@ abort_guest_exit_end:
> 1: ret
> ENDPROC(__guest_exit)
>
> +ENTRY(__fpsimd_guest_trap)
> + // If virtual CPTR_EL2.TFP is set, then forward the trap to the
> + // virtual EL2. For the non-nested case, this bit is always 0.
> + mrs x1, tpidr_el2
> + ldr x0, [x1, #VIRTUAL_CPTR_EL2]
> + and x0, x0, #CPTR_EL2_TFP
> + cbnz x0, 1f
> + b __fpsimd_guest_restore
> +1:
> + mov x0, #ARM_EXCEPTION_TRAP
> + b __guest_exit
> +ENDPROC(__fpsimd_guest_trap)
> +
> ENTRY(__fpsimd_guest_restore)
> stp x2, x3, [sp, #-16]!
> stp x4, lr, [sp, #-16]!
> diff --git a/arch/arm64/kvm/hyp/hyp-entry.S b/arch/arm64/kvm/hyp/hyp-entry.S
> index 5170ce1..ab169fd 100644
> --- a/arch/arm64/kvm/hyp/hyp-entry.S
> +++ b/arch/arm64/kvm/hyp/hyp-entry.S
> @@ -113,7 +113,7 @@ el1_trap:
> */
> alternative_if_not ARM64_HAS_NO_FPSIMD
> cmp x0, #ESR_ELx_EC_FP_ASIMD
> - b.eq __fpsimd_guest_restore
> + b.eq __fpsimd_guest_trap
> alternative_else_nop_endif
>
> mrs x1, tpidr_el2
> --
> 1.9.1
>
Otherwise, I think this is correct.
Have you subjected your L1 and L2 VMs to a nice round of paranoia FP
testing?
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:42AM -0500, Jintack Lim wrote:
The subject should be changed to
"KVM: arm64: Support injecting exceptions to virtual EL2"
> Support inject synchronous exceptions to the virtual EL2 as
injecting
> described in ARM ARM AArch64.TakeException().
>
> This can be easily extended to support to inject asynchronous exceptions
> to the virtual EL2, but it will be added in a later patch when appropriate.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm/include/asm/kvm_emulate.h | 7 +++
> arch/arm64/include/asm/kvm_emulate.h | 2 +
> arch/arm64/kvm/Makefile | 1 +
> arch/arm64/kvm/emulate-nested.c | 83 ++++++++++++++++++++++++++++++++++++
> arch/arm64/kvm/trace.h | 20 +++++++++
> 5 files changed, 113 insertions(+)
> create mode 100644 arch/arm64/kvm/emulate-nested.c
>
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 0a03b7d..29a4dec 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -47,6 +47,13 @@ static inline void vcpu_set_reg(struct kvm_vcpu *vcpu, u8 reg_num,
> void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>
> +static inline int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> + kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> + __func__);
> + return -EINVAL;
> +}
> +
> static inline void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu) { };
> static inline void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu) { };
> static inline void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt) { };
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 94f98cc..3017234 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -54,6 +54,8 @@ enum exception_type {
> void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
> void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
>
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2);
> +
> void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
> void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
> void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 5762337..0263ef0 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -37,3 +37,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
> kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>
> kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += emulate-nested.o
> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> new file mode 100644
> index 0000000..48b84cc
> --- /dev/null
> +++ b/arch/arm64/kvm/emulate-nested.c
> @@ -0,0 +1,83 @@
> +/*
> + * Copyright (C) 2016 - Linaro and Columbia University
> + * Author: Jintack Lim <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/kvm.h>
> +#include <linux/kvm_host.h>
> +
> +#include <asm/kvm_emulate.h>
> +
> +#include "trace.h"
> +
> +/* This is borrowed from get_except_vector in inject_fault.c */
not sure about the value of this comment. Is there room for code reuse
or is it just different?
> +static u64 get_el2_except_vector(struct kvm_vcpu *vcpu,
> + enum exception_type type)
> +{
> + u64 exc_offset;
> +
> + switch (*vcpu_cpsr(vcpu) & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
> + case PSR_MODE_EL2t:
> + exc_offset = CURRENT_EL_SP_EL0_VECTOR;
> + break;
> + case PSR_MODE_EL2h:
> + exc_offset = CURRENT_EL_SP_ELx_VECTOR;
> + break;
> + case PSR_MODE_EL1t:
> + case PSR_MODE_EL1h:
> + case PSR_MODE_EL0t:
> + exc_offset = LOWER_EL_AArch64_VECTOR;
> + break;
> + default:
> + kvm_err("Unexpected previous exception level: aarch32\n");
Why?
> + exc_offset = LOWER_EL_AArch32_VECTOR;
> + }
> +
> + return vcpu_sys_reg(vcpu, VBAR_EL2) + exc_offset + type;
> +}
> +
> +/*
> + * Emulate taking an exception to EL2.
> + * See ARM ARM J8.1.2 AArch64.TakeException()
> + */
> +static int kvm_inject_nested(struct kvm_vcpu *vcpu, u64 esr_el2,
> + enum exception_type type)
> +{
> + int ret = 1;
> +
> + if (!nested_virt_in_use(vcpu)) {
> + kvm_err("Unexpected call to %s for the non-nesting configuration\n",
> + __func__);
> + return -EINVAL;
> + }
This feels like a strange assert like check. Why are we being defensive
at this point?
> +
> + vcpu_el2_sreg(vcpu, SPSR_EL2) = *vcpu_cpsr(vcpu);
> + vcpu_el2_sreg(vcpu, ELR_EL2) = *vcpu_pc(vcpu);
> + vcpu_sys_reg(vcpu, ESR_EL2) = esr_el2;
> +
> + *vcpu_pc(vcpu) = get_el2_except_vector(vcpu, type);
> + /* On an exception, PSTATE.SP becomes 1 */
> + *vcpu_cpsr(vcpu) = PSR_MODE_EL2h;
> + *vcpu_cpsr(vcpu) |= (PSR_A_BIT | PSR_F_BIT | PSR_I_BIT | PSR_D_BIT);
> +
> + trace_kvm_inject_nested_exception(vcpu, esr_el2, *vcpu_pc(vcpu));
> +
> + return ret;
> +}
> +
> +int kvm_inject_nested_sync(struct kvm_vcpu *vcpu, u64 esr_el2)
> +{
> + return kvm_inject_nested(vcpu, esr_el2, except_type_sync);
> +}
> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
> index 7fb0008..7c86cfb 100644
> --- a/arch/arm64/kvm/trace.h
> +++ b/arch/arm64/kvm/trace.h
> @@ -167,6 +167,26 @@
> );
>
>
> +TRACE_EVENT(kvm_inject_nested_exception,
> + TP_PROTO(struct kvm_vcpu *vcpu, unsigned long esr_el2,
> + unsigned long pc),
> + TP_ARGS(vcpu, esr_el2, pc),
> +
> + TP_STRUCT__entry(
> + __field(struct kvm_vcpu *, vcpu)
> + __field(unsigned long, esr_el2)
> + __field(unsigned long, pc)
> + ),
> +
> + TP_fast_assign(
> + __entry->vcpu = vcpu;
> + __entry->esr_el2 = esr_el2;
> + __entry->pc = pc;
> + ),
> +
> + TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
> + __entry->vcpu, __entry->esr_el2, __entry->pc)
> +);
> #endif /* _TRACE_ARM64_KVM_H */
>
> #undef TRACE_INCLUDE_PATH
> --
> 1.9.1
>
Otherwise looks good.
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:30AM -0500, Jintack Lim wrote:
> Nested virtualizaion is in use only if all three conditions are met:
> - The architecture supports nested virtualization.
> - The kernel parameter is set.
> - The userspace uses nested virtualiztion feature.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm/include/asm/kvm_host.h | 11 +++++++++++
> arch/arm64/include/asm/kvm_host.h | 2 ++
> arch/arm64/kvm/nested.c | 17 +++++++++++++++++
> virt/kvm/arm/arm.c | 4 ++++
> 4 files changed, 34 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 00b0f97..7e9e6c8 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -303,4 +303,15 @@ static inline int __init kvmarm_nested_cfg(char *buf)
> {
> return 0;
> }
> +
> +static inline int init_nested_virt(void)
> +{
> + return 0;
> +}
> +
> +static inline bool nested_virt_in_use(struct kvm_vcpu *vcpu)
> +{
> + return false;
> +}
> +
> #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 6df0c7c..86d4b6c 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -387,5 +387,7 @@ static inline void __cpu_init_stage2(void)
> }
>
> int __init kvmarm_nested_cfg(char *buf);
> +int init_nested_virt(void);
> +bool nested_virt_in_use(struct kvm_vcpu *vcpu);
>
> #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 79f38da..9a05c76 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
> @@ -24,3 +24,20 @@ int __init kvmarm_nested_cfg(char *buf)
> {
> return strtobool(buf, &nested_param);
> }
> +
> +int init_nested_virt(void)
> +{
> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
> + kvm_info("Nested virtualization is supported\n");
> +
> + return 0;
> +}
> +
> +bool nested_virt_in_use(struct kvm_vcpu *vcpu)
> +{
> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT)
> + && test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
> + return true;
you could initialize a bool in init_nested_virt which you then check
here to avoid duplicating the logic.
> +
> + return false;
> +}
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 1c1c772..36aae3a 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1478,6 +1478,10 @@ int kvm_arch_init(void *opaque)
> if (err)
> goto out_err;
>
> + err = init_nested_virt();
> + if (err)
> + return err;
> +
> err = init_subsystems();
> if (err)
> goto out_hyp;
> --
> 1.9.1
>
On Tue, Jul 18, 2017 at 11:58:54AM -0500, Jintack Lim wrote:
> With HCR_EL2.NV bit set, accesses to EL12 registers in the virtual EL2
> trap to EL2. Handle those traps just like we do for EL1 registers.
>
> One exception is CNTKCTL_EL12. We don't trap on CNTKCTL_EL1 for non-VHE
> virtual EL2 because we don't have to. However, accessing CNTKCTL_EL12
> will trap since it's one of the EL12 registers controlled by HCR_EL2.NV
> bit. Therefore, add a handler for it and don't treat it as a
> non-trap-registers when preparing a shadow context.
I'm sorry, I don't remember the details, and I don't understand from
this paragraph what the difference between CNTKCTL_EL12 and the other
EL12 registers is?
>
> Move EL12 system register macros to a common place to reuse them.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/include/asm/kvm_hyp.h | 24 ------------------------
> arch/arm64/include/asm/sysreg.h | 24 ++++++++++++++++++++++++
> arch/arm64/kvm/context.c | 7 +++++++
> arch/arm64/kvm/sys_regs.c | 25 +++++++++++++++++++++++++
> 4 files changed, 56 insertions(+), 24 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
> index 4572a9b..353b895 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -73,30 +73,6 @@
> #define read_sysreg_el1(r) read_sysreg_elx(r, _EL1, _EL12)
> #define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)
>
> -/* The VHE specific system registers and their encoding */
> -#define sctlr_EL12 sys_reg(3, 5, 1, 0, 0)
> -#define cpacr_EL12 sys_reg(3, 5, 1, 0, 2)
> -#define ttbr0_EL12 sys_reg(3, 5, 2, 0, 0)
> -#define ttbr1_EL12 sys_reg(3, 5, 2, 0, 1)
> -#define tcr_EL12 sys_reg(3, 5, 2, 0, 2)
> -#define afsr0_EL12 sys_reg(3, 5, 5, 1, 0)
> -#define afsr1_EL12 sys_reg(3, 5, 5, 1, 1)
> -#define esr_EL12 sys_reg(3, 5, 5, 2, 0)
> -#define far_EL12 sys_reg(3, 5, 6, 0, 0)
> -#define mair_EL12 sys_reg(3, 5, 10, 2, 0)
> -#define amair_EL12 sys_reg(3, 5, 10, 3, 0)
> -#define vbar_EL12 sys_reg(3, 5, 12, 0, 0)
> -#define contextidr_EL12 sys_reg(3, 5, 13, 0, 1)
> -#define cntkctl_EL12 sys_reg(3, 5, 14, 1, 0)
> -#define cntp_tval_EL02 sys_reg(3, 5, 14, 2, 0)
> -#define cntp_ctl_EL02 sys_reg(3, 5, 14, 2, 1)
> -#define cntp_cval_EL02 sys_reg(3, 5, 14, 2, 2)
> -#define cntv_tval_EL02 sys_reg(3, 5, 14, 3, 0)
> -#define cntv_ctl_EL02 sys_reg(3, 5, 14, 3, 1)
> -#define cntv_cval_EL02 sys_reg(3, 5, 14, 3, 2)
> -#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
> -#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
> -
> /**
> * hyp_alternate_select - Generates patchable code sequences that are
> * used to switch between two implementations of a function, depending
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index b01c608..b8d4d0c 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -338,6 +338,30 @@
> #define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
> #define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
>
> +/* The VHE specific system registers and their encoding */
> +#define sctlr_EL12 sys_reg(3, 5, 1, 0, 0)
> +#define cpacr_EL12 sys_reg(3, 5, 1, 0, 2)
> +#define ttbr0_EL12 sys_reg(3, 5, 2, 0, 0)
> +#define ttbr1_EL12 sys_reg(3, 5, 2, 0, 1)
> +#define tcr_EL12 sys_reg(3, 5, 2, 0, 2)
> +#define afsr0_EL12 sys_reg(3, 5, 5, 1, 0)
> +#define afsr1_EL12 sys_reg(3, 5, 5, 1, 1)
> +#define esr_EL12 sys_reg(3, 5, 5, 2, 0)
> +#define far_EL12 sys_reg(3, 5, 6, 0, 0)
> +#define mair_EL12 sys_reg(3, 5, 10, 2, 0)
> +#define amair_EL12 sys_reg(3, 5, 10, 3, 0)
> +#define vbar_EL12 sys_reg(3, 5, 12, 0, 0)
> +#define contextidr_EL12 sys_reg(3, 5, 13, 0, 1)
> +#define cntkctl_EL12 sys_reg(3, 5, 14, 1, 0)
> +#define cntp_tval_EL02 sys_reg(3, 5, 14, 2, 0)
> +#define cntp_ctl_EL02 sys_reg(3, 5, 14, 2, 1)
> +#define cntp_cval_EL02 sys_reg(3, 5, 14, 2, 2)
> +#define cntv_tval_EL02 sys_reg(3, 5, 14, 3, 0)
> +#define cntv_ctl_EL02 sys_reg(3, 5, 14, 3, 1)
> +#define cntv_cval_EL02 sys_reg(3, 5, 14, 3, 2)
> +#define spsr_EL12 sys_reg(3, 5, 4, 0, 0)
> +#define elr_EL12 sys_reg(3, 5, 4, 0, 1)
> +
> #define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)
>
> /* Common SCTLR_ELx flags. */
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index e1bc753..f3d3398 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -121,6 +121,13 @@ static void copy_shadow_non_trap_el1_state(struct kvm_vcpu *vcpu, bool setup)
> for (i = 0; i < ARRAY_SIZE(el1_non_trap_regs); i++) {
> const int sr = el1_non_trap_regs[i];
>
> + /*
> + * We trap on cntkctl_el12 accesses from virtual EL2 as suppose
as opposed to ?
> + * to not trapping on cntlctl_el1 accesses.
> + */
> + if (vcpu_el2_e2h_is_set(vcpu) && sr == CNTKCTL_EL1)
> + continue;
> +
If the guest can still access CNTHCTL_EL2 via the CNTKCTL_EL1 system
regsiter access encoding without trapping, why is the don't we need to
copy this here?
Is the point that for a VHE guest, we don't copy vcpu_sys_reg(vcpu,
CNTKCTL_EL1) to the hardware CNTKCTL_EL1, but we copy vcpu_sys_reg(vcpu,
CNTHCTL_EL2) into CNTKCTL_EL1 during the world switch instead?
Thanks,
-Christoffer
> if (setup)
> s_sys_regs[sr] = vcpu_sys_reg(vcpu, sr);
> else
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index b3e0cb8..2aa922c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -905,6 +905,14 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
> *sysreg = p->regval;
> }
>
> +static bool access_cntkctl_el12(struct kvm_vcpu *vcpu,
> + struct sys_reg_params *p,
> + const struct sys_reg_desc *r)
> +{
> + access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> + return true;
> +}
> +
> static u64 *get_special_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> {
> u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> @@ -1201,6 +1209,23 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
> { SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
> { SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },
>
> + { SYS_DESC(sctlr_EL12), access_vm_reg, reset_val, SCTLR_EL1, 0x00C50078 },
> + { SYS_DESC(cpacr_EL12), access_cpacr, reset_val, CPACR_EL1, 0 },
> + { SYS_DESC(ttbr0_EL12), access_vm_reg, reset_unknown, TTBR0_EL1 },
> + { SYS_DESC(ttbr1_EL12), access_vm_reg, reset_unknown, TTBR1_EL1 },
> + { SYS_DESC(tcr_EL12), access_vm_reg, reset_val, TCR_EL1, 0 },
> + { SYS_DESC(spsr_EL12), access_spsr},
> + { SYS_DESC(elr_EL12), access_elr},
> + { SYS_DESC(afsr0_EL12), access_vm_reg, reset_unknown, AFSR0_EL1 },
> + { SYS_DESC(afsr1_EL12), access_vm_reg, reset_unknown, AFSR1_EL1 },
> + { SYS_DESC(esr_EL12), access_vm_reg, reset_unknown, ESR_EL1 },
> + { SYS_DESC(far_EL12), access_vm_reg, reset_unknown, FAR_EL1 },
> + { SYS_DESC(mair_EL12), access_vm_reg, reset_unknown, MAIR_EL1 },
> + { SYS_DESC(amair_EL12), access_vm_reg, reset_amair_el1, AMAIR_EL1 },
> + { SYS_DESC(vbar_EL12), access_vbar, reset_val, VBAR_EL1, 0 },
> + { SYS_DESC(contextidr_EL12), access_vm_reg, reset_val, CONTEXTIDR_EL1, 0 },
> + { SYS_DESC(cntkctl_EL12), access_cntkctl_el12, reset_val, CNTKCTL_EL1, 0 },
> +
> { SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
> };
>
> --
> 1.9.1
>
On Tue, Jul 18, 2017 at 11:58:55AM -0500, Jintack Lim wrote:
nit: The subject is a little hard to understand.
> On VHE systems, EL0 of the host kernel is considered as a part of 'VHE
> host'; The execution of EL0 is affected by system registers set by the
> VHE kernel including the hypervisor. To emulate this for a VM, we use
> the same set of system registers (i.e. shadow registers) for the virtual
> EL2 and EL0 execution.
when the VM sets HCR_EL2.TGE and HCR_EL2.E2H.
>
> Note that the assumption so far is that a hypervisor in a VM always runs
> in the virtual EL2, and the exception level change from/to the virtual
> EL2 always goes through the host hypervisor. With VHE support for a VM,
> however, the exception level can be changed from EL0 to virtual EL2
> without trapping to the host hypervisor. So, when returning from the VHE
> host mode, set the vcpu mode depending on the physical exception level.
I think there are two changes in this patch which aren't described
properly in the commit message.
First, on entry to a VM that runs in hypervisor context, virtual EL2 or
EL0 with virtual TGE+E2H, we have to either set the physical CPU mode
to EL1 or EL0, for the two cases respectively, where before we would
only ever run in virtual EL2 and would always choose EL1.
Second, on exit from a VM that runs in hypervisor context, virtual EL2 or
EL0 with virtual TGE+E2H, we can no longer assume that we run in virtual
El2, but must consider the hardware state to understand if the exception
from the VM happened from virtual EL2 or from EL0 in the guest
hypervisor's context.
Maybe that helps.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/context.c | 36 ++++++++++++++++++++++--------------
> 1 file changed, 22 insertions(+), 14 deletions(-)
>
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index f3d3398..39bd92d 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -150,16 +150,18 @@ static void flush_shadow_special_regs(struct kvm_vcpu *vcpu)
> struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>
> ctxt->hw_pstate = *vcpu_cpsr(vcpu) & ~PSR_MODE_MASK;
> - /*
> - * We can emulate the guest's configuration of which
> - * stack pointer to use when executing in virtual EL2 by
> - * using the equivalent feature in EL1 to point to
> - * either the EL1 or EL0 stack pointer.
> - */
> - if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
> - ctxt->hw_pstate |= PSR_MODE_EL1h;
> - else
> - ctxt->hw_pstate |= PSR_MODE_EL1t;
> + if (vcpu_mode_el2(vcpu)) {
> + /*
> + * We can emulate the guest's configuration of which
> + * stack pointer to use when executing in virtual EL2 by
> + * using the equivalent feature in EL1 to point to
> + * either the EL1 or EL0 stack pointer.
> + */
> + if ((*vcpu_cpsr(vcpu) & PSR_MODE_MASK) == PSR_MODE_EL2h)
> + ctxt->hw_pstate |= PSR_MODE_EL1h;
> + else
> + ctxt->hw_pstate |= PSR_MODE_EL1t;
> + }
This looks funny, because now you don't set a mode unless
vcpu_mode_el2(vcpu) is true, which happens to work because the only
other choice is PSR_MODE_EL0t which happens to be 0.
>
> ctxt->hw_sys_regs = ctxt->shadow_sys_regs;
> ctxt->hw_sp_el1 = vcpu_el2_sreg(vcpu, SP_EL2);
> @@ -182,8 +184,14 @@ static void sync_shadow_special_regs(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>
> - *vcpu_cpsr(vcpu) &= PSR_MODE_MASK;
> - *vcpu_cpsr(vcpu) |= ctxt->hw_pstate & ~PSR_MODE_MASK;
> + *vcpu_cpsr(vcpu) = ctxt->hw_pstate;
> + *vcpu_cpsr(vcpu) &= ~PSR_MODE_MASK;
> + /* Set vcpu exception level depending on the physical EL */
> + if ((ctxt->hw_pstate & PSR_MODE_MASK) == PSR_MODE_EL0t)
> + *vcpu_cpsr(vcpu) |= PSR_MODE_EL0t;
> + else
> + *vcpu_cpsr(vcpu) |= PSR_MODE_EL2h;
> +
don't you need to distinguish between PSR_MODE_EL2h and PSR_MODE_EL2t
here?
> vcpu_el2_sreg(vcpu, SP_EL2) = ctxt->hw_sp_el1;
> vcpu_el2_sreg(vcpu, ELR_EL2) = ctxt->hw_elr_el1;
> vcpu_el2_sreg(vcpu, SPSR_EL2) = ctxt->hw_spsr_el1;
> @@ -218,7 +226,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpu_context *ctxt = &vcpu->arch.ctxt;
>
> - if (unlikely(vcpu_mode_el2(vcpu))) {
> + if (unlikely(is_hyp_ctxt(vcpu))) {
> flush_shadow_special_regs(vcpu);
> flush_shadow_el1_sysregs(vcpu);
> flush_shadow_non_trap_el1_state(vcpu);
> @@ -236,7 +244,7 @@ void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu)
> */
> void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
> {
> - if (unlikely(vcpu_mode_el2(vcpu))) {
> + if (unlikely(is_hyp_ctxt(vcpu))) {
> sync_shadow_special_regs(vcpu);
> sync_shadow_non_trap_el1_state(vcpu);
> } else
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:56AM -0500, Jintack Lim wrote:
> When the virtual E2H bit is set, we can support EL2 register accesses
> via EL1 registers from the virtual EL2 by doing trap-and-emulate. A
> better alternative, however, is to allow the virtual EL2 to access EL2
> register states without trap. This can be easily achieved by not traping
> EL1 registers since those registers already have EL2 register states.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/hyp/switch.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index d513da9..fffd0c7 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -74,6 +74,7 @@ static hyp_alternate_select(__activate_traps_arch,
> static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> {
> u64 val;
> + u64 vhcr_el2;
>
> /*
> * We are about to set CPTR_EL2.TFP to trap all floating point
> @@ -89,8 +90,26 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> write_sysreg(1 << 30, fpexc32_el2);
> isb();
> }
> - if (vcpu_mode_el2(vcpu))
> - val |= HCR_TVM | HCR_TRVM;
> +
> + if (is_hyp_ctxt(vcpu)) {
> + /*
> + * For a guest hypervisor on v8.0, trap and emulate the EL1
this should be for a non-VHE guest hypervisor, or a guest hypervisor
which doesn't set the E2H bit.
> + * virtual memory control register accesses.
> + */
> + if (!vcpu_el2_e2h_is_set(vcpu))
> + val |= HCR_TVM | HCR_TRVM;
> + /*
> + * For a guest hypervisor on v8.1 (VHE), allow to access the
Similarly here, it's not about the architecture level (you can have
kernels that er v8.3 aware as both the host and guest and run on v8.3
hardware, but still both of these cases are relevant.
> + * EL1 virtual memory control registers natively. These accesses
> + * are to access EL2 register states.
> + * Note that we stil need to respect the virtual HCR_EL2 state.
still
> + */
So this part could become:
/*
* For a VHE guest hypervisor, we allow it to access EL1
* virtual memory control registers directly.
*/
I don't actually understand why we want to respect the HCR_TVM and
HCR_TRVM bits when running in vEL2 ? Isn't that only if the VM runs in
EL0 in the hyp context ?
> + else {
> + vhcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
> + val |= vhcr_el2 & (HCR_TVM | HCR_TRVM);
> + }
> + }
> +
There are also some style issues here, I would prefer:
if (foo) {
/* Foo */
foo();
} else {
/* Bar */
bar();
}
> write_sysreg(val, hcr_el2);
> /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
> write_sysreg(1 << 15, hstr_el2);
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:57AM -0500, Jintack Lim wrote:
In the subject: s/virtual E2H bit enabled/virtual E2H bit is set/
> When creating the shadow context for the virtual EL2 execution, we can
> directly copy the EL2 register states to the shadow EL1 register states
> if the virtual HCR_EL2.E2H bit is set. This is because EL1 and EL2
> system register formats compatible with E2H=1.
are compatible when HCR_EL2.E2H==1.
>
> Now that we allow the virtual EL2 modify its EL2 registers without trap
to modify
without trapping, via...
> via the physical EL1 system register accesses, we need to reflect the
> changes made to the EL1 system registers to the virtual EL2 register
> states. This is not required to the virtual EL2 without VHE, since the
for virtual EL2 without...
> virtual EL2 should always use _EL2 accessors, which traps to EL2.
s/should always use/always uses/
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/context.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 66 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index 39bd92d..9947bc8 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -39,6 +39,27 @@ struct el1_el2_map {
> { VBAR_EL1, VBAR_EL2 },
> };
>
> +/*
> + * List of pair of EL1/EL2 registers which are used to access real EL2
> + * registers in EL2 with E2H bit set.
in EL1?
Maybe you can just say:
/*
* List of system registers that can be directly mapped between VHE
* EL2 system registers and EL1 system registers.
*/
> + */
> +static const struct el1_el2_map vhe_map[] = {
> + { SCTLR_EL1, SCTLR_EL2 },
> + { CPACR_EL1, CPTR_EL2 },
> + { TTBR0_EL1, TTBR0_EL2 },
> + { TTBR1_EL1, TTBR1_EL2 },
> + { TCR_EL1, TCR_EL2},
> + { AFSR0_EL1, AFSR0_EL2 },
> + { AFSR1_EL1, AFSR1_EL2 },
> + { ESR_EL1, ESR_EL2},
> + { FAR_EL1, FAR_EL2},
> + { MAIR_EL1, MAIR_EL2 },
> + { AMAIR_EL1, AMAIR_EL2 },
> + { VBAR_EL1, VBAR_EL2 },
> + { CONTEXTIDR_EL1, CONTEXTIDR_EL2 },
> + { CNTKCTL_EL1, CNTHCTL_EL2 },
> +};
> +
> static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
> {
> return ((tcr_el2 & TCR_EL2_PS_MASK) >> TCR_EL2_PS_SHIFT)
> @@ -57,7 +78,27 @@ static inline u64 cptr_to_cpacr(u64 cptr_el2)
> return cpacr_el1;
> }
>
> -static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
> +static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
> +{
> + u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
> + int i;
> +
> + /*
> + * In the virtual EL2 without VHE no EL1 system registers can't be
no other EL1 system register then el1_non_trap_regs[] can be changed
without trapping to the host hypervisor
> + * changed without trap except el1_non_trap_regs[]. So we have nothing
> + * to sync on exit from a guest.
> + */
> + if (!vcpu_el2_e2h_is_set(vcpu))
> + return;
> +
> + for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
> + const struct el1_el2_map *map = &vhe_map[i];
> +
> + vcpu_sys_reg(vcpu, map->el2) = s_sys_regs[map->el1];
> + }
> +}
> +
> +static void flush_shadow_el1_sysregs_nvhe(struct kvm_vcpu *vcpu)
> {
> u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
> u64 tcr_el2;
> @@ -86,6 +127,29 @@ static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
> s_sys_regs[CPACR_EL1] = cptr_to_cpacr(vcpu_sys_reg(vcpu, CPTR_EL2));
> }
>
> +static void flush_shadow_el1_sysregs_vhe(struct kvm_vcpu *vcpu)
> +{
> + u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
> + int i;
> +
> + /*
> + * When e2h bit is set, EL2 registers becomes compatible
> + * with corrensponding EL1 registers. So, no conversion required.
> + */
> + for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
> + const struct el1_el2_map *map = &vhe_map[i];
> +
> + s_sys_regs[map->el1] = vcpu_sys_reg(vcpu, map->el2);
> + }
> +}
> +
> +static void flush_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
> +{
> + if (vcpu_el2_e2h_is_set(vcpu))
> + flush_shadow_el1_sysregs_vhe(vcpu);
> + else
> + flush_shadow_el1_sysregs_nvhe(vcpu);
> +}
>
> /*
> * List of EL0 and EL1 registers which we allow the virtual EL2 mode to access
> @@ -247,6 +311,7 @@ void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu)
> if (unlikely(is_hyp_ctxt(vcpu))) {
> sync_shadow_special_regs(vcpu);
> sync_shadow_non_trap_el1_state(vcpu);
> + sync_shadow_el1_sysregs(vcpu);
> } else
> sync_special_regs(vcpu);
> }
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:58AM -0500, Jintack Lim wrote:
> While the EL1 virtual memory control registers can be accessed in the
> virtual EL2 with VHE without trap to manuplate the virtual EL2 states,
> we can't do that for CPTR_EL2 for an unfortunate reason.
>
> This is because the top bit of CPTR_EL2, which is TCPAC, will be ignored
> if it is accessed via CPACR_EL1 in the virtual EL2 without trap since
> the top bot of cpacr_el1 is RES0. Therefore we need to trap CPACR_EL1
top bit ?
> accesses from the virtual EL2 to emulate this bit correctly.
>
> Set CPTR_EL2.TCPAC bit to trap CPACR_EL1 accesses and handle them in the
> existing handler considering that they could be meant to access CPTR_EL2
> instead in the virtual EL2 with VHE.
>
> Note that CPTR_EL2 format depends on HCR_EL2.E2H bit. We always keep it
> in v8.0 format for the convenience. Otherwise, we need to check E2H bit
> and use different bit masks in the entry.S, and we also check E2H bit in
> all places we access virtual CPTR_EL2. The downside of using v8.0 format
> is to convert the format when copying states between CPTR_EL2 and
> CPACR_EL1 to support the virtual EL2 with VHE. The decision is subject
> to change depending on the future discussion.
I would remove the last sentence here for the actual commit message,
that is already implied by sending these patches for review.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/include/asm/kvm_emulate.h | 2 ++
> arch/arm64/kvm/context.c | 29 ++++++++++++++++++++++++++---
> arch/arm64/kvm/hyp/switch.c | 2 ++
> arch/arm64/kvm/sys_regs.c | 18 +++++++++++++++++-
> 4 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 68aafbd..4776bfc 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -59,6 +59,8 @@ enum exception_type {
> void kvm_arm_setup_shadow_state(struct kvm_vcpu *vcpu);
> void kvm_arm_restore_shadow_state(struct kvm_vcpu *vcpu);
> void kvm_arm_init_cpu_context(kvm_cpu_context_t *cpu_ctxt);
> +u64 cptr_to_cpacr(u64 cptr_el2);
> +u64 cpacr_to_cptr(u64 cpacr_el1);
>
> static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> {
> diff --git a/arch/arm64/kvm/context.c b/arch/arm64/kvm/context.c
> index 9947bc8..a7811e1 100644
> --- a/arch/arm64/kvm/context.c
> +++ b/arch/arm64/kvm/context.c
> @@ -66,7 +66,7 @@ static inline u64 tcr_el2_ips_to_tcr_el1_ps(u64 tcr_el2)
> << TCR_IPS_SHIFT;
> }
>
> -static inline u64 cptr_to_cpacr(u64 cptr_el2)
> +u64 cptr_to_cpacr(u64 cptr_el2)
> {
> u64 cpacr_el1 = 0;
>
> @@ -78,6 +78,21 @@ static inline u64 cptr_to_cpacr(u64 cptr_el2)
> return cpacr_el1;
> }
>
> +u64 cpacr_to_cptr(u64 cpacr_el1)
> +{
> + u64 cptr_el2;
> +
> + cptr_el2 = CPTR_EL2_DEFAULT;
> + if (!(cpacr_el1 & CPACR_EL1_FPEN))
> + cptr_el2 |= CPTR_EL2_TFP;
> + if (cpacr_el1 & CPACR_EL1_TTA)
> + cptr_el2 |= CPTR_EL2_TTA;
> + if (cpacr_el1 & CPTR_EL2_TCPAC)
> + cptr_el2 |= CPTR_EL2_TCPAC;
> +
> + return cptr_el2;
> +}
> +
> static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
> {
> u64 *s_sys_regs = vcpu->arch.ctxt.shadow_sys_regs;
> @@ -93,8 +108,12 @@ static void sync_shadow_el1_sysregs(struct kvm_vcpu *vcpu)
>
> for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
> const struct el1_el2_map *map = &vhe_map[i];
> + u64 *el2_reg = &vcpu_sys_reg(vcpu, map->el2);
>
> - vcpu_sys_reg(vcpu, map->el2) = s_sys_regs[map->el1];
> + /* We do trap-and-emulate CPACR_EL1 accesses. So, don't sync */
> + if (map->el2 == CPTR_EL2)
> + continue;
> + *el2_reg = s_sys_regs[map->el1];
> }
> }
>
> @@ -138,8 +157,12 @@ static void flush_shadow_el1_sysregs_vhe(struct kvm_vcpu *vcpu)
> */
> for (i = 0; i < ARRAY_SIZE(vhe_map); i++) {
> const struct el1_el2_map *map = &vhe_map[i];
> + u64 *el1_reg = &s_sys_regs[map->el1];
>
> - s_sys_regs[map->el1] = vcpu_sys_reg(vcpu, map->el2);
> + if (map->el2 == CPTR_EL2)
> + *el1_reg = cptr_to_cpacr(vcpu_sys_reg(vcpu, map->el2));
> + else
> + *el1_reg = vcpu_sys_reg(vcpu, map->el2);
nit: you could add a translation function to the map array and call that
if it's set, otherwise default to copying values as they are, something
like:
if (map->translate)
*el1_reg = map->translate(vcpu_sys_reg(vcpu, map->el2));
else
*el1_reg = vcpu_sys_reg(vcpu, map->el2);
> }
> }
>
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index fffd0c7..50c90f2 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -50,6 +50,8 @@ static void __hyp_text __activate_traps_vhe(struct kvm_vcpu *vcpu)
> val = read_sysreg(cpacr_el1);
> val |= CPACR_EL1_TTA;
> val &= ~CPACR_EL1_FPEN;
> + if (is_hyp_ctxt(vcpu))
> + val |= CPTR_EL2_TCPAC;
also, I think we'll forget why this gets set for hyp context here, so a
short comment would be nice.
what if the guest hypervisor has set CPTR_EL2.TCPAC and runs a VM don't
we also need to set the CPTR_EL2.TCPAC in the hardware and forward the
exception to the VM in that case?
> write_sysreg(val, cpacr_el1);
>
> write_sysreg(__kvm_hyp_vector, vbar_el1);
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 2aa922c..79980be 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -972,7 +972,23 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> {
> - access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> + u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
> +
> + /*
> + * When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
> + * in the virtual EL2 is to access CPTR_EL2.
> + */
> + if (vcpu_el2_e2h_is_set(vcpu) && (reg == SYS_CPACR_EL1)) {
you don't check here if we're in virtual el2 mode, because you rely on
only ever getting here if we had is_hyp_ctxt() when entering the VM,
right?
> + u64 *sysreg = &vcpu_sys_reg(vcpu, CPTR_EL2);
> +
> + /* We keep the value in ARMv8.0 CPTR_EL2 format. */
> + if (!p->is_write)
> + p->regval = cptr_to_cpacr(*sysreg);
> + else
> + *sysreg = cpacr_to_cptr(p->regval);
> + } else /* CPACR_EL1 access with E2H == 0 or CPACR_EL12 access */
> + access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> +
again, I think you can improve your commenting style to make it clear
which comment belongs to which block and only put a comment above the
entire if-statement if it applies to the logic as a whole.
the coding style also prefers that you use braces in both branches if
only one of the branches is a single statement.
> return true;
> }
>
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:58:59AM -0500, Jintack Lim wrote:
> Now that the virtual EL2 can access EL2 register states via EL1
> registers, we need to consider it when selecting the register to
> emulate.
I don't really understand what this patch does from the commit message.
>From looking at the code, is trying to cater for the case where the
guest hypervisor configures the virtual hardware to trap on memory
control register accesses (for example during VM boot to solve the cache
issues) and we correspondingly set the VM control register trap bits,
and when actually handling the trap we need to take special care in the
VHE guest hypervisor case, because we'll have to redirect the register
access to the vitual EL2 registers instead of the virtual EL1 registers?
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/sys_regs.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 44 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 79980be..910b50d 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -110,6 +110,31 @@ static bool access_dcsw(struct kvm_vcpu *vcpu,
> return true;
> }
>
> +struct el1_el2_map {
> + int el1;
> + int el2;
> +};
> +
> +static const struct el1_el2_map vm_map[] = {
> + {SCTLR_EL1, SCTLR_EL2},
> + {TTBR0_EL1, TTBR0_EL2},
> + {TTBR1_EL1, TTBR1_EL2},
> + {TCR_EL1, TCR_EL2},
> + {ESR_EL1, ESR_EL2},
> + {FAR_EL1, FAR_EL2},
> + {AFSR0_EL1, AFSR0_EL2},
> + {AFSR1_EL1, AFSR1_EL2},
> + {MAIR_EL1, MAIR_EL2},
> + {AMAIR_EL1, AMAIR_EL2},
> + {CONTEXTIDR_EL1, CONTEXTIDR_EL2},
> +};
> +
> +static inline bool el12_reg(struct sys_reg_params *p)
let's call this is_el12_instr
> +{
> + /* All *_EL12 registers have Op1=5. */
s/registers/access instructions/
> + return (p->Op1 == 5);
> +}
> +
> /*
> * Generic accessor for VM registers. Only called as long as HCR_TVM
> * is set. If the guest enables the MMU, we stop trapping the VM
> @@ -120,16 +145,33 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
> const struct sys_reg_desc *r)
> {
> bool was_enabled = vcpu_has_cache_enabled(vcpu);
> + u64 *sysreg = &vcpu_sys_reg(vcpu, r->reg);
> + int i;
> + const struct el1_el2_map *map;
> +
> + /*
> + * Redirect EL1 register accesses to the corresponding EL2 registers if
> + * they are meant to access EL2 registers.
> + */
> + if (vcpu_el2_e2h_is_set(vcpu) && !el12_reg(p)) {
> + for (i = 0; i < ARRAY_SIZE(vm_map); i++) {
> + map = &vm_map[i];
> + if (map->el1 == r->reg) {
> + sysreg = &vcpu_sys_reg(vcpu, map->el2);
> + break;
> + }
> + }
> + }
>
> BUG_ON(!vcpu_mode_el2(vcpu) && !p->is_write);
>
> if (!p->is_write) {
> - p->regval = vcpu_sys_reg(vcpu, r->reg);
> + p->regval = *sysreg;
> return true;
> }
>
> if (!p->is_aarch32) {
> - vcpu_sys_reg(vcpu, r->reg) = p->regval;
> + *sysreg = p->regval;
> } else {
> if (!p->is_32bit)
> vcpu_cp15_64_high(vcpu, r->reg) = upper_32_bits(p->regval);
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:59:01AM -0500, Jintack Lim wrote:
> In addition to EL2 register accesses, setting NV bit will also make EL12
> register accesses trap to EL2. To emulate this for the virtual EL2,
> forword traps due to EL12 register accessses to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.
>
> This is for recursive nested virtualization.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/sys_regs.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 4fd7090..3559cf7 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -149,6 +149,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
> int i;
> const struct el1_el2_map *map;
>
> + if (el12_reg(p) && forward_nv_traps(vcpu))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> /*
> * Redirect EL1 register accesses to the corresponding EL2 registers if
> * they are meant to access EL2 registers.
> @@ -959,6 +962,9 @@ static bool access_cntkctl_el12(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> {
> + if (forward_nv_traps(vcpu))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> return true;
> }
> @@ -1005,6 +1011,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> {
> + if (el12_reg(p) && forward_nv_traps(vcpu))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
> return true;
> }
> @@ -1013,6 +1022,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> {
> + if (el12_reg(p) && forward_nv_traps(vcpu))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
> return true;
> }
> @@ -1021,6 +1033,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> {
> + if (el12_reg(p) && forward_nv_traps(vcpu))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> return true;
> }
> @@ -1031,6 +1046,9 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
> {
> u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
>
> + if (el12_reg(p) && forward_nv_traps(vcpu))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> /*
> * When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
> * in the virtual EL2 is to access CPTR_EL2.
> --
> 1.9.1
>
I'm wondering instead of having all these handlers, could we add this at
a higher level, like kvm_handle_sys() instead?
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:59:02AM -0500, Jintack Lim wrote:
> Forward the EL1 virtual memory register traps to the virtual EL2 if they
> are not coming from the virtual EL2 and the virtual HCR_EL2.TVM or TRVM
> bit is set.
I noticed that all these recursive patches don't change how we program
the physical HCR_EL2. Is that because we always respect the guest
hypervisor's configuration of the virtual HCR_EL2 into the physical one
when running the VM?
If so, perhaps we should add a single sentence in the commit messages
about that.
>
> This is for recursive nested virtualization.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/sys_regs.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 3559cf7..3e4ec5e 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -135,6 +135,27 @@ static inline bool el12_reg(struct sys_reg_params *p)
> return (p->Op1 == 5);
> }
>
> +/* This function is to support the recursive nested virtualization */
it's just 'recursive nested virtualization', not 'the recursive nested
virtualization', and I also think 'recursive virtualization' is
sufficient.
> +static bool forward_vm_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> + u64 hcr_el2 = vcpu_sys_reg(vcpu, HCR_EL2);
> +
> + /* If a trap comes from the virtual EL2, the host hypervisor handles. */
> + if (vcpu_mode_el2(vcpu))
> + return false;
> +
> + /*
> + * If the virtual HCR_EL2.TVM or TRVM bit is set, we need to foward
> + * this trap to the virtual EL2.
> + */
> + if ((hcr_el2 & HCR_TVM) && p->is_write)
> + return true;
> + else if ((hcr_el2 & HCR_TRVM) && !p->is_write)
> + return true;
> +
> + return false;
> +}
> +
> /*
> * Generic accessor for VM registers. Only called as long as HCR_TVM
> * is set. If the guest enables the MMU, we stop trapping the VM
> @@ -152,6 +173,9 @@ static bool access_vm_reg(struct kvm_vcpu *vcpu,
> if (el12_reg(p) && forward_nv_traps(vcpu))
> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>
> + if (!el12_reg(p) && forward_vm_traps(vcpu, p))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
why do you need the !el12_reg(p) check here?
> +
> /*
> * Redirect EL1 register accesses to the corresponding EL2 registers if
> * they are meant to access EL2 registers.
> --
> 1.9.1
>
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:59:03AM -0500, Jintack Lim wrote:
> Forward ELR_EL1, SPSR_EL1 and VBAR_EL1 traps to the virtual EL2 if the
> virtual HCR_EL2.NV bit is set.
>
> This is for recursive nested virtualization.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/include/asm/kvm_arm.h | 1 +
> arch/arm64/kvm/sys_regs.c | 18 ++++++++++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index aeaac4e..a1274b7 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -23,6 +23,7 @@
> #include <asm/types.h>
>
> /* Hyp Configuration Register (HCR) bits */
> +#define HCR_NV1 (UL(1) << 43)
> #define HCR_NV (UL(1) << 42)
> #define HCR_E2H (UL(1) << 34)
> #define HCR_ID (UL(1) << 33)
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 3e4ec5e..6f67666 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1031,6 +1031,15 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
> return true;
> }
>
> +/* This function is to support the recursive nested virtualization */
> +static bool forward_nv1_traps(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
> +{
> + if (!vcpu_mode_el2(vcpu) && (vcpu_sys_reg(vcpu, HCR_EL2) & HCR_NV1))
> + return true;
> +
> + return false;
> +}
> +
> static bool access_elr(struct kvm_vcpu *vcpu,
> struct sys_reg_params *p,
> const struct sys_reg_desc *r)
> @@ -1038,6 +1047,9 @@ static bool access_elr(struct kvm_vcpu *vcpu,
> if (el12_reg(p) && forward_nv_traps(vcpu))
> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>
> + if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu->arch.ctxt.gp_regs.elr_el1);
> return true;
> }
> @@ -1049,6 +1061,9 @@ static bool access_spsr(struct kvm_vcpu *vcpu,
> if (el12_reg(p) && forward_nv_traps(vcpu))
> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>
> + if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu->arch.ctxt.gp_regs.spsr[KVM_SPSR_EL1]);
> return true;
> }
> @@ -1060,6 +1075,9 @@ static bool access_vbar(struct kvm_vcpu *vcpu,
> if (el12_reg(p) && forward_nv_traps(vcpu))
> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>
> + if (!el12_reg(p) && forward_nv1_traps(vcpu, p))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
> access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
> return true;
> }
> --
> 1.9.1
>
Will we ever trap on any of these if !el12_reg() && !forward_nv_traps()
?
If not, do we need the !el12_reg() checks here?
Thanks,
-Christoffer
On Tue, Jul 18, 2017 at 11:59:04AM -0500, Jintack Lim wrote:
> Forward CPACR_EL1 traps to the virtual EL2 if virtual CPTR_EL2 is
> configured to trap CPACR_EL1 accesses from EL1.
>
> This is for recursive nested virtualization.
>
> Signed-off-by: Jintack Lim <[email protected]>
> ---
> arch/arm64/kvm/sys_regs.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 6f67666..ba2966d 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -1091,6 +1091,11 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
> if (el12_reg(p) && forward_nv_traps(vcpu))
> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>
> + /* Forward this trap to the virtual EL2 if CPTR_EL2.TCPAC is set*/
> + if (!el12_reg(p) && !vcpu_mode_el2(vcpu) &&
> + (vcpu_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TCPAC))
> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> +
I'm trying to understand what should happen if the VM is in EL1 and
accesses CPACR_EL12, but the guest hypervisor did not set
CPTR_EL2.TCPAC, why would we get here, and if there's a good reason why
we god here, is the EL12 access not supposed to undef at EL1 as opposed
to actually work, like it seems your code does when it doesn't take the
branch?
> /*
> * When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
> * in the virtual EL2 is to access CPTR_EL2.
> --
> 1.9.1
>
Thanks,
-Christoffer
Hi Jintack,
On Tue, Jul 18, 2017 at 11:58:26AM -0500, Jintack Lim wrote:
> Nested virtualization is the ability to run a virtual machine inside another
> virtual machine. In other words, it’s about running a hypervisor (the guest
> hypervisor) on top of another hypervisor (the host hypervisor).
>
> Supporting nested virtualization on ARM means that the hypervisor provides not
> only EL0/EL1 execution environment to VMs as it usually does but also the
> virtualization extensions including EL2 execution environment. Once the host
> hypervisor provides those execution environments to the VMs, then the guest
> hypervisor can run its own VMs (nested VMs) naturally.
>
> This series supports nested virtualization on arm64. ARM recently announced an
> extension (ARMv8.3) which has support for nested virtualization[1]. This patch
> set is based on the ARMv8.3 specification and tested on the FastModel with
> ARMv8.3 extension.
>
> The whole patch set to support nested virtualization is huge over 70
> patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
> virtualization. This patch series is the first part.
>
> CPU virtualization patch series provides basic nested virtualization framework
> and instruction emulations including v8.1 VHE feature and v8.3 nested
> virtualization feature for VMs.
>
> This patch series again can be divided into four parts. Patch 1 to 5 introduces
> nested virtualization by discovering hardware feature, adding a kernel
> parameter and allowing the userspace to set the initial CPU mode to EL2.
>
> Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
> a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
> virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
> host hypervisor manages virtual EL2 register state for the guest hypervisor
> and shadow EL1 register state that reflects the virtual EL2 register state to
> run the guest hypervisor in EL1.
>
> Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
> Extensions. These patches emulate newly defined registers and bits in v8.1 and
> allow the virtual EL2 to access EL2 register states via EL1 register accesses
> as in the real EL2.
>
> Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
> These enable recursive nested virtualization.
>
> This patch set is tested on the FastModel with the v8.3 extension for arm64 and
> a cubietruck for arm32. On the FastModel, the host and the guest kernels are
> compiled with and without VHE, so there are four combinations. I was able to
> boot SMP Linux in the nested VM on all four configurations and able to run
> hackbench. I also checked that regular VMs could boot when the nested
> virtualization kernel parameter was not set. On the cubietruck, I also verified
> that regular VMs could boot as well.
>
> I'll share my experiment setup shortly.
>
> Even though this work has some limitations and TODOs, I'd appreciate early
> feedback on this RFC. Specifically, I'm interested in:
>
> - Overall design to manage vcpu context for the virtual EL2
> - Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
> (Patch 30 and 32)
> - Patch organization and coding style
>
> This patch series is based on kvm/next d38338e.
> The whole patch series including memory, VGIC, and timer patches is available
> here:
>
> [email protected]:columbia/nesting-pub.git rfc-v2
>
> Limitations:
> - There are some cases that the target exception level of a VM is ambiguous when
> emulating eret instruction. I'm discussing this issue with Christoffer and
> Marc. Meanwhile, I added a temporary patch (not included in this
> series. f1beaba in the repo) and used 4.10.0 kernel when testing the guest
> hypervisor with VHE.
> - Recursive nested virtualization is not tested yet.
> - Other hypervisors (such as Xen) on KVM are not tested.
>
> TODO:
> - Submit memory, VGIC, and timer patches
> - Evaluate regular VM performance to see if there's a negative impact.
> - Test other hypervisors such as Xen on KVM
> - Test recursive nested virtualization
>
I think this overall looks pretty good, and I think you can drop the RFC
tag from the next revision, assuming the remaining patch sets for
memory, vgic, and timers don't require some major controversial rework
of these patches.
Thanks,
-Christoffer
> v1-->v2:
> - Added support for the virtual EL2 with VHE
> - Rewrote commit messages and comments from the perspective of supporting
> execution environments to VMs, rather than from the perspective of the guest
> hypervisor running in them.
> - Fixed a few bugs to make it run on the FastModel.
> - Tested on ARMv8.3 with four configurations. (host/guest. with/without VHE.)
> - Rebased to kvm/next
>
> [1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
>
> Christoffer Dall (7):
> KVM: arm64: Add KVM nesting feature
> KVM: arm64: Allow userspace to set PSR_MODE_EL2x
> KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
> KVM: arm/arm64: Add a framework to prepare virtual EL2 execution
> arm64: Add missing TCR hw defines
> KVM: arm64: Create shadow EL1 registers
> KVM: arm64: Trap EL1 VM register accesses in virtual EL2
>
> Jintack Lim (31):
> arm64: Add ARM64_HAS_NESTED_VIRT feature
> KVM: arm/arm64: Enable nested virtualization via command-line
> KVM: arm/arm64: Check if nested virtualization is in use
> KVM: arm64: Add EL2 system registers to vcpu context
> KVM: arm64: Add EL2 special registers to vcpu context
> KVM: arm64: Add the shadow context for virtual EL2 execution
> KVM: arm64: Set vcpu context depending on the guest exception level
> KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
> exit
> KVM: arm64: Move exception macros and enums to a common file
> KVM: arm64: Support to inject exceptions to the virtual EL2
> KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
> KVM: arm64: Trap CPACR_EL1 access in virtual EL2
> KVM: arm64: Handle eret instruction traps
> KVM: arm64: Set a handler for the system instruction traps
> KVM: arm64: Handle PSCI call via smc from the guest
> KVM: arm64: Inject HVC exceptions to the virtual EL2
> KVM: arm64: Respect virtual HCR_EL2.TWX setting
> KVM: arm64: Respect virtual CPTR_EL2.TFP setting
> KVM: arm64: Add macros to support the virtual EL2 with VHE
> KVM: arm64: Add EL2 registers defined in ARMv8.1 to vcpu context
> KVM: arm64: Emulate EL12 register accesses from the virtual EL2
> KVM: arm64: Support a VM with VHE considering EL0 of the VHE host
> KVM: arm64: Allow the virtual EL2 to access EL2 states without trap
> KVM: arm64: Manage the shadow states when virtual E2H bit enabled
> KVM: arm64: Trap and emulate CPTR_EL2 accesses via CPACR_EL1 from the
> virtual EL2 with VHE
> KVM: arm64: Emulate appropriate VM control system registers
> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting
> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting for EL12
> register traps
> KVM: arm64: Respect virtual HCR_EL2.TVM and TRVM settings
> KVM: arm64: Respect the virtual HCR_EL2.NV1 bit setting
> KVM: arm64: Respect the virtual CPTR_EL2.TCPAC setting
>
> Documentation/admin-guide/kernel-parameters.txt | 4 +
> arch/arm/include/asm/kvm_emulate.h | 17 ++
> arch/arm/include/asm/kvm_host.h | 15 +
> arch/arm64/include/asm/cpucaps.h | 3 +-
> arch/arm64/include/asm/esr.h | 1 +
> arch/arm64/include/asm/kvm_arm.h | 2 +
> arch/arm64/include/asm/kvm_coproc.h | 3 +-
> arch/arm64/include/asm/kvm_emulate.h | 56 ++++
> arch/arm64/include/asm/kvm_host.h | 64 ++++-
> arch/arm64/include/asm/kvm_hyp.h | 24 --
> arch/arm64/include/asm/pgtable-hwdef.h | 6 +
> arch/arm64/include/asm/sysreg.h | 70 +++++
> arch/arm64/include/uapi/asm/kvm.h | 1 +
> arch/arm64/kernel/asm-offsets.c | 1 +
> arch/arm64/kernel/cpufeature.c | 11 +
> arch/arm64/kvm/Makefile | 5 +-
> arch/arm64/kvm/context.c | 346 +++++++++++++++++++++++
> arch/arm64/kvm/emulate-nested.c | 83 ++++++
> arch/arm64/kvm/guest.c | 2 +
> arch/arm64/kvm/handle_exit.c | 89 +++++-
> arch/arm64/kvm/hyp/entry.S | 13 +
> arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
> arch/arm64/kvm/hyp/switch.c | 33 ++-
> arch/arm64/kvm/hyp/sysreg-sr.c | 117 ++++----
> arch/arm64/kvm/inject_fault.c | 12 -
> arch/arm64/kvm/nested.c | 63 +++++
> arch/arm64/kvm/reset.c | 8 +
> arch/arm64/kvm/sys_regs.c | 359 +++++++++++++++++++++++-
> arch/arm64/kvm/sys_regs.h | 8 +
> arch/arm64/kvm/trace.h | 43 ++-
> virt/kvm/arm/arm.c | 20 ++
> 31 files changed, 1363 insertions(+), 118 deletions(-)
> create mode 100644 arch/arm64/kvm/context.c
> create mode 100644 arch/arm64/kvm/emulate-nested.c
> create mode 100644 arch/arm64/kvm/nested.c
>
> --
> 1.9.1
>
Hi Christoffer,
On Mon, Jul 31, 2017 at 9:00 AM, Christoffer Dall <[email protected]> wrote:
> Hi Jintack,
>
> On Tue, Jul 18, 2017 at 11:58:26AM -0500, Jintack Lim wrote:
>> Nested virtualization is the ability to run a virtual machine inside another
>> virtual machine. In other words, it’s about running a hypervisor (the guest
>> hypervisor) on top of another hypervisor (the host hypervisor).
>>
>> Supporting nested virtualization on ARM means that the hypervisor provides not
>> only EL0/EL1 execution environment to VMs as it usually does but also the
>> virtualization extensions including EL2 execution environment. Once the host
>> hypervisor provides those execution environments to the VMs, then the guest
>> hypervisor can run its own VMs (nested VMs) naturally.
>>
>> This series supports nested virtualization on arm64. ARM recently announced an
>> extension (ARMv8.3) which has support for nested virtualization[1]. This patch
>> set is based on the ARMv8.3 specification and tested on the FastModel with
>> ARMv8.3 extension.
>>
>> The whole patch set to support nested virtualization is huge over 70
>> patches, so I categorized them into four parts: CPU, memory, VGIC, and timer
>> virtualization. This patch series is the first part.
>>
>> CPU virtualization patch series provides basic nested virtualization framework
>> and instruction emulations including v8.1 VHE feature and v8.3 nested
>> virtualization feature for VMs.
>>
>> This patch series again can be divided into four parts. Patch 1 to 5 introduces
>> nested virtualization by discovering hardware feature, adding a kernel
>> parameter and allowing the userspace to set the initial CPU mode to EL2.
>>
>> Patch 6 to 25 are to support the EL2 execution environment, the virtual EL2, to
>> a VM on v8.0 architecture. We de-privilege the guest hypervisor and emulate the
>> virtual EL2 mode in EL1 using the hardware features provided by ARMv8.3; The
>> host hypervisor manages virtual EL2 register state for the guest hypervisor
>> and shadow EL1 register state that reflects the virtual EL2 register state to
>> run the guest hypervisor in EL1.
>>
>> Patch 26 to 33 add support for the virtual EL2 with Virtualization Host
>> Extensions. These patches emulate newly defined registers and bits in v8.1 and
>> allow the virtual EL2 to access EL2 register states via EL1 register accesses
>> as in the real EL2.
>>
>> Patch 34 to 38 are to support for the virtual EL2 with nested virtualization.
>> These enable recursive nested virtualization.
>>
>> This patch set is tested on the FastModel with the v8.3 extension for arm64 and
>> a cubietruck for arm32. On the FastModel, the host and the guest kernels are
>> compiled with and without VHE, so there are four combinations. I was able to
>> boot SMP Linux in the nested VM on all four configurations and able to run
>> hackbench. I also checked that regular VMs could boot when the nested
>> virtualization kernel parameter was not set. On the cubietruck, I also verified
>> that regular VMs could boot as well.
>>
>> I'll share my experiment setup shortly.
>>
>> Even though this work has some limitations and TODOs, I'd appreciate early
>> feedback on this RFC. Specifically, I'm interested in:
>>
>> - Overall design to manage vcpu context for the virtual EL2
>> - Verifying correct EL2 register configurations such as HCR_EL2, CPTR_EL2
>> (Patch 30 and 32)
>> - Patch organization and coding style
>>
>> This patch series is based on kvm/next d38338e.
>> The whole patch series including memory, VGIC, and timer patches is available
>> here:
>>
>> [email protected]:columbia/nesting-pub.git rfc-v2
>>
>> Limitations:
>> - There are some cases that the target exception level of a VM is ambiguous when
>> emulating eret instruction. I'm discussing this issue with Christoffer and
>> Marc. Meanwhile, I added a temporary patch (not included in this
>> series. f1beaba in the repo) and used 4.10.0 kernel when testing the guest
>> hypervisor with VHE.
>> - Recursive nested virtualization is not tested yet.
>> - Other hypervisors (such as Xen) on KVM are not tested.
>>
>> TODO:
>> - Submit memory, VGIC, and timer patches
>> - Evaluate regular VM performance to see if there's a negative impact.
>> - Test other hypervisors such as Xen on KVM
>> - Test recursive nested virtualization
>>
>
> I think this overall looks pretty good, and I think you can drop the RFC
> tag from the next revision, assuming the remaining patch sets for
> memory, vgic, and timers don't require some major controversial rework
> of these patches.
Thank you for your thorough review. I'm happy that we can drop the RFC tag :).
Thanks,
Jintack
>
> Thanks,
> -Christoffer
>
>> v1-->v2:
>> - Added support for the virtual EL2 with VHE
>> - Rewrote commit messages and comments from the perspective of supporting
>> execution environments to VMs, rather than from the perspective of the guest
>> hypervisor running in them.
>> - Fixed a few bugs to make it run on the FastModel.
>> - Tested on ARMv8.3 with four configurations. (host/guest. with/without VHE.)
>> - Rebased to kvm/next
>>
>> [1] https://www.community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions
>>
>> Christoffer Dall (7):
>> KVM: arm64: Add KVM nesting feature
>> KVM: arm64: Allow userspace to set PSR_MODE_EL2x
>> KVM: arm64: Add vcpu_mode_el2 primitive to support nesting
>> KVM: arm/arm64: Add a framework to prepare virtual EL2 execution
>> arm64: Add missing TCR hw defines
>> KVM: arm64: Create shadow EL1 registers
>> KVM: arm64: Trap EL1 VM register accesses in virtual EL2
>>
>> Jintack Lim (31):
>> arm64: Add ARM64_HAS_NESTED_VIRT feature
>> KVM: arm/arm64: Enable nested virtualization via command-line
>> KVM: arm/arm64: Check if nested virtualization is in use
>> KVM: arm64: Add EL2 system registers to vcpu context
>> KVM: arm64: Add EL2 special registers to vcpu context
>> KVM: arm64: Add the shadow context for virtual EL2 execution
>> KVM: arm64: Set vcpu context depending on the guest exception level
>> KVM: arm64: Synchronize EL1 system registers on virtual EL2 entry and
>> exit
>> KVM: arm64: Move exception macros and enums to a common file
>> KVM: arm64: Support to inject exceptions to the virtual EL2
>> KVM: arm64: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2
>> KVM: arm64: Trap CPACR_EL1 access in virtual EL2
>> KVM: arm64: Handle eret instruction traps
>> KVM: arm64: Set a handler for the system instruction traps
>> KVM: arm64: Handle PSCI call via smc from the guest
>> KVM: arm64: Inject HVC exceptions to the virtual EL2
>> KVM: arm64: Respect virtual HCR_EL2.TWX setting
>> KVM: arm64: Respect virtual CPTR_EL2.TFP setting
>> KVM: arm64: Add macros to support the virtual EL2 with VHE
>> KVM: arm64: Add EL2 registers defined in ARMv8.1 to vcpu context
>> KVM: arm64: Emulate EL12 register accesses from the virtual EL2
>> KVM: arm64: Support a VM with VHE considering EL0 of the VHE host
>> KVM: arm64: Allow the virtual EL2 to access EL2 states without trap
>> KVM: arm64: Manage the shadow states when virtual E2H bit enabled
>> KVM: arm64: Trap and emulate CPTR_EL2 accesses via CPACR_EL1 from the
>> virtual EL2 with VHE
>> KVM: arm64: Emulate appropriate VM control system registers
>> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting
>> KVM: arm64: Respect the virtual HCR_EL2.NV bit setting for EL12
>> register traps
>> KVM: arm64: Respect virtual HCR_EL2.TVM and TRVM settings
>> KVM: arm64: Respect the virtual HCR_EL2.NV1 bit setting
>> KVM: arm64: Respect the virtual CPTR_EL2.TCPAC setting
>>
>> Documentation/admin-guide/kernel-parameters.txt | 4 +
>> arch/arm/include/asm/kvm_emulate.h | 17 ++
>> arch/arm/include/asm/kvm_host.h | 15 +
>> arch/arm64/include/asm/cpucaps.h | 3 +-
>> arch/arm64/include/asm/esr.h | 1 +
>> arch/arm64/include/asm/kvm_arm.h | 2 +
>> arch/arm64/include/asm/kvm_coproc.h | 3 +-
>> arch/arm64/include/asm/kvm_emulate.h | 56 ++++
>> arch/arm64/include/asm/kvm_host.h | 64 ++++-
>> arch/arm64/include/asm/kvm_hyp.h | 24 --
>> arch/arm64/include/asm/pgtable-hwdef.h | 6 +
>> arch/arm64/include/asm/sysreg.h | 70 +++++
>> arch/arm64/include/uapi/asm/kvm.h | 1 +
>> arch/arm64/kernel/asm-offsets.c | 1 +
>> arch/arm64/kernel/cpufeature.c | 11 +
>> arch/arm64/kvm/Makefile | 5 +-
>> arch/arm64/kvm/context.c | 346 +++++++++++++++++++++++
>> arch/arm64/kvm/emulate-nested.c | 83 ++++++
>> arch/arm64/kvm/guest.c | 2 +
>> arch/arm64/kvm/handle_exit.c | 89 +++++-
>> arch/arm64/kvm/hyp/entry.S | 13 +
>> arch/arm64/kvm/hyp/hyp-entry.S | 2 +-
>> arch/arm64/kvm/hyp/switch.c | 33 ++-
>> arch/arm64/kvm/hyp/sysreg-sr.c | 117 ++++----
>> arch/arm64/kvm/inject_fault.c | 12 -
>> arch/arm64/kvm/nested.c | 63 +++++
>> arch/arm64/kvm/reset.c | 8 +
>> arch/arm64/kvm/sys_regs.c | 359 +++++++++++++++++++++++-
>> arch/arm64/kvm/sys_regs.h | 8 +
>> arch/arm64/kvm/trace.h | 43 ++-
>> virt/kvm/arm/arm.c | 20 ++
>> 31 files changed, 1363 insertions(+), 118 deletions(-)
>> create mode 100644 arch/arm64/kvm/context.c
>> create mode 100644 arch/arm64/kvm/emulate-nested.c
>> create mode 100644 arch/arm64/kvm/nested.c
>>
>> --
>> 1.9.1
>>
Hi Christoffer,
On Mon, Jul 31, 2017 at 8:59 AM, Christoffer Dall <[email protected]> wrote:
> On Tue, Jul 18, 2017 at 11:59:04AM -0500, Jintack Lim wrote:
>> Forward CPACR_EL1 traps to the virtual EL2 if virtual CPTR_EL2 is
>> configured to trap CPACR_EL1 accesses from EL1.
>>
>> This is for recursive nested virtualization.
>>
>> Signed-off-by: Jintack Lim <[email protected]>
>> ---
>> arch/arm64/kvm/sys_regs.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 6f67666..ba2966d 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -1091,6 +1091,11 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
>> if (el12_reg(p) && forward_nv_traps(vcpu))
>> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>>
>> + /* Forward this trap to the virtual EL2 if CPTR_EL2.TCPAC is set*/
>> + if (!el12_reg(p) && !vcpu_mode_el2(vcpu) &&
>> + (vcpu_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TCPAC))
>> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
>> +
>
> I'm trying to understand what should happen if the VM is in EL1 and
> accesses CPACR_EL12, but the guest hypervisor did not set
> CPTR_EL2.TCPAC, why would we get here, and if there's a good reason why
I guess what you meant is HCR_EL2.NV bit?
> we god here, is the EL12 access not supposed to undef at EL1 as opposed
> to actually work, like it seems your code does when it doesn't take the
> branch?
IIUC, we need to have this logic
if (el12_reg() && virtual HCR_EL2.NV == 0)
inject_undef();
This is a good point, and should be applied for all traps controlled by NV bit.
>
>> /*
>> * When the virtual HCR_EL2.E2H == 1, an access to CPACR_EL1
>> * in the virtual EL2 is to access CPTR_EL2.
>> --
>> 1.9.1
>>
>
> Thanks,
> -Christoffer
On Tue, Aug 01, 2017 at 07:03:35AM -0400, Jintack Lim wrote:
> Hi Christoffer,
>
> On Mon, Jul 31, 2017 at 8:59 AM, Christoffer Dall <[email protected]> wrote:
> > On Tue, Jul 18, 2017 at 11:59:04AM -0500, Jintack Lim wrote:
> >> Forward CPACR_EL1 traps to the virtual EL2 if virtual CPTR_EL2 is
> >> configured to trap CPACR_EL1 accesses from EL1.
> >>
> >> This is for recursive nested virtualization.
> >>
> >> Signed-off-by: Jintack Lim <[email protected]>
> >> ---
> >> arch/arm64/kvm/sys_regs.c | 5 +++++
> >> 1 file changed, 5 insertions(+)
> >>
> >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >> index 6f67666..ba2966d 100644
> >> --- a/arch/arm64/kvm/sys_regs.c
> >> +++ b/arch/arm64/kvm/sys_regs.c
> >> @@ -1091,6 +1091,11 @@ static bool access_cpacr(struct kvm_vcpu *vcpu,
> >> if (el12_reg(p) && forward_nv_traps(vcpu))
> >> return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> >>
> >> + /* Forward this trap to the virtual EL2 if CPTR_EL2.TCPAC is set*/
> >> + if (!el12_reg(p) && !vcpu_mode_el2(vcpu) &&
> >> + (vcpu_sys_reg(vcpu, CPTR_EL2) & CPTR_EL2_TCPAC))
> >> + return kvm_inject_nested_sync(vcpu, kvm_vcpu_get_hsr(vcpu));
> >> +
> >
> > I'm trying to understand what should happen if the VM is in EL1 and
> > accesses CPACR_EL12, but the guest hypervisor did not set
> > CPTR_EL2.TCPAC, why would we get here, and if there's a good reason why
>
> I guess what you meant is HCR_EL2.NV bit?
>
No, HCR_EL2.NV is set, then we obviously get here, due to traps on _EL12
registers.
But if that wasn't the case (that's the time you'd be avaluating this
if-statement), then you're checking as part of the if-statement if the
virtual CPTR_EL2.TCPAC is set. My question is, if the virtual
CPTR_EL2.TCPAC is not set, why would the physical one be set, which must
be the case if we're running this code, right?
> > we god here, is the EL12 access not supposed to undef at EL1 as opposed
I obviously meant *got* here.
> > to actually work, like it seems your code does when it doesn't take the
> > branch?
>
> IIUC, we need to have this logic
>
> if (el12_reg() && virtual HCR_EL2.NV == 0)
> inject_undef();
>
> This is a good point, and should be applied for all traps controlled by NV bit.
>
Yes, but can this ever happen?
Thanks,
-Christoffer
On Sun, Jul 30, 2017 at 3:59 PM, Christoffer Dall <[email protected]> wrote:
> On Tue, Jul 18, 2017 at 11:58:28AM -0500, Jintack Lim wrote:
>> Add a new kernel parameter(kvm-arm.nested) to enable KVM/ARM nested
>> virtualization support. This kernel parameter on arm architecture is
>> ignored since nested virtualization is not supported on arm.
>>
>> Note that this kernel parameter will not have any impact until nested
>> virtualization support is completed. Just add this parameter first to
>> use it when implementing nested virtualization support.
>>
>> Signed-off-by: Jintack Lim <[email protected]>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 4 ++++
>> arch/arm/include/asm/kvm_host.h | 4 ++++
>> arch/arm64/include/asm/kvm_host.h | 2 ++
>> arch/arm64/kvm/Makefile | 2 ++
>> arch/arm64/kvm/nested.c | 26 +++++++++++++++++++++++++
>> virt/kvm/arm/arm.c | 2 ++
>> 6 files changed, 40 insertions(+)
>> create mode 100644 arch/arm64/kvm/nested.c
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index aa8341e..8fb152d 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1841,6 +1841,10 @@
>> [KVM,ARM] Trap guest accesses to GICv3 common
>> system registers
>>
>> + kvm-arm.nested=
>> + [KVM,ARM] Allow nested virtualization in KVM/ARM.
>> + Default is 0 (disabled)
>
> We may want to say "on systems that support it" or something like that
> here as well.
>
Sounds good! Thanks.
>> +
>> kvm-intel.ept= [KVM,Intel] Disable extended page tables
>> (virtualized MMU) support on capable Intel chips.
>> Default is 1 (enabled)
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 127e2dd..00b0f97 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -299,4 +299,8 @@ int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu,
>> int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>> struct kvm_device_attr *attr);
>>
>> +static inline int __init kvmarm_nested_cfg(char *buf)
>> +{
>> + return 0;
>> +}
>> #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 0c4fd1f..dcc4df8 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -386,4 +386,6 @@ static inline void __cpu_init_stage2(void)
>> "PARange is %d bits, unsupported configuration!", parange);
>> }
>>
>> +int __init kvmarm_nested_cfg(char *buf);
>> +
>> #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 5d98100..f513047 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -35,3 +35,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
>> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
>> kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
>> kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
>> +
>> +kvm-$(CONFIG_KVM_ARM_HOST) += nested.o
>> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
>> new file mode 100644
>> index 0000000..79f38da
>> --- /dev/null
>> +++ b/arch/arm64/kvm/nested.c
>> @@ -0,0 +1,26 @@
>> +/*
>> + * Copyright (C) 2017 - Columbia University and Linaro Ltd.
>> + * Author: Jintack Lim <[email protected]>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include <linux/kvm.h>
>> +#include <linux/kvm_host.h>
>> +
>> +static bool nested_param;
>> +
>> +int __init kvmarm_nested_cfg(char *buf)
>> +{
>> + return strtobool(buf, &nested_param);
>> +}
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index a39a1e1..1c1c772 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -67,6 +67,8 @@
>>
>> static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
>>
>> +early_param("kvm-arm.nested", kvmarm_nested_cfg);
>> +
>> static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
>> {
>> BUG_ON(preemptible());
>> --
>> 1.9.1
>>
On Sun, Jul 30, 2017 at 3:59 PM, Christoffer Dall <[email protected]> wrote:
> On Tue, Jul 18, 2017 at 11:58:30AM -0500, Jintack Lim wrote:
>> Nested virtualizaion is in use only if all three conditions are met:
>> - The architecture supports nested virtualization.
>> - The kernel parameter is set.
>> - The userspace uses nested virtualiztion feature.
>>
>> Signed-off-by: Jintack Lim <[email protected]>
>> ---
>> arch/arm/include/asm/kvm_host.h | 11 +++++++++++
>> arch/arm64/include/asm/kvm_host.h | 2 ++
>> arch/arm64/kvm/nested.c | 17 +++++++++++++++++
>> virt/kvm/arm/arm.c | 4 ++++
>> 4 files changed, 34 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 00b0f97..7e9e6c8 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -303,4 +303,15 @@ static inline int __init kvmarm_nested_cfg(char *buf)
>> {
>> return 0;
>> }
>> +
>> +static inline int init_nested_virt(void)
>> +{
>> + return 0;
>> +}
>> +
>> +static inline bool nested_virt_in_use(struct kvm_vcpu *vcpu)
>> +{
>> + return false;
>> +}
>> +
>> #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 6df0c7c..86d4b6c 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -387,5 +387,7 @@ static inline void __cpu_init_stage2(void)
>> }
>>
>> int __init kvmarm_nested_cfg(char *buf);
>> +int init_nested_virt(void);
>> +bool nested_virt_in_use(struct kvm_vcpu *vcpu);
>>
>> #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
>> index 79f38da..9a05c76 100644
>> --- a/arch/arm64/kvm/nested.c
>> +++ b/arch/arm64/kvm/nested.c
>> @@ -24,3 +24,20 @@ int __init kvmarm_nested_cfg(char *buf)
>> {
>> return strtobool(buf, &nested_param);
>> }
>> +
>> +int init_nested_virt(void)
>> +{
>> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
>> + kvm_info("Nested virtualization is supported\n");
>> +
>> + return 0;
>> +}
>> +
>> +bool nested_virt_in_use(struct kvm_vcpu *vcpu)
>> +{
>> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT)
>> + && test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
>> + return true;
>> +
>> + return false;
>> +}
>
> after reading through a lot of your patches, I feel like vm_has_el2()
> would be a more elegant name, but it's not a strict requirement to
> change it.
I think it's a nice name. Let me think about it :)
>
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 1c1c772..36aae3a 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -1478,6 +1478,10 @@ int kvm_arch_init(void *opaque)
>> if (err)
>> goto out_err;
>>
>> + err = init_nested_virt();
>> + if (err)
>> + return err;
>> +
>> err = init_subsystems();
>> if (err)
>> goto out_hyp;
>> --
>> 1.9.1
>>
>
> Thanks,
> -Christoffer
On Sun, Jul 30, 2017 at 3:59 PM, Christoffer Dall <[email protected]> wrote:
> On Tue, Jul 18, 2017 at 11:58:30AM -0500, Jintack Lim wrote:
>> Nested virtualizaion is in use only if all three conditions are met:
>> - The architecture supports nested virtualization.
>> - The kernel parameter is set.
>> - The userspace uses nested virtualiztion feature.
>>
>> Signed-off-by: Jintack Lim <[email protected]>
>> ---
>> arch/arm/include/asm/kvm_host.h | 11 +++++++++++
>> arch/arm64/include/asm/kvm_host.h | 2 ++
>> arch/arm64/kvm/nested.c | 17 +++++++++++++++++
>> virt/kvm/arm/arm.c | 4 ++++
>> 4 files changed, 34 insertions(+)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>> index 00b0f97..7e9e6c8 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -303,4 +303,15 @@ static inline int __init kvmarm_nested_cfg(char *buf)
>> {
>> return 0;
>> }
>> +
>> +static inline int init_nested_virt(void)
>> +{
>> + return 0;
>> +}
>> +
>> +static inline bool nested_virt_in_use(struct kvm_vcpu *vcpu)
>> +{
>> + return false;
>> +}
>> +
>> #endif /* __ARM_KVM_HOST_H__ */
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 6df0c7c..86d4b6c 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -387,5 +387,7 @@ static inline void __cpu_init_stage2(void)
>> }
>>
>> int __init kvmarm_nested_cfg(char *buf);
>> +int init_nested_virt(void);
>> +bool nested_virt_in_use(struct kvm_vcpu *vcpu);
>>
>> #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
>> index 79f38da..9a05c76 100644
>> --- a/arch/arm64/kvm/nested.c
>> +++ b/arch/arm64/kvm/nested.c
>> @@ -24,3 +24,20 @@ int __init kvmarm_nested_cfg(char *buf)
>> {
>> return strtobool(buf, &nested_param);
>> }
>> +
>> +int init_nested_virt(void)
>> +{
>> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
>> + kvm_info("Nested virtualization is supported\n");
>> +
>> + return 0;
>> +}
>> +
>> +bool nested_virt_in_use(struct kvm_vcpu *vcpu)
>> +{
>> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT)
>> + && test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
>> + return true;
>
> you could initialize a bool in init_nested_virt which you then check
> here to avoid duplicating the logic.
I can make a bool to check the kernel param and the capability. The
third one is per VM given by the userspace, so we don't know it when
we initialize the host hypervisor. We can potentially have a bool in
kvm_vcpu_arch or kvm_arch to cache the whole three conditions, if that
sounds ok.
>
>> +
>> + return false;
>> +}
>> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
>> index 1c1c772..36aae3a 100644
>> --- a/virt/kvm/arm/arm.c
>> +++ b/virt/kvm/arm/arm.c
>> @@ -1478,6 +1478,10 @@ int kvm_arch_init(void *opaque)
>> if (err)
>> goto out_err;
>>
>> + err = init_nested_virt();
>> + if (err)
>> + return err;
>> +
>> err = init_subsystems();
>> if (err)
>> goto out_hyp;
>> --
>> 1.9.1
>>
On Sun, Jul 30, 2017 at 3:59 PM, Christoffer Dall <[email protected]> wrote:
> On Tue, Jul 18, 2017 at 11:58:34AM -0500, Jintack Lim wrote:
>> To support the virtual EL2 execution, we need to maintain the EL2
>> special registers such as SPSR_EL2, ELR_EL2 and SP_EL2 in vcpu context.
>>
>> Note that SP_EL2 is not accessible in EL2, so we don't need a trap
>> handler for this register.
>
> Actually, it's not accessible *in the MRS/MSR instruction* but it is of
> course accessible as the current stack pointer (which is why you need
> the state, but not the trap handler).
That is correct. I'll fix the commit message.
>
> Otherwise, the patch looks good.
Thanks!
>
> Thanks,
> -Christoffer
>
>>
>> Signed-off-by: Jintack Lim <[email protected]>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 12 ++++++++++++
>> arch/arm64/include/asm/sysreg.h | 4 ++++
>> arch/arm64/kvm/sys_regs.c | 38 +++++++++++++++++++++++++++++++++-----
>> arch/arm64/kvm/sys_regs.h | 8 ++++++++
>> 4 files changed, 57 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 1dc4ed6..57dccde 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -171,6 +171,15 @@ enum vcpu_sysreg {
>> NR_SYS_REGS /* Nothing after this line! */
>> };
>>
>> +enum el2_special_regs {
>> + __INVALID_EL2_SPECIAL_REG__,
>> + SPSR_EL2, /* Saved Program Status Register (EL2) */
>> + ELR_EL2, /* Exception Link Register (EL2) */
>> + SP_EL2, /* Stack Pointer (EL2) */
>> +
>> + NR_EL2_SPECIAL_REGS
>> +};
>> +
>> /* 32bit mapping */
>> #define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
>> #define c0_CSSELR (CSSELR_EL1 * 2)/* Cache Size Selection Register */
>> @@ -218,6 +227,8 @@ struct kvm_cpu_context {
>> u64 sys_regs[NR_SYS_REGS];
>> u32 copro[NR_COPRO_REGS];
>> };
>> +
>> + u64 el2_special_regs[NR_EL2_SPECIAL_REGS];
>> };
>>
>> typedef struct kvm_cpu_context kvm_cpu_context_t;
>> @@ -307,6 +318,7 @@ struct kvm_vcpu_arch {
>>
>> #define vcpu_gp_regs(v) (&(v)->arch.ctxt.gp_regs)
>> #define vcpu_sys_reg(v,r) ((v)->arch.ctxt.sys_regs[(r)])
>> +#define vcpu_el2_sreg(v,r) ((v)->arch.ctxt.el2_special_regs[(r)])
>> /*
>> * CP14 and CP15 live in the same array, as they are backed by the
>> * same system registers.
>> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
>> index 9277c4a..98c32ef 100644
>> --- a/arch/arm64/include/asm/sysreg.h
>> +++ b/arch/arm64/include/asm/sysreg.h
>> @@ -268,6 +268,8 @@
>>
>> #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0)
>>
>> +#define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0)
>> +#define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1)
>> #define SYS_SP_EL1 sys_reg(3, 4, 4, 1, 0)
>>
>> #define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1)
>> @@ -332,6 +334,8 @@
>> #define SYS_CNTVOFF_EL2 sys_reg(3, 4, 14, 0, 3)
>> #define SYS_CNTHCTL_EL2 sys_reg(3, 4, 14, 1, 0)
>>
>> +#define SYS_SP_EL2 sys_reg(3, 6, 4, 1, 0)
>> +
>> /* Common SCTLR_ELx flags. */
>> #define SCTLR_ELx_EE (1 << 25)
>> #define SCTLR_ELx_I (1 << 12)
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 1568f8b..2b3ed70 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -900,15 +900,33 @@ static inline void access_rw(struct sys_reg_params *p, u64 *sysreg)
>> *sysreg = p->regval;
>> }
>>
>> +static u64 *get_special_reg(struct kvm_vcpu *vcpu, struct sys_reg_params *p)
>> +{
>> + u64 reg = sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2);
>> +
>> + switch (reg) {
>> + case SYS_SP_EL1:
>> + return &vcpu->arch.ctxt.gp_regs.sp_el1;
>> + case SYS_ELR_EL2:
>> + return &vcpu_el2_sreg(vcpu, ELR_EL2);
>> + case SYS_SPSR_EL2:
>> + return &vcpu_el2_sreg(vcpu, SPSR_EL2);
>> + default:
>> + return NULL;
>> + };
>> +}
>> +
>> static bool trap_el2_regs(struct kvm_vcpu *vcpu,
>> struct sys_reg_params *p,
>> const struct sys_reg_desc *r)
>> {
>> - /* SP_EL1 is NOT maintained in sys_regs array */
>> - if (sys_reg(p->Op0, p->Op1, p->CRn, p->CRm, p->Op2) == SYS_SP_EL1)
>> - access_rw(p, &vcpu->arch.ctxt.gp_regs.sp_el1);
>> - else
>> - access_rw(p, &vcpu_sys_reg(vcpu, r->reg));
>> + u64 *sys_reg;
>> +
>> + sys_reg = get_special_reg(vcpu, p);
>> + if (!sys_reg)
>> + sys_reg = &vcpu_sys_reg(vcpu, r->reg);
>> +
>> + access_rw(p, sys_reg);
>>
>> return true;
>> }
>> @@ -1116,6 +1134,8 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
>>
>> { SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
>>
>> + { SYS_DESC(SYS_SPSR_EL2), trap_el2_regs, reset_special, SPSR_EL2, 0 },
>> + { SYS_DESC(SYS_ELR_EL2), trap_el2_regs, reset_special, ELR_EL2, 0 },
>> { SYS_DESC(SYS_SP_EL1), trap_el2_regs },
>>
>> { SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
>> @@ -1138,6 +1158,8 @@ static bool trap_el2_regs(struct kvm_vcpu *vcpu,
>>
>> { SYS_DESC(SYS_CNTVOFF_EL2), trap_el2_regs, reset_val, CNTVOFF_EL2, 0 },
>> { SYS_DESC(SYS_CNTHCTL_EL2), trap_el2_regs, reset_val, CNTHCTL_EL2, 0 },
>> +
>> + { SYS_DESC(SYS_SP_EL2), NULL, reset_special, SP_EL2, 0},
>> };
>>
>> static bool trap_dbgidr(struct kvm_vcpu *vcpu,
>> @@ -2271,6 +2293,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
>>
>> /* Catch someone adding a register without putting in reset entry. */
>> memset(&vcpu->arch.ctxt.sys_regs, 0x42, sizeof(vcpu->arch.ctxt.sys_regs));
>> + memset(&vcpu->arch.ctxt.el2_special_regs, 0x42,
>> + sizeof(vcpu->arch.ctxt.el2_special_regs));
>>
>> /* Generic chip reset first (so target could override). */
>> reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
>> @@ -2281,4 +2305,8 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu)
>> for (num = 1; num < NR_SYS_REGS; num++)
>> if (vcpu_sys_reg(vcpu, num) == 0x4242424242424242)
>> panic("Didn't reset vcpu_sys_reg(%zi)", num);
>> +
>> + for (num = 1; num < NR_EL2_SPECIAL_REGS; num++)
>> + if (vcpu_el2_sreg(vcpu, num) == 0x4242424242424242)
>> + panic("Didn't reset vcpu_el2_sreg(%zi)", num);
>> }
>> diff --git a/arch/arm64/kvm/sys_regs.h b/arch/arm64/kvm/sys_regs.h
>> index 060f534..827717b 100644
>> --- a/arch/arm64/kvm/sys_regs.h
>> +++ b/arch/arm64/kvm/sys_regs.h
>> @@ -99,6 +99,14 @@ static inline void reset_val(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r
>> vcpu_sys_reg(vcpu, r->reg) = r->val;
>> }
>>
>> +static inline void reset_special(struct kvm_vcpu *vcpu,
>> + const struct sys_reg_desc *r)
>> +{
>> + BUG_ON(!r->reg);
>> + BUG_ON(r->reg >= NR_EL2_SPECIAL_REGS);
>> + vcpu_el2_sreg(vcpu, r->reg) = r->val;
>> +}
>> +
>> static inline int cmp_sys_reg(const struct sys_reg_desc *i1,
>> const struct sys_reg_desc *i2)
>> {
>> --
>> 1.9.1
>>
On Sun, Jul 30, 2017 at 4:00 PM, Christoffer Dall <[email protected]> wrote:
> On Tue, Jul 18, 2017 at 11:58:46AM -0500, Jintack Lim wrote:
>> When HCR.NV bit is set, eret instructions trap to EL2 with EC code 0x1A.
>> Emulate eret instructions by setting pc and pstate.
>
> It may be worth noting in the commit message that this is all we have to
> do, because the rest of the logic will then discover that the mode could
> change from virtual EL2 to EL1 and will setup the hw registers etc. when
> changing modes.
Makes sense. I'll write it up in the commit message.
>
>>
>> Note that the current exception level is always the virtual EL2, since
>> we set HCR_EL2.NV bit only when entering the virtual EL2. So, we take
>> spsr and elr states from the virtual _EL2 registers.
>>
>> Signed-off-by: Jintack Lim <[email protected]>
>> ---
>> arch/arm64/include/asm/esr.h | 1 +
>> arch/arm64/kvm/handle_exit.c | 16 ++++++++++++++++
>> arch/arm64/kvm/trace.h | 21 +++++++++++++++++++++
>> 3 files changed, 38 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index e7d8e28..210fde6 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -43,6 +43,7 @@
>> #define ESR_ELx_EC_HVC64 (0x16)
>> #define ESR_ELx_EC_SMC64 (0x17)
>> #define ESR_ELx_EC_SYS64 (0x18)
>> +#define ESR_ELx_EC_ERET (0x1A)
>> /* Unallocated EC: 0x19 - 0x1E */
>> #define ESR_ELx_EC_IMP_DEF (0x1f)
>> #define ESR_ELx_EC_IABT_LOW (0x20)
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 17d8a16..9259881 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -147,6 +147,21 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> return 1;
>> }
>>
>> +static int kvm_handle_eret(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> +{
>> + trace_kvm_nested_eret(vcpu, vcpu_el2_sreg(vcpu, ELR_EL2),
>> + vcpu_el2_sreg(vcpu, SPSR_EL2));
>> +
>> + /*
>> + * Note that the current exception level is always the virtual EL2,
>> + * since we set HCR_EL2.NV bit only when entering the virtual EL2.
>> + */
>> + *vcpu_pc(vcpu) = vcpu_el2_sreg(vcpu, ELR_EL2);
>> + *vcpu_cpsr(vcpu) = vcpu_el2_sreg(vcpu, SPSR_EL2);
>> +
>> + return 1;
>> +}
>> +
>> static exit_handle_fn arm_exit_handlers[] = {
>> [0 ... ESR_ELx_EC_MAX] = kvm_handle_unknown_ec,
>> [ESR_ELx_EC_WFx] = kvm_handle_wfx,
>> @@ -160,6 +175,7 @@ static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> [ESR_ELx_EC_HVC64] = handle_hvc,
>> [ESR_ELx_EC_SMC64] = handle_smc,
>> [ESR_ELx_EC_SYS64] = kvm_handle_sys_reg,
>> + [ESR_ELx_EC_ERET] = kvm_handle_eret,
>> [ESR_ELx_EC_IABT_LOW] = kvm_handle_guest_abort,
>> [ESR_ELx_EC_DABT_LOW] = kvm_handle_guest_abort,
>> [ESR_ELx_EC_SOFTSTP_LOW]= kvm_handle_guest_debug,
>> diff --git a/arch/arm64/kvm/trace.h b/arch/arm64/kvm/trace.h
>> index 7c86cfb..5f40987 100644
>> --- a/arch/arm64/kvm/trace.h
>> +++ b/arch/arm64/kvm/trace.h
>> @@ -187,6 +187,27 @@
>> TP_printk("vcpu: %p, inject exception to vEL2: ESR_EL2 0x%lx, vector: 0x%016lx",
>> __entry->vcpu, __entry->esr_el2, __entry->pc)
>> );
>> +
>> +TRACE_EVENT(kvm_nested_eret,
>> + TP_PROTO(struct kvm_vcpu *vcpu, unsigned long elr_el2,
>> + unsigned long spsr_el2),
>> + TP_ARGS(vcpu, elr_el2, spsr_el2),
>> +
>> + TP_STRUCT__entry(
>> + __field(struct kvm_vcpu *, vcpu)
>> + __field(unsigned long, elr_el2)
>> + __field(unsigned long, spsr_el2)
>> + ),
>> +
>> + TP_fast_assign(
>> + __entry->vcpu = vcpu;
>> + __entry->elr_el2 = elr_el2;
>> + __entry->spsr_el2 = spsr_el2;
>> + ),
>> +
>> + TP_printk("vcpu: %p, eret to elr_el2: 0x%016lx, with spsr_el2: 0x%08lx",
>> + __entry->vcpu, __entry->elr_el2, __entry->spsr_el2)
>> +);
>> #endif /* _TRACE_ARM64_KVM_H */
>>
>> #undef TRACE_INCLUDE_PATH
>> --
>> 1.9.1
>>
>
> Otherwise this patch looks good.
>
> Thanks,
> -Christoffer
On Tue, Aug 01, 2017 at 10:07:40AM -0400, Jintack Lim wrote:
> On Sun, Jul 30, 2017 at 3:59 PM, Christoffer Dall <[email protected]> wrote:
> > On Tue, Jul 18, 2017 at 11:58:30AM -0500, Jintack Lim wrote:
> >> Nested virtualizaion is in use only if all three conditions are met:
> >> - The architecture supports nested virtualization.
> >> - The kernel parameter is set.
> >> - The userspace uses nested virtualiztion feature.
> >>
> >> Signed-off-by: Jintack Lim <[email protected]>
> >> ---
> >> arch/arm/include/asm/kvm_host.h | 11 +++++++++++
> >> arch/arm64/include/asm/kvm_host.h | 2 ++
> >> arch/arm64/kvm/nested.c | 17 +++++++++++++++++
> >> virt/kvm/arm/arm.c | 4 ++++
> >> 4 files changed, 34 insertions(+)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> >> index 00b0f97..7e9e6c8 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -303,4 +303,15 @@ static inline int __init kvmarm_nested_cfg(char *buf)
> >> {
> >> return 0;
> >> }
> >> +
> >> +static inline int init_nested_virt(void)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> +static inline bool nested_virt_in_use(struct kvm_vcpu *vcpu)
> >> +{
> >> + return false;
> >> +}
> >> +
> >> #endif /* __ARM_KVM_HOST_H__ */
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 6df0c7c..86d4b6c 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -387,5 +387,7 @@ static inline void __cpu_init_stage2(void)
> >> }
> >>
> >> int __init kvmarm_nested_cfg(char *buf);
> >> +int init_nested_virt(void);
> >> +bool nested_virt_in_use(struct kvm_vcpu *vcpu);
> >>
> >> #endif /* __ARM64_KVM_HOST_H__ */
> >> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> >> index 79f38da..9a05c76 100644
> >> --- a/arch/arm64/kvm/nested.c
> >> +++ b/arch/arm64/kvm/nested.c
> >> @@ -24,3 +24,20 @@ int __init kvmarm_nested_cfg(char *buf)
> >> {
> >> return strtobool(buf, &nested_param);
> >> }
> >> +
> >> +int init_nested_virt(void)
> >> +{
> >> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT))
> >> + kvm_info("Nested virtualization is supported\n");
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +bool nested_virt_in_use(struct kvm_vcpu *vcpu)
> >> +{
> >> + if (nested_param && cpus_have_const_cap(ARM64_HAS_NESTED_VIRT)
> >> + && test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features))
> >> + return true;
> >
> > you could initialize a bool in init_nested_virt which you then check
> > here to avoid duplicating the logic.
>
> I can make a bool to check the kernel param and the capability. The
> third one is per VM given by the userspace, so we don't know it when
> we initialize the host hypervisor. We can potentially have a bool in
> kvm_vcpu_arch or kvm_arch to cache the whole three conditions, if that
> sounds ok.
>
Yes, that sounds good to me.
Thanks,
-Christoffer