Instruction cpucfg can be used to get processor features. And there
is trap exception when it is executed in VM mode, and also it is
to provide cpu features to VM. On real hardware cpucfg area 0 - 20
is used. Here one specified area 0x40000000 -- 0x400000ff is used
for KVM hypervisor to privide PV features, and the area can be extended
for other hypervisors in future. This area will never be used for
real HW, it is only used by software.
Signed-off-by: Bibo Mao <[email protected]>
---
arch/loongarch/include/asm/inst.h | 1 +
arch/loongarch/include/asm/loongarch.h | 10 +++++
arch/loongarch/kvm/exit.c | 59 +++++++++++++++++++-------
3 files changed, 54 insertions(+), 16 deletions(-)
diff --git a/arch/loongarch/include/asm/inst.h b/arch/loongarch/include/asm/inst.h
index d8f637f9e400..ad120f924905 100644
--- a/arch/loongarch/include/asm/inst.h
+++ b/arch/loongarch/include/asm/inst.h
@@ -67,6 +67,7 @@ enum reg2_op {
revhd_op = 0x11,
extwh_op = 0x16,
extwb_op = 0x17,
+ cpucfg_op = 0x1b,
iocsrrdb_op = 0x19200,
iocsrrdh_op = 0x19201,
iocsrrdw_op = 0x19202,
diff --git a/arch/loongarch/include/asm/loongarch.h b/arch/loongarch/include/asm/loongarch.h
index 46366e783c84..a1d22e8b6f94 100644
--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -158,6 +158,16 @@
#define CPUCFG48_VFPU_CG BIT(2)
#define CPUCFG48_RAM_CG BIT(3)
+/*
+ * cpucfg index area: 0x40000000 -- 0x400000ff
+ * SW emulation for KVM hypervirsor
+ */
+#define CPUCFG_KVM_BASE 0x40000000UL
+#define CPUCFG_KVM_SIZE 0x100
+#define CPUCFG_KVM_SIG CPUCFG_KVM_BASE
+#define KVM_SIGNATURE "KVM\0"
+#define CPUCFG_KVM_FEATURE (CPUCFG_KVM_BASE + 4)
+
#ifndef __ASSEMBLY__
/* CSR */
diff --git a/arch/loongarch/kvm/exit.c b/arch/loongarch/kvm/exit.c
index 923bbca9bd22..a8d3b652d3ea 100644
--- a/arch/loongarch/kvm/exit.c
+++ b/arch/loongarch/kvm/exit.c
@@ -206,10 +206,50 @@ int kvm_emu_idle(struct kvm_vcpu *vcpu)
return EMULATE_DONE;
}
-static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+static int kvm_emu_cpucfg(struct kvm_vcpu *vcpu, larch_inst inst)
{
int rd, rj;
unsigned int index;
+ unsigned long plv;
+
+ rd = inst.reg2_format.rd;
+ rj = inst.reg2_format.rj;
+ ++vcpu->stat.cpucfg_exits;
+ index = vcpu->arch.gprs[rj];
+
+ /*
+ * By LoongArch Reference Manual 2.2.10.5
+ * Return value is 0 for undefined cpucfg index
+ *
+ * Disable preemption since hw gcsr is accessed
+ */
+ preempt_disable();
+ plv = kvm_read_hw_gcsr(LOONGARCH_CSR_CRMD) >> CSR_CRMD_PLV_SHIFT;
+ switch (index) {
+ case 0 ... (KVM_MAX_CPUCFG_REGS - 1):
+ vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
+ break;
+ case CPUCFG_KVM_SIG:
+ /*
+ * Cpucfg emulation between 0x40000000 -- 0x400000ff
+ * Return value with 0 if executed in user mode
+ */
+ if ((plv & CSR_CRMD_PLV) == PLV_KERN)
+ vcpu->arch.gprs[rd] = *(unsigned int *)KVM_SIGNATURE;
+ else
+ vcpu->arch.gprs[rd] = 0;
+ break;
+ default:
+ vcpu->arch.gprs[rd] = 0;
+ break;
+ }
+
+ preempt_enable();
+ return EMULATE_DONE;
+}
+
+static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
+{
unsigned long curr_pc;
larch_inst inst;
enum emulation_result er = EMULATE_DONE;
@@ -224,21 +264,8 @@ static int kvm_trap_handle_gspr(struct kvm_vcpu *vcpu)
er = EMULATE_FAIL;
switch (((inst.word >> 24) & 0xff)) {
case 0x0: /* CPUCFG GSPR */
- if (inst.reg2_format.opcode == 0x1B) {
- rd = inst.reg2_format.rd;
- rj = inst.reg2_format.rj;
- ++vcpu->stat.cpucfg_exits;
- index = vcpu->arch.gprs[rj];
- er = EMULATE_DONE;
- /*
- * By LoongArch Reference Manual 2.2.10.5
- * return value is 0 for undefined cpucfg index
- */
- if (index < KVM_MAX_CPUCFG_REGS)
- vcpu->arch.gprs[rd] = vcpu->arch.cpucfg[index];
- else
- vcpu->arch.gprs[rd] = 0;
- }
+ if (inst.reg2_format.opcode == cpucfg_op)
+ er = kvm_emu_cpucfg(vcpu, inst);
break;
case 0x4: /* CSR{RD,WR,XCHG} GSPR */
er = kvm_handle_csr(vcpu, inst);
--
2.39.3
On 3/15/24 16:07, Bibo Mao wrote:
> Instruction cpucfg can be used to get processor features. And there
> is trap exception when it is executed in VM mode, and also it is
> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
> is used. Here one specified area 0x40000000 -- 0x400000ff is used
> for KVM hypervisor to privide PV features, and the area can be extended
> for other hypervisors in future. This area will never be used for
> real HW, it is only used by software.
>
> Signed-off-by: Bibo Mao <[email protected]>
> ---
> arch/loongarch/include/asm/inst.h | 1 +
> arch/loongarch/include/asm/loongarch.h | 10 +++++
> arch/loongarch/kvm/exit.c | 59 +++++++++++++++++++-------
> 3 files changed, 54 insertions(+), 16 deletions(-)
>
Sorry for the late reply, but I think it may be a bit non-constructive
to repeatedly submit the same code without due explanation in our
previous review threads. Let me try to recollect some of the details
though...
If I remember correctly, during the previous reviews, it was mentioned
that the only upsides of using CPUCFG were:
- it was exactly identical to the x86 approach,
- it would not require access to the LoongArch Reference Manual Volume 3
to use, and
- it was plain old data.
But, for the first point, we don't have to follow x86 convention after
all. The second reason might be compelling, but on the one hand that's
another problem orthogonal to the current one, and on the other hand
HVCL is:
- already effectively public because of the fact that this very patchset
is public,
- its semantics is trivial to implement even without access to the LVZ
manual, because of its striking similarity with SYSCALL, and
- by being a function call, we reserve the possibility for hypervisors
to invoke logic for self-identification purposes, even if this is likely
overkill from today's perspective.
And, even if we decide that using HVCL for self-identification is
overkill after all, we still have another choice that's IOCSR. We
already read LOONGARCH_IOCSR_FEATURES (0x8) for its bit 11 (IOCSRF_VM)
to populate the CPU_FEATURE_HYPERVISOR bit, and it's only natural that
we put the identification word in the IOCSR space. As far as I can see,
the IOCSR space is plenty and equally available for making reservations;
it can only be even easier when it's done by a Loongson team.
Finally, I've mentioned multiple times, that varying CPUCFG behavior
based on PLV is not something well documented on the manuals, hence not
friendly to low-level developers. Devs of third-party firmware and/or
kernels do exist, I've personally spoken to some of them on the
2023-11-18 3A6000 release event; in order for the varying CPUCFG
behavior approach to pass for me, at the very least, the LoongArch
reference manual must be amended to explicitly include an explanation of
it, and a reference to potential use cases.
--
WANG "xen0n" Xuerui
Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/
On 2024/3/24 上午3:02, WANG Xuerui wrote:
> On 3/15/24 16:07, Bibo Mao wrote:
>> Instruction cpucfg can be used to get processor features. And there
>> is trap exception when it is executed in VM mode, and also it is
>> to provide cpu features to VM. On real hardware cpucfg area 0 - 20
>> is used. Here one specified area 0x40000000 -- 0x400000ff is used
>> for KVM hypervisor to privide PV features, and the area can be extended
>> for other hypervisors in future. This area will never be used for
>> real HW, it is only used by software.
>>
>> Signed-off-by: Bibo Mao <[email protected]>
>> ---
>> arch/loongarch/include/asm/inst.h | 1 +
>> arch/loongarch/include/asm/loongarch.h | 10 +++++
>> arch/loongarch/kvm/exit.c | 59 +++++++++++++++++++-------
>> 3 files changed, 54 insertions(+), 16 deletions(-)
>>
>
> Sorry for the late reply, but I think it may be a bit non-constructive
> to repeatedly submit the same code without due explanation in our
> previous review threads. Let me try to recollect some of the details
> though...
Because your review comments about hypercall method is wrong, I need not
adopt it.
>
> If I remember correctly, during the previous reviews, it was mentioned
> that the only upsides of using CPUCFG were:
>
> - it was exactly identical to the x86 approach,
> - it would not require access to the LoongArch Reference Manual Volume 3
> to use, and
> - it was plain old data.
>
> But, for the first point, we don't have to follow x86 convention after
X86 virtualization is successfully and widely applied in our life and
products. It it normal to follow it if there is not obvious issues.
> all. The second reason might be compelling, but on the one hand that's
> another problem orthogonal to the current one, and on the other hand
> HVCL is:
>
> - already effectively public because of the fact that this very patchset
> is public,
> - its semantics is trivial to implement even without access to the LVZ
> manual, because of its striking similarity with SYSCALL, and
> - by being a function call, we reserve the possibility for hypervisors
> to invoke logic for self-identification purposes, even if this is likely
> overkill from today's perspective.
>
> And, even if we decide that using HVCL for self-identification is
> overkill after all, we still have another choice that's IOCSR. We
> already read LOONGARCH_IOCSR_FEATURES (0x8) for its bit 11 (IOCSRF_VM)
> to populate the CPU_FEATURE_HYPERVISOR bit, and it's only natural that
> we put the identification word in the IOCSR space. As far as I can see,
> the IOCSR space is plenty and equally available for making reservations;
> it can only be even easier when it's done by a Loongson team.
IOCSR method is possible also, about chip design CPUCFG is used for cpu
features and IOCSR is for device featurs. Here CPUCFG method is
selected, I am KVM LoongArch maintainer and I can decide to select
methods if the method works well. Is that right?
If you are interested in KVM LoongArch, you can submit more patches and
become maintainer or write new hypervisor support such xen/xvisor etc,
and use your method.
Also you are interested in Linux kernel, there are some issues. Can you
help to improve it?
1. T0-T7 are scratch registers during SYSCALL ABI, this is what you
suggest, does there exist information leaking to user space from T0-T7
registers?
2. LoongArch KVM depends on AS_HAS_LVZ_EXTENSION, which requires the
latest binutils. It is also what you suggest. Some kernel developers
does not have the latest binutils and common kvm code is modified and
LoongArch KVM fails to compile. But they can not find it since their
LoongArch cross-compile is old and LoongArch KVM is disabled. This issue
can be found at https://lkml.org/lkml/2023/11/15/828.
Regards
Bibo Mao
>
> Finally, I've mentioned multiple times, that varying CPUCFG behavior
> based on PLV is not something well documented on the manuals, hence not
> friendly to low-level developers. Devs of third-party firmware and/or
> kernels do exist, I've personally spoken to some of them on the
> 2023-11-18 3A6000 release event; in order for the varying CPUCFG
> behavior approach to pass for me, at the very least, the LoongArch
> reference manual must be amended to explicitly include an explanation of
> it, and a reference to potential use cases.
>
On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
> > Sorry for the late reply, but I think it may be a bit non-constructive
> > to repeatedly submit the same code without due explanation in our
> > previous review threads. Let me try to recollect some of the details
> > though...
> Because your review comments about hypercall method is wrong, I need not
> adopt it.
Again it's unfair to say so considering the lack of LVZ documentation.
/* snip */
>
> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you
> suggest, does there exist information leaking to user space from T0-T7
> registers?
It's not a problem. When syscall returns RESTORE_ALL_AND_RET is invoked
despite T0-T7 are not saved. So a "junk" value will be read from the
leading PT_SIZE bytes of the kernel stack for this thread.
The leading PT_SIZE bytes of the kernel stack is dedicated for storing
the struct pt_regs representing the reg file of the thread in the
userspace.
Thus we may only read out the userspace T0-T7 value stored when the same
thread was interrupted or trapped last time, or 0 (if the thread was
never interrupted or trapped before).
And it's impossible to read some data used by the kernel internally, or
some data of another thread.
But indeed there is some improvement here. Zeroing these registers
seems cleaner than reading out the junk values, and also faster (move
$t0, $r0 is faster than ld.d $t0, $sp, PT_R12). Not sure if it's worthy
to violate Huacai's "keep things simple" aspiration though.
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University
On 2024/4/2 上午10:49, Xi Ruoyao wrote:
> On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
>>> Sorry for the late reply, but I think it may be a bit non-constructive
>>> to repeatedly submit the same code without due explanation in our
>>> previous review threads. Let me try to recollect some of the details
>>> though...
>> Because your review comments about hypercall method is wrong, I need not
>> adopt it.
>
> Again it's unfair to say so considering the lack of LVZ documentation.
>
> /* snip */
>
>>
>> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you
>> suggest, does there exist information leaking to user space from T0-T7
>> registers?
>
> It's not a problem. When syscall returns RESTORE_ALL_AND_RET is invoked
> despite T0-T7 are not saved. So a "junk" value will be read from the
> leading PT_SIZE bytes of the kernel stack for this thread.
For you it is "junk" value, some guys maybe thinks it is useful.
There is another issue, since kernel restore T0-T7 registers and user
space save T0-T7. Why T0-T7 is scratch registers rather than preserve
registers like other architecture? What is the advantage if it is
scratch registers?
Regards
Bibo Mao
>
> The leading PT_SIZE bytes of the kernel stack is dedicated for storing
> the struct pt_regs representing the reg file of the thread in the
> userspace.
>
> Thus we may only read out the userspace T0-T7 value stored when the same
> thread was interrupted or trapped last time, or 0 (if the thread was
> never interrupted or trapped before).
>
> And it's impossible to read some data used by the kernel internally, or
> some data of another thread.
>
> But indeed there is some improvement here. Zeroing these registers
> seems cleaner than reading out the junk values, and also faster (move
> $t0, $r0 is faster than ld.d $t0, $sp, PT_R12). Not sure if it's worthy
> to violate Huacai's "keep things simple" aspiration though.
>
On 2024/4/2 上午10:49, Xi Ruoyao wrote:
> On Tue, 2024-04-02 at 09:43 +0800, maobibo wrote:
>>> Sorry for the late reply, but I think it may be a bit non-constructive
>>> to repeatedly submit the same code without due explanation in our
>>> previous review threads. Let me try to recollect some of the details
>>> though...
>> Because your review comments about hypercall method is wrong, I need not
>> adopt it.
>
> Again it's unfair to say so considering the lack of LVZ documentation.
>
> /* snip */
>
>>
>> 1. T0-T7 are scratch registers during SYSCALL ABI, this is what you
>> suggest, does there exist information leaking to user space from T0-T7
>> registers?
>
> It's not a problem. When syscall returns RESTORE_ALL_AND_RET is invoked
> despite T0-T7 are not saved. So a "junk" value will be read from the
> leading PT_SIZE bytes of the kernel stack for this thread.
>
> The leading PT_SIZE bytes of the kernel stack is dedicated for storing
> the struct pt_regs representing the reg file of the thread in the
> userspace.
Not all syscalls use leading PT_SIZE bytes of the kernel stack. It is
complicated if syscall is combined with interrupt and singals.
>
> Thus we may only read out the userspace T0-T7 value stored when the same
> thread was interrupted or trapped last time, or 0 (if the thread was
> never interrupted or trapped before).
>
> And it's impossible to read some data used by the kernel internally, or
> some data of another thread.
Are you sure that it's impossible to read some data used by the kernel
internally?
Regards
Bibo Mao
>
> But indeed there is some improvement here. Zeroing these registers
> seems cleaner than reading out the junk values, and also faster (move
> $t0, $r0 is faster than ld.d $t0, $sp, PT_R12). Not sure if it's worthy
> to violate Huacai's "keep things simple" aspiration though.
>
On Tue, 2024-04-02 at 11:34 +0800, maobibo wrote:
> Are you sure that it's impossible to read some data used by the kernel
> internally?
Yes.
> There is another issue, since kernel restore T0-T7 registers and user
> space save T0-T7. Why T0-T7 is scratch registers rather than preserve
> registers like other architecture? What is the advantage if it is
> scratch registers?
I'd say "MIPS legacy." Note that MIPS also does not preserve temp
registers, and MIPS does not have the "info leak" issue as well (or it
should have been assigned a CVE, in all these years).
I do agree maybe it's the time to move away from MIPS legacy and be more
similar to RISC-V etc now...
In Glibc we can condition __SYSCALL_CLOBBERS with #if
__LINUX_KERNEL_VERSION > xxxxxxx to take the advantage.
Huacai, Xuerui, how do you think?
--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University